Guide for AI agents working in the Death by Numbers codebase.
Death by Numbers is a digital scholarly research project examining the London Bills of Mortality (1636-1754). The repository contains three main components:
- bom-data/: Historical datasets, shapefiles, and GeoJSON files
- bom-processing/: Data processing pipeline (Python) and database tools (Go)
- bom-website/: Hugo-based static site with D3.js visualizations
The project transforms historical CSV data into a PostgreSQL database and serves it via a REST API to power interactive visualizations on the public website.
bom/
├── bom-data/ # Historical datasets and geographic data
│ ├── data-csvs/ # Raw CSV exports from DataScribe
│ ├── geoJSON-files/ # Parish boundaries by time period
│ ├── parish-shapefiles/ # Source shapefiles for GIS data
│ ├── parish-networks/ # Network data for plague spread
│ └── deathdictionary.csv
├── bom-processing/ # Data processing pipeline
│ ├── db/ # PostgreSQL migrations and Go ETL updater
│ │ ├── migrations/ # golang-migrate SQL files
│ │ └── updater/ # Go program for database imports
│ ├── scripts/
│ │ ├── bompy/ # Python data processing package
│ │ └── bomr/ # Legacy R scripts (no longer used)
│ └── api-docs/ # API documentation
└── bom-website/ # Hugo static site
├── content/ # Markdown content (blog, analysis, context)
├── themes/dbn/ # Custom TailwindCSS theme
├── assets/ # JavaScript and visualizations
│ ├── js/ # Services and Alpine.js components
│ └── visualizations/ # D3.js/Observable Plot charts
└── static/ # Static assets (images)
# Preview site locally with drafts and live reload
make preview
# Build for development (dev.deathbynumbers.org)
make build
# Build for production with minification
make build-prod
# Compile TailwindCSS after layout changes
make tailwind
# OR from themes/dbn/:
npm run build-twTheme Development:
cd themes/dbn
npm install -y # Install TailwindCSS dependencies
npm run build-tw # Build TailwindCSS# Setup and install dependencies (uses uv/Poetry)
make setup # or: make install
# Process all CSV data files
make process-all # Runs Python pipeline
# View processing results
make show-stats # Show record counts and file sizes
make show-logs # List recent log files
make tail-logs # Follow latest log file
# Data management
make clean-data # Remove output CSVs
make clean-logs # Remove log files
make copy-data # Copy raw data from bom-data/ to bompy/data-raw/
# Code quality
make format # Format with black and isort
make lint # Check formatting
make typecheck # Run mypy type checking
make test # Run pytest testsManual processing commands:
uv run process_all_data.py # Main processing script
uv run tests/test_bills_processor.py # Test bills processor
uv run tests/test_schema_alignment.py # Test schema alignmentPrerequisites:
- golang-migrate:
brew install golang-migrate(macOS) - PostgreSQL client tools
- Go 1.20+
- Create
.envfile withDB_CONN_STRandDATA_DIR
# Run database migrations
make db-up # Apply all migrations
make db-down # Revert migrations
make db-version # Show current migration version
# Import data
make db-update # Run Go importer (imports processed CSVs)
make dry-run # Preview import without changes
# Build Go importer
make build # Compiles to updater/bin/bom-importer
# Code quality
make fmt # Format Go code
make vet # Run go vet# Convert shapefiles to GeoJSON
./create_geojson.sh [target_directory]
# Default: ../../bom-data/parish-shapefiles
# Import GeoJSON to PostGIS database
./insert_geojson.sh [source_directory] [database_name] [database_user]
# Default source: ../../bom-data/geoJSON-filesImportant: After updating parishes_shp data, run this one-time SQL:
UPDATE bom.parishes_shp ps
SET parish_id = p.id
FROM bom.parishes p
WHERE ps.dbn_par = p.canonical_name;- Hugo: v0.107.0 (static site generator)
- TailwindCSS: v3.2.4 (utility-first CSS)
- Alpine.js: v3.14.1 (reactive JavaScript framework)
- D3.js: v7.9.0 (data visualizations)
- Observable Plot: v0.6.16 (declarative charting)
- Leaflet: v1.9.4 (mapping)
- Pagefind: Search indexing (run via npx)
- Python: 3.10+ with uv/Poetry
- pandas 2.0
- pydantic 2.0
- loguru 0.7
- typer, rich (CLI tools)
- Go: 1.20+
- pgx/v4 (PostgreSQL driver)
- PostgreSQL with PostGIS extension
- golang-migrate for schema management
- Schema:
bomwith tables for bills, parishes, weeks, causes, christenings, parishes_shp
Organization:
- Package structure:
src/bom/with submodules (models,loaders,processors,extractors,utils) - Dataclass-based models with type hints
- Configuration-driven:
config.pyfor constants and patterns
Conventions:
- Naming:
snake_casefor variables, functions, files - Type hints: All function signatures (
Optional[int],List[str],Dict[str, Any]) - Logging: loguru for structured logging (
logger.info(),logger.warning(),logger.error()) - Documentation: Google/NumPy style docstrings
Common patterns:
from dataclasses import dataclass
from typing import Optional, Dict, Any
from loguru import logger
@dataclass
class BillRecord:
parish_id: int
count: Optional[int]
year: int
def to_dict(self) -> Dict[str, Any]:
"""Convert to dictionary for pandas DataFrame."""
return {
'parish_id': self.parish_id,
'count': self.count,
'year': self.year
}
# Pandas DataFrame processing
df = pd.DataFrame([record.to_dict() for record in records])
df = df.dropna(subset=['required_column'])Key classes:
CSVLoader: Load and validate CSV filesSchemaValidator: Validate against database schemaBillsProcessor,ChristeningsProcessor: Transform dataParishExtractor: Extract parish mappings
Code quality:
- Format:
black(88 char line length) - Imports:
isort(black profile) - Type checking:
mypy(strict mode) - Testing:
pytest
Organization:
- ES6 modules with imports/exports
- Service layer:
DataService,URLService,ChartService,CacheService - Visualization base class:
Visualization(extended by specific charts) - Alpine.js components: Reactive state management
Conventions:
- Naming:
camelCasefor variables/functions,PascalCasefor classes - Async/await: For all data fetching
- Namespacing: Services on
windowobject (e.g.,window.dataService) - Caching: Multi-level (browser cache, service cache, request deduplication)
Common patterns:
// Visualization class
import * as d3 from 'd3';
export default class DeathsChart extends Visualization {
constructor(id, data, dimensions) {
super(id, data, dimensions, {top: 20, right: 30, bottom: 40, left: 50});
}
render() {
const svg = d3.select(`#${this.id}`)
.append('svg')
.attr('width', this.width)
.attr('height', this.height);
// D3.js rendering code
}
}
// Alpine.js component
Alpine.data('databaseExplorer', () => ({
selectedParishes: [],
loading: false,
init() {
this.fetchData();
},
async fetchData() {
this.loading = true;
try {
const data = await window.dataService.getBills({
parishes: this.selectedParishes
});
this.updateChart(data);
} catch (error) {
console.error('Failed to fetch data:', error);
} finally {
this.loading = false;
}
}
}));Data fetching patterns:
- Promise.all for parallel requests
- AbortController for cancellation
- Caching to prevent duplicate API calls
- Service layer abstracts API communication
Observable Plot usage:
import * as Plot from '@observablehq/plot';
const chart = Plot.plot({
marks: [
Plot.line(data, {x: 'date', y: 'deaths', stroke: 'cause'}),
Plot.ruleY([0])
],
width: dimensions.width,
height: dimensions.height
});Organization:
- Single
main.gofile for ETL program - Flag-based CLI configuration
- Context-driven operations
- Transaction-based imports with temporary tables
Conventions:
- Naming:
PascalCasefor exported,camelCasefor unexported - Error handling: Explicit returns with wrapping (
fmt.Errorf("operation: %w", err)) - Cleanup:
deferfor resource cleanup - Database: pgx/v4 for PostgreSQL
Common patterns:
import (
"context"
"flag"
"fmt"
"github.com/jackc/pgx/v4"
)
func main() {
dbConn := flag.String("db", "", "Database connection string")
dataDir := flag.String("data", "", "Data directory")
dryRun := flag.Bool("dry-run", false, "Preview without changes")
flag.Parse()
ctx := context.Background()
conn, err := pgx.Connect(ctx, *dbConn)
if err != nil {
return fmt.Errorf("failed to connect: %w", err)
}
defer conn.Close(ctx)
if err := importData(ctx, conn, *dataDir, *dryRun); err != nil {
return fmt.Errorf("import failed: %w", err)
}
}
func importData(ctx context.Context, conn *pgx.Conn, dataDir string, dryRun bool) error {
tx, err := conn.Begin(ctx)
if err != nil {
return fmt.Errorf("begin transaction: %w", err)
}
defer tx.Rollback(ctx) // Safe to call even after commit
// Create temp tables, import, validate
if !dryRun {
if err := tx.Commit(ctx); err != nil {
return fmt.Errorf("commit: %w", err)
}
}
return nil
}Organization:
layouts/_default/baseof.html: Base template with blockslayouts/partials/: Reusable components (header, footer, nav)layouts/shortcodes/: Content shortcodes (figures, citations, alerts)layouts/section/: Section-specific templates
Conventions:
- Go template syntax:
{{ .Title }},{{ range }},{{ with }} - TailwindCSS: Inline utility classes
- Alpine.js: For interactivity (
x-data,x-show,@click) - Hugo pipes: Asset processing with fingerprinting
Common patterns:
<!-- Template definition -->
{{ define "main" }}
<div class="container mx-auto px-4 py-8">
<h1 class="text-3xl font-bold text-dbn-green">{{ .Title }}</h1>
<div class="prose prose-lg max-w-none">
{{ .Content }}
</div>
</div>
{{ end }}
<!-- Asset pipeline -->
{{ $js := resources.Get .Params.script | js.Build | resources.Fingerprint }}
<script src="{{ $js.Permalink }}" defer></script>
<!-- Alpine.js interactivity -->
<div x-data="{ open: false }" class="relative">
<button @click="open = !open" class="btn-primary">
Toggle Menu
</button>
<div x-show="open" class="absolute z-10 mt-2">
<!-- Menu content -->
</div>
</div>
<!-- Shortcode usage in content -->
{{< figure src="/images/plague-chart.png"
caption="Weekly plague deaths, 1665"
alt="Line chart showing spike in deaths during Great Plague" >}}Namespace: bom
Core Tables:
bill_of_mortality: Individual bill records (1M+ rows)- Unique constraint:
(parish_id, count_type, year, week_id, bill_type)
- Unique constraint:
parishes: Parish lookup with canonical names (156 parishes)weeks: Unique week records with historical dating (5,393 weeks)causes_of_death: Death cause definitions and recordschristenings: Birth/baptism records with gender dataparishes_shp: PostGIS spatial data for 8+ time periods withparish_idforeign key
Import order (respects foreign keys):
yearsparishesweeksall_bills(becomesbill_of_mortality)
Migration files:
- Located in
bom-processing/db/migrations/ - Format:
NNNNNN_description.up.sql/NNNNNN_description.down.sql - Use golang-migrate for management
- Historical transcription → CSV files in
bom-data/data-csvs/ - Python processing (
bompy) → Normalized CSVs inbompy/data/ - Go ETL updater → PostgreSQL import with validation
- REST API → JSON endpoints at
https://data.chnm.org/bom/ - Website visualizations → D3.js charts fetch and render data
Base URL: https://data.chnm.org/bom/
Endpoints:
/bills- Bill of mortality records/parishes- Parish lookup/weeks- Week records/causes- Cause of death data/christenings- Birth/baptism records
Query parameters:
year,parish,cause,bill_type- Filteringlimit,offset- Pagination- Response format: JSON
File naming: YYYY-MM-DD-short-title.md in bom-website/content/blog/
Required YAML frontmatter:
---
title: "Post Title"
date: "2025-01-15"
author:
- fname lname
tags:
- tag1
- tag2
categories:
- category
---Including images:
- Upload to
bom-website/static/images/ - Use Hugo shortcode:
{{< figure src="/images/filename.jpg"
caption="Image caption"
alt="Descriptive alt text for accessibility" >}}Content sections:
/content/blog/- Blog posts and announcements/content/analysis/- Research analysis articles/content/context/- Historical context essays/content/methodologies/- Technical documentation
- Create feature branch from
main - Add/edit content or code
- Commit with descriptive messages
- Tag @hepplerj on Slack for preview on dev site
- Senior developer or sysadmin deploys to production
- Import order matters: Years → Parishes → Weeks → Bills (foreign key dependencies)
- Parish ID synchronization: After updating
parishes_shp, run the UPDATE query to linkparish_id - Dummy data cleanup: Use
DELETE FROM bom.bill_of_mortality WHERE count_type LIKE '%_DUMMY'to remove test data - Copy data before processing: Run
make copy-datafrom bompy/ to pull latest CSVs from bom-data/ - Schema validation: Python processors validate against expected PostgreSQL schema
- Hugo version: Project requires Hugo v0.107.0 - newer versions may have breaking changes
- TailwindCSS compilation: Must run
make tailwindafter adding/modifying Tailwind classes in templates - Asset fingerprinting: Production builds use Hugo's asset pipeline with fingerprinting for cache busting
- Draft posts: Use
make previewto see drafts locally;make build-prodexcludes drafts - Unsafe HTML: Markdown renderer has
unsafe: trueto allow embedded visualizations - Search index: Pagefind builds search index after Hugo build completes
- Environment-specific builds:
- Dev:
--buildDrafts --buildFuture --baseURL http://dev.deathbynumbers.org/ - Prod:
--minify --baseURL https://deathbynumbers.org/
- Dev:
- Shapefile directory exclusions: Processing scripts skip "Archived" and "merged" directories
- GeoJSON projection: Output is always WGS84 (EPSG:4326)
- GDAL requirement: Must have
ogr2ogrinstalled for shapefile conversion
- Pre-commit hooks: Website has pre-commit config for JSON, YAML, trailing whitespace
- Python formatting: Black (88 char), isort (black profile), mypy (strict)
- Go formatting: Use
go fmt,go vetbefore committing - No deployment from local: Production deployment only by senior developer/sysadmin via Ansible
cd bom-processing/scripts/bompy
make test # Run pytest
make test-bills # Test bills processor
make test-schema # Test schema alignmentcd bom-processing/db
make dry-run # Preview database import without changescd bom-website
make preview # Manual testing with live server
# Check visualizations, responsive design, content renderingNo automated test suite for website - relies on manual QA and production-like previews.
Production deployment:
- Handled by Ansible playbooks
- Only accessible to senior developer or systems administrator
- Docker containerization with nginx
- Builds to
/public/directory
Do NOT:
- Push directly to production
- Deploy from local machine
- Modify deployment configs without approval
Process:
- Complete work on feature branch
- Test locally with
make preview - Tag @hepplerj on Slack for dev deployment
- After approval, senior dev deploys to production
Documentation:
- Main README:
/README.md - Processing README:
/bom-processing/README.md - Website README:
/bom-website/README.md - Data README:
/bom-data/README.md - Dev notes:
/DEVNOTES.md - Contributing guide:
/bom-website/CONTRIBUTING.md
External Links:
- Website: https://deathbynumbers.org
- API base: https://data.chnm.org/bom/
- GitHub: https://github.com/chnm/bom
- Hugo docs: https://gohugo.io/documentation/
- TailwindCSS: https://tailwindcss.com/docs
- D3.js: https://d3js.org/
- Alpine.js: https://alpinejs.dev/
Tools:
- Markdown guide: https://markdownguide.org
- golang-migrate: https://github.com/golang-migrate/migrate
- GDAL/OGR: https://gdal.org/
- Principal Investigator: Jessica Otis (jotis2@gmu.edu)
- Technical Lead: Jason Heppler (@hepplerj on Slack, jheppler@gmu.edu)
- Institution: Roy Rosenzweig Center for History and New Media (RRCHNM)
Most common workflows:
# Start website locally
cd bom-website && make preview
# Process new data
cd bom-processing/scripts/bompy
make copy-data && make process-all && make show-stats
# Import to database
cd bom-processing/db
make db-update
# Update TailwindCSS
cd bom-website && make tailwind
# Create blog post
# 1. Create branch in GitHub UI
# 2. Add file: content/blog/YYYY-MM-DD-title.md
# 3. Add YAML frontmatter and content
# 4. Upload images to static/images/
# 5. Commit and tag @hepplerj for previewEmergency commands:
# Remove all dummy test data
psql $DB_CONN_STR -c "DELETE FROM bom.bill_of_mortality WHERE count_type LIKE '%_DUMMY';"
# Rebuild everything
cd bom-processing/scripts/bompy && make clean && make process-all
cd ../../../bom-website && make build-prod
# Reset Python environment
cd bom-processing/scripts/bompy && make reset && make setup