Orchestration for writing files to s3 to power our APIs
(Update this graph by running uv run make_graph.png
)
This project uses uv
to manage python packages.
Install uv first if you don't already have it. Then
uv sync
pre-commit install
We're using CDK so you need the cdk tooling.
Probably best to use nvm
to manage npm
to manage installing libraries.
Then
nvm use --lts
npm install
AWS_PROFILE=dev-aggregatorapi-dc DC_ENVIRONMENT=development cdk bootstrap
AWS_PROFILE=dev-aggregatorapi-dc DC_ENVIRONMENT=development cdk synth
The main concept in this repo is that of a 'layer'. This is really an artifact that is produced by code in this repo that can be use elsewhere.
The two current layers are:
This is a AWS Glue table with AddressBase in. It's partitioned by first letter for speedy filtering. It can be used in Athena.
The version of AddressBase is taken from the "addressbase cleaned" file that's made in the WDIV account. This is a simple format that squashes address fields into one.
Performs a geo-join on the AddressBase layer and a CSV of every ballot ID and it's division WKT.
At the end of the process, a parquet file per outcode is produced, containing one row per address and for each, a list of ballot IDs.
This can be used to look up current elections in other applications.
A layer is really a CDK stack. To make a new layer, make a stack and drive AWS in the way you normally would.
Look at other layers for patterns. Generally we use AWS Lambda, Glue, Athena and Step Functions and S3 to make a ETL pipeline, but your new layer might need other services.
That being said, there are some handy things you can use:
This is a Lambda function that will run a named Athena query. Saved writing a Lambda per query.
Lambda function that will empty an S3 bucket. This is important because Athena will query all files at a prefix, including duplicates of the same file. e.g if you have 5 copies of AddressBase in a prefix, Athena will return 5 rows per UPRN, or whatever.
It's useful to be able to empty a prefix, and this will do that.