OCR-D webAPI implementation

Run in Docker


git clone`
cd ocrd-webapi-implementation`

install nextflow

start docker

  • carefully: downloads ocrd-all:maximum which is huge (17 GB currently)
  • TODO: maybe switch image to minimum or medium and just offer contained processors
docker-compose --env-file things/env-template-docker up -d

test if running

curl localhost:5050
curl localhost:5050/workspaces

Run locally for development


git clone
cd ocrd-webapi-implementation

Create virtual environment and start it

make venv
. venv/bin/activate
  • TODO: this does not work if python3.7 is not present as on my own machine. How to get a python 3.7 on any linux distro?

start mongodb

  • make start-mongo


uvicorn ocrd_webapi.main:app --host --reload

Test this webAPI implementation

test if running



import file webapi-tests.postman_collection.json into postman to run example requests on running Docker service. Test-Workspace-Zips are in things/


Get workspace list: curl http://localhost:8000/workspace

Create new workspace: curl -X POST http://localhost:8000/workspace -F workspace=@tests/assets/

Create new workspace with id: curl -X PUT 'http://localhost:8000/workspace/test4711' -F 'workspace=@tests/assets/'

Update existing workspace: curl -X PUT 'http://localhost:8000/workspace/test4711' -F 'workspace=@tests/assets/'

Get single workspace: curl http://localhost:8000/workspace/test4711

Upload workflow: curl -X POST http://localhost:8000/workflow --user {user}:{pw} -F nextflow_script=@things/

Run Workflow: curl -X POST http://localhost:8000/workflow/{workflow-id} -H 'Content-Type: application/json' -d '{"workspace_id":"{workspace-id}", "workflow_parameters": {}}'

Request job status: curl http://localhost:8000/workflow/{workflow-id}/{job-id}

Download Workspace curl http://localhost:8000/workspace/{workspace-id} -H "accept: application/vnd.ocrd+zip" --output



connect to running container:

docker exec -it ocrd-webapi bash

start mongodb for local testing

docker run -d -p 27017:27017 --name mongo-4-ocrd --mount type=bind,source="$HOME/zeugs-ohne-backup/ocrd_webapi/mongo-data",target=/data/db mongo:latest

Explanation env-variables

OCRD_WEBAPI_SERVER_PATH: When users query a workspace, they get a url where to retreive it. Only therefore this variable is needed. It does not cause errors, if wrong "just" users cannot retreive their data and have to modify themselfs

OCRD_WEBAPI_PORT: Only docker. This is the port where the webapi will be available on localhost

OCRD_WEBAPI_MONGO_PORT: Only docker. This is the port where the mongodb will be available on localhost. When developing locally or running tests, this must fit to OCRD_WEBAPI_DB_URL

OCRD_WEBAPI_DATADIR_HOST: Only docker. This is the host-part of two volume-mounts. Here the data from mongdb and the data from the webapi are mounted. If running in development mode here the mongdb-stuff is accessible.

OCRD_WEBAPI_DB_URL: Important: This is the url where the webapi expects the mongdb to run

OCRD_WEBAPI_STORAGE_DIR: Important: Here the webapi stores its workspaces etc. Additionally, this is used in docker-compose. This is the container-part of a volume mount so that from the host-machine it is possible to access the data stored with the webapi