Vector sync only. This Django app is the shared pipeline that embeds and upserts documents into Pinecone (hybrid dense + sparse, integrated cloud embeddings). It does not crawl GitHub, Slack, or the web; upstream collectors populate PostgreSQL and/or the workspace, then call sync_to_pinecone() or run_cppa_pinecone_sync.
Each run targets exactly one logical source (--app-type), one namespace, one preprocessor import path, and one Pinecone account selected by --pinecone-instance (public or private — see ingestion.PineconeInstance).
Docs: docs/service_api/cppa_pinecone_sync.md · docs/Pinecone_preprocess_guideline.md · docs/Architecture_data_flow.md
- Load sync bookkeeping from this app’s models (
PineconeSyncStatus,PineconeFailList) so runs can resume and retry safely. - Import and run the preprocessor you pass in (
--preprocessordotted path). That callable reads other apps’ rows and/or workspace files and returns document dicts for embedding. - Chunk, embed, and upsert into Pinecone via
ingestion.PineconeIngestion, using the API key for the chosenPineconeInstance(PINECONE_API_KEYvsPINECONE_PRIVATE_API_KEYand related settings). - Update
PineconeSyncStatus/PineconeFailListonly — domain tables stay owned by the source app.
| CLI | Meaning |
|---|---|
--pinecone-instance public (default) |
Use the public Pinecone project credentials from Django settings (PINECONE_API_KEY, index/host settings, embedding model names). |
--pinecone-instance private |
Use the private Pinecone project (PINECONE_PRIVATE_API_KEY and its index configuration). |
Pick the instance that matches where the namespace for this app_type was provisioned. Wrong instance → wrong index/credentials, not a second “mode” of ingest.
Only sync metadata lives here — not messages, issues, or docs.
| Model | Role |
|---|---|
PineconeSyncStatus |
One row per app_type; final_sync_at marks last successful sync for incrementality with preprocessors. |
PineconeFailList |
Failed vector ids (and app_type) for retry or audit. |
References: docs/Schema.md, section 9 — CPPA Pinecone Sync · models.py · docs/service_api/cppa_pinecone_sync.md.
- No external “fetch” phase — no scheduled scrape of third-party APIs inside this package.
- No GitHub / Markdown publishing — those belong to tracker apps and
core.operations. - No writes to other apps’ domain tables — preprocessors read them; this app only updates
cppa_pinecone_sync_*tables and calls the Pinecone API.
- One-off sync:
python manage.py run_cppa_pinecone_sync --help(requires--app-type,--namespace,--preprocessortogether). - Add a new namespace: implement a preprocessor per docs/Pinecone_preprocess_guideline.md, then invoke from the owning collector or Celery task.
Wraps sync_to_pinecone() for a single (app_type, namespace, preprocessor, instance) tuple.
| Option | Description |
|---|---|
--app-type |
Logical source id (string your preprocessor understands; often matches the upstream collector’s app type). |
--namespace |
Target Pinecone namespace for upserts. |
--preprocessor |
Dotted import path to the preprocess callable (e.g. myapp.preprocessors.foo). |
--pinecone-instance |
public (default) or private — which Pinecone API credentials / project to use (PineconeInstance). |
- Django app label:
cppa_pinecone_sync - Path (from repo root):
cppa_pinecone_sync/ - Registration: Listed under
INSTALLED_APPSinconfig/settings.pyascppa_pinecone_sync.
| Command | Description |
|---|---|
run_cppa_pinecone_sync |
Run one preprocessor-driven upsert into Pinecone for the given app type, namespace, and instance. |
Run python manage.py <command> --help for options.
Typical invocation (from repo root, after README prerequisites):
python -m pytest cppa_pinecone_sync/tests/ -v