AIOps Copilot

AIOps Copilot is an end-to-end playground for anomaly detection, root-cause analysis, and observability on time-series coming from distributed services. The stack ships with a FastAPI backend, Streamlit dashboard, persistence (SQLite/Postgres), scripted pipelines, Grafana assets, and test suites.

Highlights

Real-time anomaly detection: rolling-median MAD baseline and forecast + conformal prediction, with per-series calibration.
Root Cause Analysis (RCA): graph propagation with exponential decay, configurable via YAML or OpenTelemetry Tempo traces.
FastAPI surface: ingest, batch detect, anomaly history, RCA ranking, dependency graph CRUD, Prometheus-style metrics.
Streamlit UI: overview metrics, service drill-down with conformal bands, interactive dependency graph + RCA table.
Turn-key datasets: NAB benchmark CSVs, Yahoo S5 miniature set, synthetic simulators, and a pre-built data/aiops.db.
Operational scripts: showcase runners, graph applier, Evidently drift reports, simulators, and dataset seeders.
CI-ready: lint (Ruff/Black/isort), unit + integration tests (Pytest), Dockerfiles for API and Streamlit.

Architecture at a Glance

flowchart LR
    Clients[[Clients]]
    API[FastAPI API]
    DB[(Postgres / SQLite)]
    Detectors[Detectors]
    RCA[RCA Engine]
    Evidently
    Streamlit[Streamlit Dashboard]
    Grafana
    OTel[[Grafana / OTel / Tempo]]

    Clients -->|HTTP| API
    API -->|Persist| DB
    API -->|Scores| Detectors
    API -->|Propagation| RCA
    API -->|Export| Evidently
    API -->|REST| Streamlit
    Streamlit -->|Dashboards| Grafana
    Streamlit -->|Traces & Metrics| OTel

Prerequisites

Python >= 3.12
pip/virtualenv
Optional: Docker + Docker Compose, Tempo, Prometheus, Grafana

Quickstart (No Re-training)

Clone the repository, ensure data/aiops.db is tracked (contains pre-computed NAB anomalies), then:

python -m venv .venv
. .venv/Scripts/activate        # Windows PowerShell: .\.venv\Scripts\Activate.ps1
make setup                      # install dependencies (editable mode)

make run-api                    # start FastAPI with auto reload
make streamlit                  # launch the dashboard (API_BASE_URL defaults to http://localhost:8000)

_{Streamlit service drill-down with conformal prediction band and anomaly markers.}

make setup installs the full dev toolchain, including pmdarima and statsforecast (AutoARIMA + conformal extras). On Windows, make sure Microsoft C++ Build Tools are available before running it.

The backend loads the NAB dependency graph from configs/data.yaml automatically. Existing anomalies and RCA scores stored in data/aiops.db become immediately visible in the UI.

Optional dependencies

If you prefer a runtime-only install (pip install -e .) you can add the forecast extras (AutoARIMA + StatsForecast) manually. make setup already pulls them in through the dev extras.

pip install -e .[forecast]

On Windows you may need Microsoft C++ Build Tools before installing pmdarima.

Make Targets Reference

Command	Description
`make setup`	Upgrade pip and install editable project with dev extras
`make run-api`	Launch FastAPI (reload when single worker, controlled by `UVICORN_*`)
`make streamlit`	Open Streamlit dashboard (`app/viz/dashboard.py`)
`make lint` / `fmt`	Ruff + Black + isort (check or apply)
`make test`	Pytest with coverage (`app` pkg)
`make seed-nab`	Normalise NAB CSVs into `data/nab/`
`make nab-detect`	Ingest/detect NAB services (baseline by default)
`make nab-showcase`	Full NAB ingest + detect + RCA summary + optional Evidently report
`make full-showcase`	Start API, apply graph, run showcase, optionally launch Streamlit
`make graph-nab`	Push `configs/graphs/nab.yaml` to the API
`make graph-default`	Restore microservices graph (`configs/graphs/microservices.yaml`)
`make simulate`	Generate synthetic metrics and run detection
`make report`	Generate Evidently drift report for `checkout`
`make docker-up`	Compose stack: API, Streamlit, Grafana
`make docker-down`	Stop and remove containers/volumes

All CLI flags exposed in scripts are documented via --help. make nab-detect/nab-showcase accept overrides for detector (forecast or baseline), services list, worker count, chunk size, anomaly limits, and report generation.

FastAPI Endpoints

Method & Path	Purpose
`GET /health`	Health probe
`POST /ingest/series`	Persist measurements for one service (`SeriesIngestRequest`)
`POST /detect/batch`	Run detector (baseline or forecast + conformal) on multiple series
`GET /anomalies`	Fetch stored anomalies (filter by service, since, limit)
`GET /graph`	Retrieve dependency graph (nodes + weighted edges)
`POST /graph`	Merge/normalise incoming graph payload
`GET /rca/topk`	Ranked RCA scores (default k=5)
`GET /measurements`	Fetch raw measurements (service, optional since, limit)
`GET /metrics`	Prometheus exposition (ingested points, anomaly count, detect latency)

API schemas live in app/models/schemas.py. Settings are controlled via .env (see .env.example) and YAML files in configs/.

Sample requests

curl -X POST http://localhost:8000/ingest/series \
  -H "Content-Type: application/json" \
  -d '{
        "service_id": "checkout",
        "points": [
          {"ts": "2025-01-01T12:00:00Z", "y": 123.4},
          {"ts": "2025-01-01T12:01:00Z", "y": 120.1}
        ]
      }'

curl -X POST http://localhost:8000/detect/batch \
  -H "Content-Type: application/json" \
  -d '{
        "series": [
          {
            "service_id": "checkout",
            "points": [
              {"ts": "2025-01-01T12:00:00Z", "y": 123.4},
              {"ts": "2025-01-01T12:01:00Z", "y": 120.1}
            ]
          }
        ],
        "detector": "forecast",
        "alpha": 0.1
      }'

# inspect latest measurements
curl "http://localhost:8000/measurements?service_id=checkout&limit=200"

Detection & RCA Pipeline

Ingestion: /ingest/series writes rows into measurements (SQLAlchemy models in app/io/writers.py).
Detection:
- Baseline: rolling median + MAD (app/detectors/baseline.py).
- Forecast: AutoARIMA (optional dependency) or Holt-Winters fallback + conformal calibration (app/detectors/forecast.py, app/detectors/conformal.py).
Persistence: detected anomalies stored in anomalies table, conformal ratios saved in api_state.local_scores.
RCA: weighted directed graph (app/rca/graph_builder.py) with exponential decay ranking (app/rca/rca_ranker.py).
Graph sources: configs/data.yaml (default), YAML files under configs/graphs/, or Tempo traces (app/io/otel_tempo.py) if enabled.

Bring Your Own Data

To analyse your own workloads:

Prepare the time series: each service metric needs an identifier (service_id) and a list of {ts, y} points with ISO8601 timestamps. You can ingest in bulk via /ingest/series (see cURL above) or adapt scripts/run_nab_ingest.py by pointing --data-dir to a folder of CSV files (column timestamp/value or ts/y).
Run detection: call /detect/batch or reuse run_nab_ingest.py with --services enumerating the CSV basenames. The detector flag toggles baseline vs forecast.
Customize the graph: adjust configs/data.yaml for static graphs, provide alternative YAML under configs/graphs/, or post new edges with scripts/apply_graph.py --graph <file>. If Tempo tracing is enabled (ENABLE_OTEL=true and TEMPO_BASE_URL set), the API merges live traces into the graph.
Tune settings: override detection batch sizes, RCA decay, database URLs, etc., through .env variables (see .env.example).

Once ingested, all anomalies/RCA scores become visible in the Streamlit dashboard and Grafana panels.

Streamlit Dashboard

Overview: raw /metrics output, active services list.
Service: raw measurements overlaid with predictions + conformal band, anomaly markers, key metrics (last anomaly, counts), optional residual view, adjustable history window.
Compare: multi-service overlay constrained to the common time window (slider), plus per-service anomaly/severity summary.
Graph: interactive PyVis network and expanded top-k RCA scores table.

Set API_BASE_URL to point to the FastAPI instance (defaults to http://localhost:8000).

Scripts & Pipelines

scripts/seed_nab.py: copy NAB Real Known Cause CSVs, normalise timestamps.
scripts/run_nab_ingest.py: threaded ingest+detect with configurable chunking and detector selection.
scripts/run_nab_showcase.py: orchestrate ingest, anomalies fetch, RCA fetch, metrics dump, optional Evidently report.
scripts/run_full_showcase.py: spin up uvicorn, apply graph (--graph-file), run showcase, optionally launch Streamlit.
scripts/apply_graph.py: POST a YAML graph payload to the API.
scripts/run_evidently_report.py: build Evidently drift report (reports/latest.html) retrieving measurements via SQL.
scripts/simulate_services.py: generate synthetic multi-service data and run detection once.
scripts/seed_yahoo_s5.py: ingest small Yahoo S5 subset.

Use these scripts directly or via the Makefile wrappers.

Configuration

Environment variables: .env or system env (Pydantic Settings). Key options include DB URLs, detector parameters, feature toggles.
- SQLite (default): USE_POSTGRES=false keeps everything in data/aiops.db.
- Postgres: set USE_POSTGRES=true and provide POSTGRES_HOST, POSTGRES_DB, POSTGRES_USER, POSTGRES_PASSWORD. Example:
```
USE_POSTGRES=true
POSTGRES_HOST=localhost
POSTGRES_DB=aiops
POSTGRES_USER=aiops
POSTGRES_PASSWORD=aiops
```
  Run make docker-up to start the bundled Postgres container and Grafana dashboards, or point to your own instance.
YAML:
- configs/app.yaml: app metadata, feature toggles, default ports, Postgres option, etc.
- configs/model.yaml: detector/conformal/baseline/RCA defaults.
- configs/data.yaml: default data sources and dependency graph (NAB loaded by default).
- configs/graphs/*.yaml: alternative graph topologies (NAB, microservices).
Grafana: dashboards under grafana/dashboards/, datasource definitions under grafana/provisioning/.

Data Artifacts

data/aiops.db: SQLite database pre-populated via NAB showcase. Version it if you want users to skip long runs.
data/nab/: normalised NAB CSVs (produced by make seed-nab).
reports/nab_summary.json: summary created by run_nab_showcase.py.
reports/latest.html: Evidently drift report (optional).

Testing & Quality

Run make test for the full suite or targeted pytest tests/test_api.py::test_ingest_detect_and_rca_flow.
Lint via make lint; auto-format with make fmt.
GitHub Actions (.github/workflows/ci.yml) performs lint, tests, and Docker image builds on main.

Docker Compose

docker-compose.yml spins up the API, Streamlit app, Grafana (with mounted dashboards), and supporting services. Use the Make targets for lifecycle management. Provide the .env file and optional volumes (data/, reports/) to persist state.

Prometheus and Tempo endpoints can be configured through .env (ENABLE_PROMETHEUS, PROMETHEUS_BASE_URL, ENABLE_OTEL, TEMPO_BASE_URL). When enabled, the API fetches metrics/traces from those systems, and Grafana dashboards (grafana/dashboards/) render real-time views alongside the anomalies stored locally.

Grafana default credentials: admin/admin (prompted to change on first login).
Set PROMETHEUS_BASE_URL or TEMPO_BASE_URL to remote instances if you want to reuse an existing observability stack.
Update grafana/provisioning/datasources/datasources.yaml if your Postgres/Prometheus endpoints differ from the defaults exposed by docker-compose.

Roadmap & Known Gaps

Enable Tempo/OTel by default for live graph reconstruction when traces are available.
Add deep-learning detectors (N-BEATS, PatchTST) behind optional extras.
Package reproducible demo datasets plus pre-built dashboards for easy sharing (e.g., Grafana JSON + Streamlit presets).
Extend Prometheus exposure (per-service counters, pipeline metrics).

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.github/workflows		.github/workflows
app		app
configs		configs
data		data
docs/assets		docs/assets
grafana		grafana
nab_dataset		nab_dataset
reports		reports
scripts		scripts
tests		tests
.env.example		.env.example
.gitignore		.gitignore
Dockerfile.api		Dockerfile.api
Dockerfile.streamlit		Dockerfile.streamlit
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AIOps Copilot

Highlights

Architecture at a Glance

Prerequisites

Quickstart (No Re-training)

Optional dependencies

Make Targets Reference

FastAPI Endpoints

Sample requests

Detection & RCA Pipeline

Bring Your Own Data

Streamlit Dashboard

Scripts & Pipelines

Configuration

Data Artifacts

Testing & Quality

Docker Compose

Roadmap & Known Gaps

About

Uh oh!

Releases

Packages

Languages

License

RAPHCVR/AIOps

Folders and files

Latest commit

History

Repository files navigation

AIOps Copilot

Highlights

Architecture at a Glance

Prerequisites

Quickstart (No Re-training)

Optional dependencies

Make Targets Reference

FastAPI Endpoints

Sample requests

Detection & RCA Pipeline

Bring Your Own Data

Streamlit Dashboard

Scripts & Pipelines

Configuration

Data Artifacts

Testing & Quality

Docker Compose

Roadmap & Known Gaps

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages