This file provides guidance to programming agents when working with code in this repository.
The Apify SDK for Python (apify package on PyPI) is the official library for creating Apify Actors in Python. It provides Actor lifecycle management, storage access (datasets, key-value stores, request queues), event handling, proxy configuration, and pay-per-event charging. It builds on top of the Crawlee web scraping framework and the Apify API Client. Supports Python 3.10–3.14. Build system: hatchling.
# Install dependencies (including dev)
uv sync --all-extras
# Install dev dependencies + pre-commit hooks
uv run poe install-dev
# Format code (also auto-fixes lint issues via ruff check --fix)
uv run poe format
# Lint (format check + ruff check)
uv run poe lint
# Type check
uv run poe type-check
# Run all checks (lint + type-check + unit tests)
uv run poe check-code
# Unit tests (no API token needed)
uv run poe unit-tests
# Run a single test file
uv run pytest tests/unit/actor/test_actor_lifecycle.py
# Run a single test by name
uv run pytest tests/unit/actor/test_actor_lifecycle.py -k "test_name"
# Integration tests (needs APIFY_TEST_USER_API_TOKEN)
uv run poe integration-tests
# E2E tests (needs APIFY_TEST_USER_API_TOKEN, builds/deploys Actors on platform)
uv run poe e2e-tests- Formatter/Linter: Ruff (line length 120, single quotes for inline, double quotes for docstrings)
- Type checker: ty (targets Python 3.10)
- All ruff rules enabled with specific ignores — see
pyproject.toml[tool.ruff.lint]for the full ignore list - Tests are exempt from docstring rules (
D), assert warnings (S101), and private member access (SLF001) - Unused imports are allowed in
__init__.pyfiles (re-exports) - Pre-commit hooks: lint check + type check run automatically on commit
-
_actor.py— The_ActorTypeclass is the central API.Actoris a lazy-object-proxy (lazy-object-proxy.Proxy) wrapping_ActorType— it acts as both a class (e.g.Actor.is_at_home()) and an instance-like context manager (async with Actor:). On__aenter__, the proxy's__wrapped__is replaced with the active_ActorTypeinstance. It manages the full Actor lifecycle (init,exit,fail), provides access to storages (open_dataset,open_key_value_store,open_request_queue), handles events, proxy configuration, charging, and platform API operations (start,call,metamorph,reboot). -
_configuration.py—Configurationextends Crawlee'sConfigurationwith Apify-specific settings (API URL, token, Actor run metadata, proxy settings, charging config). Configuration is populated from environment variables (APIFY_*). -
_charging.py— Pay-per-event billing system.ChargingManager/ChargingManagerImplementationhandle charging events against pricing info fetched from the API. -
_proxy_configuration.py—ProxyConfigurationmanages Apify proxy setup (residential, datacenter, groups, country targeting). -
_models.py— Pydantic models for API data structures (Actor runs, webhooks, pricing info, etc.).
Four storage client implementations, all implementing Crawlee's abstract storage client interface:
_apify/—ApifyStorageClient: talks to the Apify API for dataset, key-value store, and request queue operations (separate sub-clients for single vs. shared request queues). Used when running on the Apify platform._file_system/—FileSystemStorageClient(aliasApifyFileSystemStorageClient): extends Crawlee's file system client with Apify-specific key-value store behavior._smart_apify/—SmartApifyStorageClient: hybrid client that writes to both API and local file system for resilience.MemoryStorageClient— re-exported from Crawlee for in-memory storage.
Re-exports Crawlee's Dataset, KeyValueStore, and RequestQueue classes.
_apify_event_manager.py—ApifyEventManagerextends Crawlee's event system with platform-specific events received via WebSocket connection.
_apify_request_list.py—ApifyRequestListcreates request lists from Actor input URLs (supports both direct URLs and "requests from URL" sources).
Optional integration (apify[scrapy] extra) providing Scrapy scheduler, middlewares, pipelines, and extensions for running Scrapy spiders as Apify Actors.
crawlee— Base framework providing storage abstractions, event system, configuration, service locator patternapify-client— HTTP client for the Apify API (ApifyClientAsync)apify-shared— Shared constants and utilities (ApifyEnvVars,ActorEnvVars, etc.)
Three test levels in tests/:
unit/— Fast tests with no external dependencies. Use mocked API clients (ApifyClientAsyncPatcherfixture). Run withuv run poe unit-tests.integration/— Tests making real Apify API calls but not deploying Actors. RequiresAPIFY_TEST_USER_API_TOKEN. Run withuv run poe integration-tests.e2e/— Full end-to-end tests that build and deploy Actors on the platform. Slowest. RequiresAPIFY_TEST_USER_API_TOKEN. Usemake_actorandrun_actorfixtures. Run withuv run poe e2e-tests.
All test levels use pytest-asyncio with asyncio_mode = "auto" (no need for @pytest.mark.asyncio). Tests run in parallel via pytest-xdist (--numprocesses). Each test gets isolated state via the autouse _isolate_test_environment fixture which resets Actor, service_locator, and AliasResolver state. Conftest files live in each subdirectory (tests/unit/conftest.py, etc.) — there is no top-level tests/conftest.py.
apify_client_async_patcher(unit) —ApifyClientAsyncPatcherinstance for mockingApifyClientAsyncmethods. Patch bymethod/submethod, tracks call history in.calls.make_httpserver/httpserver(unit) — session-scopedHTTPServerviapytest-httpserverfor HTTP interception.apify_client_async(integration/e2e) — realApifyClientAsyncusingAPIFY_TEST_USER_API_TOKEN.make_actor(e2e) — creates a temporary Actor on the platform from a function,main_pystring, or source files dict; cleans up after the session.run_actor(e2e) — calls an Actor and waits up to 10 minutes for completion.