Skip to content

refactor: Introduce new Apify storage client #470

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 24 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
5c437c9
Rm old Apify storage clients
vdusek Apr 28, 2025
bf55338
Add init version of new Apify storage clients
vdusek May 9, 2025
6b2f82b
Move specific models from Crawlee to SDK
vdusek Jun 12, 2025
38bef68
Adapt to Crawlee v1
vdusek Jun 18, 2025
1f85430
Adapt to Crawlee v1 (p2)
vdusek Jun 23, 2025
a3d68a2
Fix default storage IDs
vdusek Jun 25, 2025
c77e8d5
Fix integration test and Not implemented exception in purge
vdusek Jun 26, 2025
8731aff
Fix unit tests
vdusek Jun 26, 2025
8dfaffb
fix lint
vdusek Jun 26, 2025
53fad07
add KVS record_exists not implemented
vdusek Jun 26, 2025
5869f8e
update to apify client 1.12 and implement record exists
vdusek Jun 26, 2025
82e65fc
Move default storage IDs to Configuration
vdusek Jun 27, 2025
8de950b
opening storages get default id from config
vdusek Jun 27, 2025
98b76c5
Addressing more feedback
vdusek Jun 27, 2025
7b5ee07
Fixing integration test test_push_large_data_chunks_over_9mb
vdusek Jun 27, 2025
afcb8c7
Abstract open method is removed from storage clients
vdusek Jun 30, 2025
3bacab7
fixing generate public url for KVS records
vdusek Jun 30, 2025
287a119
add async metadata getters
vdusek Jul 1, 2025
e45d65b
Merge branch 'master' into new-apify-storage-clients
vdusek Jul 1, 2025
51178ca
better usage of apify config
vdusek Jul 1, 2025
3cd7dfe
renaming
vdusek Jul 2, 2025
6fe9eb3
Merge branch 'master' into new-apify-storage-clients
vdusek Jul 3, 2025
1547cbd
fixes after merge commit
vdusek Jul 3, 2025
bb47efc
Merge branch 'master' into new-apify-storage-clients
vdusek Jul 4, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -26,13 +26,13 @@ type-check:
uv run mypy

unit-tests:
uv run pytest --numprocesses=auto --verbose --cov=src/apify tests/unit
uv run pytest --numprocesses=auto -vv --cov=src/apify tests/unit

unit-tests-cov:
uv run pytest --numprocesses=auto --verbose --cov=src/apify --cov-report=html tests/unit
uv run pytest --numprocesses=auto -vv --cov=src/apify --cov-report=html tests/unit

integration-tests:
uv run pytest --numprocesses=$(INTEGRATION_TESTS_CONCURRENCY) --verbose tests/integration
uv run pytest --numprocesses=$(INTEGRATION_TESTS_CONCURRENCY) -vv tests/integration

format:
uv run ruff check --fix
Expand Down
4 changes: 2 additions & 2 deletions docs/03_concepts/code/03_dataset_exports.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,14 +11,14 @@ async def main() -> None:
await dataset.export_to(
content_type='csv',
key='data.csv',
to_key_value_store_name='my-cool-key-value-store',
to_kvs_name='my-cool-key-value-store',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this BC break worth it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's evaluate all the potential BCs at the end

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. I thought we are nearing that now 😁

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we're just re-exporting the storages from crawlee here, there will be many more cases than this one. I'm not saying we have to rename this particular argument (and I will undo it if you insist—just I don't like those long identifiers, especially when we can use the KVS abbreviation).

)

# Export the data as JSON
await dataset.export_to(
content_type='json',
key='data.json',
to_key_value_store_name='my-cool-key-value-store',
to_kvs_name='my-cool-key-value-store',
)

# Print the exported records
Expand Down
4 changes: 2 additions & 2 deletions docs/03_concepts/code/conditional_actor_charge.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,8 @@ async def main() -> None:
# Check the dataset because there might already be items
# if the run migrated or was restarted
default_dataset = await Actor.open_dataset()
dataset_info = await default_dataset.get_info()
charged_items = dataset_info.item_count if dataset_info else 0
metadata = await default_dataset.get_metadata()
charged_items = metadata.item_count

# highlight-start
if Actor.get_charging_manager().get_pricing_info().is_pay_per_event:
Expand Down
9 changes: 7 additions & 2 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -34,9 +34,10 @@ keywords = [
"scraping",
]
dependencies = [
"apify-client>=1.11.0",
"apify-client>=1.12.0",
"apify-shared>=1.3.0",
"crawlee~=0.6.0",
"cachetools>=5.5.0",
"crawlee@git+https://github.com/apify/crawlee-python.git@9dfac4b8afb8027979d85947f0db303f384b7158",
"cryptography>=42.0.0",
"httpx>=0.27.0",
# TODO: ensure compatibility with the latest version of lazy-object-proxy
Expand Down Expand Up @@ -76,11 +77,15 @@ dev = [
"respx~=0.22.0",
"ruff~=0.12.0",
"setuptools", # setuptools are used by pytest but not explicitly required
"types-cachetools>=6.0.0.20250525",
]

[tool.hatch.build.targets.wheel]
packages = ["src/apify"]

[tool.hatch.metadata]
allow-direct-references = true

[tool.ruff]
line-length = 120
include = ["src/**/*.py", "tests/**/*.py", "docs/**/*.py", "website/**/*.py"]
Expand Down
4 changes: 2 additions & 2 deletions src/apify/_actor.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,8 +33,8 @@
from apify._platform_event_manager import EventManager, LocalEventManager, PlatformEventManager
from apify._proxy_configuration import ProxyConfiguration
from apify._utils import docs_group, docs_name, get_system_info, is_running_in_ipython
from apify.apify_storage_client import ApifyStorageClient
from apify.log import _configure_logging, logger
from apify.storage_clients import ApifyStorageClient
from apify.storages import Dataset, KeyValueStore, RequestQueue

if TYPE_CHECKING:
Expand Down Expand Up @@ -89,7 +89,7 @@ def __init__(

# Create an instance of the cloud storage client, the local storage client is obtained
# from the service locator.
self._cloud_storage_client = ApifyStorageClient.from_config(config=self._configuration)
self._cloud_storage_client = ApifyStorageClient()

# Set the event manager based on whether the Actor is running on the platform or locally.
self._event_manager = (
Expand Down
33 changes: 33 additions & 0 deletions src/apify/_configuration.py
Original file line number Diff line number Diff line change
Expand Up @@ -140,6 +140,39 @@ class Configuration(CrawleeConfiguration):
),
] = None

default_dataset_id: Annotated[
str,
Field(
validation_alias=AliasChoices(
'actor_default_dataset_id',
'apify_default_dataset_id',
),
description='Default dataset ID used by the Apify storage client when no ID or name is provided.',
),
] = 'default'

default_key_value_store_id: Annotated[
str,
Field(
validation_alias=AliasChoices(
'actor_default_key_value_store_id',
'apify_default_key_value_store_id',
),
description='Default key-value store ID for the Apify storage client when no ID or name is provided.',
),
] = 'default'

default_request_queue_id: Annotated[
str,
Field(
validation_alias=AliasChoices(
'actor_default_request_queue_id',
'apify_default_request_queue_id',
),
description='Default request queue ID for the Apify storage client when no ID or name is provided.',
),
] = 'default'

disable_outdated_warning: Annotated[
bool,
Field(
Expand Down
3 changes: 2 additions & 1 deletion src/apify/_proxy_configuration.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,8 @@

if TYPE_CHECKING:
from apify_client import ApifyClientAsync
from crawlee import Request

from apify import Request

APIFY_PROXY_VALUE_REGEX = re.compile(r'^[\w._~]+$')
COUNTRY_CODE_REGEX = re.compile(r'^[A-Z]{2}$')
Expand Down
3 changes: 0 additions & 3 deletions src/apify/apify_storage_client/__init__.py

This file was deleted.

72 changes: 0 additions & 72 deletions src/apify/apify_storage_client/_apify_storage_client.py

This file was deleted.

Loading