Skip to content

Conversation

vdusek
Copy link
Contributor

@vdusek vdusek commented Sep 15, 2025

Description

  • Implement support for NDUs (non-default unnamed storages) for Apify storage client.

Issues

Testing

  • New integration tests were implemented.

Checklist

  • CI passed

Manual testing Actor

import asyncio

from apify import Actor


async def main() -> None:
    async with Actor:
        cnt = await Actor.get_value('cnt', 0)
        cnt += 1

        Actor.log.info('Actor is running for the %d time', cnt)

        env_dict = Actor.get_env()
        env_dict = {
            'id': env_dict['id'],
            'build_id': env_dict['build_id'],
            'default_dataset_id': env_dict['default_dataset_id'],
            'default_key_value_store_id': env_dict['default_key_value_store_id'],
            'default_request_queue_id': env_dict['default_request_queue_id'],
        }
        Actor.log.info(f'Environment variables: {env_dict}')

        dataset_default = await Actor.open_dataset(force_cloud=True)
        dataset_alias = await Actor.open_dataset(force_cloud=True, alias='my-alias-dataset')
        dataset_alias_2 = await Actor.open_dataset(force_cloud=True, alias='my-alias-dataset-2')
        dataset_named = await Actor.open_dataset(force_cloud=True, name='my-named-dataset')

        Actor.log.info(f'dataset default ID: {dataset_default.id}')
        Actor.log.info(f'dataset alias ID: {dataset_alias.id}')
        Actor.log.info(f'dataset alias 2 ID: {dataset_alias_2.id}')
        Actor.log.info(f'dataset named ID: {dataset_named.id}')

        await dataset_default.push_data({'data': 'default'})
        await dataset_alias.push_data({'data': 'alias'})
        await dataset_alias_2.push_data({'data': 'alias 2'})
        await dataset_named.push_data({'data': 'named'})

        await asyncio.sleep(3)

        dataset_items_default = await dataset_default.list_items()
        dataset_items_alias = await dataset_alias.list_items()
        dataset_items_alias_2 = await dataset_alias_2.list_items()
        dataset_items_named = await dataset_named.list_items()

        Actor.log.info(f'Default dataset items: {dataset_items_default}')
        Actor.log.info(f'Alias dataset items: {dataset_items_alias}')
        Actor.log.info(f'Alias 2 dataset items: {dataset_items_alias_2}')
        Actor.log.info(f'Named dataset items: {dataset_items_named}')

        if cnt < 3:
            await Actor.set_value('cnt', cnt)
            await Actor.reboot()

        Actor.log.info('Actor is finishing...')
        await asyncio.sleep(3)

        env_dict = Actor.get_env()
        env_dict = {
            'id': env_dict['id'],
            'build_id': env_dict['build_id'],
            'default_dataset_id': env_dict['default_dataset_id'],
            'default_key_value_store_id': env_dict['default_key_value_store_id'],
            'default_request_queue_id': env_dict['default_request_queue_id'],
        }
        Actor.log.info(f'Environment variables: {env_dict}')


if __name__ == '__main__':
    asyncio.run(main())

Log:

2025-09-16T08:15:29.454Z ACTOR: Pulling Docker image of build Cs6vcRruiN3XWMBde from registry.
2025-09-16T08:15:31.429Z ACTOR: Creating Docker container.
2025-09-16T08:15:31.614Z ACTOR: Starting Docker container.
2025-09-16T08:15:32.780Z Actor is running on the Apify platform, `disable_browser_sandbox` was changed to True.
2025-09-16T08:15:32.783Z [apify] INFO  Initializing Actor...
2025-09-16T08:15:32.788Z [apify] INFO  System info ({"apify_sdk_version": "2.7.1", "apify_client_version": "2.1.0", "crawlee_version": "0.6.13b37", "python_version": "3.13.7", "os": "linux"})
2025-09-16T08:15:32.919Z [apify] INFO  Actor is running for the 1 time
2025-09-16T08:15:32.921Z [apify] INFO  Environment variables: {'id': 'yFiEdI2cQnAwgWuWL', 'build_id': 'Cs6vcRruiN3XWMBde', 'default_dataset_id': 'dzFyI0aGwQGby34fi', 'default_key_value_store_id': '2IMIBuOc6j7OJnhf0', 'default_request_queue_id': 'e498h6IN2aTatWSoN'}
2025-09-16T08:15:33.509Z [apify] INFO  dataset default ID: dzFyI0aGwQGby34fi
2025-09-16T08:15:33.511Z [apify] INFO  dataset alias ID: f7fgsLCbw2wsQ46pa
2025-09-16T08:15:33.512Z [apify] INFO  dataset alias 2 ID: tee4ve0yVg8VkTf5U
2025-09-16T08:15:33.514Z [apify] INFO  dataset named ID: 5derRGi9fgpeknbaH
2025-09-16T08:15:37.086Z [apify] INFO  Default dataset items: [{'data': 'default'}]
2025-09-16T08:15:37.087Z [apify] INFO  Alias dataset items: [{'data': 'alias'}]
2025-09-16T08:15:37.089Z [apify] INFO  Alias 2 dataset items: [{'data': 'alias 2'}]
2025-09-16T08:15:37.091Z [apify] INFO  Named dataset items: [{'data': 'named'}, {'data': 'named'}, {'data': 'named'}, {'data': 'named'}, {'data': 'named'}, {'data': 'named'}, {'data': 'named'}, {'data': 'named'}, {'data': 'named'}, {'data': 'named'}, {'data': 'named'}, {'data': 'named'}, {'data': 'named'}, {'data': 'named'}, {'data': 'named'}, {'data': 'named'}, {'data': 'named'}, {'data': 'named'}, {'data': 'named'}, {'data': 'named'}, {'data': 'named'}, {'data': 'named'}, {'data': 'named'}, {'data': 'named'}, {'data': 'named'}, {'data': 'named'}, {'data': 'named'}, {'data': 'named'}, {'data': 'named'}, {'data': 'named'}, {'data': 'named'}, {'data': 'named'}, {'data': 'named'}, {'data': 'named'}, {'data': 'named'}, {'data': 'named'}, {'data': 'named'}, {'data': 'named'}, {'data': 'named'}, {'data': 'named'}, {'data': 'named'}, {'data': 'named'}, {'data': 'named'}, {'data': 'named'}, {'data': 'named'}, {'data': 'named'}, {'data': 'named'}, {'data': 'named'}, {'data': 'named'}, {'data': 'named'}, ... [line-too-long]
2025-09-16T08:15:37.190Z ACTOR: Actor run will reboot.
2025-09-16T08:15:37.192Z ACTOR: Sending Docker container SIGTERM signal.
2025-09-16T08:15:37.221Z ACTOR: Run was rebooted.
2025-09-16T08:15:37.222Z ACTOR: Pulling Docker image of build Cs6vcRruiN3XWMBde from registry.
2025-09-16T08:15:37.224Z ACTOR: Creating Docker container.
2025-09-16T08:15:37.368Z ACTOR: Starting Docker container.
2025-09-16T08:15:38.375Z Actor is running on the Apify platform, `disable_browser_sandbox` was changed to True.
2025-09-16T08:15:38.377Z [apify] INFO  Initializing Actor...
2025-09-16T08:15:38.380Z [apify] INFO  System info ({"apify_sdk_version": "2.7.1", "apify_client_version": "2.1.0", "crawlee_version": "0.6.13b37", "python_version": "3.13.7", "os": "linux"})
2025-09-16T08:15:38.504Z [apify] INFO  Actor is running for the 2 time
2025-09-16T08:15:38.506Z [apify] INFO  Environment variables: {'id': 'yFiEdI2cQnAwgWuWL', 'build_id': 'Cs6vcRruiN3XWMBde', 'default_dataset_id': 'dzFyI0aGwQGby34fi', 'default_key_value_store_id': '2IMIBuOc6j7OJnhf0', 'default_request_queue_id': 'e498h6IN2aTatWSoN'}
2025-09-16T08:15:39.152Z [apify] INFO  dataset default ID: dzFyI0aGwQGby34fi
2025-09-16T08:15:39.154Z [apify] INFO  dataset alias ID: f7fgsLCbw2wsQ46pa
2025-09-16T08:15:39.156Z [apify] INFO  dataset alias 2 ID: tee4ve0yVg8VkTf5U
2025-09-16T08:15:39.158Z [apify] INFO  dataset named ID: 5derRGi9fgpeknbaH
2025-09-16T08:15:42.680Z [apify] INFO  Default dataset items: [{'data': 'default'}, {'data': 'default'}]
2025-09-16T08:15:42.682Z [apify] INFO  Alias dataset items: [{'data': 'alias'}, {'data': 'alias'}]
2025-09-16T08:15:42.684Z [apify] INFO  Alias 2 dataset items: [{'data': 'alias 2'}, {'data': 'alias 2'}]
2025-09-16T08:15:42.686Z [apify] INFO  Named dataset items: [{'data': 'named'}, {'data': 'named'}, {'data': 'named'}, {'data': 'named'}, {'data': 'named'}, {'data': 'named'}, {'data': 'named'}, {'data': 'named'}, {'data': 'named'}, {'data': 'named'}, {'data': 'named'}, {'data': 'named'}, {'data': 'named'}, {'data': 'named'}, {'data': 'named'}, {'data': 'named'}, {'data': 'named'}, {'data': 'named'}, {'data': 'named'}, {'data': 'named'}, {'data': 'named'}, {'data': 'named'}, {'data': 'named'}, {'data': 'named'}, {'data': 'named'}, {'data': 'named'}, {'data': 'named'}, {'data': 'named'}, {'data': 'named'}, {'data': 'named'}, {'data': 'named'}, {'data': 'named'}, {'data': 'named'}, {'data': 'named'}, {'data': 'named'}, {'data': 'named'}, {'data': 'named'}, {'data': 'named'}, {'data': 'named'}, {'data': 'named'}, {'data': 'named'}, {'data': 'named'}, {'data': 'named'}, {'data': 'named'}, {'data': 'named'}, {'data': 'named'}, {'data': 'named'}, {'data': 'named'}, {'data': 'named'}, {'data': 'named'}, ... [line-too-long]
2025-09-16T08:15:42.788Z ACTOR: Actor run will reboot.
2025-09-16T08:15:42.790Z ACTOR: Sending Docker container SIGTERM signal.
2025-09-16T08:15:42.811Z ACTOR: Run was rebooted.
2025-09-16T08:15:42.813Z ACTOR: Pulling Docker image of build Cs6vcRruiN3XWMBde from registry.
2025-09-16T08:15:42.815Z ACTOR: Creating Docker container.
2025-09-16T08:15:42.890Z ACTOR: Starting Docker container.
2025-09-16T08:15:44.101Z Actor is running on the Apify platform, `disable_browser_sandbox` was changed to True.
2025-09-16T08:15:44.108Z [apify] INFO  Initializing Actor...
2025-09-16T08:15:44.110Z [apify] INFO  System info ({"apify_sdk_version": "2.7.1", "apify_client_version": "2.1.0", "crawlee_version": "0.6.13b37", "python_version": "3.13.7", "os": "linux"})
2025-09-16T08:15:44.212Z [apify] INFO  Actor is running for the 3 time
2025-09-16T08:15:44.214Z [apify] INFO  Environment variables: {'id': 'yFiEdI2cQnAwgWuWL', 'build_id': 'Cs6vcRruiN3XWMBde', 'default_dataset_id': 'dzFyI0aGwQGby34fi', 'default_key_value_store_id': '2IMIBuOc6j7OJnhf0', 'default_request_queue_id': 'e498h6IN2aTatWSoN'}
2025-09-16T08:15:44.535Z [apify] INFO  dataset default ID: dzFyI0aGwQGby34fi
2025-09-16T08:15:44.537Z [apify] INFO  dataset alias ID: f7fgsLCbw2wsQ46pa
2025-09-16T08:15:44.539Z [apify] INFO  dataset alias 2 ID: tee4ve0yVg8VkTf5U
2025-09-16T08:15:44.541Z [apify] INFO  dataset named ID: 5derRGi9fgpeknbaH
2025-09-16T08:15:48.067Z [apify] INFO  Default dataset items: [{'data': 'default'}, {'data': 'default'}, {'data': 'default'}]
2025-09-16T08:15:48.069Z [apify] INFO  Alias dataset items: [{'data': 'alias'}, {'data': 'alias'}, {'data': 'alias'}]
2025-09-16T08:15:48.071Z [apify] INFO  Alias 2 dataset items: [{'data': 'alias 2'}, {'data': 'alias 2'}, {'data': 'alias 2'}]
2025-09-16T08:15:48.073Z [apify] INFO  Named dataset items: [{'data': 'named'}, {'data': 'named'}, {'data': 'named'}, {'data': 'named'}, {'data': 'named'}, {'data': 'named'}, {'data': 'named'}, {'data': 'named'}, {'data': 'named'}, {'data': 'named'}, {'data': 'named'}, {'data': 'named'}, {'data': 'named'}, {'data': 'named'}, {'data': 'named'}, {'data': 'named'}, {'data': 'named'}, {'data': 'named'}, {'data': 'named'}, {'data': 'named'}, {'data': 'named'}, {'data': 'named'}, {'data': 'named'}, {'data': 'named'}, {'data': 'named'}, {'data': 'named'}, {'data': 'named'}, {'data': 'named'}, {'data': 'named'}, {'data': 'named'}, {'data': 'named'}, {'data': 'named'}, {'data': 'named'}, {'data': 'named'}, {'data': 'named'}, {'data': 'named'}, {'data': 'named'}, {'data': 'named'}, {'data': 'named'}, {'data': 'named'}, {'data': 'named'}, {'data': 'named'}, {'data': 'named'}, {'data': 'named'}, {'data': 'named'}, {'data': 'named'}, {'data': 'named'}, {'data': 'named'}, {'data': 'named'}, {'data': 'named'}, ... [line-too-long]
2025-09-16T08:15:48.075Z [apify] INFO  Actor is finishing...
2025-09-16T08:15:51.068Z [apify] INFO  Environment variables: {'id': 'yFiEdI2cQnAwgWuWL', 'build_id': 'Cs6vcRruiN3XWMBde', 'default_dataset_id': 'dzFyI0aGwQGby34fi', 'default_key_value_store_id': '2IMIBuOc6j7OJnhf0', 'default_request_queue_id': 'e498h6IN2aTatWSoN'}
2025-09-16T08:15:51.070Z [apify] INFO  Exiting Actor ({"exit_code": 0})

Default KVS content under __STORAGE_ALIASES_MAPPING key:

{
  "alias-dataset-my-alias-dataset": "f7fgsLCbw2wsQ46pa",
  "alias-dataset-my-alias-dataset-2": "tee4ve0yVg8VkTf5U"
}

@vdusek vdusek added this to the 123rd sprint - Tooling team milestone Sep 15, 2025
@vdusek vdusek self-assigned this Sep 15, 2025
@vdusek vdusek added the t-tooling Issues with this label are in the ownership of the tooling team. label Sep 15, 2025
@github-actions github-actions bot added the tested Temporary label used only programatically for some analytics. label Sep 15, 2025
@vdusek vdusek added adhoc Ad-hoc unplanned task added during the sprint. enhancement New feature or request. labels Sep 15, 2025
@vdusek vdusek force-pushed the add-support-for-ndu branch from 759052c to ed78ac5 Compare September 15, 2025 12:31
Copy link
Contributor

@Pijukatel Pijukatel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caching will be updated in
#576
after this is merged: apify/crawlee-python#1386

@vdusek vdusek force-pushed the add-support-for-ndu branch from 698364c to 1008fa5 Compare September 16, 2025 07:58
@vdusek vdusek merged commit 8721ef5 into master Sep 16, 2025
44 of 46 checks passed
@vdusek vdusek deleted the add-support-for-ndu branch September 16, 2025 08:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
adhoc Ad-hoc unplanned task added during the sprint. enhancement New feature or request. t-tooling Issues with this label are in the ownership of the tooling team. tested Temporary label used only programatically for some analytics.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support for non-default unnamed storages
2 participants