Skip to content

Conversation

Copy link

Copilot AI commented Aug 25, 2025

This PR implements automatic pagination functionality for the CopernicusDataSearcher class to handle large datasets that exceed the OData API's 1000 result limit per request.

Problem

Previously, the searcher could only retrieve a maximum of 1000 results per query, even when the total available results exceeded this limit. Users had no way to access the complete dataset when searches returned more than 1000 products.

Solution

Added automatic pagination that triggers when count=True and the total result count exceeds the top parameter. The implementation:

  • Detects large datasets: When @odata.count > top, automatically initiates pagination
  • Uses OData $skip parameter: Makes sequential requests with $skip=1000, $skip=2000, etc.
  • Combines all results: Merges data from all paginated requests into a single DataFrame
  • Maintains backward compatibility: Existing code continues to work unchanged

Usage

# Enable pagination for complete dataset retrieval
searcher = CopernicusDataSearcher()
searcher.query_by_filter(
    collection_name='SENTINEL-1',
    product_type='GRD',
    start_date='2022-05-03T00:00:00.000Z',
    end_date='2022-05-03T12:00:00.000Z',
    count=True  # Enables automatic pagination
)

# Automatically retrieves ALL results, not just first 1000
df = searcher.execute_query()  # Could return 2500+ results via multiple API calls

Implementation Details

The solution adds a _execute_paginated_query() helper method that:

  • Uses the existing top parameter as the page size (default: 1000)
  • Constructs paginated URLs with appropriate $skip values
  • Handles network errors gracefully by returning partial results
  • Preserves all existing query filters and sorting

Example paginated requests generated:

GET /Products?$filter=...&$top=1000&$count=true
GET /Products?$filter=...&$top=1000&$skip=1000
GET /Products?$filter=...&$top=1000&$skip=2000

Testing

Added comprehensive test suite covering:

  • Pagination triggering logic
  • Correct $skip parameter usage
  • Backward compatibility preservation
  • Error handling during pagination
  • Large dataset scenarios (2500+ results)

All existing functionality remains unchanged when count=False (default behavior).


💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Copilot AI changed the title [WIP] Coming soon: Implement a count option for search results more than 1000. In case the count is more than 1000, you will have to lunch more queries with the pagin mechanism: paginating through the results like here:: https://catalogue.dataspace.coperni... Implement automatic pagination for search results exceeding 1000 items Aug 25, 2025
Copilot AI requested a review from sirbastiano August 25, 2025 14:03
Copilot finished work on behalf of sirbastiano August 25, 2025 14:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants