airbyte pull more in a single page #1180

Ishankoradia · 2025-10-15T05:15:41Z

Summary by CodeRabbit

Improvements
- Sources, destinations, and connections now display in alphabetical order for easier scanning.
- Lists load up to 100 items per page, improving navigation in large workspaces.
- More consistent list behavior across views with predictable ordering.
- Smoother browsing when loading long lists.
- No action required—changes apply automatically.

coderabbitai · 2025-10-15T05:15:49Z

Walkthrough

Added pagination (pageSize=100) and sorting (sortKey) parameters to list calls for sources, destinations, and web backend connections in airbyte_service.py, keeping existing response handling unchanged.

Changes

Cohort / File(s)	Summary
Airbyte list pagination & sorting `ddpui/ddpairbyte/airbyte_service.py`	Updated get_sources, get_destinations, and get_webbackend_connections to include pageSize=100 and sortKey (actorName_asc or connectionName_asc) in request payloads; response parsing unchanged.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Client
  participant AirbyteService
  participant AirbyteAPI as Airbyte API

  rect rgb(240,248,255)
  note right of AirbyteService: get_sources
  Client->>AirbyteService: get_sources(workspaceId)
  AirbyteService->>AirbyteAPI: POST /sources/list {workspaceId, pageSize:100, sortKey:"actorName_asc"}
  AirbyteAPI-->>AirbyteService: {sources:[...]}
  AirbyteService-->>Client: sources or []
  end

  rect rgb(245,255,240)
  note right of AirbyteService: get_destinations
  Client->>AirbyteService: get_destinations(workspaceId)
  AirbyteService->>AirbyteAPI: POST /destinations/list {workspaceId, pageSize:100, sortKey:"actorName_asc"}
  AirbyteAPI-->>AirbyteService: {destinations:[...]}
  AirbyteService-->>Client: destinations or []
  end

  rect rgb(255,250,240)
  note right of AirbyteService: get_webbackend_connections
  Client->>AirbyteService: get_webbackend_connections(workspaceId)
  AirbyteService->>AirbyteAPI: POST /web_backend/connections/list {workspaceId, pageSize:100, sortKey:"connectionName_asc"}
  AirbyteAPI-->>AirbyteService: {connections:[...]}
  AirbyteService-->>Client: connections or []
  end

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check	✅ Passed	The title succinctly conveys that the pull request increases the number of items fetched per page from Airbyte and directly reflects the addition of pagination parameters in the service calls, making it clear and focused on the primary change.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch hotfix/pagesize-for-airbyte-resources

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between d1d2e0b and 9be5234.

📒 Files selected for processing (1)

ddpui/ddpairbyte/airbyte_service.py (3 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

ddpui/ddpairbyte/airbyte_service.py

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: checks (3.10, 6)

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

codecov · 2025-10-15T05:18:04Z

Codecov Report

❌ Patch coverage is 66.66667% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 52.53%. Comparing base (b79234d) to head (9be5234).

Files with missing lines	Patch %	Lines
ddpui/ddpairbyte/airbyte_service.py	66.66%	1 Missing ⚠️

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #1180   +/-   ##
=======================================
  Coverage   52.53%   52.53%           
=======================================
  Files          97       97           
  Lines       11463    11463           
=======================================
  Hits         6022     6022           
  Misses       5441     5441

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (1)

ddpui/ddpairbyte/airbyte_service.py (1)

668-678: Implement pagination for connection listing.

Both get_connections and get_webbackend_connections call the same connections/list endpoint without pagination, and get_webbackend_connections even includes a # TODO: move this to paginated apis. If these endpoints can return large result sets, add pagination parameters (e.g. pageSize, pageToken) to both functions.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between b79234d and d1d2e0b.

📒 Files selected for processing (1)

ddpui/ddpairbyte/airbyte_service.py (3 hunks)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: checks (3.10, 6)

coderabbitai · 2025-10-15T05:19:20Z

ddpui/ddpairbyte/airbyte_service.py

+    res = abreq(
+        "sources/list", {"workspaceId": workspace_id, "pageSize": 100, "sortKey": "actorName_asc"}
+    )


⚠️ Potential issue | 🟠 Major

Fixed page size can cause silent data truncation.

Setting pageSize: 100 without pagination logic means workspaces with more than 100 sources will return incomplete data. This could lead to sources being invisible in the UI or missing from operations.

Additionally, get_destinations and get_webbackend_connections have TODO comments indicating these should be moved to paginated APIs, but this function doesn't. Consider adding a similar TODO comment for consistency, or implement proper pagination that fetches all pages.

If immediate pagination implementation isn't feasible, at minimum:

Add a TODO comment for tracking

Document the 100-item limitation in the function docstring

Consider logging a warning when exactly 100 items are returned (likely indicates truncation)

For a complete fix, implement pagination:

def get_sources(workspace_id: str) -> List[Dict]: """Fetch all sources in an airbyte workspace""" if not isinstance(workspace_id, str): raise HttpError(400, "Invalid workspace ID") + all_sources = [] + page_size = 100 + offset = 0 + + while True: - res = abreq( - "sources/list", {"workspaceId": workspace_id, "pageSize": 100, "sortKey": "actorName_asc"} - ) + res = abreq( + "sources/list", + { + "workspaceId": workspace_id, + "pageSize": page_size, + "sortKey": "actorName_asc", + "offset": offset + } + ) - if "sources" not in res: - logger.error("Sources not found for workspace: %s", workspace_id) - raise HttpError(404, "sources not found for workspace") - return res + if "sources" not in res: + logger.error("Sources not found for workspace: %s", workspace_id) + raise HttpError(404, "sources not found for workspace") + + all_sources.extend(res["sources"]) + + # Break if we got fewer items than page size (last page) + if len(res["sources"]) < page_size: + break + + offset += page_size + + return {"sources": all_sources}

Note: Verify if the Airbyte API uses offset or a different pagination mechanism (e.g., pageToken).

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

res = abreq(

"sources/list", {"workspaceId": workspace_id, "pageSize": 100, "sortKey": "actorName_asc"}

)

def get_sources(workspace_id: str) -> List[Dict]:

"""Fetch all sources in an airbyte workspace"""

if not isinstance(workspace_id, str):

raise HttpError(400, "Invalid workspace ID")

all_sources = []

page_size = 100

offset = 0

while True:

res = abreq(

"sources/list",

{

"workspaceId": workspace_id,

"pageSize": page_size,

"sortKey": "actorName_asc",

"offset": offset

}

)

if "sources" not in res:

logger.error("Sources not found for workspace: %s", workspace_id)

raise HttpError(404, "sources not found for workspace")

all_sources.extend(res["sources"])

# Break if we got fewer items than page size (last page)

if len(res["sources"]) < page_size:

break

offset += page_size

return {"sources": all_sources}

🤖 Prompt for AI Agents

In ddpui/ddpairbyte/airbyte_service.py around lines 270 to 272, the call to abreq uses a fixed pageSize=100 which can silently truncate results for workspaces with >100 sources; either implement proper pagination to iterate all pages according to Airbyte's pagination scheme (confirm whether it uses offset/page/limit or pageToken and accumulate results), or at minimum add a TODO noting pagination is required, update the function docstring to document the 100-item limitation, and add a warning log when the API returns exactly 100 items (indicating likely truncation) so truncation is detectable in logs.

coderabbitai · 2025-10-15T05:19:20Z

ddpui/ddpairbyte/airbyte_service.py

+    # TODO: move this to paginated apis
+    res = abreq(
+        "destinations/list",
+        {"workspaceId": workspace_id, "pageSize": 100, "sortKey": "actorName_asc"},
+    )


⚠️ Potential issue | 🟠 Major

Same pagination limitation applies here.

Like get_sources, this has the same critical issue where workspaces with more than 100 destinations will return incomplete data. The TODO comment acknowledges this is temporary, but ensure this is tracked in your issue tracker to prevent it from being forgotten.

Consider applying the same pagination solution as suggested for get_sources, adapting it for destinations. Also verify if the Airbyte API returns pagination metadata (like hasNext or totalCount) that could help implement robust pagination.

coderabbitai · 2025-10-15T05:19:20Z

ddpui/ddpairbyte/airbyte_service.py

+    # TODO: move this to paginated apis
+    res = abreq(
+        "web_backend/connections/list",
+        {"workspaceId": workspace_id, "pageSize": 100, "sortKey": "connectionName_asc"},
+    )


⚠️ Potential issue | 🟠 Major

Pagination limitation in web backend connections.

This function has the same critical issue: workspaces with more than 100 connections will have incomplete data. The sort key connectionName_asc is appropriate for connections (vs actorName_asc for sources/destinations).

Note that unlike the other functions, this returns res["connections"] directly rather than the full response object. When implementing proper pagination, ensure the return type remains consistent.

🤖 Prompt for AI Agents

In ddpui/ddpairbyte/airbyte_service.py around lines 686-690, the code only fetches the first 100 connections which drops data for workspaces with >100 connections; update this to call the "web_backend/connections/list" endpoint in a paginated loop using pageSize (keep 100) and the pageToken/next page mechanism returned by the API while preserving workspaceId and sortKey="connectionName_asc", accumulate all res["connections"] across pages and return the combined list (keep the current return type of res["connections"]); ensure the loop stops when no next page token is returned and propagate any API errors as before.

pull more in a single page

d1d2e0b

Ishankoradia requested a review from pradnk October 15, 2025 05:16

coderabbitai bot reviewed Oct 15, 2025

View reviewed changes

minor

9be5234

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

airbyte pull more in a single page #1180

airbyte pull more in a single page #1180

Uh oh!

Ishankoradia commented Oct 15, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Oct 15, 2025 •

edited

Loading

Uh oh!

codecov bot commented Oct 15, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Oct 15, 2025

Uh oh!

coderabbitai bot Oct 15, 2025

Uh oh!

coderabbitai bot Oct 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

-    res = abreq(
-        "sources/list", {"workspaceId": workspace_id, "pageSize": 100, "sortKey": "actorName_asc"}
-    )
+ def get_sources(workspace_id: str) -> List[Dict]:
+     """Fetch all sources in an airbyte workspace"""
+     if not isinstance(workspace_id, str):
+         raise HttpError(400, "Invalid workspace ID")
+     all_sources = []
+     page_size = 100
+     offset = 0
+     while True:
+         res = abreq(
+             "sources/list",
+             {
+                 "workspaceId": workspace_id,
+                 "pageSize": page_size,
+                 "sortKey": "actorName_asc",
+                 "offset": offset
+             }
+         )
+         if "sources" not in res:
+             logger.error("Sources not found for workspace: %s", workspace_id)
+             raise HttpError(404, "sources not found for workspace")
+         all_sources.extend(res["sources"])
+         # Break if we got fewer items than page size (last page)
+         if len(res["sources"]) < page_size:
+             break
+         offset += page_size
+     return {"sources": all_sources}

airbyte pull more in a single page #1180

Are you sure you want to change the base?

airbyte pull more in a single page #1180

Uh oh!

Conversation

Ishankoradia commented Oct 15, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Oct 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Pre-merge checks and finishing touches

Uh oh!

codecov bot commented Oct 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 15, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 15, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 15, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Ishankoradia commented Oct 15, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Oct 15, 2025 •

edited

Loading

codecov bot commented Oct 15, 2025 •

edited

Loading