Skip to content

Conversation

@Ishankoradia
Copy link
Contributor

@Ishankoradia Ishankoradia commented Oct 15, 2025

Summary by CodeRabbit

  • Improvements
    • Sources, destinations, and connections now display in alphabetical order for easier scanning.
    • Lists load up to 100 items per page, improving navigation in large workspaces.
    • More consistent list behavior across views with predictable ordering.
    • Smoother browsing when loading long lists.
    • No action required—changes apply automatically.

@coderabbitai
Copy link

coderabbitai bot commented Oct 15, 2025

Walkthrough

Added pagination (pageSize=100) and sorting (sortKey) parameters to list calls for sources, destinations, and web backend connections in airbyte_service.py, keeping existing response handling unchanged.

Changes

Cohort / File(s) Summary
Airbyte list pagination & sorting
ddpui/ddpairbyte/airbyte_service.py
Updated get_sources, get_destinations, and get_webbackend_connections to include pageSize=100 and sortKey (actorName_asc or connectionName_asc) in request payloads; response parsing unchanged.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Client
  participant AirbyteService
  participant AirbyteAPI as Airbyte API

  rect rgb(240,248,255)
  note right of AirbyteService: get_sources
  Client->>AirbyteService: get_sources(workspaceId)
  AirbyteService->>AirbyteAPI: POST /sources/list {workspaceId, pageSize:100, sortKey:"actorName_asc"}
  AirbyteAPI-->>AirbyteService: {sources:[...]}
  AirbyteService-->>Client: sources or []
  end

  rect rgb(245,255,240)
  note right of AirbyteService: get_destinations
  Client->>AirbyteService: get_destinations(workspaceId)
  AirbyteService->>AirbyteAPI: POST /destinations/list {workspaceId, pageSize:100, sortKey:"actorName_asc"}
  AirbyteAPI-->>AirbyteService: {destinations:[...]}
  AirbyteService-->>Client: destinations or []
  end

  rect rgb(255,250,240)
  note right of AirbyteService: get_webbackend_connections
  Client->>AirbyteService: get_webbackend_connections(workspaceId)
  AirbyteService->>AirbyteAPI: POST /web_backend/connections/list {workspaceId, pageSize:100, sortKey:"connectionName_asc"}
  AirbyteAPI-->>AirbyteService: {connections:[...]}
  AirbyteService-->>Client: connections or []
  end
Loading

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title succinctly conveys that the pull request increases the number of items fetched per page from Airbyte and directly reflects the addition of pagination parameters in the service calls, making it clear and focused on the primary change.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch hotfix/pagesize-for-airbyte-resources

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between d1d2e0b and 9be5234.

📒 Files selected for processing (1)
  • ddpui/ddpairbyte/airbyte_service.py (3 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • ddpui/ddpairbyte/airbyte_service.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: checks (3.10, 6)

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@Ishankoradia Ishankoradia requested a review from pradnk October 15, 2025 05:16
@codecov
Copy link

codecov bot commented Oct 15, 2025

Codecov Report

❌ Patch coverage is 66.66667% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 52.53%. Comparing base (b79234d) to head (9be5234).

Files with missing lines Patch % Lines
ddpui/ddpairbyte/airbyte_service.py 66.66% 1 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main    #1180   +/-   ##
=======================================
  Coverage   52.53%   52.53%           
=======================================
  Files          97       97           
  Lines       11463    11463           
=======================================
  Hits         6022     6022           
  Misses       5441     5441           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (1)
ddpui/ddpairbyte/airbyte_service.py (1)

668-678: Implement pagination for connection listing.

Both get_connections and get_webbackend_connections call the same connections/list endpoint without pagination, and get_webbackend_connections even includes a # TODO: move this to paginated apis. If these endpoints can return large result sets, add pagination parameters (e.g. pageSize, pageToken) to both functions.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between b79234d and d1d2e0b.

📒 Files selected for processing (1)
  • ddpui/ddpairbyte/airbyte_service.py (3 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: checks (3.10, 6)

Comment on lines +270 to +272
res = abreq(
"sources/list", {"workspaceId": workspace_id, "pageSize": 100, "sortKey": "actorName_asc"}
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Fixed page size can cause silent data truncation.

Setting pageSize: 100 without pagination logic means workspaces with more than 100 sources will return incomplete data. This could lead to sources being invisible in the UI or missing from operations.

Additionally, get_destinations and get_webbackend_connections have TODO comments indicating these should be moved to paginated APIs, but this function doesn't. Consider adding a similar TODO comment for consistency, or implement proper pagination that fetches all pages.

If immediate pagination implementation isn't feasible, at minimum:

  1. Add a TODO comment for tracking
  2. Document the 100-item limitation in the function docstring
  3. Consider logging a warning when exactly 100 items are returned (likely indicates truncation)

For a complete fix, implement pagination:

 def get_sources(workspace_id: str) -> List[Dict]:
     """Fetch all sources in an airbyte workspace"""
     if not isinstance(workspace_id, str):
         raise HttpError(400, "Invalid workspace ID")
 
+    all_sources = []
+    page_size = 100
+    offset = 0
+    
+    while True:
-    res = abreq(
-        "sources/list", {"workspaceId": workspace_id, "pageSize": 100, "sortKey": "actorName_asc"}
-    )
+        res = abreq(
+            "sources/list", 
+            {
+                "workspaceId": workspace_id, 
+                "pageSize": page_size, 
+                "sortKey": "actorName_asc",
+                "offset": offset
+            }
+        )
-    if "sources" not in res:
-        logger.error("Sources not found for workspace: %s", workspace_id)
-        raise HttpError(404, "sources not found for workspace")
-    return res
+        if "sources" not in res:
+            logger.error("Sources not found for workspace: %s", workspace_id)
+            raise HttpError(404, "sources not found for workspace")
+        
+        all_sources.extend(res["sources"])
+        
+        # Break if we got fewer items than page size (last page)
+        if len(res["sources"]) < page_size:
+            break
+            
+        offset += page_size
+    
+    return {"sources": all_sources}

Note: Verify if the Airbyte API uses offset or a different pagination mechanism (e.g., pageToken).

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
res = abreq(
"sources/list", {"workspaceId": workspace_id, "pageSize": 100, "sortKey": "actorName_asc"}
)
def get_sources(workspace_id: str) -> List[Dict]:
"""Fetch all sources in an airbyte workspace"""
if not isinstance(workspace_id, str):
raise HttpError(400, "Invalid workspace ID")
all_sources = []
page_size = 100
offset = 0
while True:
res = abreq(
"sources/list",
{
"workspaceId": workspace_id,
"pageSize": page_size,
"sortKey": "actorName_asc",
"offset": offset
}
)
if "sources" not in res:
logger.error("Sources not found for workspace: %s", workspace_id)
raise HttpError(404, "sources not found for workspace")
all_sources.extend(res["sources"])
# Break if we got fewer items than page size (last page)
if len(res["sources"]) < page_size:
break
offset += page_size
return {"sources": all_sources}
🤖 Prompt for AI Agents
In ddpui/ddpairbyte/airbyte_service.py around lines 270 to 272, the call to
abreq uses a fixed pageSize=100 which can silently truncate results for
workspaces with >100 sources; either implement proper pagination to iterate all
pages according to Airbyte's pagination scheme (confirm whether it uses
offset/page/limit or pageToken and accumulate results), or at minimum add a TODO
noting pagination is required, update the function docstring to document the
100-item limitation, and add a warning log when the API returns exactly 100
items (indicating likely truncation) so truncation is detectable in logs.

Comment on lines +533 to +537
# TODO: move this to paginated apis
res = abreq(
"destinations/list",
{"workspaceId": workspace_id, "pageSize": 100, "sortKey": "actorName_asc"},
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Same pagination limitation applies here.

Like get_sources, this has the same critical issue where workspaces with more than 100 destinations will return incomplete data. The TODO comment acknowledges this is temporary, but ensure this is tracked in your issue tracker to prevent it from being forgotten.

Consider applying the same pagination solution as suggested for get_sources, adapting it for destinations. Also verify if the Airbyte API returns pagination metadata (like hasNext or totalCount) that could help implement robust pagination.

Comment on lines +686 to +690
# TODO: move this to paginated apis
res = abreq(
"web_backend/connections/list",
{"workspaceId": workspace_id, "pageSize": 100, "sortKey": "connectionName_asc"},
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Pagination limitation in web backend connections.

This function has the same critical issue: workspaces with more than 100 connections will have incomplete data. The sort key connectionName_asc is appropriate for connections (vs actorName_asc for sources/destinations).

Note that unlike the other functions, this returns res["connections"] directly rather than the full response object. When implementing proper pagination, ensure the return type remains consistent.

🤖 Prompt for AI Agents
In ddpui/ddpairbyte/airbyte_service.py around lines 686-690, the code only
fetches the first 100 connections which drops data for workspaces with >100
connections; update this to call the "web_backend/connections/list" endpoint in
a paginated loop using pageSize (keep 100) and the pageToken/next page mechanism
returned by the API while preserving workspaceId and
sortKey="connectionName_asc", accumulate all res["connections"] across pages and
return the combined list (keep the current return type of res["connections"]);
ensure the loop stops when no next page token is returned and propagate any API
errors as before.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants