Skip to content

DataScan count method does not respect limit #2121

@jayceslesar

Description

@jayceslesar

Apache Iceberg version

0.9.1 (latest release)

Please describe the bug 🐞

When calling count() on a DataScan, limit is not respected. Seems trivial but if I set a limit of 5 I expect 5 or less rows back, at least with a scan-like implementation

The underlying ArrowScan does not get passed the limit param

https://github.com/apache/iceberg-python/blob/main/pyiceberg/table/__init__.py#L1940

This results in scans taking longer due to not respecting the limit.

The fix will involve more than just passing the limit to the ArrowScan

Willingness to contribute

  • I can contribute a fix for this bug independently
  • I would be willing to contribute a fix for this bug with guidance from the Iceberg community
  • I cannot contribute a fix for this bug at this time

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions