Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEAT] Profiling: array size #2600

Open
samnlindsay opened this issue Feb 4, 2025 · 0 comments
Open

[FEAT] Profiling: array size #2600

samnlindsay opened this issue Feb 4, 2025 · 0 comments
Labels
enhancement New feature or request profiling

Comments

@samnlindsay
Copy link
Contributor

Is your proposal related to a problem?

When working with array columns it is useful to be aware of the distribution of number of elements in those arrays. If they're almost always single-valued, maybe worth reconsidering whether an array is necessary. If they can get too large, this should be flagged before developing a model where these large arrays could be problematic.

Describe the solution you'd like

Prototype chart generated in https://github.com/moj-analytical-services/data_linking/pull/795

Image

Describe alternatives you've considered

Additional context

Related to #1064 / #1397 - do we profile array contents at the moment? If not, a combined array profiling chart would be useful (i.e. size distribution + value distribution for each array column)

@samnlindsay samnlindsay added enhancement New feature or request profiling labels Feb 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request profiling
Projects
None yet
Development

No branches or pull requests

1 participant