Ability to data profile node outputs for creating data quality checks

<a href="https://github.com/skrawcz"><img src="https://avatars.githubusercontent.com/u/2328071?v=4" align="left" width="96" height="96" hspace="10"></img></a> **Issue by [skrawcz](https://github.com/skrawcz)**
_Tuesday Aug 02, 2022 at 17:00 GMT_
_Originally opened as https://github.com/stitchfix/hamilton/issues/165_

----

**Is your feature request related to a problem? Please describe.**
Data profiling is a way to help bootstrap creating data quality checks. 
Data profiling is also a way to facilitate data exploration, by providing summary statistics over data.

**Describe the solution you'd like**
A user should be able to profile their DAG, or a set of nodes, and get out some summary statistics.
Those statistics could then be used to bootstrap data quality, i.e. check_output(), decorators, but the output should be standalone.

**Describe alternatives you've considered**
Haven't considered many options. But there are a few libraries that do data profiling already.

**Additional context**
Systems like whylogs, great expectations, use profiling to help with the user experience.
Standalone libraries like https://github.com/capitalone/DataProfiler also exist.

https://github.com/stitchfix/hamilton/pull/149 does a little to prototype in this area too.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ability to data profile node outputs for creating data quality checks #40

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Ability to data profile node outputs for creating data quality checks #40

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions