Skip to content

Add CAG validation to synthesizer.validate #2480

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 14 commits into from
May 14, 2025

Conversation

R-Palazzo
Copy link
Contributor

CU-86b4pmjph
Resolve #2470

@R-Palazzo R-Palazzo self-assigned this Apr 24, 2025
@sdv-team
Copy link
Contributor

Copy link

codecov bot commented Apr 24, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 83.77%. Comparing base (e7fed1c) to head (5b86f37).
Report is 3 commits behind head on feature/single-table-CAG.

❗ There is a different number of reports uploaded between BASE (e7fed1c) and HEAD (5b86f37). Click for more details.

HEAD has 1 upload less than BASE
Flag BASE (e7fed1c) HEAD (5b86f37)
unit 1 0
Additional details and impacted files
@@                      Coverage Diff                      @@
##           feature/single-table-CAG    #2480       +/-   ##
=============================================================
- Coverage                     98.55%   83.77%   -14.78%     
=============================================================
  Files                            68       68               
  Lines                          7045     7051        +6     
=============================================================
- Hits                           6943     5907     -1036     
- Misses                          102     1144     +1042     
Flag Coverage Δ
integration 83.77% <100.00%> (+0.32%) ⬆️
unit ?

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Base automatically changed from issue-24XX-move-multi-table-logic to feature/single-table-CAG April 28, 2025 18:41
@frances-h frances-h force-pushed the issue-2470-validation branch from 202aa7b to d955a9e Compare April 30, 2025 13:40
@pvk-developer pvk-developer force-pushed the feature/single-table-CAG branch from 12ec5bc to c2c3060 Compare April 30, 2025 17:42
@R-Palazzo R-Palazzo force-pushed the feature/single-table-CAG branch from c2c3060 to 258e10a Compare May 4, 2025 08:48
@R-Palazzo R-Palazzo force-pushed the issue-2470-validation branch 2 times, most recently from b62ec68 to 14d9936 Compare May 5, 2025 16:41
@R-Palazzo R-Palazzo changed the base branch from feature/single-table-CAG to issue-2484-add-version-parameter-to-single-table-synthesizer-get-metadata May 5, 2025 16:42
@R-Palazzo R-Palazzo changed the base branch from issue-2484-add-version-parameter-to-single-table-synthesizer-get-metadata to feature/single-table-CAG May 5, 2025 16:43
@R-Palazzo R-Palazzo changed the base branch from feature/single-table-CAG to issue-2484-add-version-parameter-to-single-table-synthesizer-get-metadata May 5, 2025 16:44
@R-Palazzo R-Palazzo changed the base branch from issue-2484-add-version-parameter-to-single-table-synthesizer-get-metadata to issue-2484-add-version-parameter-to-single-table May 5, 2025 16:44
Base automatically changed from issue-2484-add-version-parameter-to-single-table to feature/single-table-CAG May 6, 2025 19:17
@R-Palazzo R-Palazzo force-pushed the issue-2470-validation branch from 14d9936 to 6e7813e Compare May 7, 2025 11:02
@R-Palazzo R-Palazzo marked this pull request as ready for review May 7, 2025 11:57
@R-Palazzo R-Palazzo requested a review from a team as a code owner May 7, 2025 11:57
@R-Palazzo R-Palazzo force-pushed the issue-2470-validation branch from 74f045e to 85b506c Compare May 8, 2025 16:31
@@ -775,31 +768,51 @@ def _transform_helper(self, data):

return data

def preprocess(self, data):
"""Transform the raw data to numerical space.
def _validate_cags(self, data):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this function just call the user-facing validate_cag function (made for synthetic data)?

def validate_cag(self, synthetic_data):

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On second thought, these two functions seem slightly different.

@R-Palazzo R-Palazzo requested review from amontanez24 and gsheni May 12, 2025 12:13
@frances-h frances-h force-pushed the feature/single-table-CAG branch from 9eba42f to 7347cc6 Compare May 12, 2025 15:16
@R-Palazzo R-Palazzo force-pushed the issue-2470-validation branch from 157d027 to 61cb10e Compare May 12, 2025 16:31
@R-Palazzo R-Palazzo force-pushed the issue-2470-validation branch from 61cb10e to c5b3e6f Compare May 13, 2025 12:59
Copy link
Contributor

@amontanez24 amontanez24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Just one question

@@ -133,6 +134,7 @@ def __init__(self, metadata, locales=['en_US'], synthesizer_kwargs=None):

self._initialize_models()
self._fitted = False
self._constraints_fitted = False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this parameter is not used at all for the sampling process right? I'm wondering if we need to worry about the backwards compatibility of this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point. It's not currently used for sampling and I checked that all the tests are passing on enterprise when sdv points to this branch

@R-Palazzo R-Palazzo requested a review from amontanez24 May 13, 2025 18:15
@@ -233,27 +228,29 @@ def get_metadata(self, version='original'):

return Metadata.load_from_dict(self.metadata.to_dict())

def _transform_helper(self, data):
def _validate_transform_constraints(self, data):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If re-fitting a CAG is a possibility I would like to suggest a force_fit argument; Then we would call it always on the validate but not when we preprocess.

Here is why:
If I run synthesizer.fit(data) then I would like to 're-fit' because I realized I fitted with my data but not real_data, the constraints would have already been fitted with the data and won't be re-fitted ever.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a very good point, thanks! Done in 5b86f37

@R-Palazzo R-Palazzo requested a review from pvk-developer May 14, 2025 09:57
@R-Palazzo R-Palazzo merged commit 0a5f98e into feature/single-table-CAG May 14, 2025
45 checks passed
@R-Palazzo R-Palazzo deleted the issue-2470-validation branch May 14, 2025 11:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants