-
Notifications
You must be signed in to change notification settings - Fork 154
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix ggshield creating too large payloads #823
Fix ggshield creating too large payloads #823
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made a few minor remarks, but this looks good! I like that it moves some duplicated code from Scannable concrete classes up to Scannable itself.
I think you need to rebase your py-gitguardian branch on top of the default branch to get the latest py-gitguardian changes because other parts of GGShield depend on it.
Codecov ReportAttention:
❗ Your organization needs to install the Codecov GitHub app to enable full functionality. Additional details and impacted files@@ Coverage Diff @@
## main #823 +/- ##
==========================================
- Coverage 91.60% 91.54% -0.06%
==========================================
Files 168 168
Lines 6929 6941 +12
==========================================
+ Hits 6347 6354 +7
- Misses 582 587 +5
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
Also adds some abstract proerties to Scannable.
…ng-too-large-payloads
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good now!
Aims at fixing the issue with scanning large documents where chunks can end up being larger than the maximum server payload, first reported in #555.
Depends on a change to py-gitguardian, see PR here, to pass the server payload to the GGClient.
To facilitate the computation of chunk size an utf8_encoded_size property was added and calls a new method _read_content.
Some of the code is a bit clunky with
if self.content is None
statement even tough we just called a read method to make sure it wouldn't be None, but this way pyright doesn't complain.I left a constant margin in the chunk size to encode the metadata and tested it with a scan of 10 000 files of 1Kb but it might be better to have it adapt depending on the number of files in the chunk ?