Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix ggshield creating too large payloads #823

Merged
merged 2 commits into from
Jan 11, 2024
Merged

Fix ggshield creating too large payloads #823

merged 2 commits into from
Jan 11, 2024

Conversation

fnareoh
Copy link
Contributor

@fnareoh fnareoh commented Jan 3, 2024

Aims at fixing the issue with scanning large documents where chunks can end up being larger than the maximum server payload, first reported in #555.

Depends on a change to py-gitguardian, see PR here, to pass the server payload to the GGClient.

To facilitate the computation of chunk size an utf8_encoded_size property was added and calls a new method _read_content.
Some of the code is a bit clunky with if self.content is None statement even tough we just called a read method to make sure it wouldn't be None, but this way pyright doesn't complain.

I left a constant margin in the chunk size to encode the metadata and tested it with a scan of 10 000 files of 1Kb but it might be better to have it adapt depending on the number of files in the chunk ?

@fnareoh fnareoh requested a review from agateau-gg January 9, 2024 14:52
Copy link
Collaborator

@agateau-gg agateau-gg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made a few minor remarks, but this looks good! I like that it moves some duplicated code from Scannable concrete classes up to Scannable itself.

I think you need to rebase your py-gitguardian branch on top of the default branch to get the latest py-gitguardian changes because other parts of GGShield depend on it.

ggshield/core/scan/scannable.py Outdated Show resolved Hide resolved
ggshield/core/scan/scannable.py Outdated Show resolved Hide resolved
ggshield/core/scan/commit_utils.py Outdated Show resolved Hide resolved
ggshield/core/scan/scannable.py Outdated Show resolved Hide resolved
ggshield/core/scan/scannable.py Outdated Show resolved Hide resolved
ggshield/verticals/secret/secret_scanner.py Outdated Show resolved Hide resolved
@codecov-commenter
Copy link

codecov-commenter commented Jan 11, 2024

Codecov Report

Attention: 5 lines in your changes are missing coverage. Please review.

Comparison is base (68180d6) 91.60% compared to head (a3bab0f) 91.54%.

Files Patch % Lines
ggshield/core/scan/scannable.py 80.95% 4 Missing ⚠️
ggshield/verticals/secret/secret_scanner.py 85.71% 1 Missing ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #823      +/-   ##
==========================================
- Coverage   91.60%   91.54%   -0.06%     
==========================================
  Files         168      168              
  Lines        6929     6941      +12     
==========================================
+ Hits         6347     6354       +7     
- Misses        582      587       +5     
Flag Coverage Δ
unittests 91.54% <86.11%> (-0.06%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Collaborator

@agateau-gg agateau-gg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good now!

@agateau-gg agateau-gg merged commit b8aff7d into GitGuardian:main Jan 11, 2024
21 of 22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants