Fix ggshield creating too large payloads #823

fnareoh · 2024-01-03T13:43:05Z

Aims at fixing the issue with scanning large documents where chunks can end up being larger than the maximum server payload, first reported in #555.

Depends on a change to py-gitguardian, see PR here, to pass the server payload to the GGClient.

To facilitate the computation of chunk size an utf8_encoded_size property was added and calls a new method _read_content.
Some of the code is a bit clunky with if self.content is None statement even tough we just called a read method to make sure it wouldn't be None, but this way pyright doesn't complain.

I left a constant margin in the chunk size to encode the metadata and tested it with a scan of 10 000 files of 1Kb but it might be better to have it adapt depending on the number of files in the chunk ?

agateau-gg

Made a few minor remarks, but this looks good! I like that it moves some duplicated code from Scannable concrete classes up to Scannable itself.

I think you need to rebase your py-gitguardian branch on top of the default branch to get the latest py-gitguardian changes because other parts of GGShield depend on it.

ggshield/core/scan/scannable.py

ggshield/core/scan/commit_utils.py

ggshield/core/scan/scannable.py

ggshield/verticals/secret/secret_scanner.py

codecov-commenter · 2024-01-11T13:28:37Z

Codecov Report

Attention: 5 lines in your changes are missing coverage. Please review.

Comparison is base (68180d6) 91.60% compared to head (a3bab0f) 91.54%.

Files	Patch %	Lines
ggshield/core/scan/scannable.py	80.95%	4 Missing ⚠️
ggshield/verticals/secret/secret_scanner.py	85.71%	1 Missing ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #823      +/-   ##
==========================================
- Coverage   91.60%   91.54%   -0.06%     
==========================================
  Files         168      168              
  Lines        6929     6941      +12     
==========================================
+ Hits         6347     6354       +7     
- Misses        582      587       +5

Flag	Coverage Δ
unittests	`91.54% <86.11%> (-0.06%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Also adds some abstract proerties to Scannable.

…ng-too-large-payloads

agateau-gg

Looks good now!

fnareoh requested a review from agateau-gg January 9, 2024 14:52

agateau-gg requested changes Jan 9, 2024

View reviewed changes

fnareoh added 2 commits January 11, 2024 15:17

fix(secret_scanner): adapt chunk size to server payload

a8069a5

Also adds some abstract proerties to Scannable.

Merge branch 'main' into garancegourdel/scrt-3925-fix-ggshield-creati…

a3bab0f

…ng-too-large-payloads

agateau-gg approved these changes Jan 11, 2024

View reviewed changes

agateau-gg merged commit b8aff7d into GitGuardian:main Jan 11, 2024
21 of 22 checks passed

testworksau mentioned this pull request Apr 19, 2024

SSLEOFError when scanning a windows docker image #555

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix ggshield creating too large payloads #823

Fix ggshield creating too large payloads #823

fnareoh commented Jan 3, 2024 •

edited

Loading

agateau-gg left a comment

codecov-commenter commented Jan 11, 2024 •

edited

Loading

agateau-gg left a comment

Fix ggshield creating too large payloads #823

Fix ggshield creating too large payloads #823

Conversation

fnareoh commented Jan 3, 2024 • edited Loading

agateau-gg left a comment

Choose a reason for hiding this comment

codecov-commenter commented Jan 11, 2024 • edited Loading

Codecov Report

agateau-gg left a comment

Choose a reason for hiding this comment

fnareoh commented Jan 3, 2024 •

edited

Loading

codecov-commenter commented Jan 11, 2024 •

edited

Loading