-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Full fledged processing of protection profiles #466
Full fledged processing of protection profiles #466
Conversation
@J08nY first batch of commits that refactored Some of my early design notes:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks OK. But still has conflicts with main. Is the merge commit a real merge commit?
Meh, something was left out, should be fixed by now. |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #466 +/- ##
==========================================
+ Coverage 68.55% 69.36% +0.82%
==========================================
Files 62 69 +7
Lines 7934 8341 +407
==========================================
+ Hits 5438 5785 +347
- Misses 2496 2556 +60 ☔ View full report in Codecov by Sentry. |
Hey, the initial draft of the functionality is implemented. Some notes below. Sample usageCreate and fully process PP datasetpp_dset = ProtectionProfileDataset(root_dir="/path/to/pp/directory")
pp_dset.get_certs_from_web()
pp_dset.process_auxiliary_datasets()
pp_dset.download_all_artifacts()
pp_dset.convert_all_pdfs()
pp_dset.analyze_certificates() Acess to PP Dataset from CC Dataset
cc_dset.process_auxiliary_dataset(processed_pp_dataset_root_dir="/path/to/pp/directory) In such case, Alternatively, When Notes on PP processing
Next steps
|
src/sec_certs/configuration.py
Outdated
pp_latest_full_archive: AnyHttpUrl = Field( | ||
"https://sec-certs.org/cc/pp.tar.gz", | ||
description="URL from where to fetch the latest full archive of fully processed PP dataset.", | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The pp_latest_snapshot
config also needs to change. It will no longer live on the /static/
subdir. But have the same layout as the CC and FIPS datasets. Could you make the change pls?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I wanted to discuss this first before changing this.
Could you please add some test(s) that run the PP pipeline at least once? I.e. improve the coverage in the |
Sure 🙃 , see
|
Regression testsOLD APPROACH
- After `get_certs_from_web()`
--------------------`
- # PP-rich certs: 3288
- # PP links: 4291
- # Unique PP links: 269
--------------------
- # PP-rich certs: 3288
- # PP links: 4291
- # Unique PP links: 269 NEW APPROACH
- After `get_certs_from_web()`
--------------------
- # PP-rich certs: 3259
-# PP links: 4292
- # Unique PP links: 266
- After processing ProtectionProfileDataset
--------------------
- # PP-rich certs: 3212
-# PP links: 4232
- # Unique PP links: 264 The decreased number of PPs in the new approach is explained by:
|
@J08nY the requested changes should now be incorporated. I will further check that notebooks work. I will also flag the You can start working on the integration. I suggest merging this only once we're confident we can deploy this to sec-certs.org. |
I guess we could do that, but my workflow kind of requires that the web is developed against the main branch here, but I will see. I mean real issues will only be visible after deploy anyway. |
Could you make the pp URLs have /pp/ in the url? To align with the other blueprints on the web: /cc/ and /fips/. |
If we merge to main asynchronously, then we have broken API for a while. I would appreciate if you try to deploy from here. Once it all works, we can merge to main and create a new release. |
What do you mean by broken API? What exactly breaks? |
Well, |
Hi, final batch of updates is here.
@J08nY I consider this done from my side. I'll be just assisting with integration and implementation of review requests. |
I am going to create a new PR based on a branch n which I rebased all of this on top of the current main to get rid of the messed up merge commit and have nicer history. |
Closes #72