-
-
Notifications
You must be signed in to change notification settings - Fork 36
Fast version of mine cargo ( no checkpoints ) #746
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: ziad hany <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @ziadhany, see some suggestions below.
minecode_pipelines/miners/cargo.py
Outdated
PACKAGE_BATCH_SIZE = 500 | ||
COMMIT_BATCH_SIZE = 10 | ||
|
||
BATCH_SIZE = 1000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need to declare this globally
minecode_pipelines/miners/cargo.py
Outdated
COMMIT_BATCH_SIZE = 10 | ||
|
||
BATCH_SIZE = 1000 | ||
CARGO_CHECKPOINT_PATH = "cargo/checkpoints.json" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These checkpoint is no longer needed.
minecode_pipelines/miners/cargo.py
Outdated
purl_files = [] | ||
purls = [] | ||
|
||
for file_path in base_path.rglob("*"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doing blanket rglob can be problematic, this will also iterate .git
directory.
minecode_pipelines/miners/cargo.py
Outdated
}: | ||
continue | ||
|
||
logger(f"Processing file: {file_path}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not helpful, this will simply flood our log.
minecode_pipelines/miners/cargo.py
Outdated
result = store_cargo_packages(packages, cloned_data_repo) | ||
if result: | ||
purl_file, base_purl = result | ||
logger(f"Writing package URLs for package '{base_purl}' to {purl_file}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This log is not helpful
minecode_pipelines/miners/cargo.py
Outdated
purl_files = [] | ||
purls = [] | ||
|
||
for file_path in base_path.rglob("*"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also we should use LoopProgress
when mining purls from resource for proper progress indication.
Signed-off-by: Keshav Priyadarshi <[email protected]>
Signed-off-by: Keshav Priyadarshi <[email protected]>
No description provided.