Replies: 3 comments 2 replies
-
👋 Assuming you don't want these to count towards download stats (I would assume not), you can actually do this straight off the crate index and static storage, at which point you are limited only by your download speed. (And latency. Probably more the latency, honestly. 🇳🇿) Short version:
Since none of this hits crates.io proper, the API rate limit doesn't apply. Normal GitHub rate limits apply to the first step, of course, but since you're probably just cloning and maybe occasionally pulling that repo, I can't imagine it would matter. The direct downloads off static.crates.io just go straight to our CDNs. I've actually been working on this problem myself recently (since I want to do security analysis of crates), and I've pulled out the relevant crates from my workspace and pushed them to https://github.com/LawnGnome/librarian if you just want to use/adapt that. (No judgement on the actual code quality! I haven't cleaned it up at all yet for a proper release.) But you could also easily do this with some judicious use of |
Beta Was this translation helpful? Give feedback.
-
Oh, this is excellent advice, thank you. I wasn't sure whether accessing
crates directly would be easier. It looks like it'll be much easier.
Thanks also for the pointer about librarian.
…On Tue, 19 Dec 2023, 2:31 pm Adam Harvey, ***@***.***> wrote:
👋
Assuming you don't want these to count towards download stats (I would
assume not), you can actually do this straight off the crate index and
static storage, at which point you are limited only by your download speed.
(And latency. Probably more the latency, honestly. 🇳🇿)
Short version:
1. Clone https://github.com/rust-lang/crates.io-index.
2. Iterate versions to build download URIs in the form
https://static.crates.io/crates/CRATE_NAME/CRATE_NAME-VERSION.crate.
3. Download crates.
4. ...
5. Profit!
Since none of this hits crates.io proper, the API rate limit doesn't
apply. Normal GitHub rate limits apply to the first step, of course, but
since you're probably just cloning and maybe occasionally pulling that
repo, I can't imagine it would matter. The direct downloads off
static.crates.io just go straight to our CDNs.
I've actually been working on this problem myself recently (since I want
to do security analysis of crates), and I've pulled out the relevant crates
from my workspace and pushed them to
https://github.com/LawnGnome/librarian if you just want to use/adapt
that. (No judgement on the actual code quality! I haven't cleaned it up at
all yet for a proper release.) But you could also easily do this with some
judicious use of git, jq, xargs, and curl in a shell script if this is
mostly a one-off.
—
Reply to this email directly, view it on GitHub
<#7759 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAAGZ4KDE6M5ZADZ7EXPP43YKDU6FAVCNFSM6AAAAABA2GCK6SVHI2DSMVQWIX3LMV43SRDJONRXK43TNFXW4Q3PNVWWK3TUHM3TQOJSGI3DQ>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
There is also this blog post about recommendations in the space. https://www.pietroalbini.org/blog/downloading-crates-io/ |
Beta Was this translation helpful? Give feedback.
-
Hi there crates.io team,
I would like to download and analyse many crates hosted on crates.io. This is a personal research project on the use of unsafe code blocks within the Rust community.
Here is the user-agent string that I'll be using to indicate my bot's usage:
Let me know if there is anything that I should be doing apart from the 1 Hz rate limit to keep resource usage low on your end. I am intending on accessing the
.crate
files viahttps://crates.io/api/v1/crates/<crate>/<version>/download
.Thank you!
-- Tim
Beta Was this translation helpful? Give feedback.
All reactions