Skip to content

Commit 3a7ff91

Browse files
committed
Making a third large revision to the proposal
1 parent 2108e81 commit 3a7ff91

File tree

1 file changed

+102
-144
lines changed

1 file changed

+102
-144
lines changed

text/0000-crates.io-default-ranking.md

Lines changed: 102 additions & 144 deletions
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ difficult to find which crates are meant for a particular purpose and then to
1111
decide among the available crates which one is most suitable in a particular
1212
context. [Categorization][cat-pr] and [badges][badge-pr] are coming to
1313
crates.io; categories help with finding a set of crates to consider and badges
14-
help communicate attributes of crates.
14+
help communicate attributes of crates.
1515

1616
**This RFC aims to create a default ranking of crates within a list of crates
1717
that have a category or keyword in order to make a recommendation to crate users
@@ -95,149 +95,107 @@ By far, the most common attribute people said they considered in the survey was
9595
whether a crate had good documentation. Frequently mentioned when discussing
9696
documentation was the desire to quickly find an example of how to use the crate.
9797

98-
- Number of lines of documentation in Rust files:
99-
`grep -r \/\/[\!\/] --binary-files=without-match --include=*.rs . | wc -l`
100-
- Number of lines in the README file, if specified in Cargo.toml
101-
- Number of lines in Rust files: `find . -name '*.rs' | xargs wc -l`
102-
103-
We would then add the lines in the README to the lines of documentation and
104-
subtract the lines of documentation from the total lines of code in order to
105-
get the ratio of documentation to code. Test code (and any documentation within
106-
test code) *is* part of this calculation.
107-
108-
Any crate getting in the top 20% of all crates would get a badge saying "well
109-
documented".
110-
111-
Additionally, lists of crates would have a badge showing the number of files in
112-
the standard `/examples` directory, if any. A further enhancement would be to
113-
make that badge link to the examples displayed somewhere (crates.io? in the
114-
repository? in the documentation?).
115-
116-
* combine:
117-
* 1,195 lines of documentation
118-
* 99 lines in README.md
119-
* 5,815 lines of Rust
120-
* (1195 + 99) / (5815 - 1195) = 1294/4620 = .28
121-
122-
* nom:
123-
* 2,263 lines of documentation
124-
* 372 lines in README.md
125-
* 15,661 lines of Rust
126-
* (2263 + 372) / (15661 - 2263) = 2635/13398 = .20
127-
128-
* peresil:
129-
* 159 lines of documentation
130-
* 20 lines in README.md
131-
* 1,341 lines of Rust
132-
* (159 + 20) / (1341 - 159) = 179/1182 = .15
133-
134-
* lalrpop: ([in the /lalrpop directory in the repo][lalrpop-repo])
135-
* 742 lines of documentation
136-
* 110 lines in ../README.md
137-
* 94,104 lines of Rust
138-
* (742 + 110) / (94104 - 742) = 852/93362 = .01
139-
140-
* peg:
141-
* 3 lines of documentation
142-
* no readme specified in Cargo.toml
143-
* 1,531 lines of Rust
144-
* (3 + 0) / (1531 - 3) = 3/1528 = .00
145-
146-
[lalrpop-repo]: https://github.com/nikomatsakis/lalrpop/tree/master/lalrpop
147-
148-
If we assume these are all the crates on crates.io for this example, then
149-
combine is the top 20% and would get a badge. None of the crates have files in
150-
`/examples`, so none would have the examples badge.
151-
152-
### Maintenance
153-
154-
We can add an optional attribute to Cargo.toml that crate authors could use to
155-
self-report their maintenance intentions. The valid values would be along the
156-
lines of the following, and would influence the ranking in the order they're
157-
presented:
158-
159-
- **Actively developed**, meaning new features are being added and bugs are
160-
being fixed
161-
- **Passively maintained**, meaning there are no plans for new features, but
162-
the maintainer intends to respond to issues that get filed
163-
- **As-is**, meaning the crate is feature complete, the maintainer does not
164-
intend to continue working on it or providing support, but it works for the
165-
purposes it was designed for
166-
- None, we don't display anything, since the maintainer has not chosen to
167-
specify their intentions, potential crate users will need to investigate on
168-
their own
169-
- **Experimental**, meaning the author wants to share it with the community but
170-
is not intending to meet anyone's particular use case
171-
- **Looking for maintainer**, meaning the current maintainer would like to give
172-
up the crate to someone else
173-
174-
These would be displayed as badges on lists of crates.
175-
176-
These levels would not have any time commitments attached to them-- maintainers
177-
who would like to batch changes into releases every 6 months could report
178-
"actively developed" just as much as mantainers who like to release every 6
179-
weeks. This would need to be clearly communicated to set crate user
180-
expectations properly.
181-
182-
This is also inherently a crate author's statement of current intentions, which
183-
may get out of sync with the reality of the crate's maintenance over time.
184-
185-
If I had to guess for the maintainers of the parsing crates, I would assume:
186-
187-
* nom: actively developed
188-
* combine: actively developed
189-
* lalrpop: actively developed
190-
* peg: actively developed
191-
* peresil: passively maintained
192-
193-
### Quality
194-
195-
Given that so much of "quality" is subjective, we do not have a proposed
196-
quality measure at this time. Involving CI might be useful, but that would
197-
require taking a stand on supported 3rd party CI providers. The same problem
198-
would exist with test coverage percentage.
199-
200-
Measures we have considered but that we do not have tools to compute at this
201-
time:
202-
203-
- Number of unit and/or integration tests
204-
- Ratio of test code to implementation code
205-
206-
If the community feels the effort to create these tools would be worth the
207-
information, we would investigate these further.
208-
209-
### Popularity
210-
211-
- Number of downloads in the last 90 days, and the top, say, 10% most
212-
downloaded would get a bump in ranking and a badge that says "frequently
213-
downloaded". Can be calculated as part of the [update-downloads][] background
214-
job.
215-
216-
[update-downloads]: https://github.com/rust-lang/crates.io/blob/master/src/bin/update-downloads.rs
217-
218-
With this proposal, out of the 5 parser crates assuming these are the only
219-
crates on crates.io, nom would be marked as "frequently downloaded" and the
220-
others would not. nom is currently ranked at #83 in the list of crates by
221-
number of downloads, which easily puts it in the top 10% out of 7,239 crates.
222-
223-
### Credibility
224-
225-
We think credibility is an even more subjective measure than quality. We
226-
considered using number of other crates an author has, but that would skew
227-
heavily towards [retep998][]. Highlighting Rust team members is also a
228-
possibility since people tend to regard them more highly, but there are many
229-
crate authors who are not on any Rust team who are releasing excellent crates.
230-
We have [an idea for a more personal "favorite authors" list][favs] that we
231-
think would help indicate credibility. With this proposed feature, each person
232-
can define credibility for themselves, which makes this measure less gameable
233-
and less of a popularity contest.
234-
235-
[retep998]: https://crates.io/users/retep998
236-
[favs]: https://github.com/rust-lang/crates.io/issues/494
237-
238-
### Overall
239-
240-
(Combining the new proposals for an overall ranking is a work in progress)
98+
This would be addressed through human evaluation, rather than automatic
99+
evaluation, in two ways:
100+
101+
1. [Render README files on a crate's page on crates.io][render-readme] so that
102+
people can quickly see for themselves the information that a crate author
103+
chooses to make available in their README. We can nudge towards having an
104+
example in the README by adding a template README that includes an Examples
105+
section [in what `cargo new` generates][cargo-new].
106+
2. Add a mechanism for logged-in crates.io users to indicate that a crate has
107+
particularly good documentation.
108+
- This would be a very constrained form of voting/rating: one UI element
109+
(ex: an up arrow, a thumbs up, a star, a checkbox, a link) that could be
110+
toggled from "not indicated" to "this crate has good documentation" and
111+
vice versa.
112+
- The number of people who have indicated a crate has good documentation
113+
would be displayed for each crate.
114+
- That number would be limited to an amount of time (proposal: 6 mo). 6 mo
115+
after you voted, your vote would disappear and you could choose to renew
116+
your vote. This would prevent older crates from getting too much of an
117+
advantage or a high rating being inaccurate if many new, undocumented
118+
features get added to a crate.
119+
- This would not influence ranking at all, and therefore is less likely to
120+
be gamed. You'd need to make many github accounts to easily game this.
121+
- Since there is no negative "this crate has bad documenation" indication,
122+
nor is there free-form text, the moderation burden should be minimal.
123+
124+
[render-readme]: https://github.com/rust-lang/crates.io/issues/81
125+
[cargo-new]: https://github.com/rust-lang/cargo/issues/3506
126+
127+
### Maintenance (and Popularity)
128+
129+
The number of releases in the last 6 months and the number of downloads in the
130+
last 90 days can be combined into an automatic indicator of the status of a
131+
crate. This would be more like a badge and would not influence ranking at all.
132+
133+
- Many recent releases and few downloads indicates an *experimental* crate.
134+
- At least occasional releases in the last 6 months and many recent downloads
135+
indicates a *mainstream* crate.
136+
- Few to no releases in the last 6 months and few recent downloads indicates an
137+
*inactive* crate.
138+
139+
In table form:
140+
141+
| | Many releases | Few releases |
142+
|----------------|---------------|--------------|
143+
| Many downloads | Mainstream | Mainstream |
144+
| Few downloads | Experimental | Inactive |
145+
146+
TODO: Decide what the cutoff values these measures should have, which we will do
147+
if people are generally in favor of this idea.
148+
149+
By using the number of downloads, crates that are "finished" and stable should
150+
still be regarded as mainstream while many people continue to use it.
151+
152+
These labels will have an indicator of their meaning and how they are calculated
153+
when you hover over them.
154+
155+
A downside of this method is that it does not convey crate author *intent*,
156+
only what one might assume based only on these two measures. A crate might get
157+
popular while an author still considers it to be experimental, thus creating
158+
expectations of stability and support.
159+
160+
We might need to experiment with the thresholds for the number of releases and
161+
the number of downloads considered to be "few" and "many".
162+
163+
Alternatives:
164+
165+
- Also factor in the version number: keep the "Experimental" label unless a
166+
crate version has many downloads *and* its version number is >= 1.0.0. This
167+
might be better once more crates release a 1.0.0 version.
168+
- Don't show any label for the "Mainstream" category and only label
169+
"Experimental" or "Inactive" crates.
170+
- Use different words for these concepts
171+
- Use more words for these concepts that more clearly states what is measured:
172+
- "This crate has many recent releases and many downloads"
173+
- "This crate has many recent downloads but has not been updated in the last 6
174+
months"
175+
- "This crate has many recent releases but few downloads"
176+
- "This crate has not been updated in the last 6 months and has few downloads"
177+
178+
For the crates used in the survey, assuming that any release in the last 6 mo
179+
makes a crate "Experimental" rather than "Inactive", and that whatever the
180+
cutoff value for "many downloads" is exactly, the line lies somewhere between
181+
nom and combine since nom has an order of magnitude more downloads than combine:
182+
183+
| Crate | Releases in last 6 mo | Downloads in last 90 days | Label |
184+
|---------|-----------------------|---------------------------|--------------|
185+
| nom | 3 | 82,975 | Mainstream |
186+
| combine | 4 | 4,252 | Experimental |
187+
| lalrpop | 3 | 1,928 | Experimental |
188+
| peg | 7 | 2,190 | Experimental |
189+
| peresil | 0 | 1,859 | Inactive |
190+
191+
### Overall ordering: Recent downloads
192+
193+
To remove some of the bias towards older crates that may have been replaced with
194+
newer alternatives, we propose that the default ranking of crates be changed
195+
from the all-time number of downloads to the number of downloads in the last 90
196+
days. This is easy to understand and explain, and is being used as a rough
197+
measure of evaluation today. This should be enough to get the most suitable
198+
crates on the first page of results.
241199

242200
## Out of scope
243201

0 commit comments

Comments
 (0)