Ongoing strategies for spam

Based on information received from the team behind [npm](https://npmjs.com), the spam attackers involved in our latest flurry are sophisticated and relentless.

Indeed our initial round of cleanup included 78 Spam User accounts each operating on its own IP Address.

We've added some functionality to the Admin side of things to stop these in their tracks to give us time to assess, but should develop more operational processes moving forward.

I propose the following approach:

## Automated Spam classification for all incoming Projects and Releases

Feed the interesting parts of the uploaded metadata for classification by a spam classification model. This should **NOT** be something that occurs synchronously during the upload, but rather its results should be stored for review by administrators.

## Admin interface for review and training of Spam classification results

PyPI Administrators should have a location to review uploads classified as spam. This should allow for the administrators to report back to the model if a given upload was a false positive. It should also allow for administrators to quickly delete true spam.

## Community crowdsourced classification of spam

Allow **Logged In** Users to report spam found on PyPI. This gives us a view of false negative classification. These reports should be rate-limited in order to prevent abuse.

## Admin interface for review of User Spam reports

PyPI Administrators should have a location to review User reports of Spam. This should allow for the administrators to report back to the model if a given upload was a false negative. It should also allow for administrators to quickly delete true spam.

Additionally, it should allow for administrators to mark reports as invalid. We may want to keep track of a "reputation" for reporters as well. Users with consistently high reputation or consistently low reputation for reports can be weighted.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Ongoing strategies for spam #2982

Automated Spam classification for all incoming Projects and Releases

Admin interface for review and training of Spam classification results

Community crowdsourced classification of spam

Admin interface for review of User Spam reports

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Ongoing strategies for spam #2982

Description

Automated Spam classification for all incoming Projects and Releases

Admin interface for review and training of Spam classification results

Community crowdsourced classification of spam

Admin interface for review of User Spam reports

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions