Open
Description
- Related GSoC 2025: Start here #4712
cve-bin-tool: improved component identification
Project description
Thanks to GSoC 2025 we've got PURL support to help improve component identification, and a mismatch database that helps us identify and avoid common mistakes made when we use a basic text search to try to find components and accidentally wind up with a component that has the same name but is written in a different language or something. But there's room for more improvements here!
- Extending the mismatch database to handle certain types of mismatches such as false positive: name collision for python arrow vs rust arrow #3193
- Adding additional data to our mismatch database
- Improve handling PURL data from OSV
- Add a framework for better handling cases where vulnerability data is wrong but will take some time to update. (Especially important as NVD fixes may take months instead of days thanks to their staffing challenges.)
- Stretch goal: looking at other sources of data we can use to refine results. See feat: Adding alternative vulnerability data sources #4100
Related reading
Skills
- python
- sqlite
- software security: knowledge of how software vulnerabilities are triaged, mitigated and solved would be very helpful here. (you can learn some of this as you go but it's worth doing some background reading to help inform your design choices)
Difficulty level
- medium
Project Length
- 350 hours (e.g. full-time for 10 weeks or part-time for longer)
- It would be possible to do part of this project in a 175 hour project, but we may prefer candidates who have the time to do more assuming similar levels of ability
Mentor
- The primary mentor for this project will depend on what other projects we accept. Please ask all questions on this issue rather than sending email so you can benefit from the expertise of other contributors and mentors. ( Terri's email gets swamped regularly by other work concerns and it's likely she will miss emails send during the GSoC period, but she will answer questions asked in public on this issue or in our gitter chat.)
GSoC Participants Only
This issue is a potential project idea for GSoC 2025, and is reserved for completion by a selected GSoC contributor. Please do not work on it outside of that program. If you'd like to apply to do it through GSoC, please start by reading #4712.