Skip to content

GSoC 2025 project idea: improved component identification #4807

Open
@terriko

Description

@terriko

cve-bin-tool: improved component identification

Project description

Thanks to GSoC 2025 we've got PURL support to help improve component identification, and a mismatch database that helps us identify and avoid common mistakes made when we use a basic text search to try to find components and accidentally wind up with a component that has the same name but is written in a different language or something. But there's room for more improvements here!

  1. Extending the mismatch database to handle certain types of mismatches such as false positive: name collision for python arrow vs rust arrow #3193
  2. Adding additional data to our mismatch database
  3. Improve handling PURL data from OSV
  4. Add a framework for better handling cases where vulnerability data is wrong but will take some time to update. (Especially important as NVD fixes may take months instead of days thanks to their staffing challenges.)
  5. Stretch goal: looking at other sources of data we can use to refine results. See feat: Adding alternative vulnerability data sources #4100

Related reading

Skills

  • python
  • sqlite
  • software security: knowledge of how software vulnerabilities are triaged, mitigated and solved would be very helpful here. (you can learn some of this as you go but it's worth doing some background reading to help inform your design choices)

Difficulty level

  • medium

Project Length

  • 350 hours (e.g. full-time for 10 weeks or part-time for longer)
  • It would be possible to do part of this project in a 175 hour project, but we may prefer candidates who have the time to do more assuming similar levels of ability

Mentor

  • The primary mentor for this project will depend on what other projects we accept. Please ask all questions on this issue rather than sending email so you can benefit from the expertise of other contributors and mentors. ( Terri's email gets swamped regularly by other work concerns and it's likely she will miss emails send during the GSoC period, but she will answer questions asked in public on this issue or in our gitter chat.)

GSoC Participants Only

This issue is a potential project idea for GSoC 2025, and is reserved for completion by a selected GSoC contributor. Please do not work on it outside of that program. If you'd like to apply to do it through GSoC, please start by reading #4712.

Metadata

Metadata

Assignees

No one assigned

    Labels

    gsocTasks related to our participation in Google Summer of Code

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions