-
Notifications
You must be signed in to change notification settings - Fork 163
GSOC 2017
Welcome to AboutCode! This year AboutCode is a mentoring Organization for the Google Summer of Code 2017 edition.
AboutCode is a project to uncover data ... about software code:
- where does it come from?
- what is its license? copyright?
- is it secure, maintained, well coded?
All these are questions that are important to find answers to when there are million of free and open source software components available on the web.
Where software comes from and what is its license should be a problem of the past, such that everyone can safely consume more free and open source software. Come and join us to make it so!
Our tools are used to help detect and report the origin and license of source code, packages and binaries, as well as discover software and package dependencies, track vulnerabilities, bugs and other important software component attributes.
Subscribe to the mailing list at https://lists.sourceforge.net/lists/listinfo/aboutcode-discuss or join the #aboutcode IRC channel on Freenode and introduce yourself and start the discussion!
Or contact the org admin: @pombredanne and [email protected]
Discovering the origin of code is a vast topic. We primarily use Python for this and some C/C++ and JavaScript, but we are open to using any other language within reason.
Our domain includes text analysis and processing (for instance for copyrights and licenses), parsing (for package manifest formats), binary analysis (to detect the origin and license of binaries, which source code they come from, etc) as well as web based tools and APIs (to expose the tools and libraries as web services).
-
ScanCode live scan server: This project is to use ScanCode as a library in a web and rest API application that allows to scan code on demand by entering a URL and then store the scan results. It could also be made available as a Travis or Github integration to scan on commit too.
-
Mentors:
-
@jdaguil https://github.com/jdaguil
-
@majurg https://github.com/majurg
-
@tdruez https://github.com/tdruez
-
Create a package security vulnerability scanner. The goal is to build on existing projects (such as https://github.com/cve-search/cve-search ) to match actual scanned and detected packages to existing vulnerabilities. This is not entirely trivial as there are several gaps in the CVE data and how they actually relate to actual packages as they could be detected by ScanCode.
-
Mentor:
-
@majurg https://github.com/majurg
-
@JonoYang https://github.com/JonoYang
-
Port the Python license expression library to JavaScript and prepare and publish an NPM package for this. Eventually use automated code translation (for JS) for the port. Add license expression support to AboutCodeMgr with this library. As a bonus, create a web server app and API service to parse and normalize ScanCode and SPDX license expressions either in Python or JavaScript.
-
https://github.com/nexB/license-expression and https://github.com/bastikr/boolean.py
-
Mentors:
-
@JonoYang https://github.com/JonoYang
-
@majurg https://github.com/majurg
-
ScanCode scan deduction: the goal of this project is to take existing scan and match results and infer summaries and deduction at a higher level, such as the licensing of a whole directory tree.
-
Mentors:
- @pombredanne https://github.com/pombredanne
- @JonoYang https://github.com/JonoYang
-
DeltaCode: this would be a new tool to help determine at a high level if two codebase or version of code are mostly similar or not and if they differ how they do differ at a high level (this is NOT a diff tool in the sense that it cares about whole codebase differences in the large, not file differences in the small).
-
Mentor: @majurg https://github.com/majurg
-
Create a license and copyright detection benchmark between ScanCode, Fossology, licensee, LicenseFinder, license-check, ninka, slic, LiD and others. This project is to create a comprehensive test suite and a benchmark for several existing FOSS open source license and copyright detection engine and establish mappings between the different conventions they use for license identification and evaluate and publish the results of detection accuracy and precision.
-
Mentors:
- @mjherzog https://github.com/mjherzog
- @pombredanne https://github.com/pombredanne
-
Improved copyright parsing in ScanCode by keeping track of line numbers and offsets where copyrights are found. This would likely require either replacing or enhancing NTLK which is used as a natural language parser to add support for tracking where a copyright has been detected in a scanned text.
-
https://github.com/nexB/scancode-toolkit/tree/develop/src/cluecode
-
Mentor: @JonoYang https://github.com/JonoYang
-
Support full JSON and ABCD formats in AttributeCode
-
Mentor: @chinyeungli https://github.com/chinyeungli
-
AboutCode Manager server web app: Build a node.js web-based application using as much as possible the same code and look and feel as in the Electron app and eventually support multiple projects for a team.
-
Mentors:
-
@jdaguil https://github.com/jdaguil
-
@tdruez https://github.com/tdruez
-
Transparent archive extraction in ScanCode. Today archive extraction is done with a separate command line invocation. The goal of this project is to integrated archive extraction transparently in the scancode scan loop.
-
Mentor: @pombredanne https://github.com/pombredanne
-
Add automated static analysis of which packages are installed in Docker layers for RPMs, Debian and Alpine Linux. This is for the conan the Docker images analysis tool.
-
Mentor: @pombredanne https://github.com/pombredanne
-
Create a plugin architecture for ScanCode outputs to multiple formats (CSV, JSON, SPDX, Debian Copyright)
-
Mentor: @pombredanne https://github.com/pombredanne
-
Static analysis of binaries for build tracing in TraceCode: TraceCode does system calls tracing. The goal of this idea is to do the same using symbols and debug symbols or strings matching to accomplish something similar to what the dynamic tracing does.
-
Mentor: @pombredanne https://github.com/pombredanne
-
Better support tracing the lifecycle of file descriptors in TraceCode: TraceCode does a system calls tracing. The goal of this project is to improve the way we track open/close file descriptors in the trace to reconstruct the life of a file.
-
Mentor: @pombredanne https://github.com/pombredanne
-
Create Debian and RPM packages for ScanCode, AttributeCode and TraceCode.
-
Mentor: @pombredanne https://github.com/pombredanne
We expect your application to be in the range of 1000 words. Anything less than that will probably not contain enough information for us to determine whether you are the right person for the job. Your proposal should contain at least the following information, plus anything you think is relevant:
- Your name
- Title of your proposal
- Abstract of your proposal
- Detailed description of your idea including explanation on why is it innovative and what it will contribute
- Description of previous work, existing solutions (links to prototypes, bibliography are more than welcome)
- Mention the details of your academic studies, any previous work, internships
- Relevant skills that will help you to achieve the goal (programming languages, frameworks)?
- Any previous open-source projects (or even previous GSoC) you have contributed to and links.
- Do you plan to have any other commitments during GSoC that may affect you work? Any vacations/holidays? Will you be available full time to work on your project? (Hint: do not bother applying if this is not a serious full time commitment)
Subscribe to the mailing list at https://lists.sourceforge.net/lists/listinfo/aboutcode-discuss or join the #aboutcode IRC channel on Freenode and introduce yourself and start the discussion!
You need to understand something about open source licensing or package managers or code and binaries static analysis. The best way to demonstrate your capability could be to submit a small patch ahead of the project selection for an existing issue or a new issue.