Split Github Vulnerability Scan into separate SCA & SAST parsers #12773

Logicmn · 2025-07-11T01:10:41Z

Description

Hello! The current parser implementation for GitHub code scanning results is baked into the "Github Vulnerability Scan" scan type, which is a parser originally meant to be used for GitHub SCA (Dependabot) vulnerabilities. Since these two scan types are exceptionally different, issues can arise especially around the fields used for deduplication in the hash code. This PR splits out GitHub code scanning into its own GithubSASTParser, with a scan-type string called ""Github SAST Scan." I have included documentation, unit tests, and a new list of fields for hash code deduplication.

I also included several improvements for the original Github Vulnerability Scan parser. These improvements include:

Add support for the cvssSeverities which will replace the cvss field in GitHub's graphql response in October, 2025.
Add the permalink from the dependabotUpdate field to the finding description
Add GitHub's now supported epss percentage and percentile to finding.epss_score and finding.epss_percentile finding fields
Set finding.url to GitHub Dependabot alert hyperlink for convenience
Improve vulnerability ID handling (now explicitly sets finding.cve and finding.vuln_id_from_tool fields before falling back to unsaved_vulnerability_ids)
Fix a bug where finding.component_version was only being set when the vulnerableRequirements str started with =
Improve defensive coding where applicable, like using .get() to access fields

Backward compatibility: existing users of the “Github Vulnerability Scan” scan type (driven by GithubVulnerabilityParser) for SCA imports will see no change. If you’d been using it to ingest SAST/code-scanning JSON, you’ll need to switch your import to the new “Github SAST Scan” scan type (driven by GithubSASTParser).

Ref links:

Original impl of GitHub code scanning support: Fix github parser issue 9582 #9583
GitHub code scanning API reference: https://docs.github.com/en/rest/code-scanning/code-scanning

dryrunsecurity · 2025-07-11T01:12:06Z

This pull request contains an open redirect vulnerability in the GithubSASTParser where a maliciously crafted SAST report could generate a file link pointing to an attacker-controlled domain, potentially leading to an open redirect if the DefectDojo UI renders the description as HTML.

Open Redirect in dojo/tools/github_sast/parser.py

Vulnerability	Open Redirect
Description	The `GithubSASTParser` constructs a `file_link` URL using the scheme and network location directly from the `html_url` field found in the input SAST report. If a malicious SAST report is uploaded with a crafted `html_url` (e.g., `https://attacker.com/path`), the generated `file_link` will point to the attacker's domain. This `file_link` is then embedded into the `description` field of the `Finding` object as a Markdown-formatted link. If the DefectDojo UI renders this description as HTML, it would create a clickable link to the attacker-controlled domain, leading to an open redirect.

django-DefectDojo/dojo/tools/github_sast/parser.py

Lines 1 to 85 in 2ffb18d

    
           import json 
        
           from dojo.models import Finding 
        
           class GithubSASTParser: 
        
               def get_scan_types(self): 
        
                   return ["Github SAST Scan"] 
        
               def get_label_for_scan_types(self, scan_type): 
        
                   return scan_type 
        
               def get_description_for_scan_types(self, scan_type): 
        
                   return "GitHub SAST report file can be imported in JSON format." 
        
               def get_findings(self, filename, test): 
        
                   data = json.load(filename) 
        
                   if not isinstance(data, list): 
        
                       error_msg = "Invalid SAST report format, expected a JSON list of alerts." 
        
                       raise TypeError(error_msg) 
        
                   findings = [] 
        
                   for vuln in data: 
        
                       rule = vuln.get("rule", {}) 
        
                       inst = vuln.get("most_recent_instance", {}) 
        
                       loc = inst.get("location", {}) 
        
                       html_url = vuln.get("html_url") 
        
                       rule_id = rule.get("id") 
        
                       title = f"{rule.get('description')} ({rule_id})" 
        
                       severity = rule.get("security_severity_level", "Info").title() 
        
                       active = vuln.get("state") == "open" 
        
                       # Build description with context 
        
                       desc_lines = [] 
        
                       if html_url: 
        
                           desc_lines.append(f"GitHub Alert: [{html_url}]({html_url})") 
        
                       owner = repo = None 
        
                       commit_sha = inst.get("commit_sha") 
        
                       if html_url: 
        
                           from urllib.parse import urlparse 
        
                           parsed = urlparse(html_url) 
        
                           parts = parsed.path.strip("/").split("/") 
        
                           # URL is /<owner>/<repo>/security/... so parts[0]=owner, parts[1]=repo 
        
                           if len(parts) >= 2: 
        
                               owner, repo = parts[0], parts[1] 
        
                       if owner and repo and commit_sha and loc.get("path") and loc.get("start_line"): 
        
                           file_link = ( 
        
                               f"{parsed.scheme}://{parsed.netloc}/" 
        
                               f"{owner}/{repo}/blob/{commit_sha}/" 
        
                               f"{loc['path']}#L{loc['start_line']}" 
        
                           ) 
        
                           desc_lines.append(f"Location: [{loc['path']}:{loc['start_line']}]({file_link})") 
        
                       elif loc.get("path") and loc.get("start_line"): 
        
                           # fallback if something is missing 
        
                           desc_lines.append(f"Location: {loc['path']}:{loc['start_line']}") 
        
                       msg = inst.get("message", {}).get("text") 
        
                       if msg: 
        
                           desc_lines.append(f"Message: {msg}") 
        
                       if severity: 
        
                           desc_lines.append(f"Rule Severity: {severity}") 
        
                       if rule.get("full_description"): 
        
                           desc_lines.append(f"Description: {rule.get('full_description')}") 
        
                       description = "\n".join(desc_lines) 
        
                       finding = Finding( 
        
                           title=title, 
        
                           test=test, 
        
                           description=description, 
        
                           severity=severity, 
        
                           active=active, 
        
                           static_finding=True, 
        
                           dynamic_finding=False, 
        
                           vuln_id_from_tool=rule_id, 
        
                       ) 
        
                       # File path & line 
        
                       finding.file_path = loc.get("path") 
        
                       finding.line = loc.get("start_line") 
        
                       if html_url: 
        
                           finding.url = html_url 
        
                       findings.append(finding) 
        
                   return findings

All finding details can be found in the DryRun Security Dashboard.

Logicmn · 2025-07-14T15:30:02Z

@Maffooch All linting errors should be fixed now, thanks for bearing with. :)

dojo/tools/github_vulnerability/parser.py

valentijnscholten

comment posted above

dogboat

Just two nits about import placement, but otherwise looks great; approving because they're not blockers imho.

dogboat · 2025-08-25T19:24:05Z

dojo/tools/github_sast/parser.py

+            owner = repo = None
+            commit_sha = inst.get("commit_sha")
+            if html_url:
+                from urllib.parse import urlparse


Any reason to have this here rather than at the top?

dogboat · 2025-08-25T19:33:30Z

unittests/tools/test_github_sast_parser.py

+
+    def test_parse_file_invalid_format_raises(self):
+        """Non-list JSON should raise"""
+        import io


Same nit about imports.

Logicmn added 9 commits July 10, 2025 12:18

Refactor GithubVulnerability parser and add GithubSAST parser

d17a879

More GithubVulnerability and GithubSAST parser improvements

b34a58c

Add documentation

c50be33

Add tests, update docs, and add hash code fields

2673001

Fix Github vulnerability parser unit test

3b6ee59

Unit tests and parser tweaks

6c6e697

Rm files pushed by mistake

edc4c7c

Revert certain removals from unit test

fd2c43e

Add EPSS field population and update unit tests

d6805c8

Logicmn requested review from Maffooch and mtesauro as code owners July 11, 2025 01:10

github-actions bot added settings_changes Needs changes to settings.py based on changes in settings.dist.py included in this PR docs unittests parser labels Jul 11, 2025

Logicmn added 3 commits July 10, 2025 21:19

Removed some unnecessary comments and formatting

106e769

Ruff formatting

7399641

Fix unit tests

8115ee3

Maffooch requested review from valentijnscholten and dogboat July 11, 2025 23:55

Ruff formatting

745dca2

Fix unit test

d698115

valentijnscholten reviewed Jul 15, 2025

View reviewed changes

dojo/tools/github_vulnerability/parser.py Outdated Show resolved Hide resolved

valentijnscholten reviewed Jul 15, 2025

View reviewed changes

dojo/tools/github_vulnerability/parser.py Outdated Show resolved Hide resolved

valentijnscholten added this to the 2.49.0 milestone Jul 15, 2025

Logicmn added 3 commits July 16, 2025 15:08

Github Vulnerability parser and docs tweaks, and upgrade instructions

b7d143e

Politeness

9f9bc42

Fix dependabot update pr link parsing

84b0706

Maffooch requested a review from valentijnscholten July 28, 2025 23:23

valentijnscholten reviewed Jul 30, 2025

View reviewed changes

Backwards compatability

2ffb18d

valentijnscholten modified the milestones: 2.49.0, 2.50.0 Aug 4, 2025

dogboat approved these changes Aug 25, 2025

View reviewed changes

valentijnscholten modified the milestones: 2.50.0, 2.51.0 Sep 2, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Split Github Vulnerability Scan into separate SCA & SAST parsers #12773

Split Github Vulnerability Scan into separate SCA & SAST parsers #12773

Uh oh!

Logicmn commented Jul 11, 2025 •

edited

Loading

Uh oh!

dryrunsecurity bot commented Jul 11, 2025 •

edited

Loading

Uh oh!

Logicmn commented Jul 14, 2025

Uh oh!

Uh oh!

Uh oh!

valentijnscholten left a comment

Uh oh!

dogboat left a comment

Uh oh!

dogboat Aug 25, 2025

Uh oh!

dogboat Aug 25, 2025

Uh oh!

Uh oh!

Split Github Vulnerability Scan into separate SCA & SAST parsers #12773

Are you sure you want to change the base?

Split Github Vulnerability Scan into separate SCA & SAST parsers #12773

Uh oh!

Conversation

Logicmn commented Jul 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dryrunsecurity bot commented Jul 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Logicmn commented Jul 14, 2025

Uh oh!

Uh oh!

Uh oh!

valentijnscholten left a comment

Choose a reason for hiding this comment

Uh oh!

dogboat left a comment

Choose a reason for hiding this comment

Uh oh!

dogboat Aug 25, 2025

Choose a reason for hiding this comment

Uh oh!

dogboat Aug 25, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Logicmn commented Jul 11, 2025 •

edited

Loading

dryrunsecurity bot commented Jul 11, 2025 •

edited

Loading