Overview
Establish a comprehensive security testing and fuzzing strategy for the PowerPoint skill Python codebase (~4,700 lines across 14 modules under .github/skills/experimental/powerpoint/scripts/).
This initiative implements Scenario D (Hypothesis + pip-audit) from the ClusterFuzzLite research evaluation, leveraging the existing CodeQL Python analysis for SAST coverage. The approach was selected for its highest value-to-effort ratio given the codebase's structured-input nature and size.
Background
A thorough security-focused codebase analysis identified 5 security findings (1 CRITICAL, 3 HIGH, 1 MODERATE), a complete absence of property-based or fuzz testing (all ~300+ tests are deterministic), and no dependency CVE scanning. ClusterFuzzLite was evaluated but rejected as the primary approach due to structured input mismatch, Python >=3.11 incompatibility with the default base image, and disproportionate setup complexity.
Three-Phase Implementation
Phase 1: Hypothesis Property Tests (High Priority)
Add hypothesis>=6.100 to dev dependencies and write property tests targeting priority modules:
validate_slides.py / validate_deck.py — input validation robustness
build_deck.py — element builder dispatch with arbitrary element definitions
pptx_colors.py — hex color parsing edge cases
pptx_tables.py — merge bounds and out-of-range handling
Phase 2: pip-audit Dependency CVE Scanning (High Priority)
Add pip-audit CI step to scan pyproject.toml dependencies (python-pptx, pyyaml, pymupdf, lxml) for known CVEs using open vulnerability databases (PyPI Advisory Database, OSV).
Phase 3: OSSF Scorecard Fuzzing Compliance (Medium Priority)
Add a thin Atheris wrapper using the polyglot pattern so that import atheris is detectable by OSSF Scorecard's Fuzzing check. Hypothesis alone scores 0/10 since Scorecard only recognizes import atheris for Python.
Security Findings to Address
| Severity |
Finding |
Location |
| CRITICAL |
Arbitrary code execution via importlib |
build_deck.py |
| HIGH |
XML parsing (XXE vector) via lxml.etree.fromstring() |
extract_content.py |
| HIGH |
Untrusted binary blob writes |
extract_content.py |
| HIGH |
PyMuPDF C extension attack surface |
export_slides.py, render_pdf_images.py |
| MODERATE |
Recursive processing without depth limits |
Multiple modules |
Existing Security Coverage
- CodeQL:
security-extended,security-and-quality query suites for actions and python — runs on every PR, on-demand, and weekly
- OpenSSF Scorecard: Weekly runs on Sundays + push to main
- gitleaks: Secret scanning (devcontainer-only)
Sub-Issues
This epic tracks the following work items:
Acceptance Criteria
Overview
Establish a comprehensive security testing and fuzzing strategy for the PowerPoint skill Python codebase (~4,700 lines across 14 modules under
.github/skills/experimental/powerpoint/scripts/).This initiative implements Scenario D (Hypothesis + pip-audit) from the ClusterFuzzLite research evaluation, leveraging the existing CodeQL Python analysis for SAST coverage. The approach was selected for its highest value-to-effort ratio given the codebase's structured-input nature and size.
Background
A thorough security-focused codebase analysis identified 5 security findings (1 CRITICAL, 3 HIGH, 1 MODERATE), a complete absence of property-based or fuzz testing (all ~300+ tests are deterministic), and no dependency CVE scanning. ClusterFuzzLite was evaluated but rejected as the primary approach due to structured input mismatch, Python >=3.11 incompatibility with the default base image, and disproportionate setup complexity.
Three-Phase Implementation
Phase 1: Hypothesis Property Tests (High Priority)
Add
hypothesis>=6.100to dev dependencies and write property tests targeting priority modules:validate_slides.py/validate_deck.py— input validation robustnessbuild_deck.py— element builder dispatch with arbitrary element definitionspptx_colors.py— hex color parsing edge casespptx_tables.py— merge bounds and out-of-range handlingPhase 2: pip-audit Dependency CVE Scanning (High Priority)
Add
pip-auditCI step to scanpyproject.tomldependencies (python-pptx, pyyaml, pymupdf, lxml) for known CVEs using open vulnerability databases (PyPI Advisory Database, OSV).Phase 3: OSSF Scorecard Fuzzing Compliance (Medium Priority)
Add a thin Atheris wrapper using the polyglot pattern so that
import atherisis detectable by OSSF Scorecard's Fuzzing check. Hypothesis alone scores 0/10 since Scorecard only recognizesimport atherisfor Python.Security Findings to Address
importlibbuild_deck.pylxml.etree.fromstring()extract_content.pyextract_content.pyexport_slides.py,render_pdf_images.pyExisting Security Coverage
security-extended,security-and-qualityquery suites foractionsandpython— runs on every PR, on-demand, and weeklySub-Issues
This epic tracks the following work items:
Acceptance Criteria