Skip to content

Changes by Jules#104

Open
akutuva21 wants to merge 796 commits into
RuleWorld:mainfrom
akutuva21:main
Open

Changes by Jules#104
akutuva21 wants to merge 796 commits into
RuleWorld:mainfrom
akutuva21:main

Conversation

@akutuva21

Copy link
Copy Markdown
Member

PyBioNetGen: Security, Performance, Code Health, and Test Coverage Overhaul

Summary

This PR rolls up 419 commits spanning April 13 – June 4, 2026 into a single contribution against PyBioNetGen. It touches 151 unique files with roughly +10,800 / −9,100 lines changed across the run. The work is a broad maintenance and hardening pass over the codebase — closing security holes, eliminating long-standing TODO/FIXME/assert debt, optimizing hot paths, and substantially expanding test coverage — alongside a set of targeted bug fixes in the SBML→BNGL atomizer and the simulator/runner stack.

Work was developed with assistance from AI coding agents (Jules), with all commits co-authored and reviewed; Co-authored-by trailers credit google-labs-jules[bot] on ~115 commits.

Commit breakdown by category

Category Count
Bug fixes 113
Code health / refactor 101
Test coverage 88
Performance 33
Security 18
Features 11
PR merges / integration 14
Docs 1
Misc / formatting 40

Most-affected files

atomizer/sbml2bngl.py, core/tools/visualize.py, modelapi/runner.py, atomizer/bngModel.py, simulator/csimulator.py, atomizer/libsbml2bngl.py, modelapi/bngfile.py, and merging/namingDatabase.py, plus a large number of new and expanded test modules under tests/.


1. Security fixes

Hardening focused on unsafe deserialization, code injection vectors, XML parsing, path traversal, and leaked secrets.

  • Insecure deserialization removed in multiple modules: replaced unsafe pickle.load with json.load in atomizer/contactMap.py and annotationComparison.py, and fixed insecure deserialization in detectOntology.py.
  • eval()ast.literal_eval() across the atomizer and postAnalysis.py to remove arbitrary-code-execution risk.
  • ast.literal_eval DoS vulnerabilities fixed in postAnalysis.py and analyzeTrends (untrusted assumption strings), with one path further hardened by switching to json.loads.
  • XXE (XML External Entity) mitigation in readBNGXML.py and a full migration to defusedxml for safe XML parsing.
  • Path traversal in tarfile.extractall resolved.
  • Insecure YAML loading replaced (yaml.loadyaml.safe_load).
  • Hardcoded BioGRID API key removed from pathwaycommons.py; the key is now read from the BIOGRID_API_KEY environment variable with a graceful warning-and-skip fallback when unset.
  • Secure handling of missing molec.name in the atomizer.

2. Performance optimizations

  • NamingDatabase SQLite layer: persistent connection handling and reduced per-query overhead, reported as a ~3.7x query speedup; fixed an N+1 query in annotation insertion; batched annotation-ID caching to avoid full-table refetches; batched SQL in namespace detection.
  • String concatenation hot paths converted to join/buffered builds in bnglReaction, bnglWriter.py, BNGResult.__repr__, bngmodel.__str__, and side_string.
  • Membership checks converted from list/tuple scans to set lookups in tight loops (moleculeCreation, smallStructures, classifyReactions, atomizer constant lists).
  • Regex compilation/substitution caching in bnglWriter.py and distanceToModification; Levenshtein distance memoized in analyzeSBML.
  • Replaced type() == list checks with isinstance(), removed redundant .keys() lookups and len() == 0 checks, and added early-exit traversal in deleteMolecule.
  • Cached queryActiveSite within the resolveSCT loop.

3. Code health & refactoring

  • Error-handling modernization: transitioned assert statements to structured BNGError/BNGParseError plus BNGLogger across reactions parsing, csimulator init, network.py, and model parsing; replaced bare except clauses with specific exceptions; converted silent assignment failures to logged warnings.
  • os.chdir / CWD discipline: a large sustained effort to eliminate working-directory leaks — refactored VisResult and visualize.py to avoid os.chdir, wrapped directory changes in try/finally, replaced a module-level app instantiation that corrupted CWD, and used TemporaryDirectory contexts correctly. This resolved widespread Windows PermissionError/WinError 32 and FileNotFoundError failures in CI.
  • Topological sorting: function/dependency ordering in libsbml2bngl.py and bngModel.py refactored to use networkx topological sort (Kahn's algorithm).
  • Resolved numerous TODO/FIXME items and removed dead code throughout the atomizer, network structs, and model API; standardized shutil import (dropped the import shutil as spawn alias); refactored oversized files (e.g. splitting sbml2bngl.py helpers into utility modules) and long methods (ActionList.__init__).
  • Repo-wide black formatting and removal of stray test/build artifacts.

4. Bug fixes

  • SBML→BNGL converter correctness: fixed reaction-rate stoichiometry symmetry factors (removing an incorrect mathematical hack), mass-action kinetics detection, compartment removal in rate expressions, constant species-name parsing, parameter-namespace handling for observables and assignment rules, and correct updating of observablesDict when overridden by assignment rules. Fixed nl/nr assignment when reactants/products appear in rate expressions.
  • Atomizer: outer-compartment logic in Pattern, dimer component classification, volume adjustment for multiple species in rate functions, double-modification queueing in SCTSolver, and parameter namespace collisions in bngModel assignment rules.
  • Graph diff (gdiff): fixed single-node-to-list conversion and nested node recoloring/renaming.
  • Runner/simulator: bngl2xml now runs with a multiprocessing/subprocess timeout; fixed runner.py SectionProxy attribute error; restored CWD in finally blocks across visualize/runner/atomizer/simulator.
  • CLI: BioNetGen version is auto-loaded via module metadata for the --version flag (after one revert/re-land cycle); fixed BNGPATH resolution when BNG2.pl is on PATH.
  • Empty ListOfBonds tag handling in xmltodict parsing; invalid escape sequences in parameter rate-rule regexes; setup.py duplicate manifest inclusions.

5. New features

  • Parsing support for Include/Exclude Reactants/Products rule modifiers.
  • PSA simulation arguments added (poplevel, check_product_scale) and a psa simulation method.
  • Mathematical parameter evaluation via sympy in ParameterBlock / NetworkBlock.add_item.
  • BNGResult extension-based file filtering; standalone find_dat_files / load_results methods; consolidated path argument in the constructor.
  • A verbosity option for the parser and XML parsing logging.
  • add_bngl_function helper migrated into the bngModel class.

6. Test coverage

~88 commits add or expand tests, with new suites for pathwaycommons (BioGRID/Reactome/Uniprot query paths and HTTPError fallbacks), detectOntology.levenshtein, sbml2json utilities, the csimulator/libRRSimulator wrappers, BNGSimulator properties, gdiff, sympy_odes, Pattern equality, ModelObj/ActionBlock operations, the runner, and the notebook/graphdiff CLI commands. Several commits specifically add exception-path and edge-case coverage (e.g. add_block, _safe_rmtree OS errors, _get_color_id).


Notes for reviewers

  • The history includes a number of CI-stabilization commits (Windows file-locking/CWD fixes, black formatting, pytest-mockunittest.mock, Python 3.8 compatibility for parenthesized with) that account for some apparent duplication in the log; the net effect is a green, cross-platform test suite.
  • Commits 285–297 (Merging PR #340#367) and a few others fold in earlier sub-PRs from the same branch series.
  • Given the size, this is best reviewed by category (security → performance → atomizer correctness → tests) rather than commit-by-commit. Happy to split into smaller PRs along those lines if preferred.

google-labs-jules Bot and others added 30 commits June 2, 2026 14:42
Co-authored-by: akutuva21 <44119804+akutuva21@users.noreply.github.com>
…710193914332157

⚡ perf: fix N+1 query bottleneck in naming database annotation insertion
…6102342659150753

🧪 Test value/type error handling in structs
…-func-body-5109138640418052702

🧪 Improve testing for _extract_function_body in sympy_odes.py
…7189199558942

🧹 Refactor to add warning for renaming reserved keyword e in Atomizer
…3611885534457943203

🧪 test: add test coverage for Pattern.canonicalize ImportError
…17997413746

🧹 Refactor rawSpeciesName sorting to fix TODO
…2657053277532450

Fix detection of mass action kinetics in sbml2bngl
…2775437548583

🧹 Refactor BNGResult initialization to consolidate file and folder paths
Adds tests to verify that the `graphdiff` command in the `bionetgen` CLI correctly parses arguments and routes them to the underlying `graphDiff` tool.

Co-authored-by: akutuva21 <44119804+akutuva21@users.noreply.github.com>
Co-authored-by: akutuva21 <44119804+akutuva21@users.noreply.github.com>
The reduceComponentSymmetryFactors method was entirely broken and commented out.
This commit fixes the following issues:
1. Removes the `rReactant` and `rProduct` block which called `append` with 2 arguments instead of 1, because non-constant stoichiometry errors are already checked correctly inside `__getRawRules`.
2. Fixes the broken dictionary comparison logic `rcomponent[key] == 1` by properly indexing the inner element (`rcomponent[key][element] == 1`).
3. Integrates the method into `sbml2bngl.py` by multiplying the computed component-level symmetry factors (`sl_comp, sr_comp`) with the species-level symmetry factors (`sl_spec, sr_spec`) to properly account for both symmetries in the BNGL output.

Co-authored-by: akutuva21 <44119804+akutuva21@users.noreply.github.com>
Added unit tests `test_extract_odes_from_cvode_mex_direct` and `test_extract_odes_from_cvode_mex_inference` to `tests/test_sympy_odes.py` to cover the happy path and inference fallback of cvode mex parsing.

Co-authored-by: akutuva21 <44119804+akutuva21@users.noreply.github.com>
…2893446415596560176

Fix assignment rule renaming breakage
…-11490878634659662031

🧹 [code health improvement] Fix broken logic in assignment rule substitution
…tabase-15527366226394001331

⚡ Optimize dictionary comprehensions in namingDatabase.py
…9383128

🧹 Remove commented out adjust_concentrations method
…95465865337956691

⚡ Optimize regular expression performance in bnglWriter.py
…47221959

🧪 test: add cli argument tests for graphdiff command
akutuva21 added 30 commits June 10, 2026 13:55
…13061356770607576383'

# Conflicts:
#	bionetgen/atomizer/writer/bnglWriter.py
…on-5986959215111163739'

# Conflicts:
#	bionetgen/atomizer/writer/bnglWriter.py
…5030303034692340617'

# Conflicts:
#	bionetgen/atomizer/writer/bnglWriter.py
…7027425954342234185'

# Conflicts:
#	bionetgen/atomizer/writer/bnglWriter.py
…graph-10639770207778011296'

# Conflicts:
#	bionetgen/atomizer/writer/bnglWriter.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants