refactor(tools): Improve patch parser logic and add unit tests#375
refactor(tools): Improve patch parser logic and add unit tests#375sahsagar-google wants to merge 5 commits intomasterfrom
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: sahsagar-google The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
Here is a sample output from a run: |
mfielding
left a comment
There was a problem hiding this comment.
This change almost doubles the size of the file and I have trouble understanding what it's doing. Can the change be made while maintaining the existing structure?
Alternately, the whole thing could be reimplemented, but I'd hope to see unit tests to prove it works under the various if cases. Maybe actually modify the files in that case?
(Separately, we'll need all the patch categories, but I think you're already working on that)
@mfielding do you think the sample output below covers patch categories that work for us? |
Fixing url regex and lower-casing boolean values in the output
|
/retest |
|
/test oracle-toolkit-install-data-guard-on-gcp |
Context
The tools/gen_patch_metadata.py script is critical for maintainers.
This PR modifies it in the following way-
Refactoring gen_patch_metadata.py to be much more robust, resilient, and intelligent.
Adding a new unit test script, test_patch_parser.py, to validate this new logic against our existing patch definitions.
Updating the tools/README.md to document the new script and its more helpful output.
Summary of Changes
1. gen_patch_metadata.py (Major Refactor)
This script was almost completely rewritten to improve parsing reliability and provide better maintainer guidance.
Smarter Parsing (parse_patch):
Old: Relied only on <title> tags in README.html. It would fail if a README was missing or the title was ambiguous.
New: The parse_patch function is now much more robust. It's broken into helpers that:
Read PatchSearch.xml first to definitively get the base release, patch release, and patch abstract.
Find all numeric subdirectories within the patch.
Analyze the content of both README.html and README.txt for keywords (like ojvm, database, gi, etc.) to identify the component type.
Ambiguity Handling:
Old: Would crash with an assert error if it couldn't find a GI or OJVM component.
New: If the README analysis is ambiguous, the script now logs an ERROR, makes an educated "guess," and proceeds. This allows the script to run to completion and is critical for the unit test to function.
Improved Output:
Old: Printed a single, rigid YAML block assuming a RU and RU_Combo pairing.
New: The script now prints the patch's abstract (for context) and then provides all four possible YAML snippets (e.g., GI_RU, DB_RU, RU_Combo, DB_OJVM_RU). This empowers the maintainer to use the abstract to select the correct YAML definitions.
OPatch Download:
Logic was extracted into its own download_opatch function.
It's now smarter and attempts to find the correct OPatch version for the specific database release (e.g., "19c") instead of using a hardcoded version.
2. test_patch_parser.py (New File)
This new unit test script validates the new parsing logic in gen_patch_metadata.py.
It loads all patch definitions from the production gi_patches.yml and rdbms_patches.yml.
For every 2-component combo patch, it:
Downloads the patch .zip from the gcp-oracle-software GCS bucket.
Calls the new gen_patch_metadata.parse_patch function.
Asserts that the parsed base_release and patch_release match the YAML.
Intelligently uses assertSetEqual to compare the set of subdirs. This is key, as it allows the test to pass even if the parser's "guess" (OJVM vs. Other) is different from the YAML, as long as the two correct directories were found.
It includes a skip-list for known obsolete/unavailable 12.1.0.2 patches (per team feedback) so the test suite can run to completion.
3. tools/README.md (Updated)
Updated the gen_patch_metadata.py sample output to show its new, more informative block (with the abstract and multiple YAML options).
Added a new, comprehensive section for test_patch_parser.py, detailing:
How the test works.
Step-by-step instructions for installing dependencies (pip install ..., gcloud auth ...) and running the test.
A guide to "Understanding the Test Output," explaining why Skipping... (for 21c patches) and ambiguous... GUESSING... messages are normal and expected.