-
Notifications
You must be signed in to change notification settings - Fork 120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Module crawl_n_mask to mask secrets in yaml files #2776
base: main
Are you sure you want to change the base?
Module crawl_n_mask to mask secrets in yaml files #2776
Conversation
Skipping CI for Draft Pull Request. |
Build failed (check pipeline). Post https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/51ec583fe1f8492783d3d4c49b169388 ✔️ openstack-k8s-operators-content-provider SUCCESS in 1h 39m 20s |
a751928
to
71fb47b
Compare
This comment was marked as outdated.
This comment was marked as outdated.
scripts/crawl_n_mask.py
Outdated
or os.path.basename(self.path).split(".")[0] in ALLOWED_YAML_FILES | ||
): | ||
self._mask_yaml() | ||
# elif self.path.endswith("log"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(non-blocking) Suggestion: I'd remove this and add it back whenever necessary. Up to you.
scripts/crawl_n_mask.py
Outdated
def __init__(self, path: Optional[Any] = None) -> None: | ||
self.path: Union[str, None] = path | ||
|
||
def mask(self) -> bool: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(blocking) question: I'm unsure why we would need to return here a boolean.
Could you add more context?
Might be good to add docStrings like: https://github.com/openstack-k8s-operators/ci-framework/blob/main/scripts/create_role_molecule.py#L24
scripts/crawl_n_mask.py
Outdated
|
||
class SecretMask: | ||
|
||
def __init__(self, path: Optional[Any] = None) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(blocking) suggestion: I'd suggest changing path type to Optional[str] = None
If there's the possibility to be another type, then you should change L110 to something like Union[Any, None] = path
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done changes
scripts/crawl_n_mask.py
Outdated
# self._maskLogFile() | ||
return True | ||
|
||
def _mask_yaml(self) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(blocking) let's add some docstrings!
scripts/crawl_n_mask.py
Outdated
regexes = [gen_regex, con_regex] | ||
|
||
|
||
class SecretMask: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(blocking) question: how is this going to be used? Is it something we're going to execute at the end with all the list of generated yaml files?
If this script goes wrong, do we want to stop execution so nothing gets leaked?
I'm sure, but returning None doesn't seem correct to me.
I guess we can hold on until a functionality expert for the project chimes in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the yaml files are logged somehow by saving in any fact, might be good to have in mind a filter: https://www.dasblinkenlichten.com/creating-ansible-filter-plugins/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As of now, we are going to have a script handy. The initial idea was to run the script against the generated logs.
scripts/crawl_n_mask.py
Outdated
regexes and mask any potential sensitive info. | ||
""" | ||
for pattern in regexes: | ||
value = re.sub(pattern, r"\1{}".format(MASK_STR), value, flags=re.I) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(blocking) suggestion: I see here we're going to loop over all the regexes even though, if the first element matched.
What about checking the returned value if it's the same as the original, so we stop the for loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A large string might have multiple secrets. That's why it is checking for all regex patterns using for loop.
scripts/crawl_n_mask.py
Outdated
elif isinstance(v, list): | ||
for i, item in enumerate(v): | ||
if isinstance(item, dict): | ||
self._applyMask(item) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
praise: Great!
scripts/crawl_n_mask.py
Outdated
""" | ||
try: | ||
assert self.path is not None | ||
original_content = self._readYaml() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(non-blocking) suggestion: WDYT about instead of executing twice _readYaml, we deep copy in the first execution in self.original_content?
scripts/crawl_n_mask.py
Outdated
crawl(OPTS.dir) | ||
|
||
if OPTS.path is not None and os.path.exists(OPTS.path): | ||
SecretMask(OPTS.path).mask() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(non-blocking) question: Don't we need to check if this file is not in excluded_file_ext_regex ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(non-blocking) question: Don't we need to check if this file is not in excluded_file_ext_regex ?
Yes, that's something I missed. I'll add it.
71fb47b
to
574f03b
Compare
cabf4d0
to
8b1f9fd
Compare
Build failed (check pipeline). Post https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/000a7dff92cc45f4b7f318e26d541343 ✔️ openstack-k8s-operators-content-provider SUCCESS in 1h 24m 24s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm thinking the idea of moving this to an ansible module like: krb_request module. check the file commit history: https://github.com/openstack-k8s-operators/ci-framework/commits/main/plugins/modules/krb_request.py
scripts/tests/test_crawl_n_mask.py
Outdated
""" | ||
for root, _, files in os.walk(SAMPLE_DIR): | ||
for f in files: | ||
print("Processing file %s" % f) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(blocking) suggestion: I'd rather remove this print. This would make unnecessary noise imho.
scripts/tests/test_crawl_n_mask.py
Outdated
return list(yaml.safe_load_all(f)) | ||
except (FileNotFoundError, yaml.YAMLError) as e: | ||
print(f"Error while reading YAML: {e}") | ||
# sys.exit(-1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(blocking) suggestion: I'd remove this leftover.
scripts/tests/test_crawl_n_mask.py
Outdated
SecretMask. | ||
""" | ||
|
||
def _read_yaml_sample(self, path) -> Optional[Union[list, None]]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(blocking) suggestion: Here it's returning None, but in the caller method we don't check if it's list or None.
I'd suggest just returning List, and if something goes wrong, catching that from the error.
In any case, this is a test case, so I'd not expect to check if the file is there, cause if not, the test would fail in any case.
Could you check the part you're trying to cover, but you don't need to test the actual test?
So here I'd just return the list object without any try except clause or the possibility to return None.
scripts/crawl_n_mask.py
Outdated
SecretMask(os.path.join(root, f)).mask() | ||
|
||
|
||
def handle_error(e): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(blocking) suggestion: Somehow we need to handle better the errors.
My suggestion is to groom all the try except blocks, avoid returning None if something went wrong and directly in the except, after logging the problem (or logging the problem here) sys exist 1. So ansible caller could check the return status by checking the return code.
Also, I'm thinking if it'd be good to move this as ansible module. So you install it and you can use it directly instead of ansible.comand: /bin/bash crawl_n_mask.py
This module would receive path and/or and/or folder.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we have some yaml files in logs which are not yaml. If we do sys.exit(-1), then the script will stop when it find such files. Unless those files are not fixed, it is better to return None instead of exiting the script.
scripts/tests/test_crawl_n_mask.py
Outdated
hence no masking is applied and we expect the original data | ||
content being the same as the processed one. | ||
""" | ||
if "nochange" in f: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd create two tests here, one when there's something to mask and another when there's no something.
Also I'm thinking the idea to test a bit more in detail for each of the tests in the script.
But first of all, let's check if we want to create an ansible module instead of a script.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've borrowed the script and test cases from openstack-must-gather. The idea was to keep things simple. We can keep improving test cases and the script once we decide how we will run it over our logs dir.
8b1f9fd
to
b30ab44
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Worked on the review of @evallesp
scripts/crawl_n_mask.py
Outdated
regexes = [gen_regex, con_regex] | ||
|
||
|
||
class SecretMask: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As of now, we are going to have a script handy. The initial idea was to run the script against the generated logs.
scripts/crawl_n_mask.py
Outdated
|
||
class SecretMask: | ||
|
||
def __init__(self, path: Optional[Any] = None) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done changes
scripts/crawl_n_mask.py
Outdated
regexes and mask any potential sensitive info. | ||
""" | ||
for pattern in regexes: | ||
value = re.sub(pattern, r"\1{}".format(MASK_STR), value, flags=re.I) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A large string might have multiple secrets. That's why it is checking for all regex patterns using for loop.
scripts/crawl_n_mask.py
Outdated
for processing. | ||
""" | ||
try: | ||
assert self.path is not None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, cleaned it up.
scripts/crawl_n_mask.py
Outdated
SecretMask(os.path.join(root, f)).mask() | ||
|
||
|
||
def handle_error(e): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we have some yaml files in logs which are not yaml. If we do sys.exit(-1), then the script will stop when it find such files. Unless those files are not fixed, it is better to return None instead of exiting the script.
scripts/tests/test_crawl_n_mask.py
Outdated
hence no masking is applied and we expect the original data | ||
content being the same as the processed one. | ||
""" | ||
if "nochange" in f: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've borrowed the script and test cases from openstack-must-gather. The idea was to keep things simple. We can keep improving test cases and the script once we decide how we will run it over our logs dir.
7dc1081
to
0cbdb30
Compare
Build failed (check pipeline). Post https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/9bef9ebb6e9f4b3db5fc8ccfabeedd9c ✔️ openstack-k8s-operators-content-provider SUCCESS in 1h 32m 48s |
recheck |
recheck |
0cbdb30
to
bc35d6b
Compare
Build failed (check pipeline). Post https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/ca7cbf5355de47869f7119537fb86bde ❌ openstack-k8s-operators-content-provider FAILURE in 15m 28s |
recheck |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good, but someone more involved in that issue should approve it.
|
||
params = module.params | ||
path = params["path"] | ||
isdir = params["isdir"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if someone puts a wrong value of isdir
, like passes a yaml file and sets isdir: true or passes a dir patch and sets isdir false by mistake?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
When a file is passed and
isdir
is set to true, thencrawl()
will be called with that file as parameter, and withincrawl()
,os.walk()
will fail, and will raise an exception throughhandle_walk_errors()
. -
If a dir is passed and
isdir
is set to false, thenmask()
will be called, and it will return when if condition does not match. Since there is no dedicated validation ofisdir
andpath
, it may be confusing.
bc35d6b
to
d49d36e
Compare
Maybe we use this module as we already have it to prevent immediate leaks and as a follow up patch look into gitleaks? |
Build failed (check pipeline). Post https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/9285693a67124896a8cdff2910df5281 ❌ openstack-k8s-operators-content-provider FAILURE in 12m 50s |
recheck |
Build failed (check pipeline). Post https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/0ad00280813d4a0e80b0ba12bb834d4d ✔️ openstack-k8s-operators-content-provider SUCCESS in 1h 29m 24s |
recheck |
This PR aims to fix https://issues.redhat.com/browse/OSPRH-14524
path
(type: path) andisdir
(type: bool, default: False).PR tested: Secrets in yaml files are getting masked through this PR.