Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Module crawl_n_mask to mask secrets in yaml files #2776

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

amartyasinha
Copy link
Contributor

@amartyasinha amartyasinha commented Mar 5, 2025

This PR aims to fix https://issues.redhat.com/browse/OSPRH-14524

  • crawl_n_mask is a new module which is being introduced in CIFMW.
  • Module requires only two arguments,path (type: path) and isdir (type: bool, default: False).
  • This module can process yaml file directly or it can even crawl and find yaml files within provided directory.
  • Masking of remaining log files yet to be done as part of separate PR.
  • Instead of reinventing the wheel, borrowed the mask script from openstack-must-gather and modified according to our requirement.

PR tested: Secrets in yaml files are getting masked through this PR.

Copy link
Contributor

openshift-ci bot commented Mar 5, 2025

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/51ec583fe1f8492783d3d4c49b169388

✔️ openstack-k8s-operators-content-provider SUCCESS in 1h 39m 20s
✔️ podified-multinode-edpm-deployment-crc SUCCESS in 1h 16m 19s
✔️ cifmw-crc-podified-edpm-baremetal SUCCESS in 1h 22m 50s
✔️ noop SUCCESS in 0s
✔️ cifmw-pod-ansible-test SUCCESS in 8m 25s
cifmw-pod-pre-commit FAILURE in 7m 53s
✔️ build-push-container-cifmw-client SUCCESS in 18m 06s

@amartyasinha amartyasinha changed the title WIP: Mask Secrets from logs WIP: Mask Secrets from yaml log files Mar 6, 2025
@amartyasinha amartyasinha force-pushed the mask_secrets branch 5 times, most recently from a751928 to 71fb47b Compare March 6, 2025 10:06
@evallesp

This comment was marked as outdated.

or os.path.basename(self.path).split(".")[0] in ALLOWED_YAML_FILES
):
self._mask_yaml()
# elif self.path.endswith("log"):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(non-blocking) Suggestion: I'd remove this and add it back whenever necessary. Up to you.

def __init__(self, path: Optional[Any] = None) -> None:
self.path: Union[str, None] = path

def mask(self) -> bool:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(blocking) question: I'm unsure why we would need to return here a boolean.
Could you add more context?
Might be good to add docStrings like: https://github.com/openstack-k8s-operators/ci-framework/blob/main/scripts/create_role_molecule.py#L24


class SecretMask:

def __init__(self, path: Optional[Any] = None) -> None:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(blocking) suggestion: I'd suggest changing path type to Optional[str] = None
If there's the possibility to be another type, then you should change L110 to something like Union[Any, None] = path

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done changes

# self._maskLogFile()
return True

def _mask_yaml(self) -> None:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(blocking) let's add some docstrings!

regexes = [gen_regex, con_regex]


class SecretMask:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(blocking) question: how is this going to be used? Is it something we're going to execute at the end with all the list of generated yaml files?
If this script goes wrong, do we want to stop execution so nothing gets leaked?

I'm sure, but returning None doesn't seem correct to me.

I guess we can hold on until a functionality expert for the project chimes in.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the yaml files are logged somehow by saving in any fact, might be good to have in mind a filter: https://www.dasblinkenlichten.com/creating-ansible-filter-plugins/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As of now, we are going to have a script handy. The initial idea was to run the script against the generated logs.

regexes and mask any potential sensitive info.
"""
for pattern in regexes:
value = re.sub(pattern, r"\1{}".format(MASK_STR), value, flags=re.I)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(blocking) suggestion: I see here we're going to loop over all the regexes even though, if the first element matched.
What about checking the returned value if it's the same as the original, so we stop the for loop.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A large string might have multiple secrets. That's why it is checking for all regex patterns using for loop.

elif isinstance(v, list):
for i, item in enumerate(v):
if isinstance(item, dict):
self._applyMask(item)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

praise: Great!

"""
try:
assert self.path is not None
original_content = self._readYaml()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(non-blocking) suggestion: WDYT about instead of executing twice _readYaml, we deep copy in the first execution in self.original_content?

crawl(OPTS.dir)

if OPTS.path is not None and os.path.exists(OPTS.path):
SecretMask(OPTS.path).mask()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(non-blocking) question: Don't we need to check if this file is not in excluded_file_ext_regex ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(non-blocking) question: Don't we need to check if this file is not in excluded_file_ext_regex ?

Yes, that's something I missed. I'll add it.

@amartyasinha amartyasinha force-pushed the mask_secrets branch 3 times, most recently from cabf4d0 to 8b1f9fd Compare March 11, 2025 10:48
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/000a7dff92cc45f4b7f318e26d541343

✔️ openstack-k8s-operators-content-provider SUCCESS in 1h 24m 24s
✔️ podified-multinode-edpm-deployment-crc SUCCESS in 1h 08m 33s
cifmw-crc-podified-edpm-baremetal RETRY_LIMIT in 11m 46s
✔️ noop SUCCESS in 0s
✔️ cifmw-pod-ansible-test SUCCESS in 7m 50s
✔️ cifmw-pod-pre-commit SUCCESS in 7m 31s
✔️ build-push-container-cifmw-client SUCCESS in 17m 33s

Copy link

@evallesp evallesp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking the idea of moving this to an ansible module like: krb_request module. check the file commit history: https://github.com/openstack-k8s-operators/ci-framework/commits/main/plugins/modules/krb_request.py

"""
for root, _, files in os.walk(SAMPLE_DIR):
for f in files:
print("Processing file %s" % f)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(blocking) suggestion: I'd rather remove this print. This would make unnecessary noise imho.

return list(yaml.safe_load_all(f))
except (FileNotFoundError, yaml.YAMLError) as e:
print(f"Error while reading YAML: {e}")
# sys.exit(-1)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(blocking) suggestion: I'd remove this leftover.

SecretMask.
"""

def _read_yaml_sample(self, path) -> Optional[Union[list, None]]:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(blocking) suggestion: Here it's returning None, but in the caller method we don't check if it's list or None.
I'd suggest just returning List, and if something goes wrong, catching that from the error.
In any case, this is a test case, so I'd not expect to check if the file is there, cause if not, the test would fail in any case.

Could you check the part you're trying to cover, but you don't need to test the actual test?

So here I'd just return the list object without any try except clause or the possibility to return None.

SecretMask(os.path.join(root, f)).mask()


def handle_error(e):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(blocking) suggestion: Somehow we need to handle better the errors.

My suggestion is to groom all the try except blocks, avoid returning None if something went wrong and directly in the except, after logging the problem (or logging the problem here) sys exist 1. So ansible caller could check the return status by checking the return code.

Also, I'm thinking if it'd be good to move this as ansible module. So you install it and you can use it directly instead of ansible.comand: /bin/bash crawl_n_mask.py

This module would receive path and/or and/or folder.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we have some yaml files in logs which are not yaml. If we do sys.exit(-1), then the script will stop when it find such files. Unless those files are not fixed, it is better to return None instead of exiting the script.

hence no masking is applied and we expect the original data
content being the same as the processed one.
"""
if "nochange" in f:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd create two tests here, one when there's something to mask and another when there's no something.
Also I'm thinking the idea to test a bit more in detail for each of the tests in the script.

But first of all, let's check if we want to create an ansible module instead of a script.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've borrowed the script and test cases from openstack-must-gather. The idea was to keep things simple. We can keep improving test cases and the script once we decide how we will run it over our logs dir.

@amartyasinha amartyasinha changed the title WIP: Mask Secrets from yaml log files Mask Secrets from yaml log files Mar 17, 2025
@amartyasinha amartyasinha marked this pull request as ready for review March 17, 2025 05:39
@amartyasinha amartyasinha requested a review from a team as a code owner March 17, 2025 05:39
Copy link
Contributor Author

@amartyasinha amartyasinha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worked on the review of @evallesp

regexes = [gen_regex, con_regex]


class SecretMask:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As of now, we are going to have a script handy. The initial idea was to run the script against the generated logs.


class SecretMask:

def __init__(self, path: Optional[Any] = None) -> None:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done changes

regexes and mask any potential sensitive info.
"""
for pattern in regexes:
value = re.sub(pattern, r"\1{}".format(MASK_STR), value, flags=re.I)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A large string might have multiple secrets. That's why it is checking for all regex patterns using for loop.

for processing.
"""
try:
assert self.path is not None
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, cleaned it up.

SecretMask(os.path.join(root, f)).mask()


def handle_error(e):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we have some yaml files in logs which are not yaml. If we do sys.exit(-1), then the script will stop when it find such files. Unless those files are not fixed, it is better to return None instead of exiting the script.

hence no masking is applied and we expect the original data
content being the same as the processed one.
"""
if "nochange" in f:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've borrowed the script and test cases from openstack-must-gather. The idea was to keep things simple. We can keep improving test cases and the script once we decide how we will run it over our logs dir.

Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/9bef9ebb6e9f4b3db5fc8ccfabeedd9c

✔️ openstack-k8s-operators-content-provider SUCCESS in 1h 32m 48s
✔️ podified-multinode-edpm-deployment-crc SUCCESS in 1h 16m 41s
cifmw-crc-podified-edpm-baremetal FAILURE in 26m 06s
✔️ noop SUCCESS in 0s
✔️ cifmw-pod-ansible-test SUCCESS in 9m 09s
✔️ cifmw-pod-pre-commit SUCCESS in 8m 19s
✔️ build-push-container-cifmw-client SUCCESS in 16m 55s
✔️ cifmw-molecule-artifacts SUCCESS in 4m 20s

@amartyasinha
Copy link
Contributor Author

recheck

@amartyasinha
Copy link
Contributor Author

recheck

Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/ca7cbf5355de47869f7119537fb86bde

openstack-k8s-operators-content-provider FAILURE in 15m 28s
⚠️ podified-multinode-edpm-deployment-crc SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider
⚠️ cifmw-crc-podified-edpm-baremetal SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider
✔️ noop SUCCESS in 0s
✔️ cifmw-pod-ansible-test SUCCESS in 9m 10s
✔️ cifmw-pod-pre-commit SUCCESS in 8m 12s
✔️ build-push-container-cifmw-client SUCCESS in 18m 09s
✔️ cifmw-molecule-artifacts SUCCESS in 4m 29s

@amartyasinha
Copy link
Contributor Author

recheck

danpawlik
danpawlik previously approved these changes Apr 2, 2025
Copy link
Contributor

@danpawlik danpawlik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good, but someone more involved in that issue should approve it.


params = module.params
path = params["path"]
isdir = params["isdir"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if someone puts a wrong value of isdir, like passes a yaml file and sets isdir: true or passes a dir patch and sets isdir false by mistake?

Copy link
Contributor Author

@amartyasinha amartyasinha Apr 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • When a file is passed and isdir is set to true, then crawl() will be called with that file as parameter, and within crawl(), os.walk() will fail, and will raise an exception through handle_walk_errors().

  • If a dir is passed and isdir is set to false, then mask() will be called, and it will return when if condition does not match. Since there is no dedicated validation of isdir and path, it may be confusing.

@frenzyfriday
Copy link
Collaborator

Hey @amartyasinha just wondering if you considered using a tool like https://github.com/gitleaks/gitleaks
It's already in use in Prow, implemented by the openshift-ci team. Maybe you know a reason we can't use this project like it won't detect the types of secrets we have or something.

@lewisdenny Thanks for suggesting gitleaks. Just checked it out and it's amazing. I scanned a log directory and it was able to find most of the secrets. We can modify their config to have our own secret detection rules (to cover the detection of remaining secrets).
We have two options now:

  • First is to go with the basic Ansible module which will only scan yaml files.
  • Another option is to look into gitleaks, create a config according to our requirement, and add a way to mask the secrets (that's something gitleaks doesn't provide).

What's your opinion on that?

My recommendation when it comes to security is to not roll your own. If gitleaks severs our purpose then it may be the safer route but let's hear from other maintainers as well.

Maybe we use this module as we already have it to prevent immediate leaks and as a follow up patch look into gitleaks?

Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/9285693a67124896a8cdff2910df5281

openstack-k8s-operators-content-provider FAILURE in 12m 50s
⚠️ podified-multinode-edpm-deployment-crc SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider
⚠️ cifmw-crc-podified-edpm-baremetal SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider
✔️ noop SUCCESS in 0s
✔️ cifmw-pod-ansible-test SUCCESS in 8m 26s
✔️ cifmw-pod-pre-commit SUCCESS in 8m 12s
✔️ build-push-container-cifmw-client SUCCESS in 17m 39s
✔️ cifmw-molecule-artifacts SUCCESS in 4m 45s

@amartyasinha
Copy link
Contributor Author

recheck

Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/0ad00280813d4a0e80b0ba12bb834d4d

✔️ openstack-k8s-operators-content-provider SUCCESS in 1h 29m 24s
✔️ podified-multinode-edpm-deployment-crc SUCCESS in 1h 12m 47s
cifmw-crc-podified-edpm-baremetal FAILURE in 58m 08s
✔️ noop SUCCESS in 0s
✔️ cifmw-pod-ansible-test SUCCESS in 8m 49s
✔️ cifmw-pod-pre-commit SUCCESS in 8m 13s
✔️ build-push-container-cifmw-client SUCCESS in 21m 29s
✔️ cifmw-molecule-artifacts SUCCESS in 5m 01s

@amartyasinha
Copy link
Contributor Author

recheck

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants