Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add initial GitHub integration #422

Open
wants to merge 71 commits into
base: main
Choose a base branch
from
Open

Conversation

hwine
Copy link
Contributor

@hwine hwine commented Dec 3, 2020

Propose a squash merge on this -- or I can reorder & clean up some for a real review.

This branch is already running in jenkins staging without issues. All changes (except as noted) are in the github/ directory, so do not impact other code.

Changes outside of github/ are:

  • Makefile - re-introduction of python version cross check (can be reverted), and some refactoring of finding all python files (could be improved)
  • conftest.py - added in GitHub client, enhanced metadata handling as discussed
  • setup.py - fix for document generation error
  • tox.ini - added, but not used in Makefile yet, so can be reverted
  • cache.py, service_report_generator.py, aws/* (2 files), gcp/* (1 file) - impact of running pyupgrade in my tree, single line syntax change, can be reverted
  • adding bandit to pre-commit checks, with baseline for the one false positive it found
  • adding other hygiene checks to pre-commit which all pass
  • updated secrets baseline, as we no longer have false positives under docs/_build

hwine added 30 commits August 27, 2020 10:18
Needed for pytest to find local `conftest.py` file.
needs frost integration
now can run from any directory
The deleted filew were for testing in dev. Also a few changes for
python3.6 support.
Also added the github/orgs code
Fortunately, fat-fingering a token caused the error paths to be
executed, pointing to some scoping issues. Resolved.
In general works through getting minimal set of JSON to process.
About to move standard being tested into the parametrization id
This now does not break `make doctest`

Also cleaned up comments, added todo's and license
Python >3.6 doesn't work with dataclasses 0.7, which made travis runs
fail. dataclasses 0.6 doesn't fail.

Added tox.ini to easily test this.

N.B. `tox` is expected to be installed globally
Just picked this up in a rebase -- no idea why it's there.
This tests the `foo.retrieve_github_data` calls as a standalone script.
Goal is to have the script + GitHub data importable as a package.

First priority is to have working in frost, though.
Also sets stage to have cli generate csv files for upload to S3 and
possible input to frost.
Good enough output now to verify using frost from Jenkins
The condition of "no protection rules" was not being tested properly.
The hack wasn't required at this location, only in main(). But we're
still called during doctests with token=None. Handle that.
the v4 & v3 id's are immutable, names are not
And some minor code tweaks to pass black & dectect-secrets

Also fixed some potential shell issues
Work done by pyupgrade pre-commit plugin. Included:
- conversion to f-strings where possible
- removal of redundant syntax
- application of Exception hierarchy changes in py 3.3
  (IOError is now a subclass of OSError, and need not be listed
  separately)

This was done to get a baseline before updating to py38 syntax
Commit dcac1cd broke this by performing a dereference at an earlier
stage, and not removing the (now incorrect) later dereference.
It was having trouble with "nbhooks", but we don't need that
@hwine hwine requested a review from ajvb December 3, 2020 20:33
@hwine hwine self-assigned this Dec 3, 2020
Copy link
Contributor

@ajvb ajvb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple comments from my first quick pass.

I really think that some tests should be added, as there is a lot of code here. As well, there are a lot of TODO comments. Maybe not all of them need to be addressed before merging, but if they don't need to be fixed now then maybe we can convert them into follow-up issues and remove the TODO comment?

Also, what exactly is the github/github_schema.py file?

Makefile Outdated Show resolved Hide resolved
Makefile Outdated Show resolved Hide resolved
cache.py Show resolved Hide resolved
conftest.py Outdated Show resolved Hide resolved
conftest.py Show resolved Hide resolved
github/.gitignore Outdated Show resolved Hide resolved
@@ -0,0 +1,10 @@
exemptions:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this should be moved to foxsec-automation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, but then we're saying this can't land until I finish the plumbing in foxsec-automation. A bit of chicken and egg. An option would be to make an issue to move this. ATM, it just has a "test exemption" -- it does have to be moved before any real data is entered.

Another discussion point -- the modern solution is a mono-repo, but we can't do that. :)

github/run.sh Outdated Show resolved Hide resolved
@hwine
Copy link
Contributor Author

hwine commented Dec 4, 2020

I really think that some tests should be added, as there is a lot of code here. As well, there are a lot of TODO

So, this the tension between "land early incomplete, but not buggy" and "wait until it's perfect". The longer it live outside of the production branch, the more cruft may show up. We can discuss.

Also, what exactly is the github/github_schema.py file?

Ah, yes -- that is an autogenerated file by a tool in the graphQL library. Anytime we want to use a newer feature of the GitHub GraphQL API, we need to regenerate this file. That clearly needs some docs, but we don't want to regenerate it at packaging time, as we need a full test suite run.

To clarify the status of the code - it runs, and produces correct output which is stored for metrics usage. It does not yet consume the frost results and take actions. Think of it as the MVP, with more sprints to come.

We can discuss how we handle all this at the frost meetup

@hwine
Copy link
Contributor Author

hwine commented Dec 4, 2020

Also, what exactly is the github/github_schema.py file?

That clearly needs some docs, ...

Updated github/README.md in next commit

# This Source Code Form is subject to the terms of the Mozilla Public
# License, v. 2.0. If a copy of the MPL was not distributed with this
# file, You can obtain one at https://mozilla.org/MPL/2.0/.
"""Perform actions based on frost results."""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider moving to foxsec-automation

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed for now

@@ -0,0 +1,209 @@
# YAML file for GitHub issue templates
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider moving to foxsec-automation

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed

Copy link
Contributor

@ajvb ajvb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Second pass through.

github/branches/test_branch_protection.py Outdated Show resolved Hide resolved
import pytest


def idfn(val: Any) -> Optional[str]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NB: Should this be alongside repos_to_check()? And called get_repo_id or similar?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thinking had been "repos_to_check()" is a pure GitHub thing. "idfns=" is a pytest thing.

I think I can move it to the "BranchOfInterest" class, as I did for "Criteria", unless that feels "too different".

f"ERROR:SOGH001:{data.name_with_owner}:{branch} has no {criteria} rule"
)
return meets_criteria(active_rules, criteria), message
# vscode correctly tells me that all code below here is unreachable
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why keep unreachable code?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because I had an inkling that this would need refactoring -- working on that on another branch


if not active_rules:
# results.append(
assert (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be returning rather than asserting?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe -- part of the refactor, leaving open

return met


def validate_branch_protections(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would love to see a doctest for this func

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Coming soon in (another) branch near you. This is the part I may have to refactor heavily as I automate issue generation.

Leaving this open for now, I'll (hopefully) be able to pull in a solution.

## csv_out.writerow(OrgInfo.csv_header())
## for org in args.orgs:
## row_data = get_org_info(endpoint, org)
## csv_output(row_data, csv_writer=csv_out)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be removed or added behind a new cli opt?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question -- probably the cli prompt (TIL).

The intent is to display a more human readable format when run from the cli

*in_files,
]
# python 3.6 doesn't support capture_output
status = subprocess.run(cmd, capture_output=True) # nosec
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just read the json files and parse them within Python?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because 'jq' is more expressive, and easier to change? (<== my reasons during development)

I'd prefer to move to "jq bindings", and get rid of the "subprocess" ugliness, unless that still raises the same concerns.

@ajvb thoughts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd definitely prefer to use the python std lib over requiring another dependency, especially an additional system package. As well, though it is more expressive, it requires any new developer to learn how to use jq without providing what seems to me to be enough value to justify that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah - there seems to be a great debate on whether "reading jq" is required skill, and whether "jq" should be a standard tool in images. I vote "yes" on both, but ulfr switched ("yes" when I first started, then "meh" later)

I'll rewrite

github/orgs/conftest.py Outdated Show resolved Hide resolved
# real work
# DEV_HACK: find better way to insert dev default
path_to_metadata = os.environ.setdefault(
"PATH_TO_METADATA", "~/repos/foxsec/master/services/metadata"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should probably provide this path as an option?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And, I think an option to the groovy pipeline.

github/branches/retrieve_github_data.py Outdated Show resolved Hide resolved
hwine added 3 commits December 7, 2020 10:32
Mostly cleanup. Remaining TODO items will be done prior to production.
@ajvb ajvb added this to the 0.5.0 milestone Jan 6, 2021
Base automatically changed from master to main January 19, 2021 19:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants