Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Operator refactor - validation and import cleanup #1356

Merged
merged 5 commits into from
Aug 13, 2020

Conversation

pweil-
Copy link

@pweil- pweil- commented Aug 3, 2020

Round 1 - go!

Let's see what clean up we can do here @timflannagan1.

This has the following commits that can be reviewed independently

  1. move the type declarations to their own file and try to comment them up for future readers
  2. since the reporting operator was not exporting anything, introduce an interface and unexport the type. Consumers deal with the interface now
  3. move all the smaller validation blurbs from the various sections that are called during New, newOperator, and Run methods and consolidate on a single validation call to ensure the config is good during New. No good config == no operator to run. Bonus: Add tests for all validations
  4. rename cb* imports to remove references to chargeback

There are a couple questions sprinkled in where things weren't obvious.

@openshift-ci-robot openshift-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Aug 3, 2020
@pweil-
Copy link
Author

pweil- commented Aug 3, 2020

/assign @timflannagan1

log "github.com/sirupsen/logrus"
"github.com/taozle/go-hive-driver"
"github.com/kube-reporting/metering-operator/pkg/db"
cbClientset "github.com/kube-reporting/metering-operator/pkg/generated/clientset/versioned"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're tackling this kind of work now, I would expect to also rename this import and all of its occurrences too, to something like meteringClientset or anything that doesn't involve the old "chargeback" name.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're tackling this kind of work now

We don't have to merge this now, I think it should be no-op so it shouldn't be too risky but bugs first and we can merge these when master opens if that't the right thing to do.

I would expect to also rename this import and all of its occurrences too, to something like meteringClientset or anything that doesn't involve the old "chargeback" name.

Easy enough, I'll add a new commit for this refactor

Copy link
Contributor

@timflannagan timflannagan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can take a better look tomorrow when my brain is probably fresher, but I think this is awesome so far. A lot of this part of the codebase has been somewhat neglected for a while now and could use some general maintenance work.

@timflannagan
Copy link
Contributor

Something else that I would like to tackle, but is probably out of the scope of this kind of work, is improving (or adding) comments to types/methods/functions to the various reporting-operator components, like what does New() ... or op.Run achieve such that a person not familiar with controllers would be able to see what we're trying to achieve with those functions.

Comment on lines +107 to 110
errs := IsValidConfig(&cfg)
if errs != nil {
return nil, errs
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Building off my comment from above, I wonder if we should be doing all of the reporting configuration validation in the cmd/reporting-operator driver function that provides the entry point for this operator, rather than in the constructor.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it can be called from the cmd package but it should not be done there. Commands in Kube/OpenShift follow a pattern of Complete, Validate, Run. We should refactor ours to match and call the IsValidConfig method in the command's Validate method. However we should keep it in new as well to protect against programmatic access to the operator no matter how it's started in the future.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pweil-
Copy link
Author

pweil- commented Aug 4, 2020

improving (or adding) comments to types/methods/functions to the various reporting-operator components,

this is 100% a goal here. Anything we touch in a refactor should get comments IMO, hence the API comments.

@pweil-
Copy link
Author

pweil- commented Aug 4, 2020

Some other observations for the discussion (perhaps this PR, perhaps future)

  1. figure out why we have so many threads in a single operator pod and if we can reduce and maintain performance (possibly removing the need for any mutex guarded code depending on the answer)

  2. reduce the configuration complexity where possible. Example is the namespace configuration where we require AllNamespaces to be set with TargetNamespaces when we could just key on TargetNamespaces and do the right thing. Same for the TLS configs with a boolean that turns it off or on when we could just have the existence of the certs enable it.

  3. move items that are ancillary to the operator into their own packages

pkg/operator/operator.go Outdated Show resolved Hide resolved
@pweil- pweil- mentioned this pull request Aug 4, 2020
16 tasks
@pweil-
Copy link
Author

pweil- commented Aug 4, 2020

time="08-04-2020 13:35:39" level=debug msg="Created the mysql namespace" context=metering-validhdfs-mysqldatabase
error: multiple images or templates matched "mysql:5.7"

The argument "mysql:5.7" could apply to the following Docker images, OpenShift image streams, or templates:

* Image stream "mysql" (tag "8.0") in project "openshift"
  Use --image-stream="openshift/mysql:8.0" to specify this image or template

* Image stream "mysql" (tag "latest") in project "openshift"
  Use --image-stream="openshift/mysql:latest" to specify this image or template

@timflannagan
Copy link
Contributor

time="08-04-2020 13:35:39" level=debug msg="Created the mysql namespace" context=metering-validhdfs-mysqldatabase
error: multiple images or templates matched "mysql:5.7"

The argument "mysql:5.7" could apply to the following Docker images, OpenShift image streams, or templates:

* Image stream "mysql" (tag "8.0") in project "openshift"
  Use --image-stream="openshift/mysql:8.0" to specify this image or template

* Image stream "mysql" (tag "latest") in project "openshift"
  Use --image-stream="openshift/mysql:latest" to specify this image or template

#1353

@pweil-
Copy link
Author

pweil- commented Aug 4, 2020

/retest

2 similar comments
@pweil-
Copy link
Author

pweil- commented Aug 5, 2020

/retest

@pweil-
Copy link
Author

pweil- commented Aug 6, 2020

/retest

@pweil- pweil- changed the title WIP: Operator refactor Operator refactor - validation and import cleanup Aug 7, 2020
@openshift-ci-robot openshift-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Aug 7, 2020
@pweil-
Copy link
Author

pweil- commented Aug 7, 2020

Removing WIP from this - it's probably big enough that we shouldn't add anything else too it yet if we want to merge. I am going to call out a few pieces (in review comments) that I think we need to be aware of though before we merge.

@pweil-
Copy link
Author

pweil- commented Aug 7, 2020

/retest

// TODO this requires a port to be specified, is that one of our requirements?
func isValidHostPort(hp string, name string) error {
if len(hp) > 0 {
if _, _, err := net.SplitHostPort(hp); err != nil {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will fail if a port is not given - can someone confirm that that is a correct behavior?

Copy link
Member

@bentito bentito Aug 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This gets called on api, metrics, pprof, presto, hive

I think it's valid to make the admin config the port for our API.
Metrics = Prom, has a default, might be ok to not spec.
pprof seems like it has a default port as well, so maybe okay to not spec.
Presto has a default port, might be ok to not spec.
Hive has a default thrift service port as well, so maybe that might not need being requeired to specify either.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I will remove the calls/tests for apis that have default ports.

// PrometheusDataSourceMaxBackfillImportDuration overrides PrometheusDataSourceGlobalImportFromTime
// don't set both.
if cfg.PrometheusDataSourceGlobalImportFromTime != nil && cfg.PrometheusDataSourceMaxBackfillImportDuration > 0 {
errs = append(errs, fmt.Errorf("prometheusDataSourceGlobalImportFromTime and prometheusDataSourceMaxBackfillImportDuration cannot both be set"))
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

previously we stated that one overrides the other. Introducing this will cause explicit failure. Acceptable?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure where I land on this validation check. I think it sounds reasonable looking at the codebase and making sure we reject configurations that won't be respected. @bentito any opinion on this validation check.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bentito - thoughts on this one?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading the usage in start.go, it seems like this validates the stated usage. If it's Prom import question, they're both just ways of getting at how far back to try to grab Prom data when metering somehow gets out of sync. I think PrometheusDataSourceGlobalImportFromTime has more potential to be pretty awful for cluster workload, but I don't think that's being asked here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just validation of input. Sounds like I can leave it. Thanks


errs = append(errs, isValidHostPort(cfg.HiveHost, "hiveHost"))

if !cfg.HiveUseTLS {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

previously we allowed config to be set even if this was set to false. Should we continue?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as my comment below. I think this is fine UX, reject any configurations like this if you're disabling TLS on the reporting-operator side for that component.


errs = append(errs, isValidHostPort(cfg.PrestoHost, "prestoHost"))

if !cfg.PrestoUseTLS {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

previously we allowed config to be set even if this was set to false. Should we continue?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would imagine that makes sense, i.e. rejecting a configuration where you explicitly set TLS-related flags yet disable TLS entirely for the reporting-operator -> presto communications.

@pweil-
Copy link
Author

pweil- commented Aug 10, 2020

/retest

@pweil-
Copy link
Author

pweil- commented Aug 11, 2020

@bentito @timflannagan1 updates from the review in the last commit. Thanks.

@pweil-
Copy link
Author

pweil- commented Aug 11, 2020

/retest

5 similar comments
@pweil-
Copy link
Author

pweil- commented Aug 11, 2020

/retest

@pweil-
Copy link
Author

pweil- commented Aug 11, 2020

/retest

@pweil-
Copy link
Author

pweil- commented Aug 11, 2020

/retest

@pweil-
Copy link
Author

pweil- commented Aug 11, 2020

/retest

@timflannagan
Copy link
Contributor

/retest

@timflannagan
Copy link
Contributor

@pweil- @bentito where are we at with this? There's nothing in here that I think we should hold off until 4.7 master opens up so merging this sooner than later is probably best.

@pweil-
Copy link
Author

pweil- commented Aug 12, 2020

@timflannagan1 all review comments from @bentito have been addressed (left the prometheusDataSourceGlobalImportFromTime/prometheusDataSourceMaxBackfillImportDuration and removed the non-api host/port validations and tests).

Unless there is anything else I think this can merge.

@timflannagan
Copy link
Contributor

/lgtm
/approve

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Aug 12, 2020
@openshift-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: pweil-, timflannagan1

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 12, 2020
@pweil-
Copy link
Author

pweil- commented Aug 13, 2020

/retest

@openshift-merge-robot openshift-merge-robot merged commit 9b659da into kube-reporting:master Aug 13, 2020
@timflannagan
Copy link
Contributor

🎉

@pweil- pweil- deleted the operator-refactor branch August 13, 2020 15:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants