-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TSPS-417 Support environment-specific pipeline input values #188
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is exactly what i was thinking as a solution for this so looks good to me. I did have a question about the logic not related to these changes (sorry!) but I could def be wrong about it.
if (keysToPrependWithStorageURL.contains(keyName)) { | ||
rawValue = storageWorkspaceStorageContainerUrl + allPipelineInputs.get(keyName).toString(); | ||
processedValue = storageWorkspaceStorageContainerUrl + rawValue; | ||
} else if (userProvidedInputFileKeys.contains(inputDefinition.getName())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this isnt apart of what you added but I looked at the code a bit and i think if someone provided an input with a key that matched one of inputs not expected from the user, this would overwrite it. I think for the imputation pipeline the only viable input would be contigs
because the rest are either requried by the user or need to be prepended with teh storage URL (so they get caught in the above if block). But if someone passed in this value to the prepare API i think their contigs value would overwrite the default one here
"pipelineInputs": {
"outputBasename": "min_quota",
"multiSampleVcf": "palantir_merged_input_samples.liftedover.vcf.gz",
"contigs": "my value now!"
},
I think this is becuase nowhere do we exclude any extra inputs from the user provided inputs, we just make sure they have the required entries.
i think https://github.com/DataBiosphere/terra-scientific-pipelines-service/blob/main/service/src/main/java/bio/terra/pipelines/service/PipelinesService.java#L179 coudl be changed to not only log but also remove the extra inputs from the Map
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it does not - in pipelinesService.constructRawInputs (https://github.com/DataBiosphere/terra-scientific-pipelines-service/blob/main/service/src/main/java/bio/terra/pipelines/service/PipelinesService.java#L282), we do start with the user-provided inputs, but then we get all the service-provided inputs and add them, which will overwrite anything that the user provides. (we tested this and confirmed that a user can't override a service-provided input value.)
I personally don't mind that we store all the user inputs, even if they provide extra inputs (which they can't do via CLI but can do via API).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yup yup makes sense. created https://broadworkbench.atlassian.net/browse/TSPS-427 to discuss if we want to deal with extra inputs in the service vs the cli (or some other option)
66e8df5
to
73911e1
Compare
if (!inputsWithCustomValues.containsKey(keyName) | ||
&& !allPipelineInputs.containsKey(keyName) | ||
&& !inputDefinition.isRequired()) { | ||
continue; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
currently we don't add default values for optional user-provided inputs to the pipeline inputs. (we don't have optional inputs in the imputation pipeline.) should we add those if they're present?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea i think that makes sense, every optional user-provided input should have a default in our database and that should be used here.
if (!inputsWithCustomValues.containsKey(keyName) | ||
&& !allPipelineInputs.containsKey(keyName) | ||
&& !inputDefinition.isRequired()) { | ||
continue; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea i think that makes sense, every optional user-provided input should have a default in our database and that should be used here.
@@ -18,10 +18,10 @@ | |||
@SuppressWarnings("java:S107") // Disable "Methods should not have too many parameters" | |||
public class PipelineInputDefinition extends BasePipelineVariableDefinition { | |||
@Column(name = "is_required", nullable = false) | |||
private Boolean isRequired; | |||
private boolean isRequired; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why the change to primitive boolean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in the new method in PipelinesService, when using inputDefinition.isRequired()
when it was a Boolean
, Sonar wanted me to use that annoying Boolean.equals()
syntax to account for possible null values - but since both isRequired
and userProvided
are non-nullable, it seemed safe to convert to primitive boolean.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makes sense
|
||
@Column(name = "user_provided", nullable = false) | ||
private Boolean userProvided; | ||
private boolean userProvided; | ||
|
||
@Column(name = "default_value") | ||
private String defaultValue; // must be a String representation of the value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jsotobroad do you think we should remove the default value in the db for any custom inputs (i.e. refPanelPathPrefix)? either remove or put in "OVERRIDDEN BY CONFIG" as the default value or something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i would rather have the value it has now over "OVERRIDEN BY CONFIG" and i would rather have no value than the other options. If we do remove the value lets make sure we add it to the different environments (including bees/qa) before we merge.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah regardless we need to add to GSM before merge (and that's noted in the PR description, though not as a checklist item).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i lied, i thought it was in the PR description, but it's only in the terra-helmfile PR description. adding to checklist!
String processedValue; | ||
|
||
if (keysToPrependWithStorageWorkspaceContainerUrl.contains(keyName)) { | ||
// the rawValue for this field should start with a / so we don't need to add one here |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't really check this anywhere, not sure if we should add more logic for this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the /
could come from the raw value or the storage workspace container url (probably safer to have it in the storage container URL value since its just one place per environment and is easier to update than a database migration change. I probably wouldnt put a check here but maybe a comment on where we define storage workspace container url in terra helmfile? and both having a /
shouldnt produce an error AFAIK. In fact we could change it to processedValue = storageWorkspaceContainerUrl + "/" + rawValue;
just to cover all bases if we want.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does GCP not care if you have an extra slash?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nope you are totally right it does, i was thinking of http requests not stupid object storage paths. Hmm i kind of like having the /
exist in either the code or the storageWorkspaceContainerUrl (basically in one place) instead of all the values in the database where it'll be added to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added a FileUtils method to take care of this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of comments of the semantic nature 😂 . The logic all looks good and this is definitely easier to understand now.
|
||
/** | ||
* Format the pipeline inputs for a pipeline. Apply the following manipulations: - use custom | ||
* (environment-specific) values for certain service-provided inputs - prepend the storage |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we get these manipulations into a list format, like one line per manipulation. makes it easier to read i think
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ugh yes this keeps getting reformatted poorly, will fix
String processedValue; | ||
|
||
if (keysToPrependWithStorageWorkspaceContainerUrl.contains(keyName)) { | ||
// the rawValue for this field should start with a / so we don't need to add one here |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the /
could come from the raw value or the storage workspace container url (probably safer to have it in the storage container URL value since its just one place per environment and is easier to update than a database migration change. I probably wouldnt put a check here but maybe a comment on where we define storage workspace container url in terra helmfile? and both having a /
shouldnt produce an error AFAIK. In fact we could change it to processedValue = storageWorkspaceContainerUrl + "/" + rawValue;
just to cover all bases if we want.
service/src/main/java/bio/terra/pipelines/service/PipelineInputsOutputsService.java
Show resolved
Hide resolved
private String storageWorkspaceStorageUrl; | ||
private List<String> inputKeysToPrependWithStorageWorkspaceContainerUrl; | ||
private String storageWorkspaceContainerUrl; | ||
private Map<String, Object> inputsWithCustomValues; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we want to have this be Map<String, String>
for now since we thats all we have so far and then if we want to have like a list we can make the necessary changes and test properly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good idea
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also was thinking we may want to put some validation for these config values so we dont have to have a bunch of checks in the actual logic code. Particularly values that are coming from GSM/helmfile. It'll also give us better messages if we catch things here as opposed to the workflow fails cuz of some weird path thing. We can make a constructor similar to the one in LeonardoServerConfiguration and do some assertions there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this actually doesnt make sense after talkign about it, ignore me!
@@ -37,7 +33,7 @@ | |||
|
|||
class PrepareImputationInputsStepTest extends BaseEmbeddedDbTest { | |||
|
|||
@Autowired private PipelinesService pipelinesService; | |||
@Mock PipelineInputsOutputsService pipelineInputsOutputsService; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this need to be mocked?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer it to be mocked since we test the logic of the gathering and formatting of inputs in the service test; for testing the step we just want to make sure it calls the method and puts the output into the working map.
// arguments: user-provided inputs, all pipeline input definitions, expected populated | ||
// inputs | ||
arguments( // one required user input | ||
new HashMap<String, Object>(Map.of("inputName", "user provided value")), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for these arguments does just Map.of()
not work? does it have to be wrapped in a HashMap<>()
? Same question about the List.of
arguments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah it does here. in the stairway step, they need to be HashMaps in order to be serialized/deserialized properly from the working map, but here just Map.of and List.of does work. I'll change that to simplify, thanks!
* optional user-provided inputs that weren't specified by the user with default values. Extra | ||
* user inputs that are not defined in the pipeline are not included. | ||
* | ||
* <p>Note that this method does not perform any validation on the inputs. We expect the inputs to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thank you for adding these expectations. I wish we could make these functions private or something to signify you shouldnt probably be calling these willy nilly but i dont think we can
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like we can make them package-private and still have them testable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice i like it!
PipelineVariableTypesEnum pipelineInputType = inputDefinition.getType(); | ||
|
||
// use custom value if present, otherwise use the value from raw inputs (allRawInputs) | ||
String rawValue = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe rawOrCustomValue
?
new ArrayList<>(List.of("inputName")), // prepend this key with storage workspace url | ||
new HashMap<String, Object>( | ||
Map.of("input_name", "gs://storage-workspace-bucket/value"))), | ||
arguments( // test casting |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it would be nice to have one test where there is more than one item in some of the parameters just to get a more
"realistic" test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do when we test the parent method, but I can add some in these ones too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added a two-input-definition test case here and in the previous set
|
||
@Column(name = "user_provided", nullable = false) | ||
private Boolean userProvided; | ||
private boolean userProvided; | ||
|
||
@Column(name = "default_value") | ||
private String defaultValue; // must be a String representation of the value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i would rather have the value it has now over "OVERRIDEN BY CONFIG" and i would rather have no value than the other options. If we do remove the value lets make sure we add it to the different environments (including bees/qa) before we merge.
* @param userProvidedPipelineInputs - the user-provided inputs | ||
* @return Map<String, Object> allPipelineInputs - the combined inputs | ||
*/ | ||
Map<String, Object> gatherRawInputs( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
apparently not specifying a visibility defaults to package-private: https://docs.oracle.com/javase/tutorial/java/javaOO/accesscontrol.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yup yup makes sense
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a couple a questiosn around the config assertions and the path fixing method and the liquibase naming (you know i love to nit pick that :D). This looks really good though
this.storageWorkspaceContainerUrl = storageWorkspaceContainerUrl; | ||
|
||
for (Map.Entry<String, String> entry : inputsWithCustomValues.entrySet()) { | ||
if (entry.getValue() == null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should also check if they it is empty - can prob use .isBlank()
*/ | ||
public static String constructFilePath(String basePath, String fileName) { | ||
String pathDelimiter = "/"; | ||
if (basePath.endsWith(pathDelimiter) && fileName.startsWith(pathDelimiter)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think we can use the stripTrailing and stripLeading functions in stringutils to just get rid of all leading/trailing /
then just add one ourselves
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh that's nice! thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
those methods don't take parameters and only remove whitespace, but we can achieve the same thing using replace.
this.inputKeysToPrependWithStorageWorkspaceContainerUrl = | ||
inputKeysToPrependWithStorageWorkspaceContainerUrl; | ||
|
||
if (storageWorkspaceContainerUrl.endsWith("/")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any reason why we're checking this? are there other places this could mess with us that we havent accounted for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there shouldn't be other places where this matters, but now that we're constructing paths more smartly this shouldn't be necessary. happy to remove it.
* optional user-provided inputs that weren't specified by the user with default values. Extra | ||
* user inputs that are not defined in the pipeline are not included. | ||
* | ||
* <p>Note that this method does not perform any validation on the inputs. We expect the inputs to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice i like it!
* @param userProvidedPipelineInputs - the user-provided inputs | ||
* @return Map<String, Object> allPipelineInputs - the combined inputs | ||
*/ | ||
Map<String, Object> gatherRawInputs( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yup yup makes sense
tableName: pipeline_input_definitions | ||
columns: | ||
- column: | ||
name: is_custom_value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we want to have this be expects_custom_value
or is_environment_specific
or maybe something else. I feel like i immediately misintepretted this as the value it has is a custom value (which doesnt make sense so maybe not). I dont feel super strongly about it though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh i like expects_custom_value
! i shied away from is_environment_specific
in case there are future cases that warrant custom values that aren't necessarily env-specific.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
awesome thanks for all of this!
Description
Here we add support for service-provided pipeline inputs that have environment-specific values (e.g. the reference panel path prefix for imputation). We achieve this via config values that pull from secrets defined in terra-helmfile.
The logic for formatting the pipeline inputs was largely refactored out of the
PrepareImputationInputsStep
into a method inPipelineInputsOutputsService
. Also refactored some other stuff inPipelinesService
toPipelineInputsOutputsService
.One functional change is that previously we were not adding in the default value of optional inputs, if the user did not provide a value. (Imputation doesn't have any optional inputs.) With this PR, we now add the default value of non-provided optional inputs to the formatted pipeline inputs.
terra-helmfile PR to define the IMPUTATION_REFERENCE_PANEL_PATH_PREFIX_CUSTOM_VALUE secret: https://github.com/broadinstitute/terra-helmfile/pull/6014
Logs when running locally showing the gathered and formatted inputs:
Jira Ticket
https://broadworkbench.atlassian.net/browse/TSPS-417
Checklist (provide links to changes)