SecretService and Provider, filter out secret environment variables#7671
Conversation
environment variable implementation startup ordering
|
WARNING: This PR appears to have the default title generated by GitHub. Please use something more descriptive. |
There was a problem hiding this comment.
Below is the Gemini provided feedback. I am positive that you'll find some of it useful (see #2 and expanded block list below) at least.
Thanks,
-Gokhan
It is great to see a proactive approach to preventing secret leakage, especially when dealing with extensibility features that execute customer-provided code. Untrusted scripts (R/Python) are notorious for being "too helpful" and dumping their entire environment to logs or error outputs.
However, relying on a blocklist (deny-list) approach—searching for keywords like "password" or "secret"—is a bit like playing a game of whack-a-mole where the mole is invisible.
Here is an analysis of your current implementation and how to make it more robust.
🚩 Critical Security Concerns
1. The "Allowlist" vs. "Blocklist" Problem
Your current code uses a blocklist. If a developer stores a secret in an environment variable named AUTH_CREDENTIAL or AWS_ACCESS_KEY_ID, your current filter will likely miss it.
- The Risk: In security, blocklists are never exhaustive.
- The Fix: Use an Allowlist. Instead of trying to remove "bad" variables, only pass the "good" ones that the process actually needs (e.g.,
PATH,HOME,LANG).
2. Mutability After Instantiation
In your current code, sanitizeEnvironment() is called in the constructor. However, the ProcessBuilder.environment() map is mutable.
- The Leak: A developer could do this:
LabKeyProcessBuilder lkpb = new LabKeyProcessBuilder("python", "script.py");
lkpb.environment().put("SUPER_SECRET_KEY", "12345"); // This will NOT be sanitized!
lkpb.start();- The Fix: Call
sanitizeEnvironment()inside thestart()method, right before_pb.start()is executed.
🛠 Suggested Improvements
Enhanced Keyword List
If you must stick to a blocklist approach, you are missing several high-probability keywords. Consider adding:
CREDENTIAL,AUTH,PRIVATE,SECRET,PWD,SIGNATURE,CERT,PASSPHRASE.
Defense in Depth: AWS IAM
Since you are using AWS SSM Parameter Store, the most effective protection isn't just cleaning the environment; it’s restricting the process's IAM identity.
- Ensure the process runs under a specific IAM Role that does not have
ssm:GetParameterorssm:GetParametersByPathpermissions. - Even if the script discovers the parameter name, it won't have the authority to fetch the value.
💻 Refined Implementation
Here is a version that addresses the timing issue and adds a more robust keyword check.
public class LabKeyProcessBuilder {
private final ProcessBuilder _pb;
// ... (constructors remain similar)
public Process start() throws IOException {
// SANITIZE RIGHT BEFORE STARTING
// This catches variables added after constructor calls
sanitizeEnvironment();
return _pb.start();
}
private void sanitizeEnvironment() {
_pb.environment().keySet().removeIf(LabKeyProcessBuilder::isSecret);
}
public static boolean isSecret(String propertyName) {
if (propertyName == null) return false;
String lc = propertyName.toLowerCase();
SecretService secrets = ServiceRegistry.get().getService(SecretService.class);
// Expanded blocklist
boolean matchesKeyword = lc.contains("secret") ||
lc.contains("password") ||
lc.contains("apikey") ||
lc.contains("_key") ||
lc.contains("token") ||
lc.contains("auth") ||
lc.contains("credential") ||
lc.contains("signature") ||
lc.contains("private");
return matchesKeyword || (secrets != null && secrets.isRegisteredSecret(propertyName));
}
}🧱 The "Nuclear" Option (Best Practice)
For high-security environments, you should isolate these customer scripts entirely.
- Docker/Podman Containers: Run the R/Python scripts inside a container with zero environment variables passed in except for a specific
INPUT_FILEpath. - Environment Scrubbing: Use
_pb.environment().clear()to wipe everything, then selectively add only the bare minimum required for the runtime to function (likePATHandTMPDIR).
Have you considered using an allowlist of "known safe" variables instead of trying to filter out the "known bad" ones?
|
The immediate need is to ensure that we don't start leaking LLM API keys to launched processes. This approach ensures that the known variables for setting those settings, and anything in the future that's registered as a secret, are not propagated. An allowlist would be more robust but I don't have a viable pathway for generating the list, and certainly not for on-premise customers that might have their own Python or R environment requirements. As you know, we are working towards moving those script executions out of forked processes and into containers, but that's well beyond the scope of this PR. I'd rather invest in that effort instead of refining the allowed/blocked variables. If you have concerns about propagating specific environment variables on any of our cloud servers I'm happy to adjust the blocked list at any point. FWIW, I've reviewed the current list and didn't see anything troubling. |
|
I agree with your Josh's assessment of the Gemini feedback. I think one nice to have fix would be for issue#2 in the Gemini feedback: Fix:** Call |
I looked at that earlier but didn't comment on it. While I think it would be fine in practice, I'm not sure what value it's providing. What specifically are we trying to protect against? I find it hard to imagine code accidentally injecting secrets like this. |
Also, we do intentionally inject some secrets. E.g. we may add an apikey or sessionkey so the report/transform can call back into labkey with the users credentials. (Or I think we may). |
|
@labkey-gokhano I went ahead and merged this PR because I believe it's a strict improvement over the status quo. I will be shortly opening another batch of PRs to move remaining |
Rationale
We want to make it easy to identify configuration that's considered secret, like API keys or other credentials.
Changes
ProcessBuildervariant that does the scrubbing