Skip to content

SecretService and Provider, filter out secret environment variables#7671

Merged
labkey-jeckels merged 15 commits into
developfrom
fb_secretservice
May 29, 2026
Merged

SecretService and Provider, filter out secret environment variables#7671
labkey-jeckels merged 15 commits into
developfrom
fb_secretservice

Conversation

@labkey-matthewb
Copy link
Copy Markdown
Contributor

@labkey-matthewb labkey-matthewb commented May 15, 2026

Rationale

We want to make it easy to identify configuration that's considered secret, like API keys or other credentials.

Changes

  • Scrub sensitive environment variables for reporting in Admin Console and forking external processes
  • Pull secrets from startup properties or environment variables
  • Adopt ProcessBuilder variant that does the scrubbing
  • Admin Console reporting for secrets and their sources (not their values)

@github-actions
Copy link
Copy Markdown

WARNING: This PR appears to have the default title generated by GitHub. Please use something more descriptive.

@labkey-jeckels labkey-jeckels changed the title Fb secretservice SecretService and Provider, filter out secret environment variables May 22, 2026
@labkey-jeckels labkey-jeckels marked this pull request as ready for review May 22, 2026 21:12
Copy link
Copy Markdown

@labkey-gokhano labkey-gokhano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Below is the Gemini provided feedback. I am positive that you'll find some of it useful (see #2 and expanded block list below) at least.

Thanks,

-Gokhan

It is great to see a proactive approach to preventing secret leakage, especially when dealing with extensibility features that execute customer-provided code. Untrusted scripts (R/Python) are notorious for being "too helpful" and dumping their entire environment to logs or error outputs.

However, relying on a blocklist (deny-list) approach—searching for keywords like "password" or "secret"—is a bit like playing a game of whack-a-mole where the mole is invisible.

Here is an analysis of your current implementation and how to make it more robust.


🚩 Critical Security Concerns

1. The "Allowlist" vs. "Blocklist" Problem

Your current code uses a blocklist. If a developer stores a secret in an environment variable named AUTH_CREDENTIAL or AWS_ACCESS_KEY_ID, your current filter will likely miss it.

  • The Risk: In security, blocklists are never exhaustive.
  • The Fix: Use an Allowlist. Instead of trying to remove "bad" variables, only pass the "good" ones that the process actually needs (e.g., PATH, HOME, LANG).

2. Mutability After Instantiation

In your current code, sanitizeEnvironment() is called in the constructor. However, the ProcessBuilder.environment() map is mutable.

  • The Leak: A developer could do this:
LabKeyProcessBuilder lkpb = new LabKeyProcessBuilder("python", "script.py");
lkpb.environment().put("SUPER_SECRET_KEY", "12345"); // This will NOT be sanitized!
lkpb.start();
  • The Fix: Call sanitizeEnvironment() inside the start() method, right before _pb.start() is executed.

🛠 Suggested Improvements

Enhanced Keyword List

If you must stick to a blocklist approach, you are missing several high-probability keywords. Consider adding:

  • CREDENTIAL, AUTH, PRIVATE, SECRET, PWD, SIGNATURE, CERT, PASSPHRASE.

Defense in Depth: AWS IAM

Since you are using AWS SSM Parameter Store, the most effective protection isn't just cleaning the environment; it’s restricting the process's IAM identity.

  • Ensure the process runs under a specific IAM Role that does not have ssm:GetParameter or ssm:GetParametersByPath permissions.
  • Even if the script discovers the parameter name, it won't have the authority to fetch the value.

💻 Refined Implementation

Here is a version that addresses the timing issue and adds a more robust keyword check.

public class LabKeyProcessBuilder {
    private final ProcessBuilder _pb;

    // ... (constructors remain similar)

    public Process start() throws IOException {
        // SANITIZE RIGHT BEFORE STARTING
        // This catches variables added after constructor calls
        sanitizeEnvironment();
        return _pb.start();
    }

    private void sanitizeEnvironment() {
        _pb.environment().keySet().removeIf(LabKeyProcessBuilder::isSecret);
    }

    public static boolean isSecret(String propertyName) {
        if (propertyName == null) return false;
        
        String lc = propertyName.toLowerCase();
        SecretService secrets = ServiceRegistry.get().getService(SecretService.class);

        // Expanded blocklist
        boolean matchesKeyword = lc.contains("secret") || 
                                 lc.contains("password") || 
                                 lc.contains("apikey") || 
                                 lc.contains("_key") || 
                                 lc.contains("token") ||
                                 lc.contains("auth") ||
                                 lc.contains("credential") ||
                                 lc.contains("signature") ||
                                 lc.contains("private");

        return matchesKeyword || (secrets != null && secrets.isRegisteredSecret(propertyName));
    }
}

🧱 The "Nuclear" Option (Best Practice)

For high-security environments, you should isolate these customer scripts entirely.

  1. Docker/Podman Containers: Run the R/Python scripts inside a container with zero environment variables passed in except for a specific INPUT_FILE path.
  2. Environment Scrubbing: Use _pb.environment().clear() to wipe everything, then selectively add only the bare minimum required for the runtime to function (like PATH and TMPDIR).

Have you considered using an allowlist of "known safe" variables instead of trying to filter out the "known bad" ones?

@labkey-jeckels
Copy link
Copy Markdown
Contributor

The immediate need is to ensure that we don't start leaking LLM API keys to launched processes. This approach ensures that the known variables for setting those settings, and anything in the future that's registered as a secret, are not propagated.

An allowlist would be more robust but I don't have a viable pathway for generating the list, and certainly not for on-premise customers that might have their own Python or R environment requirements. As you know, we are working towards moving those script executions out of forked processes and into containers, but that's well beyond the scope of this PR. I'd rather invest in that effort instead of refining the allowed/blocked variables.

If you have concerns about propagating specific environment variables on any of our cloud servers I'm happy to adjust the blocked list at any point. FWIW, I've reviewed the current list and didn't see anything troubling.

@labkey-gokhano
Copy link
Copy Markdown

I agree with your Josh's assessment of the Gemini feedback. I think one nice to have fix would be for issue#2 in the Gemini feedback:

Fix:** Call sanitizeEnvironment() inside the start() method, right before _pb.start() is executed.

@labkey-jeckels
Copy link
Copy Markdown
Contributor

I agree with your Josh's assessment of the Gemini feedback. I think one nice to have fix would be for issue#2 in the Gemini feedback:

Fix:** Call sanitizeEnvironment() inside the start() method, right before _pb.start() is executed.

I looked at that earlier but didn't comment on it.

While I think it would be fine in practice, I'm not sure what value it's providing. What specifically are we trying to protect against? I find it hard to imagine code accidentally injecting secrets like this.

@labkey-matthewb
Copy link
Copy Markdown
Contributor Author

labkey-matthewb commented May 29, 2026

I agree with your Josh's assessment of the Gemini feedback. I think one nice to have fix would be for issue#2 in the Gemini feedback:
Fix:** Call sanitizeEnvironment() inside the start() method, right before _pb.start() is executed.

I looked at that earlier but didn't comment on it.

While I think it would be fine in practice, I'm not sure what value it's providing. What specifically are we trying to protect against? I find it hard to imagine code accidentally injecting secrets like this.

Also, we do intentionally inject some secrets. E.g. we may add an apikey or sessionkey so the report/transform can call back into labkey with the users credentials. (Or I think we may).

@labkey-jeckels labkey-jeckels merged commit 459fdff into develop May 29, 2026
14 checks passed
@labkey-jeckels labkey-jeckels deleted the fb_secretservice branch May 29, 2026 20:10
@labkey-jeckels
Copy link
Copy Markdown
Contributor

@labkey-gokhano I went ahead and merged this PR because I believe it's a strict improvement over the status quo. I will be shortly opening another batch of PRs to move remaining ProcessBuilder uses to LabKeyProcessBuilder. Open to further changes here but I'd like to understand if we're trying to protect against accidental leaks, malicious developers, or something else.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants