SecretService and Provider, filter out secret environment variables by labkey-matthewb · Pull Request #7671 · LabKey/platform

labkey-matthewb · 2026-05-15T19:20:18Z

Rationale

We want to make it easy to identify configuration that's considered secret, like API keys or other credentials.

Changes

Scrub sensitive environment variables for reporting in Admin Console and forking external processes
Pull secrets from startup properties or environment variables
Adopt ProcessBuilder variant that does the scrubbing
Admin Console reporting for secrets and their sources (not their values)

environment variable implementation startup ordering

github-actions · 2026-05-15T19:20:31Z

WARNING: This PR appears to have the default title generated by GitHub. Please use something more descriptive.

labkey-gokhano

Below is the Gemini provided feedback. I am positive that you'll find some of it useful (see #2 and expanded block list below) at least.

Thanks,

-Gokhan

It is great to see a proactive approach to preventing secret leakage, especially when dealing with extensibility features that execute customer-provided code. Untrusted scripts (R/Python) are notorious for being "too helpful" and dumping their entire environment to logs or error outputs.

However, relying on a blocklist (deny-list) approach—searching for keywords like "password" or "secret"—is a bit like playing a game of whack-a-mole where the mole is invisible.

Here is an analysis of your current implementation and how to make it more robust.

🚩 Critical Security Concerns

1. The "Allowlist" vs. "Blocklist" Problem

Your current code uses a blocklist. If a developer stores a secret in an environment variable named AUTH_CREDENTIAL or AWS_ACCESS_KEY_ID, your current filter will likely miss it.

The Risk: In security, blocklists are never exhaustive.
The Fix: Use an Allowlist. Instead of trying to remove "bad" variables, only pass the "good" ones that the process actually needs (e.g., PATH, HOME, LANG).

2. Mutability After Instantiation

In your current code, sanitizeEnvironment() is called in the constructor. However, the ProcessBuilder.environment() map is mutable.

The Leak: A developer could do this:

LabKeyProcessBuilder lkpb = new LabKeyProcessBuilder("python", "script.py");
lkpb.environment().put("SUPER_SECRET_KEY", "12345"); // This will NOT be sanitized!
lkpb.start();

The Fix: Call sanitizeEnvironment() inside the start() method, right before _pb.start() is executed.

🛠 Suggested Improvements

Enhanced Keyword List

If you must stick to a blocklist approach, you are missing several high-probability keywords. Consider adding:

CREDENTIAL, AUTH, PRIVATE, SECRET, PWD, SIGNATURE, CERT, PASSPHRASE.

Defense in Depth: AWS IAM

Since you are using AWS SSM Parameter Store, the most effective protection isn't just cleaning the environment; it’s restricting the process's IAM identity.

Ensure the process runs under a specific IAM Role that does not have ssm:GetParameter or ssm:GetParametersByPath permissions.
Even if the script discovers the parameter name, it won't have the authority to fetch the value.

💻 Refined Implementation

Here is a version that addresses the timing issue and adds a more robust keyword check.

public class LabKeyProcessBuilder {
    private final ProcessBuilder _pb;

    // ... (constructors remain similar)

    public Process start() throws IOException {
        // SANITIZE RIGHT BEFORE STARTING
        // This catches variables added after constructor calls
        sanitizeEnvironment();
        return _pb.start();
    }

    private void sanitizeEnvironment() {
        _pb.environment().keySet().removeIf(LabKeyProcessBuilder::isSecret);
    }

    public static boolean isSecret(String propertyName) {
        if (propertyName == null) return false;
        
        String lc = propertyName.toLowerCase();
        SecretService secrets = ServiceRegistry.get().getService(SecretService.class);

        // Expanded blocklist
        boolean matchesKeyword = lc.contains("secret") || 
                                 lc.contains("password") || 
                                 lc.contains("apikey") || 
                                 lc.contains("_key") || 
                                 lc.contains("token") ||
                                 lc.contains("auth") ||
                                 lc.contains("credential") ||
                                 lc.contains("signature") ||
                                 lc.contains("private");

        return matchesKeyword || (secrets != null && secrets.isRegisteredSecret(propertyName));
    }
}

🧱 The "Nuclear" Option (Best Practice)

For high-security environments, you should isolate these customer scripts entirely.

Docker/Podman Containers: Run the R/Python scripts inside a container with zero environment variables passed in except for a specific INPUT_FILE path.
Environment Scrubbing: Use _pb.environment().clear() to wipe everything, then selectively add only the bare minimum required for the runtime to function (like PATH and TMPDIR).

Have you considered using an allowlist of "known safe" variables instead of trying to filter out the "known bad" ones?

labkey-jeckels · 2026-05-29T16:12:09Z

The immediate need is to ensure that we don't start leaking LLM API keys to launched processes. This approach ensures that the known variables for setting those settings, and anything in the future that's registered as a secret, are not propagated.

An allowlist would be more robust but I don't have a viable pathway for generating the list, and certainly not for on-premise customers that might have their own Python or R environment requirements. As you know, we are working towards moving those script executions out of forked processes and into containers, but that's well beyond the scope of this PR. I'd rather invest in that effort instead of refining the allowed/blocked variables.

If you have concerns about propagating specific environment variables on any of our cloud servers I'm happy to adjust the blocked list at any point. FWIW, I've reviewed the current list and didn't see anything troubling.

labkey-gokhano · 2026-05-29T17:16:52Z

I agree with your Josh's assessment of the Gemini feedback. I think one nice to have fix would be for issue#2 in the Gemini feedback:

Fix:** Call sanitizeEnvironment() inside the start() method, right before _pb.start() is executed.

labkey-jeckels · 2026-05-29T18:36:10Z

I agree with your Josh's assessment of the Gemini feedback. I think one nice to have fix would be for issue#2 in the Gemini feedback:

Fix:** Call sanitizeEnvironment() inside the start() method, right before _pb.start() is executed.

I looked at that earlier but didn't comment on it.

While I think it would be fine in practice, I'm not sure what value it's providing. What specifically are we trying to protect against? I find it hard to imagine code accidentally injecting secrets like this.

labkey-matthewb · 2026-05-29T18:43:03Z

I agree with your Josh's assessment of the Gemini feedback. I think one nice to have fix would be for issue#2 in the Gemini feedback:
Fix:** Call sanitizeEnvironment() inside the start() method, right before _pb.start() is executed.

I looked at that earlier but didn't comment on it.

While I think it would be fine in practice, I'm not sure what value it's providing. What specifically are we trying to protect against? I find it hard to imagine code accidentally injecting secrets like this.

Also, we do intentionally inject some secrets. E.g. we may add an apikey or sessionkey so the report/transform can call back into labkey with the users credentials. (Or I think we may).

labkey-jeckels · 2026-05-29T20:12:20Z

@labkey-gokhano I went ahead and merged this PR because I believe it's a strict improvement over the status quo. I will be shortly opening another batch of PRs to move remaining ProcessBuilder uses to LabKeyProcessBuilder. Open to further changes here but I'd like to understand if we're trying to protect against accidental leaks, malicious developers, or something else.

labkey-matthewb added 3 commits May 14, 2026 18:51

checkpoint initial implementation of service interface SecretService

c59da11

comment out the time stuff for now

e49b086

environment variable implementation startup ordering

duplicate test

d612904

labkey-matthewb added 2 commits May 15, 2026 13:15

fix mcp init

1dd9bf2

Merge remote-tracking branch 'origin/develop' into fb_secretservice

7d3aab5

labkey-matthewb requested a review from labkey-jeckels May 18, 2026 17:49

labkey-jeckels added 4 commits May 20, 2026 21:56

Merge branch 'refs/heads/develop' into fb_secretservice

4b7905c

Support SSM-backed properties in application.properties and secrets

6c57b12

Assorted fixes

2de6ddb

Misc improvements

be25967

labkey-jeckels changed the title ~~Fb secretservice~~ SecretService and Provider, filter out secret environment variables May 22, 2026

labkey-jeckels mentioned this pull request May 22, 2026

Support pulling application.properties configuration and secrets from AWS SSM LabKey/server#1388

Merged

labkey-jeckels marked this pull request as ready for review May 22, 2026 21:12

labkey-jeckels added 3 commits May 22, 2026 16:05

Consolidate secret redaction logic; revert MCP embedding/model change

2631e37

Minor tweaks

6ed0b80

Merge branch 'develop' into fb_secretservice

b702237

labkey-jeckels assigned labkey-jeckels and labkey-matthewb and unassigned labkey-jeckels May 23, 2026

labkey-jeckels requested a review from labkey-gokhano May 23, 2026 01:48

labkey-jeckels added 2 commits May 27, 2026 11:12

Merge branch 'develop' into fb_secretservice

9b95d8a

Merge branch 'refs/heads/develop' into fb_secretservice

123afe5

labkey-jeckels approved these changes May 28, 2026

View reviewed changes

Merge branch 'refs/heads/develop' into fb_secretservice

dbde3bf

labkey-gokhano reviewed May 29, 2026

View reviewed changes

labkey-gokhano approved these changes May 29, 2026

View reviewed changes

labkey-jeckels merged commit 459fdff into develop May 29, 2026
14 checks passed

labkey-jeckels deleted the fb_secretservice branch May 29, 2026 20:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SecretService and Provider, filter out secret environment variables#7671

SecretService and Provider, filter out secret environment variables#7671
labkey-jeckels merged 15 commits into
developfrom
fb_secretservice

labkey-matthewb commented May 15, 2026 •

edited by labkey-jeckels

Loading

Uh oh!

github-actions Bot commented May 15, 2026

Uh oh!

labkey-gokhano left a comment •

edited

Loading

Uh oh!

labkey-jeckels commented May 29, 2026

Uh oh!

labkey-gokhano commented May 29, 2026

Uh oh!

labkey-jeckels commented May 29, 2026

Uh oh!

labkey-matthewb commented May 29, 2026 •

edited

Loading

Uh oh!

Uh oh!

labkey-jeckels commented May 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

labkey-matthewb commented May 15, 2026 • edited by labkey-jeckels Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rationale

Changes

Uh oh!

github-actions Bot commented May 15, 2026

Uh oh!

labkey-gokhano left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

🚩 Critical Security Concerns

1. The "Allowlist" vs. "Blocklist" Problem

2. Mutability After Instantiation

🛠 Suggested Improvements

Enhanced Keyword List

Defense in Depth: AWS IAM

💻 Refined Implementation

🧱 The "Nuclear" Option (Best Practice)

Uh oh!

labkey-jeckels commented May 29, 2026

Uh oh!

labkey-gokhano commented May 29, 2026

Uh oh!

labkey-jeckels commented May 29, 2026

Uh oh!

labkey-matthewb commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

labkey-jeckels commented May 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

labkey-matthewb commented May 15, 2026 •

edited by labkey-jeckels

Loading

labkey-gokhano left a comment •

edited

Loading

labkey-matthewb commented May 29, 2026 •

edited

Loading