-
Notifications
You must be signed in to change notification settings - Fork 849
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Blog post] - How Hugging Face Scaled Secrets Management for AI Infrastructure #2657
base: main
Are you sure you want to change the base?
Changes from 21 commits
363ebd2
96d080a
7433c6e
1bc44ba
094e768
4e8717e
06905ff
0bbcdc0
1ece503
a59157b
fb8e6e5
93b8b6b
3a73f31
f407b95
24bbe8a
6d85470
99172bb
666cd4a
99022a2
8be43a5
a73b983
413d78b
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -5638,7 +5638,6 @@ | |
- multimodal | ||
- vision | ||
- vlm | ||
|
||
- local: jfrog | ||
title: "Hugging Face and JFrog partner to make AI Security more transparent" | ||
author: mcpotato | ||
|
@@ -5683,4 +5682,15 @@ | |
- llm | ||
- vlm | ||
- community | ||
- research | ||
- research | ||
|
||
- local: scaling-secrets-management | ||
title: "How Hugging Face Scaled Secrets Management for AI Infrastructure" | ||
thumbnail: /blog/assets/infisical/thumbnail.png | ||
author: segudev | ||
guest: true | ||
date: Mar 13, 2025 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Reminder to update |
||
tags: | ||
- security | ||
- infra | ||
- research |
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
@@ -0,0 +1,110 @@ | ||||||
--- | ||||||
title: "How Hugging Face Scaled Secrets Management for AI Infrastructure" | ||||||
thumbnail: /blog/assets/infisical/thumbnail.png | ||||||
authors: | ||||||
- user: segudev | ||||||
guest: true | ||||||
org: Infisical | ||||||
thomas-infisical marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
--- | ||||||
|
||||||
# How Hugging Face Scaled Secrets Management for AI Infrastructure | ||||||
|
||||||
Hugging Face has become synonymous with advancing AI at scale. With over 4 million builders deploying models on the Hub, the rapid growth of the platform necessitated a rethinking of how sensitive configuration data—secrets—are managed. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
|
||||||
Last year, the engineering teams set out to improve the handling of their secrets and credentials. After evaluating tools like HashiCorp Vault, they ultimately chose [Infisical](https://infisical.com/). | ||||||
|
||||||
This case study details their migration to Infisical, explains how they integrated its powerful features, and highlights how it enabled engineers to work more efficiently and securely. | ||||||
|
||||||
## Background | ||||||
|
||||||
As Hugging Face's infrastructure evolved from an AWS-only setup to a multi-cloud environment that includes Azure and GCP, the engineering team needed a more agile, secure, and centralized way to manage secrets. Instead of reworking legacy systems or paying for heavyweight solutions like HashiCorp Vault, they turned to Infisical due to its developer-friendly workflows, multi-cloud abstraction, and robust security capabilities. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
|
||||||
The key challenges they faced were: | ||||||
|
||||||
- An increased risk of “[secret sprawl](https://infisical.com/blog/what-is-secret-sprawl)” due to inconsistent management across environments. | ||||||
- Complex permission management as the team scaled, requiring tight, role-based access controls (RBAC) integrated with the organization’s SSO (Okta). | ||||||
- Difficulties with local development where traditional [.env files](https://infisical.com/blog/stop-using-env-files) compromised both security and developer productivity. | ||||||
- The burden of manual secret rotation, which became painfully evident after a security incident involving exposed credentials. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
|
||||||
In addition, the team needed a solution that adhered to infrastructure-as-code practices, supported project-by-project secret management, and provided a smooth balance between automation and manual control during deployments. | ||||||
|
||||||
## Implementation | ||||||
|
||||||
Infisical’s flexible architecture was an ideal solution. The engineering team seized the opportunity to re-examine their internal project structure, splitting projects into distinct infrastructure and application domains. This allowed them to implement a clearer separation of concerns and standardize secret rotation practices—a priority in the wake of a recent security incident. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
|
||||||
By leveraging Terraform, which was previously used to create Kubernetes secrets from AWS configurations, they found the transition to the Infisical Kubernetes Operator exceptionally smooth. This integration enabled security improvements while standardizing secrets management across all environments. | ||||||
|
||||||
### Kubernetes Integration | ||||||
|
||||||
Kubernetes is at the heart of Hugging Face’s production environment, and Infisical's [Kubernetes Operator](https://infisical.com/docs/integrations/platforms/kubernetes) has been instrumental in automating secret updates. The Operator continuously monitors for changes to any secret in Infisical and ensures that these updates are propagated to the corresponding Kubernetes objects. Whenever a change is detected, it can automatically reload dependent Deployments, ensuring that containers always run with the most recent secrets. | ||||||
|
||||||
**Example:** | ||||||
|
||||||
A new secret is required by an application running in Kubernetes. The secret can be created via the Infisical's CLI or the web UI, then the developer creates an `InfisicalSecret` resource in Kubernetes that specifies which secret from Infisical should be synced: | ||||||
|
||||||
```yaml | ||||||
apiVersion: infisical.com/v1alpha1 | ||||||
kind: InfisicalSecret | ||||||
metadata: | ||||||
name: my-app-secret | ||||||
namespace: production | ||||||
spec: | ||||||
infisicalSecretId: "123e4567-e89b-12d3-a456-426614174000" | ||||||
targetSecretName: "my-app-k8s-secret" | ||||||
``` | ||||||
Once the CRD is applied, the Infisical Operator continuously watches for updates. When changes are detected in Infisical, the Operator automatically updates the Kubernetes secret (`my-app-k8s-secret`). | ||||||
|
||||||
```mermaid | ||||||
graph TD | ||||||
A[Infisical Platform] -->|Push Update| B[Infisical Operator] | ||||||
B -->|Sync Secret| C[Kubernetes Secret (my-app-k8s-secret)] | ||||||
C -->|Mounted in| D[Application Pod] | ||||||
D -->|Reads| E[Environment Variables / Volumes] | ||||||
``` | ||||||
Comment on lines
+57
to
+63
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think this will be rendered, perhaps we need a link instead. |
||||||
Better yet, since the application's Deployment references `my-app-k8s-secret` as an environment variable source or mounted volume, the operator can automatically trigger a container reload when the the secret changes. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
|
||||||
In practice, Hugging Face engineers favor waiting for manual redeployments despite the operator’s ability to trigger container restarts automatically. This decision was driven by the need for precise control over deployments, particularly when high traffic (over 10 million requests per minute) and numerous replicas are involved. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
|
||||||
### Local Development | ||||||
|
||||||
For local development, [Infisical’s CLI](https://infisical.com/docs/cli/usage) streamlines workflows by injecting secrets directly into development environments. This removes the need for insecure local .env files, aligning local configurations with production standards and reducing onboarding friction. | ||||||
|
||||||
## Security and Access Management | ||||||
|
||||||
Security improvements form the backbone of this migration. By integrating Infisical with existing identity providers such as Okta, Hugging Face established a fine-grained RBAC system. Permissions are automatically mapped from Okta groups, ensuring that developers retain administrative rights over their projects, while frontend and backend teams receive appropriately restricted read or write access. | ||||||
|
||||||
Additionally, the [secret sharing](https://infisical.com/docs/documentation/platform/secret-sharing) functionality allows secure credentials sharing among ML/AI researchers at Hugging Face. The centralized Infisical platform also simplifies auditing and managing secret rotations—a necessity highlighted by previous security incidents. | ||||||
|
||||||
## CI/CD and Infrastructure Integration | ||||||
|
||||||
Seamless integration with CI/CD pipelines further enhanced the overall security posture. Infisical was embedded into the deployment pipeline via GitHub Actions using [OIDC authentication](https://infisical.com/docs/documentation/platform/identities/oidc-auth/github) and Terraform integration. By operating self-hosted runners within a secure environment, every deployment adhered to production-grade security standards. This integrated approach minimized risks and ensured a uniform experience from local development to cloud deployment. | ||||||
|
||||||
## Technical Outcomes & Insights | ||||||
|
||||||
Centralizing secrets management with Infisical brought tangible improvements: | ||||||
|
||||||
- Engineers no longer need to spend valuable time manually configuring environment secrets. Self-serve workflows accelerated onboarding and daily development cycles. | ||||||
- Automated audits and fine-grained access controls enabled rapid incident response and promoted a “shift left” approach to security. | ||||||
- Consistent integration across cloud providers, Kubernetes clusters, and CI/CD pipelines eliminated discrepancies in secret management, thus reinforcing the infrastructure's security and reliability. | ||||||
|
||||||
As noted by Adrien Carreira, Head of Infrastructure at Hugging Face, | ||||||
|
||||||
>"Infisical provided all the functionality and security settings we needed to boost our security posture and save engineering time. Whether you're working locally, running kubernetes clusters in production, or operating secrets within CI/CD pipelines, Infisical has a seamless prebuilt workflow." | ||||||
|
||||||
## Conclusion | ||||||
|
||||||
Hugging Face's migration to Infisical demonstrates how a technically driven, engineering-centric approach to managing secrets across multiple cloud platforms delivers significants benefits. For tackling similar challenges, using Infisical is a practical way to work more efficiently while keeping security strong. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
|
||||||
When the secure path is made the easiest path, teams can focus on building innovative products instead of of worrying about managing secrets. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
|
||||||
## Resources | ||||||
|
||||||
For teams interested in adopting a similar approach: | ||||||
- [Secure GitOps Workflows: A Practical Guide to Secrets Management](https://infisical.com/blog/gitops-secrets-management) | ||||||
- [Kubernetes Secrets Management in 2025 - A Complete Guide](https://infisical.com/blog/kubernetes-secrets-management-2025) | ||||||
- [Platform Documentation](https://infisical.com/docs/documentation/platform/organization) | ||||||
- [CLI Reference](https://infisical.com/docs/cli/overview) | ||||||
|
||||||
--- | ||||||
|
||||||
*This technical case study was adapted from the original case study published at [infisical.com/customers/hugging-face](https://infisical.com/customers/hugging-face)* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please restore this line?