You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardexpand all lines: scaling-secrets-management.md
+69-62
Original file line number
Diff line number
Diff line change
@@ -3,101 +3,108 @@ title: "How Hugging Face Scaled Secrets Management for AI Infrastructure"
3
3
thumbnail: /blog/assets/infisical/thumbnail.png
4
4
authors:
5
5
- user: segudev
6
+
guest: true
6
7
org: Infisical
7
8
---
8
9
9
10
# How Hugging Face Scaled Secrets Management for AI Infrastructure
10
-
Managing secrets at scale becomes increasingly complex as infrastructure grows. For Hugging Face, this challenge intensified as their platform scaled to support over 4 million AI builders deploying models on the Hub. TThis case study explores how they approached secrets management to support their growing infrastructure needs.
11
11
12
-
## Technical Challenge
13
-
As Hugging Face's infrastructure scaled to support millions of model deployments, their infrastructure and engineering teams identified security and operationnal challenges.
12
+
Hugging Face has become synonymous with advancing AI at scale. With over 4 million builders deploying models on the Hub, the rapid growth of the platform necessitated a rethinking of how sensitive configuration data—secrets—are managed.
14
13
15
-
### Security Risk Management
16
-
Being at the forefront of AI development, Hugging Face needed to ensure their security infrastructure exceeded industry standards. This included:
17
-
- Maintaining tight access controls across their infrastructure
18
-
- Implementing a "Security Shift Left" approach
19
-
- Establishing comprehensive audit capabilities
14
+
Last year, the engineering teams set out to improve the handling of their secrets and credentials. After evaluating tools like HashiCorp Vault, they ultimately chose [Infisical](https://infisical.com/).
20
15
21
-
### Secret Sprawl
22
-
With increasing infrastructure complexity and new engineering projects, [secret sprawl](https://infisical.com/blog/what-is-secret-sprawl) became a significant concern. The team needed to:
23
-
- Automate secrets management processes
24
-
- Streamline secret deployment workflows
25
-
- Establish a single source of truth for credentials
16
+
This case study details their migration to Infisical, explains how they integrated its powerful features, and highlights how it enabled engineers to work more efficiently and securely.
26
17
27
-
### Developer Experience
28
-
Supporting a large engineering team required maintaining developer productivity through:
29
-
- Self-serve secret management workflows
30
-
- Efficient developer onboarding processes
31
-
- Streamlined local development setup
18
+
## Background
32
19
33
-
## Solution
34
-
To solve the above, Hugging Face partnered with [Infisical](https://infisical.com/) to centralize its secrets management workflows and establish a single source of truth for infrastructure credentials, with several key technical components involved:
20
+
As Hugging Face's infrastructure evolved from an AWS-only setup to a multi-cloud environment that includes Azure and GCP, the engineering team needed a more agile, secure, and centralized way to manage secrets. Instead of reworking legacy systems or paying for heavyweight solutions like HashiCorp Vault, they turned to Infisical due to its developer-friendly workflows, multi-cloud abstraction, and robust security capabilities.
21
+
22
+
The key challenges they faced were:
23
+
24
+
- An increased risk of “[secret sprawl](https://infisical.com/blog/what-is-secret-sprawl)” due to inconsistent management across environments.
25
+
- Complex permission management as the team scaled, requiring tight, role-based access controls (RBAC) integrated with the organization’s SSO (Okta).
26
+
- Difficulties with local development where traditional [.env files](https://infisical.com/blog/stop-using-env-files) compromised both security and developer productivity.
27
+
- The burden of manual secret rotation, which became painfully evident after a security incident involving exposed credentials.
28
+
29
+
In addition, the team needed a solution that adhered to infrastructure-as-code practices, supported project-by-project secret management, and provided a smooth balance between automation and manual control during deployments.
30
+
31
+
## Implementation
32
+
33
+
Infisical’s flexible architecture was an ideal solution. The engineering team seized the opportunity to re-examine their internal project structure, splitting projects into distinct infrastructure and application domains. This allowed them to implement a clearer separation of concerns and standardize secret rotation practices—a priority in the wake of a recent security incident.
34
+
35
+
By leveraging Terraform, which was previously used to create Kubernetes secrets from AWS configurations, they found the transition to the Infisical Kubernetes Operator exceptionally smooth. This integration enabled security improvements while standardizing secrets management across all environments.
35
36
36
37
### Kubernetes Integration
37
-
With Kubernetes being central to Hugging Face's infrastructure, they implemented Infisical's [Kubernetes Operator](https://infisical.com/docs/integrations/platforms/kubernetes) to:
38
-
- Automatically propagate secrets to containers
39
-
- Handle application redeployments based on secret updates
40
-
- Maintain consistent secret management across clusters
41
38
42
-
### Local Development Workflow
43
-
For local development environments, the team utilized the [Infisical CLI](https://infisical.com/docs/cli/usage) to:
39
+
Kubernetes is at the heart of Hugging Face’s production environment, and Infisical's [Kubernetes Operator](https://infisical.com/docs/integrations/platforms/kubernetes) has been instrumental in automating secret updates. The Operator continuously monitors for changes to any secret in Infisical and ensures that these updates are propagated to the corresponding Kubernetes objects. Whenever a change is detected, it can automatically reload dependent Deployments, ensuring that containers always run with the most recent secrets.
40
+
41
+
**Example:**
42
+
43
+
A new secret is required by an application running in Kubernetes. The secret can be created via the Infisical's CLI or the web UI, then the developer creates an `InfisicalSecret` resource in Kubernetes that specifies which secret from Infisical should be synced:
Once the CRD is applied, the Infisical Operator continuously watches for updates. When changes are detected in Infisical, the Operator automatically updates the Kubernetes secret (`my-app-k8s-secret`).
B -->|Sync Secret| C[Kubernetes Secret (my-app-k8s-secret)]
61
+
C -->|Mounted in| D[Application Pod]
62
+
D -->|Reads| E[Environment Variables / Volumes]
63
+
```
64
+
Better yet, since the application's Deployment references `my-app-k8s-secret` as an environment variable source or mounted volume, the operator can automatically trigger a container reload when the the secret changes.
44
65
45
-
-[Inject secrets](https://infisical.com/docs/cli/commands/run) into local application environments
46
-
- Eliminate the need for local [.env files](https://infisical.com/blog/stop-using-env-files)
47
-
- Reduce security risks from secrets on local machines
66
+
In practice, Hugging Face engineers favor waiting for manual redeployments despite the operator’s ability to trigger container restarts automatically. This decision was driven by the need for precise control over deployments, particularly when high traffic (over 10 million requests per minute) and numerous replicas are involved.
48
67
49
-
### Centralized Management
50
-
The team established a central secrets management system using:
68
+
### Local Development
51
69
52
-
- A [web dashboard](https://infisical.com/docs/documentation/platform/project) enabling self-serve secrets management
53
-
-[Role-based access controls](https://infisical.com/docs/documentation/platform/access-controls/role-based-access-controls#role-based-access-controls) for different teams
54
-
-[Secret referencing and importing](https://infisical.com/docs/documentation/platform/secret-reference) capabilities for maintaining a single source of truth across infrastructure.
55
-
-[Secret Sharing](https://infisical.com/docs/documentation/platform/secret-sharing) to generate encrypted links to share secrets with each other or with stakeholders outside of the organization.
70
+
For local development, [Infisical’s CLI](https://infisical.com/docs/cli/usage) streamlines workflows by injecting secrets directly into development environments. This removes the need for insecure local .env files, aligning local configurations with production standards and reducing onboarding friction.
56
71
57
-
## Results and Impact
72
+
## Security and Access Management
58
73
59
-
With the help of Infisical, Hugging Face was able increase both operational efficiency and security posture through centralized secrets management.
74
+
Security improvements form the backbone of this migration. By integrating Infisical with existing identity providers such as Okta, Hugging Face established a fine-grained RBAC system. Permissions are automatically mapped from Okta groups, ensuring that developers retain administrative rights over their projects, while frontend and backend teams receive appropriately restricted read or write access.
60
75
61
-
### Developer Workflow Efficiency
62
-
The new system improved development workflows through:
76
+
Additionally, the [secret sharing](https://infisical.com/docs/documentation/platform/secret-sharing) functionality allows secure credentials sharing among ML/AI researchers at Hugging Face. The centralized Infisical platform also simplifies auditing and managing secret rotations—a necessity highlighted by previous security incidents.
63
77
64
-
- Self-serve secrets management based on permissions. This saves developers time and speeds up development iterations.
65
-
- Faster developer onboarding: new engineers are now able to immediately get up and running with access to the necessary environments.
66
-
- Synchronized secrets across team environments: engineers easily check out the right environment and start their applications locally.
67
-
- Automated application redeployments: using Infisical, Hugging Face is able to automatically redeploy their applications based on secret changes in various environments.
78
+
## CI/CD and Infrastructure Integration
68
79
69
-
### Security Improvements
80
+
Seamless integration with CI/CD pipelines further enhanced the overall security posture. Infisical was embedded into the deployment pipeline via GitHub Actions using [OIDC authentication](https://infisical.com/docs/documentation/platform/identities/oidc-auth/github) and Terraform integration. By operating self-hosted runners within a secure environment, every deployment adhered to production-grade security standards. This integrated approach minimized risks and ensured a uniform experience from local development to cloud deployment.
70
81
71
-
Security is often a matter of making the secure path the easiest path. Beyond all the points mentioned above, the following measures helped strengthen Hugging Face's security posture regarding secrets management:
82
+
## Technical Outcomes & Insights
72
83
73
-
- Implemented tight and granular access controls
74
-
- Established comprehensive audit logging
75
-
- Integrated secure authentication methods
76
-
- Enhanced security through centralized management
84
+
Centralizing secrets management with Infisical brought tangible improvements:
77
85
78
-
### Security Culture Enhancement
86
+
- ngineers no longer need to spend valuable time manually configuring environment secrets. Self-serve workflows accelerated onboarding and daily development cycles.
87
+
- Automated audits and fine-grained access controls enabled rapid incident response and promoted a “shift left” approach to security.
88
+
- Consistent integration across cloud providers, Kubernetes clusters, and CI/CD pipelines eliminated discrepancies in secret management, thus reinforcing the infrastructure's security and reliability.
79
89
80
-
Finally, the implementation helped foster better security practices by:
90
+
As noted by Adrien Carreira, Head of Infrastructure at Hugging Face,
81
91
82
-
- Enabling secure secret sharing via encrypted channels
83
-
- Promoting responsible coding practices
84
-
- Implementing permission-based access controls
92
+
>"Infisical provided all the functionality and security settings we needed to boost our security posture and save engineering time. Whether you're working locally, running kubernetes clusters in production, or operating secrets within CI/CD pipelines, Infisical has a seamless prebuilt workflow."
85
93
86
-
## Technical Insights
87
-
As noted by **Adrien Carreira**, Head of Infrastructure at Hugging Face:
94
+
## Conclusion
88
95
89
-
> "Infisical provided all the functionality and security settings we needed to boost our security posture and save engineering time. Whether you're working locally, running kubernetes clusters in production, or operating secrets within CI/CD pipelines, Infisical has a seamless prebuilt workflow."
96
+
Hugging Face's migration to Infisical demonstrates how a technically driven, engineering-centric approach to managing secrets across multiple cloud platforms delivers significants benefits. For tackling similar challenges, using Infisical is a practical way to work more efficiently while keeping security strong.
90
97
91
-
The implementation demonstrated that proper secrets management can simultaneously enhance security and developer productivity - a rare combination in infrastructure tooling.
98
+
When the secure path is made the easiest path, teams can focus on building innovative products instead of of worrying about managing secrets.
92
99
93
100
## Resources
94
-
For teams looking to implement similar solutions:
95
101
102
+
For teams interested in adopting a similar approach:
103
+
- [Secure GitOps Workflows: A Practical Guide to Secrets Management](https://infisical.com/blog/gitops-secrets-management)
104
+
- [Kubernetes Secrets Management in 2025 - A Complete Guide](https://infisical.com/blog/kubernetes-secrets-management-2025)
*This technical case study was adapted from the original customer story published at [infisical.com/customers/hugging-face](https://infisical.com/customers/hugging-face)*
110
+
*This technical case study was adapted from the original case study published at [infisical.com/customers/hugging-face](https://infisical.com/customers/hugging-face)*
0 commit comments