Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Blog post] - How Hugging Face Scaled Secrets Management for AI Infrastructure #2657

Open
wants to merge 22 commits into
base: main
Choose a base branch
from

Conversation

thomas-infisical
Copy link

Congratulations! You've made it this far! Once merged, the article will appear at https://huggingface.co/blog. Official articles
require additional reviews. Alternatively, you can write a community article following the process here.

Preparing the Article

You're not quite done yet, though. Please make sure to follow this process (as documented here):

  • Add an entry to _blog.yml.
  • Add a thumbnail. There are no requirements here, but there is a template if it's helpful.
  • Check you use a short title and blog path.
  • Upload any additional assets (such as images) to the Documentation Images repo. This is to reduce bloat in the GitHub base repo when cloning and pulling. Try to have small images to avoid a slow or expensive user experience.
  • Add metadata (such as authors) to your md file. You can also specify guest or org for the authors.
  • Ensure the publication date is correct.
  • Preview the content. A quick way is to paste the markdown content in https://huggingface.co/new-blog. Do not click publish, this is just a way to do an early check.

Here is an example of a complete PR: #2382

Getting a Review

Please make sure to get a review from someone on your team or a co-author.
Once this is done and once all the steps above are completed, you should be able to merge.
There is no need for additional reviews if you and your co-authors are happy and meet all of the above.

Feel free to add @pcuenca as a reviewer if you want a final check. Keep in mind he'll be biased toward light reviews
(e.g., check for proper metadata) rather than content reviews unless explicitly asked.

Copy link
Member

@julien-c julien-c left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on our main blog we usually publish stuff that's more technical and less "marketing", can you make the content a bit more technical/concrete?

Feel free to add a few screenshots too (you can host them inside a HF dataset repo)

@thomas-infisical
Copy link
Author

Hello @julien-c ,
Following your advice I updated the blog with more technical details and a practical example.
Wdyt?

Copy link
Collaborator

@rtrompier rtrompier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@vmatsiiako
Copy link

HI @julien-c! Let us know if we can merge this perhaps?

@thomas-infisical
Copy link
Author

Hello @pcuenca , could you review this please?

@thomas-infisical
Copy link
Author

@pcuenca ?

@pcuenca
Copy link
Member

pcuenca commented Mar 18, 2025

@thomas-infisical taking a look tonight

_blog.yml Outdated
author: segudev
guest: true
date: Mar 13, 2025
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reminder to update

@@ -5638,7 +5638,6 @@
- multimodal
- vision
- vlm

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please restore this line?


# How Hugging Face Scaled Secrets Management for AI Infrastructure

Hugging Face has become synonymous with advancing AI at scale. With over 4 million builders deploying models on the Hub, the rapid growth of the platform necessitated a rethinking of how sensitive configuration data—secrets—are managed.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Hugging Face has become synonymous with advancing AI at scale. With over 4 million builders deploying models on the Hub, the rapid growth of the platform necessitated a rethinking of how sensitive configuration data—secrets—are managed.
Hugging Face has become synonymous with advancing AI at scale. With over 4 million builders deploying models on the Hub, the rapid growth of the platform necessitated a rethinking of how sensitive configuration data —secrets— are managed.


## Background

As Hugging Face's infrastructure evolved from an AWS-only setup to a multi-cloud environment that includes Azure and GCP, the engineering team needed a more agile, secure, and centralized way to manage secrets. Instead of reworking legacy systems or paying for heavyweight solutions like HashiCorp Vault, they turned to Infisical due to its developer-friendly workflows, multi-cloud abstraction, and robust security capabilities.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
As Hugging Face's infrastructure evolved from an AWS-only setup to a multi-cloud environment that includes Azure and GCP, the engineering team needed a more agile, secure, and centralized way to manage secrets. Instead of reworking legacy systems or paying for heavyweight solutions like HashiCorp Vault, they turned to Infisical due to its developer-friendly workflows, multi-cloud abstraction, and robust security capabilities.
As Hugging Face's infrastructure evolved from an AWS-only setup to a multi-cloud environment that includes Azure and GCP, the engineering team needed a more agile, secure, and centralized way to manage secrets. Instead of reworking legacy systems or adopting heavyweight solutions like HashiCorp Vault, they turned to Infisical due to its developer-friendly workflows, multi-cloud abstraction, and robust security capabilities.

- An increased risk of “[secret sprawl](https://infisical.com/blog/what-is-secret-sprawl)” due to inconsistent management across environments.
- Complex permission management as the team scaled, requiring tight, role-based access controls (RBAC) integrated with the organization’s SSO (Okta).
- Difficulties with local development where traditional [.env files](https://infisical.com/blog/stop-using-env-files) compromised both security and developer productivity.
- The burden of manual secret rotation, which became painfully evident after a security incident involving exposed credentials.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- The burden of manual secret rotation, which became painfully evident after a security incident involving exposed credentials.
- The burden of manual secret rotation, which became painfully evident after a security incident that involved exposed credentials.

Comment on lines +57 to +63
```mermaid
graph TD
A[Infisical Platform] -->|Push Update| B[Infisical Operator]
B -->|Sync Secret| C[Kubernetes Secret (my-app-k8s-secret)]
C -->|Mounted in| D[Application Pod]
D -->|Reads| E[Environment Variables / Volumes]
```
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this will be rendered, perhaps we need a link instead.

C -->|Mounted in| D[Application Pod]
D -->|Reads| E[Environment Variables / Volumes]
```
Better yet, since the application's Deployment references `my-app-k8s-secret` as an environment variable source or mounted volume, the operator can automatically trigger a container reload when the the secret changes.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Better yet, since the application's Deployment references `my-app-k8s-secret` as an environment variable source or mounted volume, the operator can automatically trigger a container reload when the the secret changes.
Better yet, since the application's Deployment references `my-app-k8s-secret` as an environment variable source or mounted volume, the Operator can automatically trigger a container reload when the secret changes.

```
Better yet, since the application's Deployment references `my-app-k8s-secret` as an environment variable source or mounted volume, the operator can automatically trigger a container reload when the the secret changes.

In practice, Hugging Face engineers favor waiting for manual redeployments despite the operator’s ability to trigger container restarts automatically. This decision was driven by the need for precise control over deployments, particularly when high traffic (over 10 million requests per minute) and numerous replicas are involved.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
In practice, Hugging Face engineers favor waiting for manual redeployments despite the operator’s ability to trigger container restarts automatically. This decision was driven by the need for precise control over deployments, particularly when high traffic (over 10 million requests per minute) and numerous replicas are involved.
In practice, Hugging Face engineers favor waiting for manual redeployments despite the Operator’s ability to automatically trigger container restarts. This decision was driven by the need for precise control over deployments, particularly when high traffic (over 10 million requests per minute) and numerous replicas are involved.


## Conclusion

Hugging Face's migration to Infisical demonstrates how a technically driven, engineering-centric approach to managing secrets across multiple cloud platforms delivers significants benefits. For tackling similar challenges, using Infisical is a practical way to work more efficiently while keeping security strong.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Hugging Face's migration to Infisical demonstrates how a technically driven, engineering-centric approach to managing secrets across multiple cloud platforms delivers significants benefits. For tackling similar challenges, using Infisical is a practical way to work more efficiently while keeping security strong.
Hugging Face's migration to Infisical demonstrates how a technically driven, engineering-centric approach to managing secrets across multiple cloud platforms delivers significant benefits. For tackling similar challenges, using Infisical is a practical way to work more efficiently while keeping security strong.


Hugging Face's migration to Infisical demonstrates how a technically driven, engineering-centric approach to managing secrets across multiple cloud platforms delivers significants benefits. For tackling similar challenges, using Infisical is a practical way to work more efficiently while keeping security strong.

When the secure path is made the easiest path, teams can focus on building innovative products instead of of worrying about managing secrets.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
When the secure path is made the easiest path, teams can focus on building innovative products instead of of worrying about managing secrets.
When the secure path is made the easiest path, teams can focus on building innovative products instead of worrying about managing secrets.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants