What owning Vault for years actually teaches you

Standing up HashiCorp Vault is a good afternoon. There is a demo, a token comes back, everyone nods. Then you put it in the critical path of every deploy and every running service, and the afternoon becomes the next several years. That second part is the actual job, and it is the part nobody screenshots.

My team owned and operated self-hosted Vault on Kubernetes for a globally distributed platform of roughly 10,000 servers. The brief was simple to say and hard to do: machine identity without static secrets, and one trust model that held across CI/CD and every workload. Here is what that taught me that the quickstart does not.

Static secrets are a habit, not a feature

Nobody decides to scatter long-lived credentials across a platform. It happens one reasonable exception at a time. A pipeline needs to talk to a registry, so someone drops in a token. A service needs a database, so someone adds an environment variable. Each one is defensible on its own, and together they are a map of everywhere an attacker would like to be.

The fix is not a better vault for the same habit. It is removing the reason the habit exists. We moved CI/CD pipelines and workloads onto Kubernetes auth, so identity came from what the workload actually was, not from a string it carried around. When the credential is short-lived and tied to identity, there is nothing durable to leak, and the exceptions stop accumulating because there is nothing to except.

RBAC is an organizational problem wearing a technical costume

Least-privilege reads like a configuration task. It is really a series of conversations. Every team believes its access is the load-bearing one, and most of the time the honest answer is that nobody has audited it in years. Writing the policy is the small part. Agreeing on who should be able to do what, and then holding that line as teams reorganize and services move, is the work.

We enforced RBAC and least-privilege as a default rather than a cleanup project, which mostly meant being the team that asked “why does this need that” early and often. It is less popular than handing out wildcards. It is also the difference between an access model and a pile of grants.

Identity is an operating discipline, not an integration

The trap with a secrets platform is treating it as a thing you integrate once. It is closer to a service you run, with all the unglamorous obligations that implies: upgrades that cannot drop the deploy path, failure modes you have rehearsed instead of discovered, and an audit trail that holds up when someone actually asks. The patterns we built outlived their original use, expanding across developer tooling and platform access control, because operating them well made them worth reaching for.

The payoff is boring, and that is the point

The win was not a clever architecture diagram. It was a single, auditable identity layer for pipelines and workloads, replacing credentials that used to live in a dozen places, and carrying it across years of platform evolution without becoming the thing everyone routed around. Good identity infrastructure is invisible when it works. The measure of it is how rarely anyone has to think about it, and how confidently you can answer the question of who could do what, and when.

That is the part worth getting right. Everything before it is just the afternoon.