LiteLLM Got Compromised. Your Firewall Would Have Watched It Happen.

Mar 31, 2026

Most security conversations in early-stage companies eventually land on the same set of controls: lock down the VPC, tighten the firewall rules, make sure nobody can get in from the outside. It’s an intuitive mental model; you build a moat, you raise the drawbridge, the castle is safe.

The problem is that the model assumes the threat is always outside. The recent LiteLLM attack is a useful reminder of what happens when it isn’t.

On March 24, a threat actor group called TeamPCP published two malicious versions of LiteLLM to PyPI. They didn’t breach a perimeter. They didn’t scan for open ports. They compromised Trivy, a security scanner that LiteLLM’s CI/CD pipeline was pulling without a pinned version; the compromised scanner executed inside GitHub Actions (behind whatever firewall you have) and read the PYPI_PUBLISH token directly from the runner’s environment. With that token, they published versions 1.82.7 and 1.82.8 carrying a credential stealer that harvested SSH keys, .env files, AWS/GCP/Azure credentials, Kubernetes configs, and anything else it could find; encrypted the bundle; and sent it out to a domain they controlled.

Your firewall saw none of this. It was all legitimate traffic, from a trusted tool, inside your perimeter, doing what build pipelines do.

To be fair: a well-tuned egress filter monitoring for new domain destinations might have flagged that outbound POST to an unknown domain. But that’s catching the exfiltration after the credentials are already assembled and on their way out; and only if the attacker didn’t route through a trusted domain or protocol (the same threat group uses Internet Computer Protocol canisters as a C2 channel in other operations, specifically because there’s no domain to block). Your firewall, at best, is telling you what already happened. It’s like a security guard patting someone down after they’ve already walked out with the vault keys.

The attack worked because there were credentials worth stealing once it was inside. That’s the problem worth solving.

What the attacker was actually looking for

The malware wasn’t subtle about its targets. It specifically went after:

Long-lived cloud provider credentials (AWS access keys, GCP service account key files, Azure tokens)
.env files and shell history
SSH private keys
Kubernetes service account tokens (and if it found one, it attempted to read all cluster secrets and spin up privileged pods in kube-system)

This is a shopping list, not a credential dump. Whoever designed this payload understood that the most valuable thing sitting on a developer’s machine or a CI runner isn’t the code; it’s the keys that let you act as that developer’s identity inside their cloud environment, potentially for a long time.

The attack worked well where teams store credentials the way most teams store credentials: as static keys in environment variables, .env files, and config files checked into well-intentioned but imperfect practices.

The elephant in the room

Here’s what I keep coming back to: if a compromised package runs on your CI runner and reads your environment, what does it find?

If the answer is “a cloud provider key that never expires and has broad permissions,” the blast radius of this attack is severe. That key gets sent to an attacker-controlled server, and they can now act as your CI identity inside your GCP or AWS environment until someone notices, which in practice means until something breaks or a security review catches it.

If the answer is “a Workload Identity Federation token that was issued for this specific build job and expires in fifteen minutes,” the attacker got something worthless. They can’t replay it. They can’t use it from outside the expected execution context. By the time they try, it’s gone.

That’s not a philosophical difference. That’s the difference between a breach that requires a full credential rotation across your environment and an incident that generates a log entry.

Perimeter controls are a receipt, not a lock

Here’s the principle the LiteLLM attack illustrates clearly: network perimeter controls do not prevent data exfiltration. They tell you about it afterward; and only if you’re monitoring the right signals, and only if the attacker isn’t routing through a channel you’ve already decided to trust.

A firewall operates on traffic. By the time there’s suspicious traffic to inspect, the malware has already run, the credentials are already harvested, and the only question left is whether the outbound call succeeds. Block it and you’ve interrupted the delivery mechanism; the attacker still has what they came for, assembled in memory, and they’ll find another way out. Catch it in your logs and you’ve got an incident to respond to, a credential rotation to execute, and a postmortem to write.

That’s detection. It’s valuable. It’s not the same thing as not having a problem.

Short-lived credentials operate at a different layer entirely. They don’t try to catch the exfiltration; they ensure that even if credentials are exfiltrated, they are rendered pointless. If the credentials in your build environment expire in fifteen minutes and are cryptographically bound to a specific workload context, an attacker who successfully exfiltrates them has a bundle of keys that don’t open anything. The attack succeeded technically but was unable to accomplish its ultimate objective.

The distinction matters because it changes where you invest. A perimeter-first security posture keeps asking “how do we stop things from getting out?” An identity-first posture asks “if something does get out, how bad is it?” Those are different questions with different architectural answers; and for supply chain attacks specifically, the second question is the one worth optimizing for.

What a zero trust architecture actually changes here

Zero trust gets thrown around as a marketing term so often that it’s easy to forget it describes a concrete set of architectural decisions. In the context of this attack, the relevant decisions are:

No long-lived credentials in the build environment. If your CI/CD pipeline authenticates to cloud resources via Workload Identity Federation (on GCP) or OIDC-based role assumption (on AWS), there are no static keys to steal. The token is ephemeral, scoped to the specific workload, and useless outside the expected execution context. A credential stealer running in that environment reads something that expires before the attacker can use it.

No secrets in environment variables or flat files. If your application and pipeline retrieve secrets at runtime from a secrets manager with fine-grained IAM controls, there’s nothing in the .env file. The malware scans the filesystem and finds nothing; the actual secrets never land on disk in a form that can be harvested.

Identity-aware perimeter controls. Even if a credential were somehow stolen and replayed, a well-configured VPC Service Controls perimeter evaluates not just the credential but the access context: is this request coming from an expected IP range, device, or service identity? A valid credential used from an attacker-controlled server hits a policy that says “this identity is not expected to authenticate from that context” and gets denied before it touches your data.

Audit logs that are actually auditable. This one is underrated. When the malware queries the cloud metadata endpoint to pull instance credentials, that’s a logged event. When it attempts to create a privileged pod in kube-system, that’s a Kubernetes audit log event. If your logging architecture surfaces these signals in a place your team actually reviews, the attack becomes detectable; not just in retrospect but potentially while it’s still happening.

None of these controls are exotic. They’re the standard output of building an identity-first infrastructure from the start; which is exactly the kind of decision that feels like overhead at a Series A and feels like foresight at an incident review.

The vulnerability of the supply chain

I’d be overstating things if I said a zero trust architecture would have prevented this attack entirely. The initial infection vector (an unpinned package pulled from a public registry into a CI runner) lives upstream of your identity layer. Short-lived credentials don’t stop malicious code from executing; they just ensure that what it finds isn’t useful.

The complementary control is supply chain hygiene: pinning dependency versions, verifying SHA checksums, and ideally routing your build environment’s package installs through a private artifact registry that lets you vet packages before they enter your pipeline. That’s a separate layer; but it’s also the layer that, combined with short-lived credentials and a secrets manager, turns this attack from a significant breach into a contained incident with a boring postmortem.

It’s important not to make supply chain a process-heavy system. You don’t need to stand up an artifact repository and have all libraries approved before use. What you do need is to ensure you trust the current version and stay on that current version. Lean on tools like Renovate or Dependabot to let you know when it’s time to upgrade.

The goal isn’t to build a system that can’t be attacked. The goal is to build a system where a successful attack has a small blast radius and a fast recovery time.

Why this matters for healthcare specifically

Healthcare teams carry an extra dimension of risk here: the credentials being targeted aren’t just cloud infrastructure keys. They’re also the identities that can reach PHI. An attacker who exfiltrates your GCP service account key and your application’s database credentials doesn’t just have your infrastructure; they potentially have patient data, which brings HIPAA breach notification timelines, OCR investigations, and the reputational weight that comes with all of that into the picture.

The compliance argument for short-lived credentials isn’t that they’re required by HIPAA (they’re not, explicitly). It’s that they dramatically reduce the probability that a compromised build environment turns into a notifiable event. That’s the kind of architectural property worth building in from the start; not because an auditor is going to check for it, but because the alternative is explaining to your customers why their data was exposed because a CI runner pulled a security scanner without a pinned version.

What to do if you’re not there yet

If your CI/CD pipeline is still using long-lived service account keys, start there. Migrating to Workload Identity Federation on GCP or OIDC-based IAM roles on AWS is a well-documented path; it’s not a multi-month project, and the operational overhead once it’s in place is close to zero.

If secrets still live in .env files or environment variables, the next step is moving them to a secrets manager (GCP Secret Manager, AWS Secrets Manager, HashiCorp Vault) and updating your services to fetch at runtime. This is the kind of change that feels disruptive to schedule and unremarkable once it’s done; which is exactly how good infrastructure changes should feel.

The LiteLLM attack was sophisticated in how it got in. What it found when it got there didn’t have to be that valuable.

Bryan Knight is the founder of Ressik Labs, a health tech infrastructure and security consultancy. He works with early-stage healthcare companies building compliant data platforms.

Bryan Knight

Discussion about this post

Ready for more?