Understanding the security benefits of eBPF-based vs. traditional service meshes

Date: 2025-03-04

Inspired by How eBPF will solve Service Mesh - Goodbye Sidecars | Isovalent Blog

Service meshes are an infrastructure layer designed for the microservices era and typically provide the following features on top of microservices-oriented platforms such as Kubernetes:

(source: “How eBPF will solve Service Mesh - Goodbye Sidecars | Isovalent Blog”)

N.B. by microservices-oriented platforms such as Kubernetes, we really mean it’s the only such type of platform that is still relevant in 2025 and beyond. See LFS158x: Introduction to Kubernetes for a quick primer on Kubernetes and LFS144x: Introduction to Istio for a quick primer on the Istio service mesh.

There are currently 2 main categories of service meshes with widespread adoption across the cloud-native industry:

  1. Traditional (user-space) service meshes such as Istio and Linkerd. They are most commonly known for their sidecar-oriented architecture, though we’ll see that other forms of traditional service mesh exist, e.g. Istio ambient mesh
  2. eBPF-based (kernel-space) service meshes such as Cilium. They run through eBPF programs which dynamically extend the kernel at runtime with cloud-native features such as L7-aware processing and workload identity. More importantly, eBPF-based service meshes are fully integrated into the Kubernetes networking fabric (CNI), providing advanced networking, security and observability features in a fully transparent, performant and elegant manner

Evolution of the service mesh

(source: “How eBPF will solve Service Mesh - Goodbye Sidecars | Isovalent Blog”)

The article “How eBPF will solve Service Mesh - Goodbye Sidecars” by Isovalent does an excellent job explaining the why and how eBPF-based service meshes outperform traditional ones by orders of magnitude, especially when deploying and operating workloads at scale - talk 10,000 microservices and above. Instead, we’ll focus on the security aspects of traditional and eBPF-based service meshes in this article - how the former compromises on cluster, workload and host-level security by design while the latter naturally solves all of these problems.

Istio and Linkerd - your classic sidecar-oriented service mesh

Istio architecture

(source: Istio / Architecture)

Istio and Linkerd both employ the sidecar-oriented architecture as depicted above with the default installation. With the sidecar-oriented architecture, communications between a pair of microservices A and B are routed through sidecar proxies sitting in front of each respective microservice. The routing is performed at the infrastructure layer; in other words, it is transparent to the application containers, requiring no changes to the application code nor the application (container) image. The per-microservice sidecar proxies (data plane) combined with the control plane provide advanced L7 observability, security and traffic management features such as HTTP metrics, mTLS mutual authentication and in-transit encryption as well as traffic splitting, load-balancing and circuit breaking etc.

In their default configurations, both Istio and Linkerd achieve this infrastructure-level routing by injecting an init container with the NET_ADMIN and NET_RAW Linux kernel capabilities to each meshed Pod. The init container configures the iptables / nftables rules directly on the Node on which the Pod is scheduled on before the application container(s) and sidecar proxy are started.

This operation naturally requires elevated privileges on the host and is therefore insecure. Using Istio or Linkerd with this default mode requires each meshed namespace enforce the privileged Pod Security Standard (PSS) or disable Pod Security Admission (PSA) altogether. This means that once an attacker gains privileges to deploy workloads to any meshed namespace, they can trivially carry out host-level attacks by deploying privileged Pods using the host network and host PID namespace etc., deploying crypto-miners or retaining persistent access to the compromised cluster for carrying out subsequent malicious activities further up the attack chain.

These fundamental security design flaws were quickly uncovered in the early days of sidebar-oriented service mesh and CNI mode was soon introduced to address these issues.

Istio and Linkerd CNI mode - not as insecure but still vulnerable

Istio and Linkerd CNI mode retains the sidecar architecture but replaces the privileged init container with a DaemonSet which integrates with the underlying CNI. The DaemonSet runs a copy of the equivalent of the former privileged init container on each Node. It is responsible for configuring the iptables / nftables rules on the Node for each meshed Pod to route traffic through the sidecar proxy, before the sidecar proxy or application container starts. Since meshed Pods do not require the privileged init container anymore, meshed namespaces can lock down host and workload-level security by enforcing the restricted PSS which disallows any Linux capability except NET_BIND_SERVICE and forbids running containers as root among other measures.

A limitation to this approach is that init containers defined by the application owner lose network connectivity since routes are added to the sidecar proxy which starts only after all init containers have executed to completion. A common workaround for this limitation is to run init containers as the UID of the service mesh reserved user - 1337 for Istio and 2102 for Linkerd. This special UID is able to bypass all routing rules imposed by the service mesh by design.

There’s just one problem - any container in any Pod can freely declare their own process UID, regardless of whether or not the container is an init container. This means that Pods and containers assuming the service mesh reserved UID can bypass all traffic restrictions imposed by the service mesh and potentially intercept sensitive network traffic within the cluster. In fact, SAPwned is a real-world example of how white-hat hackers at Wiz Research managed to intercept sensitive traffic by assuming the Istio-reserved 1337 UID and eventually exfiltrate privileged credentials of SAP AI customers, in which the SAP AI platform was powered by Istio and allowed its users to submit arbitrary Argo Workflows manifests.

To address the growing security concerns around sidecar-based service mesh architectures, the Istio project announced Istio Ambient Mesh in 2022.

Istio Ambient Mesh - sidecar-free, but still with a greater attack surface than eBPF-based service mesh

Istio Ambient Mesh popularized the concept of sidecar-free service mesh since its debut in 2022. Instead of a per-Pod sidecar proxy, Envoy is deployed as a DaemonSet which runs a copy of the Envoy proxy on each node. The per-node Envoy proxy is then responsible for routing requests from and to each meshed Pod, forwarding the request to the Envoy copy running on the destination node when the source and destination Pods reside on different nodes. With this innovative architecture, the Istio 1337 UID is no longer reserved so workloads assuming this UID can no longer trivially escape the mesh among many other improvements.

Nevertheless, the Istio components themselves require elevated host-level privileges and typically run in a dedicated namespace other than kube-system; by default istio-system. Furthermore, Istio must be deployed on top of an existing CNI such as Flannel, Calico or Cilium. The additional privileged Istio namespace and components on top of the existing CNI increases the attack surface of the Kubernetes cluster running Istio.

In contrast, eBPF-based service meshes embedded within the CNI such as Cilium require no additional components or privileged namespaces other than kube-system thus minimizing the attack surface of the Kubernetes cluster.

Cilium CNI - unified Kubernetes networking fabric and eBPF service mesh

Cilium - Unified CNI and eBPF-based service mesh

(source: “How eBPF will solve Service Mesh - Goodbye Sidecars | Isovalent Blog”)

Cilium CNI includes advanced L7 networking, security and observability features out of the box comparable to that offered by a traditional service mesh. The Cilium datapath is implemented as eBPF programs which run in the kernel for performance at scale. Furthermore, to ensure the security of eBPF programs and that they do not crash or hang the kernel, all eBPF programs must pass the verifier before compiling into bytecode dynamically loaded into the kernel at runtime. All of this means that you get the features expected with a service mesh for free, embedded right within the CNI itself.

Conclusion

Traditional service meshes running in user-space are not only inefficient but also pose some serious security risks. We saw that:

  1. With the classic sidecar architecture, Pods meshed by both Istio and Linkerd require the cluster administrator to forego workload and host-level security entirely for all meshed namespaces, allowing attackers to escalate their privileges trivially once they get hold of privileges to deploy workloads in any meshed namespace
  2. With CNI mode enabled, meshed namespaces can lock down host-level security with the restricted PSS but attackers can still leverage the service mesh reserved UID to intercept sensitive traffic and perform data exfiltration
  3. With Istio ambient mesh, the service mesh reserved UID is no longer a concern but nevertheless additional components imply a greater attack surface compared to a fully embedded service mesh
  4. With Cilium CNI, the service mesh is embedded in the CNI implementation itself which boasts incredible performance and security guarantees

I hope you enjoyed this article and stay tuned for updates ;-)

Subscribe: RSS Atom [Valid RSS] [Valid Atom 1.0]

Return to homepage