eBPF Redirection
FSM comes with eBPF functionality and provides users an options to use eBPF over default iptables.
The minimum kernel version is 5.4.
This guide shows how to start using this new functionality and enjoy the benefits eBPF. If you want to directly jump into quick start, refer to eBPF setup quickstart guide
For more details of comparison between iptables and eBPF, you can refer to Traffic Redirection.
Architecture
To provide eBPF features, Flomesh Service Mesh provides the fsm-cni CNI implementation and fsm-interceptor running on each node, where fsm-cni is compatible with mainstream CNI plugins.
When kubelet creates a pod on a node, it calls the CNI interface through the container runtime CRI to create the pod’s network namespace. After the pod’s network namespace is created, fsm-cni calls the interface of fsm-interceptor to load the BPF program and attach it to the hook point. In addition, fsm-interceptor also maintains pod information in eBPF Maps.
Implementation Principles
Next, we will introduce the implementation principles of the two features brought by the introduction of eBPF, but please note that many processing details will be ignored here.
Traffic interception
Outbound traffic
The figure below shows the interception of outbound traffic. Attach a BPF program to the socket operation connect, and in the program determine whether the current pod is managed by the service mesh, that is, whether it has a sidecar injected, and then modify the destination address to 127.0.0.1
and the destination port to the sidecar’s outbound port 15003
. It is not enough to just modify it. The original destination address and port should also be saved in a map, using the socket’s cookie as the key.
After the connection with the sidecar is established, the original destination is saved in another map through a program attached to the mount point sock_ops
, using local address + port and remote address + port as the key. When the sidecar accesses the target application later, it obtains the original destination through the getsockopt
operation on the socket. Yes, a eBPF program is also attached to getsockopt
, which retrieves the original destination address from the map and returns it.
Inbound traffic
For the interception of inbound traffic, the traffic originally intended for the application port is forwarded to the sidecar’s inbound port 15003
. There are two cases:
- In the first case, the requester and the service are located on the same node. After the requester’s sidecar connect operation is intercepted, the destination port is changed to
15003
. - In the second case, the requester and the service are located on different nodes. When the handshake packet reaches the service’s network namespace, it is intercepted by the BPF program attached to the tc (traffic control) ingress, and the port is modified to
15003
, achieving a functionality similar to DNAT.
Network communication acceleration
In Kubernetes networks, network packets unavoidably undergo multiple kernel network protocol stack processing. eBPF accelerates network communication by bypassing unnecessary kernel network protocol stack processing and directly exchanging data between two sockets that are peers.
The figure in the traffic interception section shows the sending and receiving trajectories of messages. When the program attached to sock_ops discovers that the connection is successfully established, it saves the socket in a map, using local address + port and remote address + port as the key. As the two sockets are peers, their local and remote information is opposite, so when a socket sends a message, it can directly address the peer socket from the map.
This solution also applies to communication between two pods on the same node.
Prerequisites
- Ubuntu 20.04
- Kernel 5.15.0-1034
- 2c4g VM * 3:master、node1、node2
Install CNI Plugin
Execute the following command on all nodes to download the CNI plugin.
Master Node
Get the IP address of the master node. (Your machine IP might be different)
Kubernetes cluster uses the k3s distribution, but when installing the cluster, you need to disable the flannel integrated by k3s and use independently installed flannel for validation. This is because k3s’s doesn’t follow Flannel directory structure /opt/cni/bin
and store its CNI bin directory at /var/lib/rancher/k3s/data/xxx/bin
where xxx
is some randomly generated text.
Install Flannel. Note that the default Pod CIDR of Flannel is 10.244.0.0/16
, and we will modify it to k3s’s default 10.42.0.0/16
.
Get the access token of the API server for initializing worker nodes.
Worker Node
Use the IP address of the master node and the token obtained earlier to initialize the node.
Download FSM
CLI
Install FSM
Deploy Sample Application
Testing
During testing, you can view the debug logs of BPF program execution by viewing the kernel tracing logs on the worker node using the following command. To avoid interference caused by sidecar communication with the control plane, first obtain the IP address of the control plane.
Execute the following command on both worker nodes.
Execute the following command on both worker nodes.
You should receive results similar to the following, and the kernel tracing logs should also output the debug logs of the BPF program accordingly (the content is quite long, so it will not be shown here).
Feedback
Was this page helpful?
Glad to hear it! Please tell us how we can improve.
Sorry to hear that. Please tell us how we can improve.