Circuit breaking for destinations within the mesh

Configuring circuit breaking for destinations within the mesh

This guide demonstrates how to configure circuit breaking for destinations that are a part of an FSM managed service mesh.

Prerequisites

  • Kubernetes cluster running Kubernetes v1.19.0 or greater.
  • Have FSM installed.
  • Have kubectl available to interact with the API server.
  • Have fsm CLI available for managing the service mesh.
  • FSM version >= v1.0.0.

Demo

The following demo shows a load-testing client fortio sending traffic to the httpbin service. We will see how applying circuit breakers for traffic to the httpbin service impacts the fortio client when the configured circuit breaking limits trip.

For simplicity, enable permissive traffic policy mode so that explicit SMI traffic access policies are not required for application connectivity within the mesh.

export FSM_NAMESPACE=fsm-system # Replace fsm-system with the namespace where FSM is installed
kubectl patch meshconfig fsm-mesh-config -n "$FSM_NAMESPACE" -p '{"spec":{"traffic":{"enablePermissiveTrafficPolicyMode":true}}}'  --type=merge

Deploy services

Deploy server service.

kubectl create namespace server
fsm namespace add server
kubectl apply -n server -f - <<EOF
apiVersion: v1
kind: Service
metadata:
  name: fortio
  labels:
    app: fortio
    service: fortio
spec:
  ports:
  - port: 8080
    name: http-8080
  - port: 8078
    name: tcp-8078
  - port: 8079
    name: grpc-8079
  selector:
    app: fortio
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: fortio
spec:
  replicas: 1
  selector:
    matchLabels:
      app: fortio
  template:
    metadata:
      labels:
        app: fortio
    spec:
      containers:
      - name: fortio
        image: fortio/fortio:latest_release
        imagePullPolicy: Always
        ports:
        - containerPort: 8080
          name: http
        - containerPort: 8078
          name: tcp
        - containerPort: 8079
          name: grpc
EOF

Deploy client service.

kubectl create namespace client
fsm namespace add client

kubectl apply -n client -f - <<EOF
apiVersion: v1
kind: ServiceAccount
metadata:
  name: fortio-client
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: fortio-client
spec:
  replicas: 1
  selector:
    matchLabels:
      app: fortio-client
  template:
    metadata:
      labels:
        app: fortio-client
    spec:
      serviceAccountName: fortio-client
      containers:
      - name: fortio-client
        image: fortio/fortio:latest_release
        imagePullPolicy: Always
EOF

Test

Confirm the fortio-client is able to successfully make HTTP requests to the fortio-server service on port 8080. We call the service with 10 concurrent connections (-c 10) and send 1000 requests (-n 1000). Service is configured to return HTTP status code of 511 for 20% requests.

fortio_client=`kubectl get pod -n client -l app=fortio-client -o jsonpath='{.items[0].metadata.name}'`
kubectl exec "$fortio_client" -n client -c fortio-client -- fortio load -quiet -c 10 -n 1000 -qps 200 -p 99.99 http://fortio.server.svc.cluster.local:8080/echo?status=511:20

Returned result might look something like below:

Sockets used: 205 (for perfect keepalive, would be 10)
Uniform: false, Jitter: false, Catchup allowed: true
IP addresses distribution:
10.43.43.151:8080: 205
Code 200 : 804 (80.4 %)
Code 511 : 196 (19.6 %)
All done 1000 calls (plus 0 warmup) 2.845 ms avg, 199.8 qps

Next, apply a circuit breaker configuration using the UpstreamTrafficSetting resource for traffic directed to the fortio-server service.

Policy: Error Request Count Triggers Circuit Breaker

The error request count trigger threshold is set to errorAmountThreshold=100, the circuit breaker is triggered when the error request count reaches 100, returning 503 Service Unavailable!, the circuit breaker lasts for 10s

kubectl apply -f - <<EOF
apiVersion: policy.flomesh.io/v1alpha1
kind: UpstreamTrafficSetting
metadata:
  name: http-circuit-breaking
  namespace: server
spec:
  host: fortio.server.svc.cluster.local
  connectionSettings:
    http:
      circuitBreaking:
        statTimeWindow: 1m
        minRequestAmount: 200
        errorAmountThreshold: 100
        degradedTimeWindow: 10s
        degradedStatusCode: 503
        degradedResponseContent: 'Service Unavailable!'
EOF

Set 20% of the server’s responses to return error code 511. When the error request count reaches 100, the number of successful requests should be around 400.

kubectl exec "$fortio_client" -n client -c fortio-client -- fortio load -quiet -c 10 -n 1000 -qps 200 -p 99.99 http://fortio.server.svc.cluster.local:8080/echo\?status\=511:20

From the results, it meets the expectations. After circuit breaker, the requests return the 503 error code.

Sockets used: 570 (for perfect keepalive, would be 10)
Uniform: false, Jitter: false, Catchup allowed: true
IP addresses distribution:
10.43.43.151:8080: 570
Code 200 : 430 (43.0 %)
Code 503 : 470 (47.0 %)
Code 511 : 100 (10.0 %)
All done 1000 calls (plus 0 warmup) 3.376 ms avg, 199.8 qps

Checking the sidecar logs, the error count reached 100 at the 530th request, triggering the circuit breaker.

2023-02-08 01:08:01.456 [INF] [circuit_breaker] total/slowAmount/errorAmount (open) server/fortio|8080 530 0 100

Policy: Error Rate Triggers Circuit Breaker

Here we change the trigger from error count to error rate: errorRatioThreshold=0.10, the circuit breaker is triggered when the error rate reaches 10%, it lasts for 10s and returns 503 Service Unavailable!. Note that the minimum number of requests is still 200.

kubectl apply -f - <<EOF
apiVersion: policy.flomesh.io/v1alpha1
kind: UpstreamTrafficSetting
metadata:
  name: http-circuit-breaking
  namespace: server
spec:
  host: fortio.server.svc.cluster.local
  connectionSettings:
    http:
      circuitBreaking:
        statTimeWindow: 1m
        minRequestAmount: 200
        errorRatioThreshold: 0.10
        degradedTimeWindow: 10s
        degradedStatusCode: 503
        degradedResponseContent: 'Service Unavailable!'
EOF

Set 20% of the server’s responses to return error code 511.

kubectl exec "$fortio_client" -n client -c fortio-client -- fortio load -quiet -c 10 -n 1000 -qps 200 -p 99.99 http://fortio.server.svc.cluster.local:8080/echo\?status\=511:20

From the output results, it can be seen that after 200 requests, the circuit breaker was triggered, and the number of circuit breaker requests was 800.

Sockets used: 836 (for perfect keepalive, would be 10)
Uniform: false, Jitter: false, Catchup allowed: true
IP addresses distribution:
10.43.43.151:8080: 836
Code 200 : 164 (16.4 %)
Code 503 : 800 (80.0 %)
Code 511 : 36 (3.6 %)
All done 1000 calls (plus 0 warmup) 3.605 ms avg, 199.8 qps

Checking the sidecar logs, after satisfying the minimum number of requests to trigger the circuit breaker (200), the error rate also reached the threshold, triggering the circuit breaker.

2023-02-08 01:19:25.874 [INF] [circuit_breaker] total/slowAmount/errorAmount (close) server/fortio|8080 200 0 36

Policy: Slow Call Request Count Triggers Circuit Breaker

Testing slow calls, add a 200ms delay to 20% of the requests.

kubectl exec "$fortio_client" -n client -c fortio-client -- fortio load -quiet -c 10 -n 1000 -qps 200 -p 50,78,79,80,81,82,90,95 http://fortio.server.svc.cluster.local:8080/echo\?delay\=200ms:20

A similar effect is achieved, with nearly 80% of requests taking less than 200ms.

# target 50% 0.000999031
# target 78% 0.0095
# target 79% 0.200175
# target 80% 0.200467
# target 81% 0.200759
# target 82% 0.20105
# target 90% 0.203385
# target 95% 0.204844

285103 max 0.000285103 sum 0.000285103
Sockets used: 10 (for perfect keepalive, would be 10)
Uniform: false, Jitter: false, Catchup allowed: true
IP addresses distribution:
10.43.43.151:8080: 10
Code 200 : 1000 (100.0 %)
All done 1000 calls (plus 0 warmup) 44.405 ms avg, 149.6 qps

Setting strategy:

  • Set the slow call request time threshold to 200ms
  • Set the number of slow call requests to 100
kubectl apply -f - <<EOF
apiVersion: policy.flomesh.io/v1alpha1
kind: UpstreamTrafficSetting
metadata:
  name: http-circuit-breaking
  namespace: server
spec:
  host: fortio.server.svc.cluster.local
  connectionSettings:
    http:
      circuitBreaking:
        statTimeWindow: 1m
        minRequestAmount: 200
        slowTimeThreshold: 200ms
        slowAmountThreshold: 100
        degradedTimeWindow: 10s
        degradedStatusCode: 503
        degradedResponseContent: 'Service Unavailable!'
EOF

Inject a 200ms delay for 20% of requests.

kubectl exec "$fortio_client" -n client -c fortio-client -- fortio load -quiet -c 10 -n 1000 -qps 200 -p 50,78,79,80,81,82,90,95 http://fortio.server.svc.cluster.local:8080/echo\?delay\=200ms:20

The number of successful requests is 504, with 20% taking 200ms, and the number of slow requests has reached the threshold to trigger a circuit breaker.

# target 50% 0.00246111
# target 78% 0.00393846
# target 79% 0.00398974
# target 80% 0.00409756
# target 81% 0.00421951
# target 82% 0.00434146
# target 90% 0.202764
# target 95% 0.220036

Sockets used: 496 (for perfect keepalive, would be 10)
Uniform: false, Jitter: false, Catchup allowed: true
IP addresses distribution:
10.43.43.151:8080: 496
Code 200 : 504 (50.4 %)
Code 503 : 496 (49.6 %)
All done 1000 calls (plus 0 warmup) 24.086 ms avg, 199.8 qps

Checking the logs of the sidecar, the number of slow calls reached 100 on the 504th request, triggering the circuit breaker.

2023-02-08 07:27:01.106 [INF] [circuit_breaker] total/slowAmount/errorAmount (open)  server/fortio|8080 504 100 0

Policy: Slow Call Rate Triggers Circuit Breaking

The slow call rate is set to 0.10, meaning that the circuit breaker will be triggered when it is expected that 10% of the requests within the statistical window will take more than 200ms.

kubectl apply -f - <<EOF
apiVersion: policy.flomesh.io/v1alpha1
kind: UpstreamTrafficSetting
metadata:
  name: http-circuit-breaking
  namespace: server
spec:
  host: fortio.server.svc.cluster.local
  connectionSettings:
    http:
      circuitBreaking:
        statTimeWindow: 1m
        minRequestAmount: 200
        slowTimeThreshold: 200ms
        slowRatioThreshold: 0.1
        degradedTimeWindow: 10s
        degradedStatusCode: 503
        degradedResponseContent: 'Service Unavailable!'
EOF

Add 200ms delay to 20% of requests.

kubectl exec "$fortio_client" -n client -c fortio-client -- fortio load -quiet -c 10 -n 1000 -qps 200 -p 50,78,79,80,81,82,90,95 http://fortio.server.svc.cluster.local:8080/echo\?delay\=200ms:20

From the output results, there are 202 successful requests which meets the minimum number of requests required for the configuration to take effect. Among them, 20% are slow calls, reaching the threshold for triggering circuit breaker.

# target 50% 0.00305539
# target 78% 0.00387172
# target 79% 0.00390087
# target 80% 0.00393003
# target 81% 0.00395918
# target 82% 0.00398834
# target 90% 0.00458915
# target 95% 0.00497674

Sockets used: 798 (for perfect keepalive, would be 10)
Uniform: false, Jitter: false, Catchup allowed: true
IP addresses distribution:
10.43.43.151:8080: 798
Code 200 : 202 (20.2 %)
Code 503 : 798 (79.8 %)
All done 1000 calls (plus 0 warmup) 10.133 ms avg, 199.8 qps

Check the logs of the sidecar, at the 202nd request, the number of slow requests was 28, which triggered the circuit breaker.

2023-02-08 07:38:25.284 [INF] [circuit_breaker] total/slowAmount/errorAmount (open)  server/fortio|8080 202 28 0

Feedback

Was this page helpful?


Last modified June 18, 2024: fix workflow issue (c83135d)