Skip to content

Resilience to TCP SYN Node Loss #1860

@anupamdialpad

Description

@anupamdialpad

Is your feature request related to a problem? Please describe

For long lived TCP connection if the Node which received the TCP SYN packet goes down while the connection is still open then traffic gets routed to a different node but it is not routed to the backend pod. This is a problem since the pod is still around to serve the traffic but can't do so because the Node which had the connection went down.

Describe the solution you'd like

New node which handles the traffic should successfully route the traffic to the backend pod.

Additional context

  1. There is k8s cluster with 2 nodes
  2. Service has DSR and maglev enabled
apiVersion: v1
kind: Service
metadata:
  annotations:
    kube-router.io/service.dsr: "tunnel"
    kube-router.io/service.scheduler: "mh"
    kube-router.io/service.schedflags: "flag-1,flag-2"
  1. There are 3 pods behind this service. All the pods are running on eqx-sjc-kubenode1-staging
root@gce-del-km-staging-anupam:~/anupam/manifests $ kubectl get svc,endpoints
NAME                       TYPE        CLUSTER-IP       EXTERNAL-IP     PORT(S)    AGE
service/debian-server-lb   ClusterIP   192.168.97.188   199.27.151.10   8099/TCP   5h34m

NAME                         ENDPOINTS                                         AGE
endpoints/debian-server-lb   10.36.0.84:8099,10.36.0.85:8099,10.36.0.86:8099   5h34m

root@gce-del-km-staging-anupam:~/anupam/manifests $ kubectl get pods -o wide
NAME                            READY   STATUS    RESTARTS   AGE   IP              NODE                       
debian-server-8b5467777-2shpz   1/1     Running   0          49m   10.36.0.85      eqx-sjc-kubenode1-staging 
debian-server-8b5467777-hbw29   1/1     Running   0          49m   10.36.0.86      eqx-sjc-kubenode1-staging
debian-server-8b5467777-pv9sr   1/1     Running   0          49m   10.36.0.84      eqx-sjc-kubenode1-staging  
  1. IPVS entries are successfully applied by kube-router
root@eqx-sjc-kubenode1-staging:~ $ ipvsadm -L -n
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn         
TCP  192.168.97.188:8099 mh (mh-fallback,mh-port)
  -> 10.36.0.84:8099              Masq    1      0          0         
  -> 10.36.0.85:8099              Masq    1      0          0         
  -> 10.36.0.86:8099              Masq    1      0          0         
FWM  3754 mh (mh-fallback,mh-port)
  -> 10.36.0.84:8099              Tunnel  1      0          0         
  -> 10.36.0.85:8099              Tunnel  1      0          0         
  -> 10.36.0.86:8099              Tunnel  1      0          0   

root@tlx-dal-kubenode1-staging:~ $ ipvsadm -L -n
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn       
TCP  192.168.97.188:8099 mh (mh-fallback,mh-port)
  -> 10.36.0.84:8099              Masq    1      0          0         
  -> 10.36.0.85:8099              Masq    1      0          0         
  -> 10.36.0.86:8099              Masq    1      0          0         
FWM  3754 mh (mh-fallback,mh-port)
  -> 10.36.0.84:8099              Tunnel  1      0          0         
  -> 10.36.0.85:8099              Tunnel  1      0          0         
  -> 10.36.0.86:8099              Tunnel  1      1          0     
  1. In all the 3 pods start a TCP server on port 8099 using nc -lv 0.0.0.0 8099
  2. Create a session from client which is closer to tlx-dal-kubenode1-staging using nc <service-ip> 8099
  3. A connection is established where the NAT translation happens on tlx-dal-kubenode1-staging. Now stop the external IP advertisement by removing --advertise-external-ip from the kube-router argument.
  4. Now send a message from the nc client. The traffic gets routed to eqx-sjc-kubenode1-staging, verified by running tcpdump. But the message is not visible on the backend pod server.

Add any other context or screenshots about the feature request here.

kube-router version: version 2.5.0, built on 2025-02-14T20:20:43Z, go1.23.6
kubernetes version: 1.29.14
kernel version: 5.10.0-34-amd64

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions