-
Notifications
You must be signed in to change notification settings - Fork 481
Closed
Labels
Description
Is your feature request related to a problem? Please describe
For long lived TCP connection if the Node which received the TCP SYN packet goes down while the connection is still open then traffic gets routed to a different node but it is not routed to the backend pod. This is a problem since the pod is still around to serve the traffic but can't do so because the Node which had the connection went down.
Describe the solution you'd like
New node which handles the traffic should successfully route the traffic to the backend pod.
Additional context
- There is k8s cluster with 2 nodes
- Service has DSR and maglev enabled
apiVersion: v1
kind: Service
metadata:
annotations:
kube-router.io/service.dsr: "tunnel"
kube-router.io/service.scheduler: "mh"
kube-router.io/service.schedflags: "flag-1,flag-2"
- There are 3 pods behind this service. All the pods are running on
eqx-sjc-kubenode1-staging
root@gce-del-km-staging-anupam:~/anupam/manifests $ kubectl get svc,endpoints
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/debian-server-lb ClusterIP 192.168.97.188 199.27.151.10 8099/TCP 5h34m
NAME ENDPOINTS AGE
endpoints/debian-server-lb 10.36.0.84:8099,10.36.0.85:8099,10.36.0.86:8099 5h34m
root@gce-del-km-staging-anupam:~/anupam/manifests $ kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE
debian-server-8b5467777-2shpz 1/1 Running 0 49m 10.36.0.85 eqx-sjc-kubenode1-staging
debian-server-8b5467777-hbw29 1/1 Running 0 49m 10.36.0.86 eqx-sjc-kubenode1-staging
debian-server-8b5467777-pv9sr 1/1 Running 0 49m 10.36.0.84 eqx-sjc-kubenode1-staging
- IPVS entries are successfully applied by kube-router
root@eqx-sjc-kubenode1-staging:~ $ ipvsadm -L -n
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
-> RemoteAddress:Port Forward Weight ActiveConn InActConn
TCP 192.168.97.188:8099 mh (mh-fallback,mh-port)
-> 10.36.0.84:8099 Masq 1 0 0
-> 10.36.0.85:8099 Masq 1 0 0
-> 10.36.0.86:8099 Masq 1 0 0
FWM 3754 mh (mh-fallback,mh-port)
-> 10.36.0.84:8099 Tunnel 1 0 0
-> 10.36.0.85:8099 Tunnel 1 0 0
-> 10.36.0.86:8099 Tunnel 1 0 0
root@tlx-dal-kubenode1-staging:~ $ ipvsadm -L -n
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
-> RemoteAddress:Port Forward Weight ActiveConn InActConn
TCP 192.168.97.188:8099 mh (mh-fallback,mh-port)
-> 10.36.0.84:8099 Masq 1 0 0
-> 10.36.0.85:8099 Masq 1 0 0
-> 10.36.0.86:8099 Masq 1 0 0
FWM 3754 mh (mh-fallback,mh-port)
-> 10.36.0.84:8099 Tunnel 1 0 0
-> 10.36.0.85:8099 Tunnel 1 0 0
-> 10.36.0.86:8099 Tunnel 1 1 0
- In all the 3 pods start a TCP server on port 8099 using
nc -lv 0.0.0.0 8099
- Create a session from client which is closer to
tlx-dal-kubenode1-staging
usingnc <service-ip> 8099
- A connection is established where the NAT translation happens on
tlx-dal-kubenode1-staging
. Now stop the external IP advertisement by removing--advertise-external-ip
from the kube-router argument. - Now send a message from the nc client. The traffic gets routed to
eqx-sjc-kubenode1-staging
, verified by running tcpdump. But the message is not visible on the backend pod server.
Add any other context or screenshots about the feature request here.
kube-router version: version 2.5.0, built on 2025-02-14T20:20:43Z, go1.23.6
kubernetes version: 1.29.14
kernel version: 5.10.0-34-amd64