Skip to content

Gateway keep-alive mechanism stops sending packets after ~30 seconds #1669

Open
@sanity

Description

@sanity

Summary

When connecting through the vega gateway (EC2 instance), the keep-alive mechanism sends NoOp packets correctly for approximately 30 seconds, then stops sending them entirely. This causes connections to timeout and reconnect in an endless loop. The technic gateway (local LAN) works correctly with bidirectional keep-alive packets.

Environment

  • Freenet version: 0.1.10 (latest release with WASM polling fix)
  • Affected gateway: vega.locut.us (100.27.151.80) - EC2 instance
  • Working gateway: technic.locut.us (136.62.52.28:31337) - Local LAN
  • Client: Local freenet node behind NAT (Google Fiber residential connection)

Confirmed Facts

  1. NAT traversal is working correctly

    • Client successfully receives packets from vega gateway
    • No firewall/routing issues - packets flow in both directions
    • tcpdump confirms bidirectional UDP traffic
  2. Vega gateway behavior pattern

    • Sends keep-alive packets (NoOp with no receipts) every 10 seconds initially
    • Keep-alives arrive at: 06:10:00, 06:10:10, 06:10:20, 06:10:30
    • After ~30 seconds, keep-alive packets stop completely
    • Receipt acknowledgment packets (NoOp with receipts) continue working
    • Connection times out after 30 seconds of no keep-alives
    • Reconnection succeeds but pattern repeats
  3. Technic gateway behavior

    • Sends keep-alive packets consistently every 10 seconds
    • Never stops sending keep-alives
    • Connection remains stable indefinitely
  4. Packet types observed

    // From vega - only 4 keep-alives then stops:
    Received NoOp keep-alive packet (no receipts), remote: 100.27.151.80:31337
    
    // From vega - receipts continue working:
    Received NoOp receipt packet, remote: 100.27.151.80:31337, packet_id: 65, receipt_count: 1
    
    // From technic - continuous keep-alives:
    Received NoOp keep-alive packet (no receipts), remote: 136.62.52.28:31337
    

Hypothesis

The keep-alive task on the vega gateway appears to crash, panic, or otherwise stop executing after ~30 seconds. Since receipt packets continue to work, the connection itself remains functional - only the keep-alive timer task fails.

Possible causes:

  • Resource exhaustion on EC2 instance
  • Task panic that's being silently caught
  • Race condition in keep-alive timer
  • Different behavior between local and gateway mode

Reproduction Steps

  1. Start local freenet node: freenet network --ws-api-port 55509
  2. Monitor keep-alive packets: RUST_LOG=freenet_core::transport::keepalive=info
  3. Observe connection to vega gateway
  4. Note that keep-alives stop after ~30 seconds
  5. Connection times out and reconnects, pattern repeats

Debug Logs

Keep-alive packets from vega (note they stop after 30s):

2025-06-16T06:10:00.549737Z INFO Received NoOp keep-alive packet (no receipts), remote: 100.27.151.80:31337, packet_id: 17
2025-06-16T06:10:10.549347Z INFO Received NoOp keep-alive packet (no receipts), remote: 100.27.151.80:31337, packet_id: 34
2025-06-16T06:10:20.549744Z INFO Received NoOp keep-alive packet (no receipts), remote: 100.27.151.80:31337, packet_id: 50
2025-06-16T06:10:30.549961Z INFO Received NoOp keep-alive packet (no receipts), remote: 100.27.151.80:31337, packet_id: 69
[No more keep-alives received, connection times out at 06:11:00]

Impact

This issue prevents stable connections through EC2-hosted gateways, affecting:

  • River chat application (gets stuck on "Subscribing to room...")
  • Any long-lived connections through cloud-hosted gateways
  • Network reliability for users connecting through affected gateways

Related Information

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    Status

    Triage

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions