Skip to content

Race condition in connection manager - handshake completion not properly synchronized #1654

Open
@sanity

Description

@sanity

Problem

The test test_gw_to_peer_outbound_conn_forwarded was failing in CI due to a race condition where connection establishment events are not properly synchronized. The test required a 500ms delay to work around timing issues.

Root Cause

  1. Non-atomic connection management: add_connection() may return before the connection is fully available for forwarding operations
  2. Asynchronous event handling without proper synchronization: The handshake handler processes events sequentially but doesn't ensure proper ordering/completion
  3. Event emission timing: Events may be emitted before the underlying operations are fully complete

Current Workaround

Added a 500ms delay in the test between establishing peer connection and joiner connection. This is not acceptable for a production system focused on efficiency.

Expected Behavior

  • Connection establishment should be atomic and deterministic
  • Once add_connection() returns, the connection should be immediately available for forwarding
  • No time-based delays should be required for proper operation
  • Connection establishment latency should be minimal and predictable

Proposed Solutions

  1. Synchronous connection management: Make add_connection() atomic and blocking until connection is fully ready
  2. Event completion signals: Only emit events when operations are fully complete
  3. Proper async coordination: Use async synchronization primitives (channels, barriers, etc.) instead of time delays
  4. Connection state tracking: Implement proper state machine for connection lifecycle

Impact

  • Performance: 500ms delays are unacceptable for a distributed system
  • Reliability: Race conditions can cause intermittent failures
  • Scalability: Poor connection management affects network topology building
  • Developer experience: Flaky tests waste CI resources and developer time

Files Involved

  • crates/core/src/node/network_bridge/handshake.rs (test with workaround)
  • crates/core/src/ring/connection_manager.rs (connection management)
  • crates/core/src/operations/connect.rs (forwarding logic)

This issue should be prioritized as it affects core networking functionality and violates Freenet's efficiency design goals.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    Status

    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions