Skip to content

Extremely slow startup after restarting node during syncing from scratch #6802

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
poszu opened this issue Mar 24, 2025 · 1 comment
Open
Labels

Comments

@poszu
Copy link
Contributor

poszu commented Mar 24, 2025

Description

The node starts very slowly after its restarted during syncing from genesis. It hangs on warming up the ATX in-memory cache because it loads all ATXs from the genesis into memory.

💡 The cache warmup code loads all ATXs starting from the last applied epoch. In the case of syncing from genesis its the epoch 0 because we don't sync and apply layers until ATX sync is completed.

💡 To fix it, the code could sync layers in epoch X immediately once all ATXs for epochs 0 - X are synced.

Steps to reproduce

  1. start a fresh node to start syncing from genesis
  2. wait until it syncs few epochs (e.g. 15)
  3. stop it
  4. start it again

Actual Behavior

Node start hangs on warmup

Expected Behavior

Node should warmup quickly and continue to sync

Environment

irrelevant

Additional Resources

none

@poszu poszu added the bug label Mar 24, 2025
@fasmat
Copy link
Member

fasmat commented Mar 24, 2025

The problem not only manifests when restarting a node that is syncing from genesis. The underlying problem is that layers aren't applied until the node considers itself "ATX synced". All ATXs published in epochs since the epoch of last applied layer (in the case of sync from genesis 0).

I believe the code that needs to be changed is this section here:

func (s *Syncer) processLayers(ctx context.Context) error {
ctx = log.WithNewSessionID(ctx)
if !s.ticker.CurrentLayer().After(types.GetEffectiveGenesis()) {
return nil
}
if !s.ListenToATXGossip() {
return errATXsNotSynced
}

Instead of just waiting until all ATXs are synced and then processing all layers wait until at least one more epoch is synced than s.getLastSyncedLayer() and process those layers.

Possibly other places in the code need to be adjusted as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants