Skip to content

Add zebrad migrate-from-zcashd command #9472

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

str4d
Copy link
Contributor

@str4d str4d commented Apr 29, 2025

Motivation

If a user already has a zcashd node, they should be able to use its local block data to bootstrap a Zebra node, instead of requiring a fresh download.

Solution

This PR adds a command that parses the blkNNNNN.dat files from a zcashd datadir, and replays them to the Zebra state as if it just received them from the network.

Tests

Currently WIP, tested with my local zcashd node and it doesn't work due to Zebra not implementing large parts of the historic Zcash protocol spec.

Specifications & References

Follow-up Work

Needs advice on how to alter Zebra to enable this. We either need:

  • some way to alter Zebra so we can leverage its most-work chain-following logic while skipping all validation for old blocks in a way that also ensures it never persists orphan blocks (because zcashd saves all blocks as-received, even if they are later orphaned) or invalid blocks to state (which can exist in the block files on disk, if they were individually valid but contextually invalid).
  • a LevelDB dependency (which would enable Zebra to read the zcashd block index and directly find the main chain blocks, ignoring orphans.

PR Checklist

  • The PR name is suitable for the release notes.
  • The solution is tested.
  • The documentation is up to date.

Copy link
Contributor

@arya2 arya2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some way to alter Zebra so we can leverage its most-work chain-following logic while skipping all validation for old blocks in a way that also ensures it never persists orphan blocks (because zcashd saves all blocks as-received, even if they are later orphaned) or invalid blocks to state (which can exist in the block files on disk, if they were individually valid but contextually invalid).

I would suggest using the block verifier router, Zebra will skip validation for checkpointed blocks.

Comment on lines +21 to +22
//! * TODO: We currently perform full validaton because we don't read from the
//! `zcashd` block index.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
//! * TODO: We currently perform full validaton because we don't read from the
//! `zcashd` block index.

use color_eyre::eyre::{eyre, Report};
use tokio::{
fs::File,
io::{AsyncReadExt, BufReader},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
io::{AsyncReadExt, BufReader},
io::{AsyncReadExt, BufReader},
sync::oneshot,

io::{AsyncReadExt, BufReader},
time::Instant,
};
use tower::{Service, ServiceExt};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
use tower::{Service, ServiceExt};
use tower::{buffer::Buffer, util::BoxService, Service, ServiceBuilder, ServiceExt};

block::{Block, Height},
parameters::{Magic, Network},
serialization::ZcashDeserialize,
};
Copy link
Contributor

@arya2 arya2 May 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
};
chain_tip::ChainTip,
};
use zebra_node_services::mempool;


/// How often we log info-level progress messages
const PROGRESS_HEIGHT_INTERVAL: u32 = 5_000;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// The maximum number of unprocessed messages to buffer for
/// the state service when migrating from zcashd.
const STATE_BUFFER_BOUND: usize = 100;

Comment on lines +77 to +79
self.migrate(app_config.network.network.clone(), app_config.state.clone())
.await
.map_err(|e| eyre!(e))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
self.migrate(app_config.network.network.clone(), app_config.state.clone())
.await
.map_err(|e| eyre!(e))
self.migrate(
app_config.network.network.clone(),
app_config.state.clone(),
app_config.consensus.clone(),
)
.await
.map_err(|e| eyre!(e))

async fn migrate(
&self,
network: Network,
target_config: zebra_state::Config,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
target_config: zebra_state::Config,
target_state_config: zebra_state::Config,
target_consensus_config: zebra_consensus::Config,

network: Network,
target_config: zebra_state::Config,
) -> Result<(), BoxError> {
info!(?target_config, "initializing target state service");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
info!(?target_config, "initializing target state service");
info!(
?target_state_config,
?target_consensus_config,
"initializing target state service"
);

Comment on lines +97 to +102
let (
mut target_state,
_target_read_only_state_service,
_target_latest_chain_tip,
_target_chain_tip_change,
) = zebra_state::spawn_init(target_config.clone(), &network, Height::MAX, 0).await?;
Copy link
Contributor

@arya2 arya2 May 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm skeptical of there being any noticeable benefit from using PrepareForBulkLoad() until we switch to shardtrees, but it would be nice to have.

Note: The transaction verifier works with a dummy mempool setup channel receiver.

Suggested change
let (
mut target_state,
_target_read_only_state_service,
_target_latest_chain_tip,
_target_chain_tip_change,
) = zebra_state::spawn_init(target_config.clone(), &network, Height::MAX, 0).await?;
let (
target_state_service,
_target_read_state,
target_latest_chain_tip,
_target_chain_tip_change,
) = zebra_state::spawn_init(target_state_config.clone(), &network, Height::MAX, 0).await?;
let target_state = ServiceBuilder::new()
.buffer(STATE_BUFFER_BOUND)
.service(target_state_service);
let (
mut block_verifier_router,
_tx_verifier,
_consensus_task_handles,
_max_checkpoint_height,
) = zebra_consensus::router::init(
target_consensus_config,
&network,
target_state,
oneshot::channel::<
Buffer<BoxService<mempool::Request, mempool::Response, BoxError>, mempool::Request>,
>()
.1,
)
.await;

Comment on lines +150 to +230
let target_block_commit_hash = target_state
.ready()
.await?
.call(if height == Height::MIN {
// We can always trust the genesis block from a `zcashd` datadir to be
// the only block with height 0 due to how `zcashd` sideloads it into
// new datadirs.
zebra_state::Request::CommitCheckpointVerifiedBlock(source_block.clone().into())
} else {
// We can't use `CommitCheckpointVerifiedBlock` here because `zcashd`
// block files contain the blocks as-received from the network, and
// can include orphaned blocks that aren't within the checkpoint.
// TODO: The only consensus logic we need Zebra to do for historic
// blocks is to find the most-work chain; every other consensus rule
// can be presumed-valid for blocks that end up in the main chain
// (and certainly for blocks that end up in the checkpoint).
zebra_state::Request::CommitSemanticallyVerifiedBlock(
source_block.clone().into(),
)
})
.await?;
let target_block_commit_hash = match target_block_commit_hash {
zebra_state::Response::Committed(target_block_commit_hash) => {
trace!(?target_block_commit_hash, "wrote Zebra block");
target_block_commit_hash
}
response => Err(format!(
"unexpected response to CommitSemanticallyVerifiedBlock request, height: {}\n \
response: {response:?}",
height.0,
))?,
};

// Read written block from target
let target_block = target_state
.ready()
.await?
.call(zebra_state::Request::Block(height.into()))
.await?;
let target_block = match target_block {
zebra_state::Response::Block(Some(target_block)) => {
trace!(?height, %target_block, "read Zebra block");
target_block
}
zebra_state::Response::Block(None) => Err(format!(
"unexpected missing Zebra block, height: {}",
height.0,
))?,

response => Err(format!(
"unexpected response to Block request, height: {},\n \
response: {response:?}",
height.0,
))?,
};
let target_block_data_hash = target_block.hash();

// Check for data errors
//
// These checks make sure that Zebra doesn't corrupt the block data
// when serializing it.
// Zebra currently serializes `Block` structs into bytes while writing,
// then deserializes bytes into new `Block` structs when reading.
// So these checks are sufficient to detect block data corruption.
//
// If Zebra starts reusing cached `Block` structs after writing them,
// we'll also need to check `Block` structs created from the actual database bytes.
if source_block_hash != target_block_commit_hash
|| source_block_hash != target_block_data_hash
|| source_block != target_block
{
Err(format!(
"unexpected mismatch between zcashd and Zebra blocks,\n \
max copy height: {max_copy_height:?},\n \
zcashd hash: {source_block_hash:?},\n \
Zebra commit hash: {target_block_commit_hash:?},\n \
Zebra data hash: {target_block_data_hash:?},\n \
zcashd block: {source_block:?},\n \
Zebra block: {target_block:?}",
))?;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first two checks here:

            if source_block_hash != target_block_commit_hash
                || source_block_hash != target_block_data_hash
                || source_block != target_block

are now happening here in the block verifier and here in the checkpoint verifier.

Is the third check necessary?

Suggested change
let target_block_commit_hash = target_state
.ready()
.await?
.call(if height == Height::MIN {
// We can always trust the genesis block from a `zcashd` datadir to be
// the only block with height 0 due to how `zcashd` sideloads it into
// new datadirs.
zebra_state::Request::CommitCheckpointVerifiedBlock(source_block.clone().into())
} else {
// We can't use `CommitCheckpointVerifiedBlock` here because `zcashd`
// block files contain the blocks as-received from the network, and
// can include orphaned blocks that aren't within the checkpoint.
// TODO: The only consensus logic we need Zebra to do for historic
// blocks is to find the most-work chain; every other consensus rule
// can be presumed-valid for blocks that end up in the main chain
// (and certainly for blocks that end up in the checkpoint).
zebra_state::Request::CommitSemanticallyVerifiedBlock(
source_block.clone().into(),
)
})
.await?;
let target_block_commit_hash = match target_block_commit_hash {
zebra_state::Response::Committed(target_block_commit_hash) => {
trace!(?target_block_commit_hash, "wrote Zebra block");
target_block_commit_hash
}
response => Err(format!(
"unexpected response to CommitSemanticallyVerifiedBlock request, height: {}\n \
response: {response:?}",
height.0,
))?,
};
// Read written block from target
let target_block = target_state
.ready()
.await?
.call(zebra_state::Request::Block(height.into()))
.await?;
let target_block = match target_block {
zebra_state::Response::Block(Some(target_block)) => {
trace!(?height, %target_block, "read Zebra block");
target_block
}
zebra_state::Response::Block(None) => Err(format!(
"unexpected missing Zebra block, height: {}",
height.0,
))?,
response => Err(format!(
"unexpected response to Block request, height: {},\n \
response: {response:?}",
height.0,
))?,
};
let target_block_data_hash = target_block.hash();
// Check for data errors
//
// These checks make sure that Zebra doesn't corrupt the block data
// when serializing it.
// Zebra currently serializes `Block` structs into bytes while writing,
// then deserializes bytes into new `Block` structs when reading.
// So these checks are sufficient to detect block data corruption.
//
// If Zebra starts reusing cached `Block` structs after writing them,
// we'll also need to check `Block` structs created from the actual database bytes.
if source_block_hash != target_block_commit_hash
|| source_block_hash != target_block_data_hash
|| source_block != target_block
{
Err(format!(
"unexpected mismatch between zcashd and Zebra blocks,\n \
max copy height: {max_copy_height:?},\n \
zcashd hash: {source_block_hash:?},\n \
Zebra commit hash: {target_block_commit_hash:?},\n \
Zebra data hash: {target_block_data_hash:?},\n \
zcashd block: {source_block:?},\n \
Zebra block: {target_block:?}",
))?;
}
block_verifier_router
.ready()
.await?
.call(zebra_consensus::Request::Commit(source_block))
.await?;

Comment on lines +114 to +121
let initial_target_tip = match initial_target_tip {
zebra_state::Response::Tip(target_tip) => target_tip,

response => Err(format!("unexpected response to Tip request: {response:?}",))?,
};
let min_target_height = initial_target_tip
.map(|target_tip| target_tip.0 .0 + 1)
.unwrap_or(0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let initial_target_tip = match initial_target_tip {
zebra_state::Response::Tip(target_tip) => target_tip,
response => Err(format!("unexpected response to Tip request: {response:?}",))?,
};
let min_target_height = initial_target_tip
.map(|target_tip| target_tip.0 .0 + 1)
.unwrap_or(0);
let initial_target_tip = target_latest_chain_tip.best_tip_height();
let min_target_height = initial_target_tip
.map(|Height(target_tip)| target_tip + 1)
.unwrap_or(0);

@oxarbitrage
Copy link
Contributor

Note: During the Arborist call, there was a suggestion to move this code to the zebra-utils crate as a new binary instead of a new zebrad command.

@arya2
Copy link
Contributor

arya2 commented May 2, 2025

Note: During the Arborist call, there was a suggestion to move this code to the zebra-utils crate as a new binary instead of a new zebrad command.

It would be nice to avoid adding a dependency to zebrad, especially one like levelsdb, but if that can be avoided, having a command in zebrad seems like a better UX while needing fewer modifications.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants