-
Notifications
You must be signed in to change notification settings - Fork 132
Add zebrad migrate-from-zcashd
command
#9472
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some way to alter Zebra so we can leverage its most-work chain-following logic while skipping all validation for old blocks in a way that also ensures it never persists orphan blocks (because zcashd saves all blocks as-received, even if they are later orphaned) or invalid blocks to state (which can exist in the block files on disk, if they were individually valid but contextually invalid).
I would suggest using the block verifier router, Zebra will skip validation for checkpointed blocks.
//! * TODO: We currently perform full validaton because we don't read from the | ||
//! `zcashd` block index. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
//! * TODO: We currently perform full validaton because we don't read from the | |
//! `zcashd` block index. |
use color_eyre::eyre::{eyre, Report}; | ||
use tokio::{ | ||
fs::File, | ||
io::{AsyncReadExt, BufReader}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
io::{AsyncReadExt, BufReader}, | |
io::{AsyncReadExt, BufReader}, | |
sync::oneshot, |
io::{AsyncReadExt, BufReader}, | ||
time::Instant, | ||
}; | ||
use tower::{Service, ServiceExt}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use tower::{Service, ServiceExt}; | |
use tower::{buffer::Buffer, util::BoxService, Service, ServiceBuilder, ServiceExt}; |
block::{Block, Height}, | ||
parameters::{Magic, Network}, | ||
serialization::ZcashDeserialize, | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
}; | |
chain_tip::ChainTip, | |
}; | |
use zebra_node_services::mempool; |
|
||
/// How often we log info-level progress messages | ||
const PROGRESS_HEIGHT_INTERVAL: u32 = 5_000; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// The maximum number of unprocessed messages to buffer for | |
/// the state service when migrating from zcashd. | |
const STATE_BUFFER_BOUND: usize = 100; | |
self.migrate(app_config.network.network.clone(), app_config.state.clone()) | ||
.await | ||
.map_err(|e| eyre!(e)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self.migrate(app_config.network.network.clone(), app_config.state.clone()) | |
.await | |
.map_err(|e| eyre!(e)) | |
self.migrate( | |
app_config.network.network.clone(), | |
app_config.state.clone(), | |
app_config.consensus.clone(), | |
) | |
.await | |
.map_err(|e| eyre!(e)) |
async fn migrate( | ||
&self, | ||
network: Network, | ||
target_config: zebra_state::Config, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
target_config: zebra_state::Config, | |
target_state_config: zebra_state::Config, | |
target_consensus_config: zebra_consensus::Config, |
network: Network, | ||
target_config: zebra_state::Config, | ||
) -> Result<(), BoxError> { | ||
info!(?target_config, "initializing target state service"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
info!(?target_config, "initializing target state service"); | |
info!( | |
?target_state_config, | |
?target_consensus_config, | |
"initializing target state service" | |
); |
let ( | ||
mut target_state, | ||
_target_read_only_state_service, | ||
_target_latest_chain_tip, | ||
_target_chain_tip_change, | ||
) = zebra_state::spawn_init(target_config.clone(), &network, Height::MAX, 0).await?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm skeptical of there being any noticeable benefit from using PrepareForBulkLoad()
until we switch to shardtrees, but it would be nice to have.
Note: The transaction verifier works with a dummy mempool setup channel receiver.
let ( | |
mut target_state, | |
_target_read_only_state_service, | |
_target_latest_chain_tip, | |
_target_chain_tip_change, | |
) = zebra_state::spawn_init(target_config.clone(), &network, Height::MAX, 0).await?; | |
let ( | |
target_state_service, | |
_target_read_state, | |
target_latest_chain_tip, | |
_target_chain_tip_change, | |
) = zebra_state::spawn_init(target_state_config.clone(), &network, Height::MAX, 0).await?; | |
let target_state = ServiceBuilder::new() | |
.buffer(STATE_BUFFER_BOUND) | |
.service(target_state_service); | |
let ( | |
mut block_verifier_router, | |
_tx_verifier, | |
_consensus_task_handles, | |
_max_checkpoint_height, | |
) = zebra_consensus::router::init( | |
target_consensus_config, | |
&network, | |
target_state, | |
oneshot::channel::< | |
Buffer<BoxService<mempool::Request, mempool::Response, BoxError>, mempool::Request>, | |
>() | |
.1, | |
) | |
.await; | |
let target_block_commit_hash = target_state | ||
.ready() | ||
.await? | ||
.call(if height == Height::MIN { | ||
// We can always trust the genesis block from a `zcashd` datadir to be | ||
// the only block with height 0 due to how `zcashd` sideloads it into | ||
// new datadirs. | ||
zebra_state::Request::CommitCheckpointVerifiedBlock(source_block.clone().into()) | ||
} else { | ||
// We can't use `CommitCheckpointVerifiedBlock` here because `zcashd` | ||
// block files contain the blocks as-received from the network, and | ||
// can include orphaned blocks that aren't within the checkpoint. | ||
// TODO: The only consensus logic we need Zebra to do for historic | ||
// blocks is to find the most-work chain; every other consensus rule | ||
// can be presumed-valid for blocks that end up in the main chain | ||
// (and certainly for blocks that end up in the checkpoint). | ||
zebra_state::Request::CommitSemanticallyVerifiedBlock( | ||
source_block.clone().into(), | ||
) | ||
}) | ||
.await?; | ||
let target_block_commit_hash = match target_block_commit_hash { | ||
zebra_state::Response::Committed(target_block_commit_hash) => { | ||
trace!(?target_block_commit_hash, "wrote Zebra block"); | ||
target_block_commit_hash | ||
} | ||
response => Err(format!( | ||
"unexpected response to CommitSemanticallyVerifiedBlock request, height: {}\n \ | ||
response: {response:?}", | ||
height.0, | ||
))?, | ||
}; | ||
|
||
// Read written block from target | ||
let target_block = target_state | ||
.ready() | ||
.await? | ||
.call(zebra_state::Request::Block(height.into())) | ||
.await?; | ||
let target_block = match target_block { | ||
zebra_state::Response::Block(Some(target_block)) => { | ||
trace!(?height, %target_block, "read Zebra block"); | ||
target_block | ||
} | ||
zebra_state::Response::Block(None) => Err(format!( | ||
"unexpected missing Zebra block, height: {}", | ||
height.0, | ||
))?, | ||
|
||
response => Err(format!( | ||
"unexpected response to Block request, height: {},\n \ | ||
response: {response:?}", | ||
height.0, | ||
))?, | ||
}; | ||
let target_block_data_hash = target_block.hash(); | ||
|
||
// Check for data errors | ||
// | ||
// These checks make sure that Zebra doesn't corrupt the block data | ||
// when serializing it. | ||
// Zebra currently serializes `Block` structs into bytes while writing, | ||
// then deserializes bytes into new `Block` structs when reading. | ||
// So these checks are sufficient to detect block data corruption. | ||
// | ||
// If Zebra starts reusing cached `Block` structs after writing them, | ||
// we'll also need to check `Block` structs created from the actual database bytes. | ||
if source_block_hash != target_block_commit_hash | ||
|| source_block_hash != target_block_data_hash | ||
|| source_block != target_block | ||
{ | ||
Err(format!( | ||
"unexpected mismatch between zcashd and Zebra blocks,\n \ | ||
max copy height: {max_copy_height:?},\n \ | ||
zcashd hash: {source_block_hash:?},\n \ | ||
Zebra commit hash: {target_block_commit_hash:?},\n \ | ||
Zebra data hash: {target_block_data_hash:?},\n \ | ||
zcashd block: {source_block:?},\n \ | ||
Zebra block: {target_block:?}", | ||
))?; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The first two checks here:
if source_block_hash != target_block_commit_hash
|| source_block_hash != target_block_data_hash
|| source_block != target_block
are now happening here in the block verifier and here in the checkpoint verifier.
Is the third check necessary?
let target_block_commit_hash = target_state | |
.ready() | |
.await? | |
.call(if height == Height::MIN { | |
// We can always trust the genesis block from a `zcashd` datadir to be | |
// the only block with height 0 due to how `zcashd` sideloads it into | |
// new datadirs. | |
zebra_state::Request::CommitCheckpointVerifiedBlock(source_block.clone().into()) | |
} else { | |
// We can't use `CommitCheckpointVerifiedBlock` here because `zcashd` | |
// block files contain the blocks as-received from the network, and | |
// can include orphaned blocks that aren't within the checkpoint. | |
// TODO: The only consensus logic we need Zebra to do for historic | |
// blocks is to find the most-work chain; every other consensus rule | |
// can be presumed-valid for blocks that end up in the main chain | |
// (and certainly for blocks that end up in the checkpoint). | |
zebra_state::Request::CommitSemanticallyVerifiedBlock( | |
source_block.clone().into(), | |
) | |
}) | |
.await?; | |
let target_block_commit_hash = match target_block_commit_hash { | |
zebra_state::Response::Committed(target_block_commit_hash) => { | |
trace!(?target_block_commit_hash, "wrote Zebra block"); | |
target_block_commit_hash | |
} | |
response => Err(format!( | |
"unexpected response to CommitSemanticallyVerifiedBlock request, height: {}\n \ | |
response: {response:?}", | |
height.0, | |
))?, | |
}; | |
// Read written block from target | |
let target_block = target_state | |
.ready() | |
.await? | |
.call(zebra_state::Request::Block(height.into())) | |
.await?; | |
let target_block = match target_block { | |
zebra_state::Response::Block(Some(target_block)) => { | |
trace!(?height, %target_block, "read Zebra block"); | |
target_block | |
} | |
zebra_state::Response::Block(None) => Err(format!( | |
"unexpected missing Zebra block, height: {}", | |
height.0, | |
))?, | |
response => Err(format!( | |
"unexpected response to Block request, height: {},\n \ | |
response: {response:?}", | |
height.0, | |
))?, | |
}; | |
let target_block_data_hash = target_block.hash(); | |
// Check for data errors | |
// | |
// These checks make sure that Zebra doesn't corrupt the block data | |
// when serializing it. | |
// Zebra currently serializes `Block` structs into bytes while writing, | |
// then deserializes bytes into new `Block` structs when reading. | |
// So these checks are sufficient to detect block data corruption. | |
// | |
// If Zebra starts reusing cached `Block` structs after writing them, | |
// we'll also need to check `Block` structs created from the actual database bytes. | |
if source_block_hash != target_block_commit_hash | |
|| source_block_hash != target_block_data_hash | |
|| source_block != target_block | |
{ | |
Err(format!( | |
"unexpected mismatch between zcashd and Zebra blocks,\n \ | |
max copy height: {max_copy_height:?},\n \ | |
zcashd hash: {source_block_hash:?},\n \ | |
Zebra commit hash: {target_block_commit_hash:?},\n \ | |
Zebra data hash: {target_block_data_hash:?},\n \ | |
zcashd block: {source_block:?},\n \ | |
Zebra block: {target_block:?}", | |
))?; | |
} | |
block_verifier_router | |
.ready() | |
.await? | |
.call(zebra_consensus::Request::Commit(source_block)) | |
.await?; |
let initial_target_tip = match initial_target_tip { | ||
zebra_state::Response::Tip(target_tip) => target_tip, | ||
|
||
response => Err(format!("unexpected response to Tip request: {response:?}",))?, | ||
}; | ||
let min_target_height = initial_target_tip | ||
.map(|target_tip| target_tip.0 .0 + 1) | ||
.unwrap_or(0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let initial_target_tip = match initial_target_tip { | |
zebra_state::Response::Tip(target_tip) => target_tip, | |
response => Err(format!("unexpected response to Tip request: {response:?}",))?, | |
}; | |
let min_target_height = initial_target_tip | |
.map(|target_tip| target_tip.0 .0 + 1) | |
.unwrap_or(0); | |
let initial_target_tip = target_latest_chain_tip.best_tip_height(); | |
let min_target_height = initial_target_tip | |
.map(|Height(target_tip)| target_tip + 1) | |
.unwrap_or(0); |
Note: During the Arborist call, there was a suggestion to move this code to the |
It would be nice to avoid adding a dependency to zebrad, especially one like levelsdb, but if that can be avoided, having a command in zebrad seems like a better UX while needing fewer modifications. |
Motivation
If a user already has a
zcashd
node, they should be able to use its local block data to bootstrap a Zebra node, instead of requiring a fresh download.Solution
This PR adds a command that parses the
blkNNNNN.dat
files from azcashd
datadir, and replays them to the Zebra state as if it just received them from the network.Tests
Currently WIP, tested with my local
zcashd
node and it doesn't work due to Zebra not implementing large parts of the historic Zcash protocol spec.Specifications & References
Follow-up Work
Needs advice on how to alter Zebra to enable this. We either need:
zcashd
saves all blocks as-received, even if they are later orphaned) or invalid blocks to state (which can exist in the block files on disk, if they were individually valid but contextually invalid).zcashd
block index and directly find the main chain blocks, ignoring orphans.PR Checklist