Add `zebrad migrate-from-zcashd` command #9472

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Draft

str4d wants to merge 1 commit into ZcashFoundation:main from str4d:cli-migrate-from-zcashd

Contributor

str4d commented Apr 29, 2025

Motivation

If a user already has a zcashd node, they should be able to use its local block data to bootstrap a Zebra node, instead of requiring a fresh download.

Solution

This PR adds a command that parses the blkNNNNN.dat files from a zcashd datadir, and replays them to the Zebra state as if it just received them from the network.

Tests

Currently WIP, tested with my local zcashd node and it doesn't work due to Zebra not implementing large parts of the historic Zcash protocol spec.

Specifications & References

Follow-up Work

Needs advice on how to alter Zebra to enable this. We either need:

some way to alter Zebra so we can leverage its most-work chain-following logic while skipping all validation for old blocks in a way that also ensures it never persists orphan blocks (because zcashd saves all blocks as-received, even if they are later orphaned) or invalid blocks to state (which can exist in the block files on disk, if they were individually valid but contextually invalid).
a LevelDB dependency (which would enable Zebra to read the zcashd block index and directly find the main chain blocks, ignoring orphans.

PR Checklist

The PR name is suitable for the release notes.
The solution is tested.
The documentation is up to date.


          Add zebrad migrate-from-zcashd command

51ae0ff

arya2 reviewed

View reviewed changes

Contributor

arya2 left a comment

some way to alter Zebra so we can leverage its most-work chain-following logic while skipping all validation for old blocks in a way that also ensures it never persists orphan blocks (because zcashd saves all blocks as-received, even if they are later orphaned) or invalid blocks to state (which can exist in the block files on disk, if they were individually valid but contextually invalid).

I would suggest using the block verifier router, Zebra will skip validation for checkpointed blocks.

zebrad/src/commands/migrate_from_zcashd.rs

Comment on lines +21 to +22

		//! * TODO: We currently perform full validaton because we don't read from the
		//! `zcashd` block index.

Contributor

arya2 May 1, 2025

Suggested change

      
            //!      * TODO: We currently perform full validaton because we don't read from the
          
            //!        `zcashd` block index.

zebrad/src/commands/migrate_from_zcashd.rs

+              use color_eyre::eyre::{eyre, Report};
+              use tokio::{
+                  fs::File,
+                  io::{AsyncReadExt, BufReader},

Contributor

arya2 May 1, 2025

Suggested change

      
                io::{AsyncReadExt, BufReader},
          
                io::{AsyncReadExt, BufReader},
          
                sync::oneshot,

zebrad/src/commands/migrate_from_zcashd.rs

+                  io::{AsyncReadExt, BufReader},
+                  time::Instant,
+              };
+              use tower::{Service, ServiceExt};

Contributor

arya2 May 1, 2025

Suggested change

      
            use tower::{Service, ServiceExt};
          
            use tower::{buffer::Buffer, util::BoxService, Service, ServiceBuilder, ServiceExt};

zebrad/src/commands/migrate_from_zcashd.rs

+                  block::{Block, Height},
+                  parameters::{Magic, Network},
+                  serialization::ZcashDeserialize,
+              };

Contributor

arya2 May 1, 2025 •

edited

Loading

Suggested change

      
            };
          
                chain_tip::ChainTip,
          
            };
          
            use zebra_node_services::mempool;

zebrad/src/commands/migrate_from_zcashd.rs


		/// How often we log info-level progress messages
		const PROGRESS_HEIGHT_INTERVAL: u32 = 5_000;

Contributor

arya2 May 1, 2025

Suggested change

      
            /// The maximum number of unprocessed messages to buffer for
          
            /// the state service when migrating from zcashd.
          
            const STATE_BUFFER_BOUND: usize = 100;

zebrad/src/commands/migrate_from_zcashd.rs

Comment on lines +77 to +79

+                      self.migrate(app_config.network.network.clone(), app_config.state.clone())
+                          .await
+                          .map_err(|e| eyre!(e))

Contributor

arya2 May 1, 2025

Suggested change

      
                    self.migrate(app_config.network.network.clone(), app_config.state.clone())
          
                        .await
          
                        .map_err(|e| eyre!(e))
          
                    self.migrate(
          
                        app_config.network.network.clone(),
          
                        app_config.state.clone(),
          
                        app_config.consensus.clone(),
          
                    )
          
                    .await
          
                    .map_err(|e| eyre!(e))

zebrad/src/commands/migrate_from_zcashd.rs

+                  async fn migrate(
+                      &self,
+                      network: Network,
+                      target_config: zebra_state::Config,

Contributor

arya2 May 1, 2025

Suggested change

      
                    target_config: zebra_state::Config,
          
                    target_state_config: zebra_state::Config,
          
                    target_consensus_config: zebra_consensus::Config,

zebrad/src/commands/migrate_from_zcashd.rs

+                      network: Network,
+                      target_config: zebra_state::Config,
+                  ) -> Result<(), BoxError> {
+                      info!(?target_config, "initializing target state service");

Contributor

arya2 May 1, 2025

Suggested change

      
                    info!(?target_config, "initializing target state service");
          
                    info!(
          
                        ?target_state_config,
          
                        ?target_consensus_config,
          
                        "initializing target state service"
          
                    );

zebrad/src/commands/migrate_from_zcashd.rs

Comment on lines +97 to +102

+                      let (
+                          mut target_state,
+                          _target_read_only_state_service,
+                          _target_latest_chain_tip,
+                          _target_chain_tip_change,
+                      ) = zebra_state::spawn_init(target_config.clone(), &network, Height::MAX, 0).await?;

Contributor

arya2 May 1, 2025 •

edited

Loading

I'm skeptical of there being any noticeable benefit from using PrepareForBulkLoad() until we switch to shardtrees, but it would be nice to have.

Note: The transaction verifier works with a dummy mempool setup channel receiver.

Suggested change

      
                    let (
          
                        mut target_state,
          
                        _target_read_only_state_service,
          
                        _target_latest_chain_tip,
          
                        _target_chain_tip_change,
          
                    ) = zebra_state::spawn_init(target_config.clone(), &network, Height::MAX, 0).await?;
          
                    let (
          
                        target_state_service,
          
                        _target_read_state,
          
                        target_latest_chain_tip,
          
                        _target_chain_tip_change,
          
                    ) = zebra_state::spawn_init(target_state_config.clone(), &network, Height::MAX, 0).await?;
          
                    let target_state = ServiceBuilder::new()
          
                        .buffer(STATE_BUFFER_BOUND)
          
                        .service(target_state_service);
          
                    let (
          
                        mut block_verifier_router,
          
                        _tx_verifier,
          
                        _consensus_task_handles,
          
                        _max_checkpoint_height,
          
                    ) = zebra_consensus::router::init(
          
                        target_consensus_config,
          
                        &network,
          
                        target_state,
          
                        oneshot::channel::<
          
                            Buffer<BoxService<mempool::Request, mempool::Response, BoxError>, mempool::Request>,
          
                        >()
          
                        .1,
          
                    )
          
                    .await;

zebrad/src/commands/migrate_from_zcashd.rs

Comment on lines +150 to +230

+                          let target_block_commit_hash = target_state
+                              .ready()
+                              .await?
+                              .call(if height == Height::MIN {
+                                  // We can always trust the genesis block from a `zcashd` datadir to be
+                                  // the only block with height 0 due to how `zcashd` sideloads it into
+                                  // new datadirs.
+                                  zebra_state::Request::CommitCheckpointVerifiedBlock(source_block.clone().into())
+                              } else {
+                                  // We can't use `CommitCheckpointVerifiedBlock` here because `zcashd`
+                                  // block files contain the blocks as-received from the network, and
+                                  // can include orphaned blocks that aren't within the checkpoint.
+                                  // TODO: The only consensus logic we need Zebra to do for historic
+                                  // blocks is to find the most-work chain; every other consensus rule
+                                  // can be presumed-valid for blocks that end up in the main chain
+                                  // (and certainly for blocks that end up in the checkpoint).
+                                  zebra_state::Request::CommitSemanticallyVerifiedBlock(
+                                      source_block.clone().into(),
+                                  )
+                              })
+                              .await?;
+                          let target_block_commit_hash = match target_block_commit_hash {
+                              zebra_state::Response::Committed(target_block_commit_hash) => {
+                                  trace!(?target_block_commit_hash, "wrote Zebra block");
+                                  target_block_commit_hash
+                              }
+                              response => Err(format!(
+                                  "unexpected response to CommitSemanticallyVerifiedBlock request, height: {}\n \
+                                   response: {response:?}",
+                                  height.0,
+                              ))?,
+                          };
+                          // Read written block from target
+                          let target_block = target_state
+                              .ready()
+                              .await?
+                              .call(zebra_state::Request::Block(height.into()))
+                              .await?;
+                          let target_block = match target_block {
+                              zebra_state::Response::Block(Some(target_block)) => {
+                                  trace!(?height, %target_block, "read Zebra block");
+                                  target_block
+                              }
+                              zebra_state::Response::Block(None) => Err(format!(
+                                  "unexpected missing Zebra block, height: {}",
+                                  height.0,
+                              ))?,
+                              response => Err(format!(
+                                  "unexpected response to Block request, height: {},\n \
+                                   response: {response:?}",
+                                  height.0,
+                              ))?,
+                          };
+                          let target_block_data_hash = target_block.hash();
+                          // Check for data errors
+                          //
+                          // These checks make sure that Zebra doesn't corrupt the block data
+                          // when serializing it.
+                          // Zebra currently serializes `Block` structs into bytes while writing,
+                          // then deserializes bytes into new `Block` structs when reading.
+                          // So these checks are sufficient to detect block data corruption.
+                          //
+                          // If Zebra starts reusing cached `Block` structs after writing them,
+                          // we'll also need to check `Block` structs created from the actual database bytes.
+                          if source_block_hash != target_block_commit_hash
+                              || source_block_hash != target_block_data_hash
+                              || source_block != target_block
+                          {
+                              Err(format!(
+                                  "unexpected mismatch between zcashd and Zebra blocks,\n \
+                                   max copy height: {max_copy_height:?},\n \
+                                   zcashd hash: {source_block_hash:?},\n \
+                                   Zebra commit hash: {target_block_commit_hash:?},\n \
+                                   Zebra data hash: {target_block_data_hash:?},\n \
+                                   zcashd block: {source_block:?},\n \
+                                   Zebra block: {target_block:?}",
+                              ))?;
+                          }

Contributor

arya2 May 1, 2025

The first two checks here:

            if source_block_hash != target_block_commit_hash
                || source_block_hash != target_block_data_hash
                || source_block != target_block

are now happening here in the block verifier and here in the checkpoint verifier.

Is the third check necessary?

Suggested change

      
                        let target_block_commit_hash = target_state
          
                            .ready()
          
                            .await?
          
                            .call(if height == Height::MIN {
          
                                // We can always trust the genesis block from a `zcashd` datadir to be
          
                                // the only block with height 0 due to how `zcashd` sideloads it into
          
                                // new datadirs.
          
                                zebra_state::Request::CommitCheckpointVerifiedBlock(source_block.clone().into())
          
                            } else {
          
                                // We can't use `CommitCheckpointVerifiedBlock` here because `zcashd`
          
                                // block files contain the blocks as-received from the network, and
          
                                // can include orphaned blocks that aren't within the checkpoint.
          
                                // TODO: The only consensus logic we need Zebra to do for historic
          
                                // blocks is to find the most-work chain; every other consensus rule
          
                                // can be presumed-valid for blocks that end up in the main chain
          
                                // (and certainly for blocks that end up in the checkpoint).
          
                                zebra_state::Request::CommitSemanticallyVerifiedBlock(
          
                                    source_block.clone().into(),
          
                                )
          
                            })
          
                            .await?;
          
                        let target_block_commit_hash = match target_block_commit_hash {
          
                            zebra_state::Response::Committed(target_block_commit_hash) => {
          
                                trace!(?target_block_commit_hash, "wrote Zebra block");
          
                                target_block_commit_hash
          
                            }
          
                            response => Err(format!(
          
                                "unexpected response to CommitSemanticallyVerifiedBlock request, height: {}\n \
          
                                 response: {response:?}",
          
                                height.0,
          
                            ))?,
          
                        };
          
                        // Read written block from target
          
                        let target_block = target_state
          
                            .ready()
          
                            .await?
          
                            .call(zebra_state::Request::Block(height.into()))
          
                            .await?;
          
                        let target_block = match target_block {
          
                            zebra_state::Response::Block(Some(target_block)) => {
          
                                trace!(?height, %target_block, "read Zebra block");
          
                                target_block
          
                            }
          
                            zebra_state::Response::Block(None) => Err(format!(
          
                                "unexpected missing Zebra block, height: {}",
          
                                height.0,
          
                            ))?,
          
                            response => Err(format!(
          
                                "unexpected response to Block request, height: {},\n \
          
                                 response: {response:?}",
          
                                height.0,
          
                            ))?,
          
                        };
          
                        let target_block_data_hash = target_block.hash();
          
                        // Check for data errors
          
                        //
          
                        // These checks make sure that Zebra doesn't corrupt the block data
          
                        // when serializing it.
          
                        // Zebra currently serializes `Block` structs into bytes while writing,
          
                        // then deserializes bytes into new `Block` structs when reading.
          
                        // So these checks are sufficient to detect block data corruption.
          
                        //
          
                        // If Zebra starts reusing cached `Block` structs after writing them,
          
                        // we'll also need to check `Block` structs created from the actual database bytes.
          
                        if source_block_hash != target_block_commit_hash
          
                            || source_block_hash != target_block_data_hash
          
                            || source_block != target_block
          
                        {
          
                            Err(format!(
          
                                "unexpected mismatch between zcashd and Zebra blocks,\n \
          
                                 max copy height: {max_copy_height:?},\n \
          
                                 zcashd hash: {source_block_hash:?},\n \
          
                                 Zebra commit hash: {target_block_commit_hash:?},\n \
          
                                 Zebra data hash: {target_block_data_hash:?},\n \
          
                                 zcashd block: {source_block:?},\n \
          
                                 Zebra block: {target_block:?}",
          
                            ))?;
          
                        }
          
                        block_verifier_router
          
                            .ready()
          
                            .await?
          
                            .call(zebra_consensus::Request::Commit(source_block))
          
                            .await?;

arya2 reviewed

View reviewed changes

zebrad/src/commands/migrate_from_zcashd.rs

Comment on lines +114 to +121

+                      let initial_target_tip = match initial_target_tip {
+                          zebra_state::Response::Tip(target_tip) => target_tip,
+                          response => Err(format!("unexpected response to Tip request: {response:?}",))?,
+                      };
+                      let min_target_height = initial_target_tip
+                          .map(|target_tip| target_tip.0 .0 + 1)
+                          .unwrap_or(0);

Contributor

arya2 May 2, 2025

Suggested change

      
                    let initial_target_tip = match initial_target_tip {
          
                        zebra_state::Response::Tip(target_tip) => target_tip,
          
                        response => Err(format!("unexpected response to Tip request: {response:?}",))?,
          
                    };
          
                    let min_target_height = initial_target_tip
          
                        .map(|target_tip| target_tip.0 .0 + 1)
          
                        .unwrap_or(0);
          
                    let initial_target_tip = target_latest_chain_tip.best_tip_height();
          
                    let min_target_height = initial_target_tip
          
                        .map(|Height(target_tip)| target_tip + 1)
          
                        .unwrap_or(0);

Contributor

oxarbitrage commented May 2, 2025

Note: During the Arborist call, there was a suggestion to move this code to the zebra-utils crate as a new binary instead of a new zebrad command.

Contributor

arya2 commented May 2, 2025 •

edited

Loading

Note: During the Arborist call, there was a suggestion to move this code to the zebra-utils crate as a new binary instead of a new zebrad command.

It would be nice to avoid adding a dependency to zebrad, especially one like levelsdb, but if that can be avoided, having a command in zebrad seems like a better UX while needing fewer modifications.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet