Skip to content

Failed to migrate metric engine if metadata region uses a stale manifest #6273

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
WenyXu opened this issue Jun 9, 2025 · 0 comments
Open
Assignees
Labels
C-bug Category Bugs

Comments

@WenyXu
Copy link
Member

WenyXu commented Jun 9, 2025

What type of bug is this?

Unexpected error

What subsystems are affected?

Datanode

Minimal reproduce step

  1. Deploy a GreptimeDB cluster.
  2. Create some metric tables and insert data into them.
  3. Trigger a migration.

This issue might not reproduce in every attempt.

What did you expect to see?

success

What did you see instead?

Failed to migrate the metric region

https://github.com/GreptimeTeam/greptimedb/actions/runs/15525576493/job/43705080814

What operating system did you use?

doesn't matter

What version of GreptimeDB did you use?

main

Relevant log output and stack trace

2025-06-09T03:08:16.932987214Z stdout F 2025-06-09T03:08:16.932769Z  INFO meta_srv::procedure::region_migration::manager: Starting region migration procedure ae2fd4a2-a94f-4bfb-be00-a50fc1266571 for region: 4458176053248(1038, 0), from_peer: peer-0(my-greptimedb-datanode-0.my-greptimedb-datanode.my-greptimedb:4001), to_peer: peer-1(my-greptimedb-datanode-1.my-greptimedb-datanode.my-greptimedb:4001)
2025-06-09T03:08:16.933586415Z stdout F 2025-06-09T03:08:16.933351Z  INFO LocalManager::submit_root_procedure: common_procedure::local::runner: Runner metasrv-procedure::RegionMigration-ae2fd4a2-a94f-4bfb-be00-a50fc1266571 starts
2025-06-09T03:08:17.934518498Z stdout F 2025-06-09T03:08:17.934353Z  INFO LocalManager::submit_root_procedure: meta_srv::procedure::region_migration::open_candidate_region: Received open region reply: OpenRegion(SimpleReply { result: false, error: Some("0: Failed to handle request for region 4458176053248(1038, 0), at src/datanode/src/region_server.rs:944:14\n1: Failed to collect record batch stream, at src/metric-engine/src/metadata_region.rs:279:70\n2: External error, at src/mito2/src/read/seq_scan.rs:250:26\n3: OpenDAL operator failed, at src/mito2/src/sst/parquet/metadata.rs:91:14\n4: NotFound (persistent) at  => NotFound (permanent) at read, context: { uri: http://minio.minio.svc.cluster.local/default/test-root/data/greptime/public/1038/1038_0000000000/metadata/c0f07f21-bd8e-40e8-824c-fc1b8474f0a7.parquet, response: Parts { status: 404, version: HTTP/1.1, headers: {\"accept-ranges\": \"bytes\", \"content-length\": \"531\", \"content-type\": \"application/xml\", \"server\": \"MinIO\", \"strict-transport-security\": \"max-age=31536000; includeSubDomains\", \"vary\": \"Origin\", \"vary\": \"Accept-Encoding\", \"x-amz-id-2\": \"dd9025bab4ad464b049177c95eb6ebf374d3b3fd1af9251148b658df7ac2e3e8\", \"x-amz-request-id\": \"1847410B819E3C06\", \"x-content-type-options\": \"nosniff\", \"x-ratelimit-limit\": \"4141\", \"x-ratelimit-remaining\": \"4141\", \"x-xss-protection\": \"1; mode=block\", \"x-amz-delete-marker\": \"true\", \"x-amz-version-id\": \"null\", \"date\": \"Mon, 09 Jun 2025 03:08:17 GMT\"} }, service: s3, path: data/greptime/public/1038/1038_0000000000/metadata/c0f07f21-bd8e-40e8-824c-fc1b8474f0a7.parquet, range: 0-3653 } => S3Error { code: \"NoSuchKey\", message: \"The specified key does not exist.\", resource: \"/default/test-root/data/greptime/public/1038/1038_0000000000/metadata/c0f07f21-bd8e-40e8-824c-fc1b8474f0a7.parquet\", request_id: \"1847410B819E3C06\" }\n") }), region: 4458176053248(1038, 0), elapsed: 994.691477ms
2025-06-09T03:08:17.934822267Z stdout F 2025-06-09T03:08:17.934658Z ERROR LocalManager::submit_root_procedure: common_procedure::local::runner: Failed to execute procedure metasrv-procedure::RegionMigration-ae2fd4a2-a94f-4bfb-be00-a50fc1266571, retry: true, clean_poisons: false err=0: Procedure exec failed
2025-06-09T03:08:17.93482893Z stdout F 1: Expected to retry later, reason: Region 4458176053248(1038, 0) is not opened by datanode Peer { id: 1, addr: "my-greptimedb-datanode-1.my-greptimedb-datanode.my-greptimedb:4001" }, error: Some("0: Failed to handle request for region 4458176053248(1038, 0), at src/datanode/src/region_server.rs:944:14\n1: Failed to collect record batch stream, at src/metric-engine/src/metadata_region.rs:279:70\n2: External error, at src/mito2/src/read/seq_scan.rs:250:26\n3: OpenDAL operator failed, at src/mito2/src/sst/parquet/metadata.rs:91:14\n4: NotFound (persistent) at  => NotFound (permanent) at read, context: { uri: http://minio.minio.svc.cluster.local/default/test-root/data/greptime/public/1038/1038_0000000000/metadata/c0f07f21-bd8e-40e8-824c-fc1b8474f0a7.parquet, response: Parts { status: 404, version: HTTP/1.1, headers: {\"accept-ranges\": \"bytes\", \"content-length\": \"531\", \"content-type\": \"application/xml\", \"server\": \"MinIO\", \"strict-transport-security\": \"max-age=31536000; includeSubDomains\", \"vary\": \"Origin\", \"vary\": \"Accept-Encoding\", \"x-amz-id-2\": \"dd9025bab4ad464b049177c95eb6ebf374d3b3fd1af9251148b658df7ac2e3e8\", \"x-amz-request-id\": \"1847410B819E3C06\", \"x-content-type-options\": \"nosniff\", \"x-ratelimit-limit\": \"4141\", \"x-ratelimit-remaining\": \"4141\", \"x-xss-protection\": \"1; mode=block\", \"x-amz-delete-marker\": \"true\", \"x-amz-version-id\": \"null\", \"date\": \"Mon, 09 Jun 2025 03:08:17 GMT\"} }, service: s3, path: data/greptime/public/1038/1038_0000000000/metadata/c0f07f21-bd8e-40e8-824c-fc1b8474f0a7.parquet, range: 0-3653 } => S3Error { code: \"NoSuchKey\", message: \"The specified key does not exist.\", resource: \"/default/test-root/data/greptime/public/1038/1038_0000000000/metadata/c0f07f21-bd8e-40e8-824c-fc1b8474f0a7.parquet\", request_id: \"1847410B819E3C06\" }\n"), elapsed: 994.798086ms, at src/meta-srv/src/procedure/region_migration/open_candidate_region.rs:176:22
2025-06-09T03:08:17.935076047Z stdout F 2025-06-09T03:08:17.934862Z  INFO LocalManager::submit_root_procedure: common_procedure::local::runner: Procedure metasrv-procedure::RegionMigration-ae2fd4a2-a94f-4bfb-be00-a50fc1266571 retry for the 1 times after 574 millis
2025-06-09T03:08:18.517190824Z stdout F 2025-06-09T03:08:18.517011Z  INFO LocalManager::submit_root_procedure: meta_srv::procedure::region_migration::open_candidate_region: Received open region reply: OpenRegion(SimpleReply { result: false, error: Some("0: Failed to handle request for region 4458176053248(1038, 0), at src/datanode/src/region_server.rs:944:14\n1: Failed to collect record batch stream, at src/metric-engine/src/metadata_region.rs:279:70\n2: External error, at src/mito2/src/read/seq_scan.rs:250:26\n3: OpenDAL operator failed, at src/mito2/src/sst/parquet/metadata.rs:91:14\n4: NotFound (persistent) at  => NotFound (permanent) at read, context: { uri: http://minio.minio.svc.cluster.local/default/test-root/data/greptime/public/1038/1038_0000000000/metadata/c0f07f21-bd8e-40e8-824c-fc1b8474f0a7.parquet, response: Parts { status: 404, version: HTTP/1.1, headers: {\"accept-ranges\": \"bytes\", \"content-length\": \"531\", \"content-type\": \"application/xml\", \"server\": \"MinIO\", \"strict-transport-security\": \"max-age=31536000; includeSubDomains\", \"vary\": \"Origin\", \"vary\": \"Accept-Encoding\", \"x-amz-id-2\": \"dd9025bab4ad464b049177c95eb6ebf374d3b3fd1af9251148b658df7ac2e3e8\", \"x-amz-request-id\": \"1847410BA456FDA5\", \"x-content-type-options\": \"nosniff\", \"x-ratelimit-limit\": \"4141\", \"x-ratelimit-remaining\": \"4141\", \"x-xss-protection\": \"1; mode=block\", \"x-amz-delete-marker\": \"true\", \"x-amz-version-id\": \"null\", \"date\": \"Mon, 09 Jun 2025 03:08:18 GMT\"} }, service: s3, path: data/greptime/public/1038/1038_0000000000/metadata/c0f07f21-bd8e-40e8-824c-fc1b8474f0a7.parquet, range: 0-3653 } => S3Error { code: \"NoSuchKey\", message: \"The specified key does not exist.\", resource: \"/default/test-root/data/greptime/public/1038/1038_0000000000/metadata/c0f07f21-bd8e-40e8-824c-fc1b8474f0a7.parquet\", request_id: \"1847410BA456FDA5\" }\n") }), region: 4458176053248(1038, 0), elapsed: 7.149487ms
@WenyXu WenyXu added the C-bug Category Bugs label Jun 9, 2025
@WenyXu WenyXu self-assigned this Jun 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Category Bugs
Projects
None yet
Development

No branches or pull requests

1 participant