📝 Note: By way of exception, I include one and only one image dataset, due to its size: 700K scenes and the incredible improvement in depth estimation results of the fine-tuned Depth Anything V2 ViT-B model on MegaSynth and evaluated on Hypersim. See the results in Table 6.
Dataset | Venue | Resolution | |
---|---|---|---|
1 | MegaSynth | 512×512 |
Dataset | Venue | Resolution | B o T |
C 3 R |
D 2 U |
D P |
G C |
M o G |
P O M |
R D |
U D 2 |
V D A |
|
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | Spring | 1920×1080 | - | T | T | E | T | T | - | - | - | - | |
2 | HorizonGS | 1920×1080 | - | - | - | - | - | - | - | - | - | - | |
3 | MVS-Synth | 1920×1080 | - | T | - | T | T | T | - | - | - | - | |
4 | Mid-Air | 1024×1024 | - | - | - | - | T | T | - | - | - | - | |
5 | MatrixCity | 1000×1000 | - | - | - | - | T | T | - | - | T | - | |
6 | SAIL-VOS 3D | 1280×800 | - | - | - | T | - | - | - | - | - | - | |
7 | BEDLAM | 1280×720 | - | T | - | T | - | - | - | - | T | - | |
8 | Dynamic Replica | 1280×720 | - | T | - | T | T | - | T | - | T | - | |
9 | BlinkVision | 960×540 | - | - | T | - | - | - | - | - | - | - | |
10 | PointOdyssey | 960×540 | - | T | T | - | - | - | T | E | T | T | |
11 | DyDToF | 960×540 | - | - | - | - | - | - | - | E | - | - | |
12 | IRS | (to do) | 960×540 | - | T | - | T | T | T | - | - | - | T |
13 | Scene Flow | 960×540 | - | - | - | - | E | - | - | - | - | - | |
14 | THUD++ | 730×530 | - | - | - | - | - | - | - | - | - | - | |
15 | 3D Ken Burns | 512×512 | - | T | - | T | T | T | - | - | - | - | |
16 | TartanAir | (to do) | 640×480 | - | T | T | T | T | T | T | T | T | T |
17 | ParallelDomain-4D | 640×480 | - | - | - | - | - | - | T | - | - | - | |
18 | GTA-SfM | (to do) | 640×480 | - | - | - | - | T | T | - | - | - | - |
19 | InteriorNet | 640×480 | - | - | - | - | - | - | - | - | - | - | |
20 | MPI Sintel | 1024×436 | E | E | E | E | E | E | E | - | E | E | |
21 | Virtual KITTI 2 | 1242×375 | - | T | - | T | T | - | - | - | - | T | |
22 | TartanAir Shibuya | 640×360 | E | - | - | - | - | - | - | - | - | - | |
Total: T (training) | |||||||||||||
Total: E (testing) |
- Bonn RGB-D Dynamic (5 video clips with 110 frames each): AbsRel<=0.079
- NYU-Depth V2: AbsRel<=0.0424 (relative depth)
- NYU-Depth V2: AbsRel<=0.051 (metric depth)
- Appendix 1: Rules for qualifying models for the rankings (to do)
- Appendix 2: Metrics selection for the rankings (to do)
- Appendix 3: List of all research papers from the above rankings
RK | Model Links: Venue Repository |
LPIPS ↓ {Input fr.} Table 1 M2SVid |
---|---|---|
1 | M2SVid |
0.180 {MF} |
2 | SVG |
0.217 {MF} |
3 | StereoCrafter |
0.242 {MF} |
📝 Note: 1) See Figure 4 2) The ranking order is determined in the first instance by a direct comparison of the scores of two models in the same paper. If there is no such direct comparison in any paper or there is a disagreement in different papers, the ranking order is determined by the best score of the compared two models in all papers that are shown in the columns as data sources. The DepthCrafter rank is based on the latest version 1.0.1.
📝 Note: The ranking order is determined in the first instance by a direct comparison of the scores of two models in the same paper. If there is no such direct comparison in any paper or there is a disagreement in different papers, the ranking order is determined by the best score of the compared two models in all papers that are shown in the columns as data sources. The Metric3D v2 ViT-Large rank is not based on a score of 0.134, which is probably just an anomaly.