Skip to content

Commit be51376

Browse files
committed
Results from GH action on NVIDIA_RTX4090x2
1 parent e2bfe98 commit be51376

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

56 files changed

+20712
-20713
lines changed

open/MLCommons/measurements/RTX4090x2-nvidia-gpu-TensorRT-default_config/retinanet/multistream/README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ pip install -U mlcflow
1717

1818
mlc rm cache -f
1919

20-
mlc pull repo mlcommons@mlperf-automations --checkout=03d9201c1c9305c7c3eaa0262984af76c7f2287f
20+
mlc pull repo mlcommons@mlperf-automations --checkout=6a917925e946fcf6a1511578ba101067d4a88532
2121

2222

2323
```
@@ -40,4 +40,4 @@ Model Precision: int8
4040
### Accuracy Results
4141

4242
### Performance Results
43-
`Samples per query`: `5646056.0`
43+
`Samples per query`: `5645863.0`
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
1-
[2025-02-02 02:07:19,844 main.py:229 INFO] Detected system ID: KnownSystem.dd805e2fec5f
2-
[2025-02-02 02:07:19,926 harness.py:249 INFO] The harness will load 2 plugins: ['build/plugins/NMSOptPlugin/libnmsoptplugin.so', 'build/plugins/retinanetConcatPlugin/libretinanetconcatplugin.so']
3-
[2025-02-02 02:07:19,927 generate_conf_files.py:107 INFO] Generated measurements/ entries for dd805e2fec5f_TRT/retinanet/MultiStream
4-
[2025-02-02 02:07:19,927 __init__.py:46 INFO] Running command: ./build/bin/harness_default --plugins="build/plugins/NMSOptPlugin/libnmsoptplugin.so,build/plugins/retinanetConcatPlugin/libretinanetconcatplugin.so" --logfile_outdir="/mlc-mount/home/arjun/gh_action_results/valid_results/RTX4090x2-nvidia_original-gpu-tensorrt-vdefault-default_config/retinanet/multistream/accuracy" --logfile_prefix="mlperf_log_" --performance_sample_count=64 --test_mode="AccuracyOnly" --gpu_copy_streams=1 --gpu_inference_streams=1 --use_deque_limit=true --gpu_batch_size=2 --map_path="data_maps/open-images-v6-mlperf/val_map.txt" --mlperf_conf_path="/home/mlcuser/MLC/repos/local/cache/get-git-repo_14157262/inference/mlperf.conf" --tensor_path="build/preprocessed_data/open-images-v6-mlperf/validation/Retinanet/int8_linear" --use_graphs=true --user_conf_path="/home/mlcuser/MLC/repos/mlcommons@mlperf-automations/script/generate-mlperf-inference-user-conf/tmp/2daada4f809841509c848f618114c672.conf" --gpu_engines="./build/engines/dd805e2fec5f/retinanet/MultiStream/retinanet-MultiStream-gpu-b2-int8.lwis_k_99_MaxP.plan" --max_dlas=0 --scenario MultiStream --model retinanet --response_postprocess openimageeffnms
5-
[2025-02-02 02:07:19,927 __init__.py:53 INFO] Overriding Environment
1+
[2025-02-03 01:47:25,289 main.py:229 INFO] Detected system ID: KnownSystem.Nvidia_9babf6fca247
2+
[2025-02-03 01:47:25,368 harness.py:249 INFO] The harness will load 2 plugins: ['build/plugins/NMSOptPlugin/libnmsoptplugin.so', 'build/plugins/retinanetConcatPlugin/libretinanetconcatplugin.so']
3+
[2025-02-03 01:47:25,368 generate_conf_files.py:107 INFO] Generated measurements/ entries for Nvidia_9babf6fca247_TRT/retinanet/MultiStream
4+
[2025-02-03 01:47:25,368 __init__.py:46 INFO] Running command: ./build/bin/harness_default --plugins="build/plugins/NMSOptPlugin/libnmsoptplugin.so,build/plugins/retinanetConcatPlugin/libretinanetconcatplugin.so" --logfile_outdir="/mlc-mount/home/arjun/gh_action_results/valid_results/RTX4090x2-nvidia_original-gpu-tensorrt-vdefault-default_config/retinanet/multistream/accuracy" --logfile_prefix="mlperf_log_" --performance_sample_count=64 --test_mode="AccuracyOnly" --gpu_copy_streams=1 --gpu_inference_streams=1 --use_deque_limit=true --gpu_batch_size=2 --map_path="data_maps/open-images-v6-mlperf/val_map.txt" --mlperf_conf_path="/home/mlcuser/MLC/repos/local/cache/get-git-repo_14157262/inference/mlperf.conf" --tensor_path="build/preprocessed_data/open-images-v6-mlperf/validation/Retinanet/int8_linear" --use_graphs=true --user_conf_path="/home/mlcuser/MLC/repos/mlcommons@mlperf-automations/script/generate-mlperf-inference-user-conf/tmp/9a9fd2e8ec244924a49c0072a84549ac.conf" --gpu_engines="./build/engines/Nvidia_9babf6fca247/retinanet/MultiStream/retinanet-MultiStream-gpu-b2-int8.lwis_k_99_MaxP.plan" --max_dlas=0 --scenario MultiStream --model retinanet --response_postprocess openimageeffnms
5+
[2025-02-03 01:47:25,368 __init__.py:53 INFO] Overriding Environment
66
benchmark : Benchmark.Retinanet
77
buffer_manager_thread_count : 0
88
data_dir : /home/mlcuser/MLC/repos/local/cache/get-mlperf-inference-nvidia-scratch-space_5aab030f/data
@@ -12,7 +12,7 @@ gpu_copy_streams : 1
1212
gpu_inference_streams : 1
1313
input_dtype : int8
1414
input_format : linear
15-
log_dir : /home/mlcuser/MLC/repos/local/cache/get-git-repo_0ab377fc/repo/closed/NVIDIA/build/logs/2025.02.02-02.07.18
15+
log_dir : /home/mlcuser/MLC/repos/local/cache/get-git-repo_0ab377fc/repo/closed/NVIDIA/build/logs/2025.02.03-01.47.23
1616
map_path : data_maps/open-images-v6-mlperf/val_map.txt
1717
mlperf_conf_path : /home/mlcuser/MLC/repos/local/cache/get-git-repo_14157262/inference/mlperf.conf
1818
multi_stream_expected_latency_ns : 0
@@ -21,14 +21,14 @@ multi_stream_target_latency_percentile : 99
2121
precision : int8
2222
preprocessed_data_dir : /home/mlcuser/MLC/repos/local/cache/get-mlperf-inference-nvidia-scratch-space_5aab030f/preprocessed_data
2323
scenario : Scenario.MultiStream
24-
system : SystemConfiguration(host_cpu_conf=CPUConfiguration(layout={CPU(name='Intel(R) Xeon(R) w7-2495X', architecture=<CPUArchitecture.x86_64: AliasedName(name='x86_64', aliases=(), patterns=())>, core_count=24, threads_per_core=2): 1}), host_mem_conf=MemoryConfiguration(host_memory_capacity=Memory(quantity=197.33452799999998, byte_suffix=<ByteSuffix.GB: (1000, 3)>, _num_bytes=197334528000), comparison_tolerance=0.05), accelerator_conf=AcceleratorConfiguration(layout=defaultdict(<class 'int'>, {GPU(name='NVIDIA GeForce RTX 4090', accelerator_type=<AcceleratorType.Discrete: AliasedName(name='Discrete', aliases=(), patterns=())>, vram=Memory(quantity=23.98828125, byte_suffix=<ByteSuffix.GiB: (1024, 3)>, _num_bytes=25757220864), max_power_limit=450.0, pci_id='0x268410DE', compute_sm=89): 1, GPU(name='NVIDIA GeForce RTX 4090', accelerator_type=<AcceleratorType.Discrete: AliasedName(name='Discrete', aliases=(), patterns=())>, vram=Memory(quantity=23.98828125, byte_suffix=<ByteSuffix.GiB: (1024, 3)>, _num_bytes=25757220864), max_power_limit=500.0, pci_id='0x268410DE', compute_sm=89): 1})), numa_conf=NUMAConfiguration(numa_nodes={}, num_numa_nodes=1), system_id='dd805e2fec5f')
24+
system : SystemConfiguration(host_cpu_conf=CPUConfiguration(layout={CPU(name='Intel(R) Xeon(R) w7-2495X', architecture=<CPUArchitecture.x86_64: AliasedName(name='x86_64', aliases=(), patterns=())>, core_count=24, threads_per_core=2): 1}), host_mem_conf=MemoryConfiguration(host_memory_capacity=Memory(quantity=197.33452799999998, byte_suffix=<ByteSuffix.GB: (1000, 3)>, _num_bytes=197334528000), comparison_tolerance=0.05), accelerator_conf=AcceleratorConfiguration(layout=defaultdict(<class 'int'>, {GPU(name='NVIDIA GeForce RTX 4090', accelerator_type=<AcceleratorType.Discrete: AliasedName(name='Discrete', aliases=(), patterns=())>, vram=Memory(quantity=23.98828125, byte_suffix=<ByteSuffix.GiB: (1024, 3)>, _num_bytes=25757220864), max_power_limit=450.0, pci_id='0x268410DE', compute_sm=89): 1, GPU(name='NVIDIA GeForce RTX 4090', accelerator_type=<AcceleratorType.Discrete: AliasedName(name='Discrete', aliases=(), patterns=())>, vram=Memory(quantity=23.98828125, byte_suffix=<ByteSuffix.GiB: (1024, 3)>, _num_bytes=25757220864), max_power_limit=500.0, pci_id='0x268410DE', compute_sm=89): 1})), numa_conf=NUMAConfiguration(numa_nodes={}, num_numa_nodes=1), system_id='Nvidia_9babf6fca247')
2525
tensor_path : build/preprocessed_data/open-images-v6-mlperf/validation/Retinanet/int8_linear
2626
test_mode : AccuracyOnly
2727
use_deque_limit : True
2828
use_graphs : True
29-
user_conf_path : /home/mlcuser/MLC/repos/mlcommons@mlperf-automations/script/generate-mlperf-inference-user-conf/tmp/2daada4f809841509c848f618114c672.conf
30-
system_id : dd805e2fec5f
31-
config_name : dd805e2fec5f_retinanet_MultiStream
29+
user_conf_path : /home/mlcuser/MLC/repos/mlcommons@mlperf-automations/script/generate-mlperf-inference-user-conf/tmp/9a9fd2e8ec244924a49c0072a84549ac.conf
30+
system_id : Nvidia_9babf6fca247
31+
config_name : Nvidia_9babf6fca247_retinanet_MultiStream
3232
workload_setting : WorkloadSetting(HarnessType.LWIS, AccuracyTarget.k_99, PowerSetting.MaxP)
3333
optimization_level : plugin-enabled
3434
num_profiles : 1
@@ -40,81 +40,81 @@ power_limit : None
4040
cpu_freq : None
4141
&&&& RUNNING Default_Harness # ./build/bin/harness_default
4242
[I] mlperf.conf path: /home/mlcuser/MLC/repos/local/cache/get-git-repo_14157262/inference/mlperf.conf
43-
[I] user.conf path: /home/mlcuser/MLC/repos/mlcommons@mlperf-automations/script/generate-mlperf-inference-user-conf/tmp/2daada4f809841509c848f618114c672.conf
43+
[I] user.conf path: /home/mlcuser/MLC/repos/mlcommons@mlperf-automations/script/generate-mlperf-inference-user-conf/tmp/9a9fd2e8ec244924a49c0072a84549ac.conf
4444
Creating QSL.
4545
Finished Creating QSL.
4646
Setting up SUT.
4747
[I] [TRT] Loaded engine size: 73 MiB
4848
[I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +6, GPU +10, now: CPU 125, GPU 881 (MiB)
4949
[I] [TRT] [MemUsageChange] Init cuDNN: CPU +2, GPU +10, now: CPU 127, GPU 891 (MiB)
5050
[I] [TRT] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +68, now: CPU 0, GPU 68 (MiB)
51-
[I] Device:0.GPU: [0] ./build/engines/dd805e2fec5f/retinanet/MultiStream/retinanet-MultiStream-gpu-b2-int8.lwis_k_99_MaxP.plan has been successfully loaded.
51+
[I] Device:0.GPU: [0] ./build/engines/Nvidia_9babf6fca247/retinanet/MultiStream/retinanet-MultiStream-gpu-b2-int8.lwis_k_99_MaxP.plan has been successfully loaded.
5252
[I] [TRT] Loaded engine size: 73 MiB
5353
[W] [TRT] Using an engine plan file across different models of devices is not recommended and is likely to affect performance or even cause errors.
54-
[I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +6, GPU +10, now: CPU 160, GPU 624 (MiB)
55-
[I] [TRT] [MemUsageChange] Init cuDNN: CPU +1, GPU +10, now: CPU 161, GPU 634 (MiB)
54+
[I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +6, GPU +10, now: CPU 160, GPU 626 (MiB)
55+
[I] [TRT] [MemUsageChange] Init cuDNN: CPU +1, GPU +10, now: CPU 161, GPU 636 (MiB)
5656
[I] [TRT] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +69, now: CPU 0, GPU 137 (MiB)
57-
[I] Device:1.GPU: [0] ./build/engines/dd805e2fec5f/retinanet/MultiStream/retinanet-MultiStream-gpu-b2-int8.lwis_k_99_MaxP.plan has been successfully loaded.
57+
[I] Device:1.GPU: [0] ./build/engines/Nvidia_9babf6fca247/retinanet/MultiStream/retinanet-MultiStream-gpu-b2-int8.lwis_k_99_MaxP.plan has been successfully loaded.
5858
[E] [TRT] 3: [runtime.cpp::~Runtime::401] Error Code 3: API Usage Error (Parameter check failed at: runtime/rt/runtime.cpp::~Runtime::401, condition: mEngineCounter.use_count() == 1 Destroying a runtime before destroying deserialized engines created by the runtime leads to undefined behavior.)
59-
[I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +1, GPU +8, now: CPU 89, GPU 893 (MiB)
59+
[I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 89, GPU 893 (MiB)
6060
[I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 89, GPU 901 (MiB)
6161
[I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +1, GPU +1528, now: CPU 1, GPU 1665 (MiB)
62-
[I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 89, GPU 636 (MiB)
63-
[I] [TRT] [MemUsageChange] Init cuDNN: CPU +1, GPU +8, now: CPU 90, GPU 644 (MiB)
64-
[I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +1527, now: CPU 1, GPU 3192 (MiB)
62+
[I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +1, GPU +8, now: CPU 90, GPU 638 (MiB)
63+
[I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 90, GPU 646 (MiB)
64+
[I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +1528, now: CPU 1, GPU 3193 (MiB)
6565
[I] Start creating CUDA graphs
6666
[I] Capture 2 CUDA graphs
6767
[I] Capture 2 CUDA graphs
6868
[I] Finish creating CUDA graphs
6969
[I] Creating batcher thread: 0 EnableBatcherThreadPerDevice: false
7070
Finished setting up SUT.
7171
Starting warmup. Running for a minimum of 5 seconds.
72-
Finished warmup. Ran for 5.14387s.
72+
Finished warmup. Ran for 5.14302s.
7373
Starting running actual test.
7474

7575
No warnings encountered during test.
7676

7777
No errors encountered during test.
7878
Finished running actual test.
7979
Device Device:0.GPU processed:
80-
6204 batches of size 2
80+
6196 batches of size 2
8181
Memcpy Calls: 0
8282
PerSampleCudaMemcpy Calls: 0
83-
BatchedCudaMemcpy Calls: 6204
83+
BatchedCudaMemcpy Calls: 6196
8484
Device Device:1.GPU processed:
85-
6188 batches of size 2
85+
6196 batches of size 2
8686
Memcpy Calls: 0
8787
PerSampleCudaMemcpy Calls: 0
88-
BatchedCudaMemcpy Calls: 6188
88+
BatchedCudaMemcpy Calls: 6196
8989
&&&& PASSED Default_Harness # ./build/bin/harness_default
90-
[2025-02-02 02:07:58,440 run_harness.py:166 INFO] Result: Accuracy run detected.
91-
[2025-02-02 02:07:58,440 __init__.py:46 INFO] Running command: python3 /home/mlcuser/MLC/repos/local/cache/get-git-repo_0ab377fc/repo/closed/NVIDIA/build/inference/vision/classification_and_detection/tools/accuracy-openimages.py --mlperf-accuracy-file /mlc-mount/home/arjun/gh_action_results/valid_results/RTX4090x2-nvidia_original-gpu-tensorrt-vdefault-default_config/retinanet/multistream/accuracy/mlperf_log_accuracy.json --openimages-dir /home/mlcuser/MLC/repos/local/cache/get-mlperf-inference-nvidia-scratch-space_5aab030f/preprocessed_data/open-images-v6-mlperf --output-file build/retinanet-results.json
90+
[2025-02-03 01:48:01,778 run_harness.py:166 INFO] Result: Accuracy run detected.
91+
[2025-02-03 01:48:01,778 __init__.py:46 INFO] Running command: python3 /home/mlcuser/MLC/repos/local/cache/get-git-repo_0ab377fc/repo/closed/NVIDIA/build/inference/vision/classification_and_detection/tools/accuracy-openimages.py --mlperf-accuracy-file /mlc-mount/home/arjun/gh_action_results/valid_results/RTX4090x2-nvidia_original-gpu-tensorrt-vdefault-default_config/retinanet/multistream/accuracy/mlperf_log_accuracy.json --openimages-dir /home/mlcuser/MLC/repos/local/cache/get-mlperf-inference-nvidia-scratch-space_5aab030f/preprocessed_data/open-images-v6-mlperf --output-file build/retinanet-results.json
9292
loading annotations into memory...
9393
Done (t=0.44s)
9494
creating index...
9595
index created!
9696
Loading and preparing results...
97-
DONE (t=17.83s)
97+
DONE (t=17.74s)
9898
creating index...
9999
index created!
100100
Running per image evaluation...
101101
Evaluate annotation type *bbox*
102-
DONE (t=132.09s).
102+
DONE (t=131.70s).
103103
Accumulating evaluation results...
104-
DONE (t=31.97s).
104+
DONE (t=32.28s).
105105
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.373
106-
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.522
106+
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.523
107107
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.403
108108
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.023
109109
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.125
110110
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.413
111111
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.419
112-
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.599
112+
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.598
113113
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.628
114-
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.083
114+
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.082
115115
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.344
116-
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.678
117-
mAP=37.330%
116+
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.677
117+
mAP=37.323%
118118

119119
======================== Result summaries: ========================
120120

0 commit comments

Comments
 (0)