Benchmark Protocol

Old Benchmark Architecture was Intractable

The benchmarking strategy from version 1 was vanilla flavored brute force: (8 WorkGroups)* (12 ThreadTiles)* (4 NumLoadsCoalescedAs)* (4 NumLoadsCoalescedBs)* (3 LoopUnrolls)* (5 BranchTypes)* ...*(1024 ProblemSizes)=23,592,960 is a multiplicative series which grows very quickly. Adding one more boolean parameter doubles the number of kernel enqueues of the benchmark.

Incremental Benchmark is Faster

Tensile version 2 allows the user to manually interrupt the multiplicative series with "additions" instead of "multiplies", i.e., (8 WorkGroups)* (12 ThreadTiles)+ (4 NumLoadsCoalescedAs)* (4 NumLoadsCoalescedBs)* (3 LoopUnrolls)+ (5 BranchTypes)* ...+(1024 ProblemSizes)=1,151 is a dramatically smaller number of enqueues. Now, adding one more boolean parameter may only add on 2 more enqueues.

Phases of Benchmark

To make the Tensile's programability more manageable for the user and developer, the benchmarking protocol has been split up into several steps encoded in a config.yaml file. The below sections reference the following config.yaml:

BenchmarkProblems:
  - ProblemType:
      OperationType: GEMM
    
    InitialSolutionParameters:
      - WorkGroupShape: [ 0 ]
      - NumLoadsCoalescedA: [ 1 ]
      - NumLoadsCoalescedB: [ 1 ]
      - WorkGroupEdge: [ 16 ]
      - ThreadTileEdge: [ 4 ]

    BenchmarkCommonParameters:
      - ProblemSizes: [ [512], [512], [512] ]
      - WorkGroupShape: [ -1, 0, 1 ]
        ThreadTileShape: [ -1, 0, 1 ]
    ForkParameters:
      - WorkGroupEdge: [8, 16]
      - ThreadTileEdge: [2, 4, 8 ]
    BenchmarkForkParameters:
      - ProblemSizes: [ [2880], [2880], [2880] ]
      - NumLoadsCoalescedA: [ 1, 2, 3, 4, 6 ]
      - NumLoadsCoalescedB: [ 1, 2, 3, 4, 6 ]
    JoinParameters:
      - MacroTile
    BenchmarkJoinParameters:
      - LoopUnroll: [8, 16]
    BenchmarkFinalParameters:
      - ProblemSizes: [ [16, 128], [16, 128], [256] ]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Benchmark Protocol

Old Benchmark Architecture was Intractable

Incremental Benchmark is Faster

Phases of Benchmark

Benchmark Common Parameters

Fork Parameters

Benchmark Fork Parameters

Join Parameters

Benchmark Join Parameters

Benchmark Final Parameters

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally