Skip to content

Benchmark Protocol

David Tanner edited this page Feb 22, 2017 · 12 revisions

Old Benchmark Architecture was Intractable

The benchmarking strategy from version 1 was vanilla flavored brute force: (8 WorkGroups)* (12 ThreadTiles)* (4 NumLoadsCoalescedAs)* (4 NumLoadsCoalescedBs)* (3 LoopUnrolls)* (5 BranchTypes)* ...*(1024 ProblemSizes)=23,592,960 is a multiplicative series which grows very quickly. Adding one more boolean parameter doubles the number of kernel enqueues of the benchmark.

Incremental Benchmark is Faster

Tensile version 2 allows the user to manually interrupt the multiplicative series with "additions" instead of "multiplies", i.e., (8 WorkGroups)* (12 ThreadTiles)+ (4 NumLoadsCoalescedAs)* (4 NumLoadsCoalescedBs)* (3 LoopUnrolls)+ (5 BranchTypes)* ...+(1024 ProblemSizes)=1,151 is a dramatically smaller number of enqueues. Now, adding one more boolean parameter may only add on 2 more enqueues.

Phases of Benchmark

To make the Tensile's programability more manageable for the user and developer, the benchmarking protocol has been split up into several steps encoded in a config.yaml file. The below sections reference the following config.yaml:

BenchmarkProblems:
  - ProblemType:
      OperationType: GEMM
    
    InitialSolutionParameters:
      - WorkGroupShape: [ 0 ]
      - NumLoadsCoalescedA: [ 1 ]
      - NumLoadsCoalescedB: [ 1 ]
      - WorkGroupEdge: [ 16 ]
      - ThreadTileEdge: [ 4 ]

    BenchmarkCommonParameters:
      - ProblemSizes: [ [512], [512], [512] ]
      - WorkGroupShape: [ -1, 0, 1 ]
        ThreadTileShape: [ -1, 0, 1 ]
    ForkParameters:
      - WorkGroupEdge: [8, 16]
      - ThreadTileEdge: [2, 4, 8 ]
    BenchmarkForkParameters:
      - ProblemSizes: [ [2880], [2880], [2880] ]
      - NumLoadsCoalescedA: [ 1, 2, 3, 4, 6 ]
      - NumLoadsCoalescedB: [ 1, 2, 3, 4, 6 ]
    JoinParameters:
      - MacroTile
    BenchmarkJoinParameters:
      - LoopUnroll: [8, 16]
    BenchmarkFinalParameters:
      - ProblemSizes: [ [16, 128], [16, 128], [256] ]

Benchmark Common Parameters

Fork Parameters

Benchmark Fork Parameters

Join Parameters

Benchmark Join Parameters

Benchmark Final Parameters

Clone this wiki locally