Skip to content

NikhilRout/TheGEMMCoreProject

Repository files navigation

TheTensorCoreProject

SystemVerilog implementation of Nvidia's SIMT CUDA, Hybrid-Precision Tensor Core, and Google's Systolic Array TPU MXU GEMM Operations. These modules are by no means really emulating the actual microarchitecture executing CUDA/Tensor Core instructions, instead they're simply performing the same operation for direct usage in FPGA designs.

Tensor Core Versions

TensorCore v0: Volta Architecture [FP16MUL FP32ADD]

Volta Tensor Core Architecture Diagram
Volta Tensor Core Architecture Diagram

TensorCore v1: Ampere Architecture [TF32MUL FP32ADD / BF16MUL FP32ADD] + Fine-Grained Structured Sparsity

Ampere Tensor Core Architecture Diagram
Ampere Tensor Core Architecture Diagram

TensorCore v2: Hopper Architecture [FP8(E5M2/E4M3)MUL FP16ADD]

Hopper Tensor Core Architecture Diagram

About

SystemVerilog Implementation of Nvidia's CUDA/Tensor Core GEMM Operations

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published