To determine GPU requirements for serving LLMs (like Llama3 70B), we need to calculate the required GPU memory using this formula:
M = ((P × 4B) × (32/Q)) × 1.2
Where:
M
= GPU memory required (in GB)P
= Number of model parameters4B
= Bytes per parameter (4)32
= Bits in 4 bytesQ
= Quantization bits (16, 8, or 4)1.2
= 20% overhead factor for additional GPU memory
For a 70B parameter model at 8-bit quantization:
M = ((70B × 4) × (32/8)) × 1.2 = 134.4 GB
-
pnpm install
- Install dependencies -
pnpm run dev
- Start development server -
pnpm run lint
- Lint source files