Real-time LLM Inference on Standard GPUs: 3k tokens/s per request

Real-time LLM Inference on Standard GPUs: 3k tokens/s per request (blog.kog.ai)

107 points

byNicoConstant

5 hours ago |