Real-time LLM Inference on Standard GPUs: 3k tokens/s per request
(blog.kog.ai)
125 points
by NicoConstant
6 hours ago |
64 comments
()
()
()
()
()
()
()
()
()
()
()
()
()
()
()
()
()
()
()
()
()
()
()
()
()