Lossless LLM compression for efficient GPU inference via dynamic-length float
(arxiv.org)
347 points
by CharlesW
16 hours ago |
107 comments
()
()
()
()
()
()
()
()
()
()
()
()
()
()
()
()
()
()
()
()
()
()