AIML Special Presentation: Google Research Visit

Generative LLMs are transforming multiple industries and have proven to be robust for multitude of use cases across industries and settings. One of the key impediments to their widespread deployment is the cost of serving and its deployability across multiple devices/settings. In this talk, Grace and Prateek discussed the key challenges in improving efficiency of LLM serving and provided an overview of some of the key techniques to address the problem. They also discussed tandem transformers and HIRE, novel methods to speed up decoding in LLMs. 

Grace Chung and Prateek Jain of Google

Grace Chung (left) and Prateek Jain of Google during their AIML presentation, May 2024

Tagged in Artificial Intelligence, Large language models, Google