AIML Research Seminar: Improving Efficiency of Foundation Models

Large deep learning models or foundation models such as chatGPT or GPT-4 have been the key factor in driving the recent new wave of AI breakthrough, resulting in huge social and economic impacts. However, even GPT-3 (the predecessor of ChatGPT) was trained on half a trillion words and equipped with 175 billion parameters, which required huge computing resource and energy consumption.

While the scale of large AI models keeps increasing, the training and inference efficiency as well as the deployment efficiency become more pressing in order to make large models energy-friendly, accessible and deployable on diverse edge devices and diverse deployment scenarios. In this talk, Prof Cai introduced works that have been done in his group along this line, particularly on sparse fine-tuning of foundation models and our recently developed elastic deep learning model - stitchable neural networks, and its extensions.

Tagged in Artificial Intelligence, Large language models