What are the latest developments in AI?
By Professor Ian Reid, Head of the School of Computer Science at the University of Adelaide, Professor at the Australian Institute for Machine Learning.
This article is an extract from Artificial intelligence: your questions answered, a report published in partnership with the Australian Strategic Policy Institute (ASPI).
AI offers us unprecedented opportunities to crack old nuts with new tools—that is, to apply new technologies to solve some of the fundamental problems affecting society. In 2022, we’re seeing rapid development in approaches that offer better ways to train AI through machine learning.
Foundation models are a new kind of AI trained with staggering amounts of data. They aren’t stand-alone software products in themselves; rather, they form a backbone capability upon which various software tools can be built (hence the adjective ‘foundation’). One example is Google’s BERT (2018), which is a model for natural language processing and understanding trained on 3.3 billion words of written English.
The original key breakthrough in the recent surge in deep learning came from training deep network architectures from the 1980s and 1990s using what we then considered to be huge datasets. But what we understand to be ‘huge’ is growing quickly. ImageNet (2009)1 is a database of 14 million labelled photographs that’s used to test and train visual object-recognition software. AlexNet (2012) is a neural network architecture that has 61 million parameters. Just eight years later, another large foundational model—the text processing model GPT-3 (2020)—is around 3,000 times that size: it has 175 billion parameters and was trained on 45 terabytes of data.
The power of foundation models lies in their flexibility to adapt to new tasks by ‘finetuning’ a network with a much smaller amount of data from a new domain. This is a primitive form of transfer learning—using the ability to perform one task well to transfer that ability to a new task. For example, Dall-E (2021) is an AI that was built on GPT-3’s ability to ‘understand’ text and combines that with the ability to generate images. It can create a picture from just an arbitrary written description of a scene. Ask it for an image of an armchair in the shape of an avocado, and that’s exactly what it will give you.
The possibility of AI drawing from vast data means it has already surpassed human capability in some domains.
One example of that is computer vision. For many years, AI has been able to perform numerical or geometrical tasks that humans find frustratingly time-consuming, or simply impossible. Three-dimensional reconstruction techniques in computer vision have shown how it’s possible for an AI to use a single camera moving through a scene to create a detailed and accurate map of the scene and work out exactly where it is geometrically; effectively, it’s GPS from images. Humans are great at the qualitative side of this (we can easily describe our surroundings, and most of us are pretty good at finding our way back to our hotel in an unfamiliar city) but we can’t generate precise, GPS-like coordinates just from our visual understanding of our environment—a bit like how we can understand some deep mathematical concepts, but a calculator will always beat us at arithmetic.
But AI is catching up to humans in the realm of semantic and qualitative understanding of the world. AI can look at one of your digital photographs and infer the geometric shape and depth of your living room. This also means that the detailed geometric maps that we build using computer vision technology can now be created with a level of understanding of what’s actually in the scene, not just the 3D coordinates of the space. AI enables robots to have superhuman abilities in geometry and localisation, while now also developing some of the semantic and spatial reasoning of which humans are so capable.
This kind of spatial AI will enable autonomous systems to operate safely and effectively with humans. In the future, people will work cooperatively with machines, not be replaced by them. Advances in robotic vision research are leading to AI systems that see their surroundings by understanding images and video data in real time. Machines that can learn through spending time in real or virtual environments could help us fight fires, achieve defence goals, and inspect and maintain infrastructure in remote and hazardous environments.
But AI is of course not limited to just exploring the meaning of language and images and our relationship to the 3D world; it’s also playing a vital role in responding to the pressing public-health issues of our time.
One example is an AI called AlphaFold (2020), which combines expertise in structural biology, physics and machine learning to predict the 3D structure of proteins based on genetic sequences. This tool is expected to revolutionise the life sciences by creating improved understanding of basic biology and revealing targeted therapies to treat disease. Researchers have already used it to make predictions about several proteins associated with SARS-CoV-2 (the virus that causes Covid-19), and scientists are right now using various AI tools to accelerate the development of new treatments and vaccines.2
Arguably the most significant development in AI is the speed of AI development itself. Where we once measured software production cycles in years, we’re now seeing a new generation of AI capability every few months, and that tempo is showing little sign of abating.
These kinds of models can support human endeavours in language, vision, robotics and reasoning in fields including industry, resources, law, health care, the environment and education. The value of our emerging AI capability isn’t in the technology itself, but in where it’s applied and what it can do for the world. However, even though the capability is improving at a rapid pace, learned AI models can fail unexpectedly and harbour biases, so more research is needed to ensure that applications are ethical, explainable and accepted by society at large.
(1) Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, Li Fei-Fei, ‘ImageNet: A large-scale hierarchical image database’, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2009, online.
(2) H Lv, L Shi, J Berkenpas, F Dao, H Zulfiqar, H Ding, Y Zhang, L Yang, R Cao, ‘Application of artificial intelligence and machine learning for COVID-19 drug discovery and vaccine design’, Briefings in Bioinformatics, 2021, 22(6).
This article is an extract from Artificial intelligence: your questions answered, a report published in partnership with the Australian Strategic Policy Institute (ASPI).