Semantic vision
In general, semantics concerns the extraction of meaning from data. Semantic vision seeks to understand not only what objects are present in an image but, perhaps even more importantly, the relationship between those objects.
In semantic vision, an image is typically segmented into regions of interest. Each segment is allocated a classification so that all pixels in an image are assigned a class (e.g. car, road, footpath, person, tree, sky).
Contextual relationships provide important cues for understanding a scene. Spatial context can be formulated in terms of the relationship between an object and its neighbouring objects. For example, a car is likely to appear above a road, but is unlikely to be surrounded by sky. Contextual relationships can also be inferred from the relationship between items in an image and the camera. For instance, if a tree is occluded by a car, then the car is closer to the camera than the tree. The ability to attribute relationships between objects demonstrates reasoning, an important step towards true “cognition”.
Imagery is routinely collected by satellite, aerial surveillance (from both aircraft and drones) and ground-level cameras. By reviewing images of the same area that are collected at over time, it is possible to identify and track changes. Identifying significant changes in a set of images is not a trivial problem. Seasonal variations and local weather conditions (such as fog, rain and cloud shadowing) can be falsely interpreted as change. Semantic vision can transform visual images into descriptions of the world; providing a more robust foundation for change tracking.
Applications for semantic vision include autonomous driving, medical imaging analysis, industrial inspection, classification of terrain from satellite imagery and data management.
Using the AIML-developed RefineNet to undertake semantic segmentation of scenes from the CityScapes dataset (video prepared by Anton Milan).
-
Competitions and achievements
AIML has won numerous global competitions in semantic vision and has made major contributions to the development of the methodology.
2nd place Robust Reading Challenge on Reading Chinese Text on Signboard (ICDAR 2019) 3rd place AI edge contest (image segmentation) (2019) 1st Place Nuclei images segmentation challenge (MICCAI 2018)
1st place Retinal fundus glaucoma segmentation challenge (MICCAI 2018) 4th place JD Fashion Item Search Competition (2108) 1st place Cityscapes semantic image segmentation challenge (2016) 1st place ImageNet challenge for the task of Scene Parsing (2016) 1st place Pascal VOC semantic image segmentation challenge (2016) 1st place “Focused Scene Text” for the task of End-to-End Scene Text Recognition (ICDAR 2015, Challenge 2) -
Featured papers
2019 “Knowledge adaption for efficient semantic segmentation” Tong He, Chunhua Shen, Zhi Tian, Dong Gong, Changming Sun, Youliang Yan We present a novel knowledge distil framework tailored for semantic segmentation. Our method achieves superior results with significantly reduced computation overheads.
2019 “One shot segmentation: unifying rigid detection and non-rigid segmentation using elastic regularization” Jacinto C. Nascimento, Gustavo Carneiro We propose a novel deep learning method for object segmentation with low training complexity that needs small training sets. These two advantages speed up the training time when compared to recent state-of-the-art approaches. We show that it is possible to have a machine learning-based segmentation system that operates directly on the space combining rigid and non-rigid deformations.
2017 “RefineNet: Multi-Path Refinement Networks for High-Resolution Semantic Segmentation” Guosheng Lin, Anton Milan, Chunhua Shen, Ian Reid We present RefineNet, a novel multi-path refinement network for semantic segmentation and object parsing. The cascaded architecture is able to effectively combine high-level semantics and low-level features to produce high-resolution segmentation maps. Our design choices are inspired by the idea of identity mapping, which facilitates gradient propagation across long-range connections and thus enables effective end-to-end learning. Experiments show that our method sets a new mark for state of the art in semantic labelling.
2016 “Efficient piecewise training of deep structured models for semantic segmentation” Guosheng Lin, Chunhua Shen, Ian Reid, Anton van dan Hengel We propose a method which combines Convolutional Neural Networks (CNNs) and Conditional Random Fields (CRFs) to exploit complex contextual information for semantic image segmentation. We formulate CNN based pairwise potentials for modelling semantic relations between image regions. Our method shows best performance on several popular datasets including the PASCAL VOC 2012 dataset.
2016 “Wider or Deeper: Revisiting the ResNet Model for Visual Recognition” Zifeng Wu, Chunhua Shen, and Anton van den Hengel The trend towards increasingly deep neural networks has been driven by a general observation that increasing depth increases the performance of a network. We analyse the operation of deep residual network architectures and use our results to derive a new, shallower, architecture of residual networks which significantly outperforms much deeper models. Our architecture outperforms the previous very deep residual networks not only on the ImageNet classification dataset, but also when applied to semantic image segmentation.
-
Projects
Deep Reinforcement Learning for the Active Extraction and Visualisation of Optimal Biomarkers in Medical Images
This project aims to develop novel methods for discovering and visualising optimal biomarkers from chest computed tomography images based on extensions of recently developed deep reinforcement learning techniques. The extensions proposed in this project will advance medical image analysis by allowing an efficient analysis of large dimensionality inputs in their original high resolution. In addition, this project will be the first approach capable of discovering previously unknown biomarkers associated with important clinical outcomes. The project will validate the approach on a real-world case study data set concerning the prediction of five-year survival of chronic disease.
A/Prof Gustavo Carneiro; Prof Andrew Bradley; Prof Lyle Palmer; Dr Jacinto Nascimento
ARC Grant ID: DP180103232
Learning the deep structure of Images
This project seeks to develop technologies that will help computer vision interpret the whole visible scene, rather than just some of the objects therein. Existing automated methods for understanding images perform well at recognising specific objects in canonical poses, but the problem of whole image interpretation is far more challenging. Convolutional neural networks (CNN) have underpinned recent progress in object recognition, but whole-image understanding cannot be tackled similarly because the number of possible combinations of objects is too large. The project thus proposes a graph-based generalisation of the CNN approach which allows scene structure to be learned explicitly. This would represent an important step towards providing computers with robust vision, allowing them to interact with their environment.
Professor Anton van den Hengel; Dr Anthony Dick; Dr Lingqiao Liu
ARC Grant ID: DP160103710
Lifelong Computer Vision Systems
The aim of the project is to develop robust computer vision systems that can operate over a wide area and over long periods. This is a major challenge because the geometry and appearance of an environment can change over time, and long-term operation requires robustness to this change. The outcome will be a system that can capture an understanding of a wide area in real time, through building a geometric map endowed with semantic descriptions, and which uses machine learning to continuously improve performance. The significance will lie in turning an inexpensive camera into a high-level sensor of the world, ushering in cognitive robotics and autonomous systems.
Ian Reid
ARC Grant ID: FL1300100102
Semantic change detection through large scale learning
Identifying whether there has been a significant change in a scene from a set of images is an important practical task, and has received much attention. The problem has been, however, that although existing statistical techniques perform reasonably well, it has been impossible to achieve the high levels of accuracy demanded by most real applications. This is due to the fact that changes in pixel intensity are not a particularly good indicator of significant change in a scene. We propose a semantic change detection approach which aims to classify the content of an image before attempting to identify change. This technology builds upon recent developments in large-scale classification which have dramatically improved both accuracy and speed.
Professor Anton van den Hengel; Professor Chunhua Shen; Dr Anders Eriksson; Dr Qinfeng Shi; BAE Systems
ARC Grant ID: LP130100156
Continuously learning to see
The ultimate goal of computer vision is to make a machine able to understand the world through analysis of images or videos. The new machine learning techniques developed in this project will enable previously impossible methods of computer vision and help strengthen Australian competitiveness in this important area.
ARC Grant ID: FT120100969
Image search for simulator content creation
3D content creation represents one of the most labour intensive stages of the process of constructing virtual environments such as simulators and games. In many cases it is possible to capture images or video of the environment to be simulated which may be used to assist the modelling process. This project aims to develop technologies based on search by which such imagery may provide both shape and semantic information to assist in the modelling process. The project builds upon recent developments in bag-of-words methods for image search. Particularly, we propose a novel method by which information latent in the image database may be identified and used to improve generative model underpinning this type of image search.
Anton van den Hengel, Anthony Dick, Sydac
ARC Grant ID: LP100100791