Semantic vision

In general, semantics concerns the extraction of meaning from data. Semantic vision seeks to understand not only what objects are present in an image but, perhaps even more importantly, the relationship between those objects.

In semantic vision, an image is typically segmented into regions of interest. Each segment is allocated a classification so that all pixels in an image are assigned a class (e.g. car, road, footpath, person, tree, sky).

Contextual relationships provide important cues for understanding a scene. Spatial context can be formulated in terms of the relationship between an object and its neighbouring objects. For example, a car is likely to appear above a road, but is unlikely to be surrounded by sky. Contextual relationships can also be inferred from the relationship between items in an image and the camera. For instance, if a tree is occluded by a car, then the car is closer to the camera than the tree. The ability to attribute relationships between objects demonstrates reasoning, an important step towards true “cognition”.

Imagery is routinely collected by satellite, aerial surveillance (from both aircraft and drones) and ground-level cameras. By reviewing images of the same area that are collected at over time, it is possible to identify and track changes. Identifying significant changes in a set of images is not a trivial problem. Seasonal variations and local weather conditions (such as fog, rain and cloud shadowing) can be falsely interpreted as change. Semantic vision can transform visual images into descriptions of the world; providing a more robust foundation for change tracking.

Applications for semantic vision include autonomous driving, medical imaging analysis, industrial inspection, classification of terrain from satellite imagery and data management.

Using the AIML-developed RefineNet to undertake semantic segmentation of scenes from the CityScapes dataset (video prepared by Anton Milan).

  • Competitions and achievements

    AIML has won numerous global competitions in semantic vision and has made major contributions to the development of the methodology.

    2nd place Robust Reading Challenge on Reading Chinese Text on Signboard (ICDAR 2019)
    3rd place AI edge contest (image segmentation) (2019)
    1st Place

    Nuclei images segmentation challenge (MICCAI 2018)

    1st place  Retinal fundus glaucoma segmentation challenge (MICCAI 2018)
    4th place JD Fashion Item Search Competition (2108)
    1st place Cityscapes semantic image segmentation challenge (2016)
    1st place ImageNet challenge for the task of Scene Parsing (2016)
    1st place Pascal VOC semantic image segmentation challenge (2016)
    1st place “Focused Scene Text” for the task of End-to-End Scene Text Recognition (ICDAR 2015, Challenge 2)
  • Featured papers

  • Projects

    Deep Reinforcement Learning for the Active Extraction and Visualisation of Optimal Biomarkers in Medical Images

    This project aims to develop novel methods for discovering and visualising optimal biomarkers from chest computed tomography images based on extensions of recently developed deep reinforcement learning techniques. The extensions proposed in this project will advance medical image analysis by allowing an efficient analysis of large dimensionality inputs in their original high resolution. In addition, this project will be the first approach capable of discovering previously unknown biomarkers associated with important clinical outcomes. The project will validate the approach on a real-world case study data set concerning the prediction of five-year survival of chronic disease.

    A/Prof Gustavo Carneiro; Prof Andrew Bradley; Prof Lyle Palmer; Dr Jacinto Nascimento

    ARC Grant ID: DP180103232

     

    Learning the deep structure of Images

    This project seeks to develop technologies that will help computer vision interpret the whole visible scene, rather than just some of the objects therein. Existing automated methods for understanding images perform well at recognising specific objects in canonical poses, but the problem of whole image interpretation is far more challenging. Convolutional neural networks (CNN) have underpinned recent progress in object recognition, but whole-image understanding cannot be tackled similarly because the number of possible combinations of objects is too large. The project thus proposes a graph-based generalisation of the CNN approach which allows scene structure to be learned explicitly. This would represent an important step towards providing computers with robust vision, allowing them to interact with their environment.

    Professor Anton van den Hengel; Dr Anthony Dick; Dr Lingqiao Liu

    ARC Grant ID: DP160103710

     

    Lifelong Computer Vision Systems

    The aim of the project is to develop robust computer vision systems that can operate over a wide area and over long periods. This is a major challenge because the geometry and appearance of an environment can change over time, and long-term operation requires robustness to this change. The outcome will be a system that can capture an understanding of a wide area in real time, through building a geometric map endowed with semantic descriptions, and which uses machine learning to continuously improve performance. The significance will lie in turning an inexpensive camera into a high-level sensor of the world, ushering in cognitive robotics and autonomous systems.

    Ian Reid

    ARC Grant ID: FL1300100102

     

    Semantic change detection through large scale learning

    Identifying whether there has been a significant change in a scene from a set of images is an important practical task, and has received much attention. The problem has been, however, that although existing statistical techniques perform reasonably well, it has been impossible to achieve the high levels of accuracy demanded by most real applications. This is due to the fact that changes in pixel intensity are not a particularly good indicator of significant change in a scene. We propose a semantic change detection approach which aims to classify the content of an image before attempting to identify change. This technology builds upon recent developments in large-scale classification which have dramatically improved both accuracy and speed.

    Professor Anton van den Hengel; Professor Chunhua Shen; Dr Anders Eriksson; Dr Qinfeng Shi; BAE Systems

    ARC Grant ID: LP130100156

     

    Continuously learning to see

    The ultimate goal of computer vision is to make a machine able to understand the world through analysis of images or videos. The new machine learning techniques developed in this project will enable previously impossible methods of computer vision and help strengthen Australian competitiveness in this important area.

    ARC Grant ID: FT120100969

     

    Image search for simulator content creation

    3D content creation represents one of the most labour intensive stages of the process of constructing virtual environments such as simulators and games. In many cases it is possible to capture images or video of the environment to be simulated which may be used to assist the modelling process. This project aims to develop technologies based on search by which such imagery may provide both shape and semantic information to assist in the modelling process. The project builds upon recent developments in bag-of-words methods for image search. Particularly, we propose a novel method by which information latent in the image database may be identified and used to improve generative model underpinning this type of image search.

    Anton van den Hengel, Anthony Dick, Sydac

    ARC Grant ID: LP100100791