AIML Special Presentation: From Instructional Diagrams to Real-World Assembly
- Date: Thu, 5 Dec 2024, 10:30 am - 11:15 am
- Location: AIML
- Jiahao Zhang PhD student in the Research School of Computing, Australian National University
Abstract: From instructional diagrams to real-world assembly, understanding and bridging the gap between diagrammatic instructions and practical actions is a significant challenge. To address this, we introduce the Ikea Assembly in the Wild (IAW) Dataset, which comprises 183 hours of diverse furniture assembly videos and nearly 8,300 corresponding illustrations from assembly manuals, annotated with their ground truth alignments. Leveraging this dataset, we tackle three key tasks: First, we align segments of in-the-wild assembly videos with corresponding instructional diagrams using a supervised contrastive learning approach that captures subtle diagrammatic details. Second, we predict the precise start and end times of the steps outlined in the manuals within the video sequences, enabling accurate temporal grounding. Finally, we demonstrate a furniture assembly pipeline where furniture parts are selected and their 6D poses are predicted based on cues from the instructional diagrams. Our methods significantly advance multimodal alignment and learning for practical assembly, achieving state-of-the-art performance on tasks such as retrieval, temporal grounding, and 3D part assembly.
Jiahao Zhang is a third-year Ph.D. student in the Research School of Computing, The Australian National University. On the one hand, he is a passionate starter in academic research and interested in many deep learning topics, particularly computer vision and video understanding. On the other hand, he is an active full-stack web developer. He is currently doing a research project supervised by Professor Stephen Gould, Dr. Anoop Cherian, Dr. Yizhak Ben-Shabat, and Dr. Cristian Rodriguez. Before that, in 2021, he received his bachelor’s degree in Advanced Computing (Honours) and Computer Science and Technology from the Australian National University and Shandong University.