AIML Research Seminar: Category Level 6D Object Pose Estimation and Tracking using Diffusion
In our daily lives, we constantly estimate the 6D pose of objects—whether it's picking up a coffee mug, catching a football on a Saturday afternoon, or avoiding a toy car rolling toward us. Even if we have never seen these particular objects before, our extensive knowledge of the world and the stereo vision of our eyes enable us to perform these tasks effortlessly. However, for a robot or computer with only a single image, this task is extremely challenging.
Objects like bowls with symmetrical shapes, the occlusion of a mug's handle, or the varying scales of cans introduce ambiguities that must be addressed to successfully estimate the 6D object pose from a single RGB image.
In this presentation, Adam outlined his approach using a diffusion-based method to model the probability distribution of possible poses for an object, addressing these pose ambiguities. He discussed the potential of this method in predicting a final pose estimate from multiple pose hypotheses generated by the diffusion model while eliminating the need for costly trained likelihood estimators to remove outliers. He also demonstrated the tracking capabilities of the method, achieved without any modifications to the model.