KeyGen: Unsupervised Keypoint based Object-Centric Representations for Category-Level Generalization

Generalization in robotic manipulation refers to the ability to successfully perform a task across a wide variety of novel object instances beyond those seen during training. However, conventional behavior cloning (BC) methods struggle to generalize across instance-level differences such as object geometry, size, and appearance. We address this challenge with a new general-purpose 3D keypoint based object-centric representation to achieve semantic generalization to intra-category variations. Our method, KeyGen, incorporates a standalone keypoint detector that canonicalizes object orientation and extracts stable, semantically consistent keypoints from point cloud data. Our visuomotor policy combines the extracted keypoints with object-centric point clouds to construct a robust scene representation. Further more, we create a novel synthetic dataset featuring three manipulation tasks, each with a diverse set of object instances, enabling the assessment of category-level generalization. Experimental results demonstrate that our approach improves sample efficiency and generalization, achieving higher success rates on both seen and unseen object instances compared to existing methods.

KeyGen: Unsupervised Keypoint based Object-Centric Representations for Category-Level Generalization

Abstract

Task Videos