Canonical Policy: Learning Canonical 3D Representation for Equivariant Policy

*Equal Contribution 1Purdue University

Teaser

Canonical Policy enables vision-conditioned policies to generalize across object appearances and viewpoints by learning canonical 3D representations, with improved sample efficiency.

Abstract

First Page Image

We introduce canonical policy, a principled framework for 3D equivariant imitation learning that unifies point cloud observations through a canonical representation. Built upon a rigorous theory of 3D canonical mappings, our method enables end-to-end learning of spatially equivariant policies from demonstrations. By leveraging geometric consistency through canonicalization and the expressiveness of generative policy models, such as diffusion models, canonical policy improves generalization and data efficiency in imitation learning.

Simulation Benchmark

We benchmarked Canonical Policy and several point cloud baselines across 12 simulation tasks. Canonical policy consistently outperforms all baselines, achieving an average task success improvement of 18%.

Stack D11

Mug Cleanup D11

Nut Assembly D01

Stack Three D11

Threading D22

Square D21

Coffee D21

Hammer Cleanup D11

Push T2

Cloth Folding3

Object Covering3

Box Closing3

Real Robot Experiments

Effectiveness of Canonical Policy

Block Stacking

In the block stacking task, the robot stacks an I-shaped block onto a T-shaped block. In the top video, under the original setup, the canonical policy performs reliably. In the bottom video, even with changes in block color, the policy continues to execute the task accurately—showcasing robustness to appearance shifts.

Shoe Alignment

Next is the shoe alignment task, where the goal is to pick up the right shoe and align it side by side with the left shoe. In the top video, the policy succeeds under the original setup. In the bottom video, it maintains high performance even when the shoe color is completely changed—demonstrating strong appearance invariance.

Can Insertion

The can insertion task is more precision-demanding. The robot must pick up a can on the right and insert it into a hole on the left. In the top video, the canonical policy often succeeds under the original setup. In the bottom video, it maintains comparable performance even under color-shifted conditions—highlighting its robustness in grasping and precise placement.

Table Organzation

Table organization is the most complex task, requiring a sequence of precise operations: object placement and drawer manipulation. Canonical policy handles this long-horizon task effectively in both the original and color-variant setups, highlighting its capacity for robust, multi-step decision-making.

Generalization to Unseen Object Appearances

Beyond color variations, we also test shape generalization in the shoe alignment task. By introducing unseen objects—leather shoes and hiking shoes—with both appearance and geometric shifts, we test the policy’s ability to generalize. Despite these challenges, canonical policy achieves the highest alignment accuracy, demonstrating strong generalization to both shape and color shifts.

Robustness to Viewpoint Variations

We further evaluate the policy's SE(3) equivariance using a mobile UR5 platform with camera viewpoint shifts. As the camera gradually rotates from the original angle on the left to a significantly shifted one on the right, canonical policy maintains stable and accurate predictions, confirming its robustness to egocentric view changes.

Left-Angled View

Frontal View

Right-Angled View