EquiForm: Noise-Robust SE(3)-Equivariant Policy Learning from 3D Point Clouds

Teaser

EquiForm enables robots to act consistently under both SE(3) transformations and noisy point cloud observations. This robustness leads to reliable manipulation behavior across diverse objects, layouts, and real-world sensing conditions.

Motivation

Equivariance alone is not sufficient in practice. As shown above, non-equivariant policies fail to align actions under transformations, while fragile equivariant methods become unstable when exposed to realistic point cloud noise and partial observations. EquiForm addresses this gap by explicitly modeling how noise disrupts equivariance, enabling robust and consistent behavior across diverse poses, layouts, and sensing conditions.

Abstract

We introduce EquiForm, a framework for noise-robust SE(3)-equivariant policy learning from point clouds. By combining geometric denoising and equivariant contrastive learning, EquiForm explicitly addresses how realistic perception noise disrupts equivariance and stabilizes canonical representations. This enables consistent action generation under SE(3) transformations and improves robustness and generalization in imitation learning. EquiForm is a modular and flexible framework that integrates seamlessly with existing policy architectures.

Geometric Denoising Module

Qualitative visualization of geometric denoising under increasing Gaussian noise. Point cloud observations are corrupted with isotropic Gaussian noise of increasing standard deviation (left to right). We compare the noisy input, farthest point sampling (FPS), and the proposed geometric denoising. Geometric denoising preserves surface structure and spatial consistency under severe noise, whereas FPS alone fails to recover coherent geometry.

Equivariant Contrastive Alignment

Noise levels. We evaluate robustness under progressively increasing observation noise:

Level 1: Random in-plane rotation only.
Level 2: Random rotation combined with pointwise Gaussian jitter (μ = 0, σ = 0.1) and random cropping, dropout, and insertion, where each operation affects 10% of the points.
Level 3: Same as Level 2, with increased noise magnitude (σ = 0.2) and 20% point perturbations.

Robustness of equivariant representations under increasing observation noise. Equivariant feature embeddings are visualized across training epochs and noise levels, comparing models trained with and without contrastive learning.

Representative Policy Behaviors under SE(3) Layout Variations

Under SE(3) layout variations, many policies either fail to align their actions or become unstable due to sensing noise. As shown above, EquiForm consistently follows the intended manipulation strategy, producing reliable real-world behavior across diverse scene configurations.

Limitations and Failure Cases

Finally, we analyze representative failure cases to understand the remaining limitations of EquiForm. Failures mainly occur when small objects are under-represented after point cloud downsampling, or when thin deformable objects introduce geometric ambiguity under geometry-only perception. These cases highlight inherent challenges of point cloud-based manipulation and suggest future directions such as adaptive sampling and complementary sensing modalities.

BibTeX


      @misc{zhang2026equiformnoiserobustse3equivariantpolicy,
        title={EquiForm: Noise-Robust SE(3)-Equivariant Policy Learning from 3D Point Clouds}, 
        author={Zhiyuan Zhang and Yu She},
        year={2026},
        eprint={2601.17486},
        archivePrefix={arXiv},
        primaryClass={cs.RO},
        url={https://arxiv.org/abs/2601.17486}, 
      }