Learning Locally Editable Virtual Humans


We introduce a novel hybrid representation and end-to-end trainable network architecture to model fully editable and customizable neural avatars. At the core of our work lies a representation that combines the modeling power of neural fields with the ease of use and inherent 3D consistency of skinned meshes. To this end, we construct a trainable feature codebook to store local geometry and texture features on the vertices of a deformable body model, thus exploiting its consistent topology under articulation. This representation is then employed in a generative auto-decoder architecture that admits fitting to unseen scans and sampling of realistic avatars with varied appearances and geometries. Furthermore, our representation allows local editing by swapping local features between 3D assets. To verify our method for avatar creation and editing, we contribute a new high-quality dataset, dubbed CustomHumans, for training and evaluation. Our experiments quantitatively and qualitatively show that our method generates diverse detailed avatars and achieves better model fitting performance compared to state-of-the-art methods.



Controllable Hybrid Human Representation

While neural avatars can be highly realistic, the question of how to edit such avatars remains open. To enable 3D avatars with high-fidelity representational power and local editing capabilities, we propose a novel hybrid representation that combines the advantages of neural fields (flexibility and modeling power) with LBS-articulated mesh models (ease of deformation and full explicit control). We construct a trainable feature codebook which stores local texture and high-fidelity geometry respectively for each vertex. Training and inference require queries of these features. A query point xg is projected onto the nearest triangle. The global coordinates are then transformed into triangle coordinates xl. The triangle’s vertex indices are used to retrieve local texture and geometry features.

Avatar Customization by Feature Inversion

Our method allows for creating and personalizing avatars with diverse body shapes, appearances, and local details. We leverage the above representation to train a multi-subject model which enables the transfer of local features across subjects. We note that since the mesh topology of the LBS model is identical, this enables us to learn a shared feature space from multiple posed scans. Given a trained model, our method can inverse any unseen 3D scans into feature codebooks. This allows us to locally change either the clothing geometry or appearance of neural avatars given the corresponding feature indices. It is worth noting that resulting avatars enable detailed pose control via the SMPL-X parameters without affecting the fitted texture and geometry.


CustomHumans Dataset
Application Link
Number of Scans
Number of Subjects
Number of Garments
Number of Faces per Mesh


Avatar Customzation

Model Fitting Comparison



    title={Learning Locally Editable Virtual Humans},
    author={Hsuan-I Ho, Lixin Xue, Jie Song, and Otmar Hilliges},
    booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},


We express our gratitude to Stefan Walter and Dean Bakker for infrastructure support, Juan Zarate for managing the capture stage, and Deniz Yildiz and Laura Wülfroth for data capture assistance. Thanks to Andrew Searle for supporting the capturing system, and anonymous thanks to all dataset participants.