Intelligent Systems


2021


Dynamic Surface Function Networks for Clothed Human Bodies
Dynamic Surface Function Networks for Clothed Human Bodies

Burov, A., Nießner, M., Thies, J.

In 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pages: 10734-10744, IEEE/CVF International Conference on Computer Vision (ICCV 2021), October 2021 (inproceedings)

Abstract
We present a novel method for temporal coherent reconstruction and tracking of clothed humans. Given a monocular RGB-D sequence, we learn a person-specific body model which is based on a dynamic surface function network. To this end, we explicitly model the surface of the person using a multi-layer perceptron (MLP) which is embedded into the canonical space of the SMPL body model. With classical forward rendering, the represented surface can be rasterized using the topology of a template mesh. For each surface point of the template mesh, the MLP is evaluated to predict the actual surface location. To handle pose-dependent deformations, the MLP is conditioned on the SMPL pose parameters. We show that this surface representation as well as the pose parameters can be learned in a self-supervised fashion using the principle of analysis-by-synthesis and differentiable rasterization. As a result, we are able to reconstruct a temporally coherent mesh sequence from the input data. The underlying surface representation can be used to synthesize new animations of the reconstructed person including pose-dependent deformations.

link (url) DOI [BibTex]

2021

link (url) DOI [BibTex]


Neural Parametric Models for 3D Deformable Shapes
Neural Parametric Models for 3D Deformable Shapes

Palafox, P., Bozic, A., Thies, J., Nießner, M., Dai, A.

In 2021 IEEE/CVF International Conference on Computer Vision (ICCV 2021) , pages: 12675-12685 , IEEE, IEEE/CVF International Conference on Computer Vision (ICCV 2021) , October 2021 (inproceedings)

Abstract
Parametric 3D models have enabled a wide variety of tasks in computer graphics and vision, such as modeling human bodies, faces, and hands. However, the construction of these parametric models is often tedious, as it requires heavy manual tweaking, and they struggle to represent additional complexity and details such as wrinkles or clothing. To this end, we propose Neural Parametric Models (NPMs), a novel, learned alternative to traditional, parametric 3D models, which does not require hand-crafted, object-specific constraints. In particular, we learn to disentangle 4D dynamics into latent-space representations of shape and pose, leveraging the flexibility of recent developments in learned implicit functions. Crucially, once learned, our neural parametric models of shape and pose enable optimization over the learned spaces to fit to new observations, similar to the fitting of a traditional parametric model, e.g., SMPL. This enables NPMs to achieve a significantly more accurate and detailed representation of observed deformable sequences. We show that NPMs improve notably over both parametric and non-parametric state of the art in reconstruction and tracking of monocular depth sequences of clothed humans and hands. Latent-space interpolation as well as shape/pose transfer experiments further demonstrate the usefulness of NPMs.

DOI [BibTex]

DOI [BibTex]


ID-Reveal: Identity-aware DeepFake Video Detection
ID-Reveal: Identity-aware DeepFake Video Detection

Cozzolino, D., Rössler, A., Thies, J., Nießner, M., Verdoliva, L.

In 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pages: 15088-15097 , IEEE/CVF International Conference on Computer Vision (ICCV 2021), October 2021 (inproceedings)

Abstract
State-of-the-art DeepFake forgery detectors are trained in a supervised fashion to answer the question ‘is this video real or fake?’. Given that their training is typically method-specific, these approaches show poor generalization across different types of facial manipulations, e.g., face swapping or facial reenactment. In this work, we look at the problem from a different perspective by focusing on the facial characteristics of a specific identity; i.e., we want to answer the question ‘Is this the person who is claimed to be?’. To this end, we introduce ID-Reveal, a new approach that learns temporal facial features, specific of how each person moves while talking, by means of metric learning coupled with an adversarial training strategy. ur method is independent of the specific type of manipulation since it is trained only on real videos. Moreover, relying on high-level semantic features, it is robust to widespread and disruptive forms of post-processing. We performed a thorough experimental analysis on several publicly available benchmarks, such as FaceForensics++, Google’s DFD, and Celeb-DF. Compared to state of the art, our method improves generalization and is more robust to low-quality videos, that are usually spread over social networks. In particular, we obtain an average improvement of more than 15% in terms of accuracy for facial reenactment on high compressed videos.

Paper Video Code link (url) DOI [BibTex]

Paper Video Code link (url) DOI [BibTex]


RetrievalFuse: Neural 3D Scene Reconstruction with a Database
RetrievalFuse: Neural 3D Scene Reconstruction with a Database

Siddiqui, Y., Thies, J., Ma, F., Shan, Q., Nießner, M., Dai, A.

In 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pages: 12548-12557 , IEEE/CVF International Conference on Computer Vision (ICCV 2021), October 2021 (inproceedings)

Abstract
3D reconstruction of large scenes is a challenging problem due to the high-complexity nature of the solution space, in particular for generative neural networks. In contrast to traditional generative learned models which encode the full generative process into a neural network and can struggle with maintaining local details at the scene level, we introduce a new method that directly leverages scene geometry from the training database. First, we learn to synthesize an initial estimate for a 3D scene, constructed by retrieving a top-k set of volumetric chunks from the scene database. These candidates are then refined to a final scene generation with an attention-based refinement that can effectively select the most consistent set of geometry from the candidates and combine them together to create an output scene, facilitating transfer of coherent structures and local detail from train scene geometry. We demonstrate our neural scene reconstruction with a database for the tasks of 3D super-resolution and surface reconstruction from sparse point clouds, showing that our approach enables generation of more coherent, accurate 3D scenes, improving on average by over 8% in IoU over state-of-the-art scene reconstruction.

DOI [BibTex]

DOI [BibTex]


SpoC: Spoofing Camera Fingerprints
SpoC: Spoofing Camera Fingerprints

Cozzolino, D., Thies, J., Rössler, A., Nießner, M., Verdoliva, L.

In Workshop on Media Forensics (CVPR 2021), 2021 (inproceedings)

[BibTex]

[BibTex]


Neural Deformation Graphs for Globally-consistent Non-rigid Reconstruction
Neural Deformation Graphs for Globally-consistent Non-rigid Reconstruction

Bozic, A., Palafox, P., Zollöfer, M., Thies, J., Dai, A., Nießner, M.

In IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), pages: 1450-1459 , IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2021), 2021 (inproceedings)

Abstract
We introduce Neural Deformation Graphs for globally-consistent deformation tracking and 3D reconstruction of non-rigid objects. Specifically, we implicitly model a deformation graph via a deep neural network. This neural deformation graph does not rely on any object-specific structure and, thus, can be applied to general non-rigid deformation tracking. Our method globally optimizes this neural graph on a given sequence of depth camera observations of a non-rigidly moving object. Based on explicit viewpoint consistency as well as inter-frame graph and surface consistency constraints, the underlying network is trained in a self-supervised fashion. We additionally optimize for the geometry of the object with an implicit deformable multi-MLP shape representation. Our approach does not assume sequential input data, thus enabling robust tracking of fast motions or even temporally disconnected recordings. Our experiments demonstrate that our Neural Deformation Graphs outperform state-of-the-art non-rigid reconstruction approaches both qualitatively and quantitatively, with 64% improved reconstruction and 62% improved deformation tracking performance.

Paper Video link (url) DOI [BibTex]

Paper Video link (url) DOI [BibTex]


Dynamic Neural Radiance Fields for Monocular 4D Facial Avatar Reconstruction
Dynamic Neural Radiance Fields for Monocular 4D Facial Avatar Reconstruction

Gafni, G., Thies, J., Zollöfer, M., Nießner, M.

In IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), CVPR 2021, 2021 (inproceedings)

Abstract
We present dynamic neural radiance fields for modeling the appearance and dynamics of a human face. To handle the dynamics of the face, we combine our scene representation network with a low-dimensional morphable model which provides explicit control over pose and expressions. We use volumetric rendering to generate images from this hybrid representation and demonstrate that such a dynamic neural scene representation can be learned from monocular input data only, without the need of a specialized capture setup.

Project Page Paper Video link (url) [BibTex]

Project Page Paper Video link (url) [BibTex]


TransformerFusion: Monocular RGB Scene Reconstruction using Transformers
TransformerFusion: Monocular RGB Scene Reconstruction using Transformers

Bozic, A., Palafox, P., Thies, J., Dai, A., Nießner, M.

Advances in Neural Information Processing Systems 34 (NeurIPS 2021) , 2, pages: 1403-1414 , 35th Conference on Neural Information Processing Systems , 2021 (conference)

Abstract
We introduce TransformerFusion, a transformer-based 3D scene reconstruction approach. From an input monocular RGB video, the video frames are processed by a transformer network that fuses the observations into a volumetric feature grid representing the scene; this feature grid is then decoded into an implicit 3D scene representation. Key to our approach is the transformer architecture that enables the network to learn to attend to the most relevant image frames for each 3D location in the scene, supervised only by the scene reconstruction task. Features are fused in a coarse-to-fine fashion, storing fine-level features only where needed, requiring lower memory storage and enabling fusion at interactive rates. The feature grid is then decoded to a higher-resolution scene reconstruction, using an MLP-based surface occupancy prediction from interpolated coarse-to-fine 3D features. Our approach results in an accurate surface reconstruction, outperforming state-of-the-art multi-view stereo depth estimation methods, fully-convolutional 3D reconstruction approaches, and approaches using LSTM- or GRU-based recurrent networks for video sequence fusion.

[BibTex]

[BibTex]


SPSG: Self-Supervised Photometric Scene Generation from RGB-D Scans
SPSG: Self-Supervised Photometric Scene Generation from RGB-D Scans

Dai, A., Siddiqui, Y., Thies, J., Valentin, J., Nießner, M.

In IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), CVPR 2021, 2021 (inproceedings)

Abstract
We present Self-Supervised Photometric Scene Generation (SPSG), a novel approach to generate high-quality, colored 3D models of scenes from RGB-D scan observations by learning to infer unobserved scene geometry and color in a self-supervised fashion. Our self-supervised approach learns to jointly inpaint geometry and color by correlating an incomplete RGB-D scan with a more complete version of that scan. Notably, rather than relying on 3D reconstruction losses to inform our 3D geometry and color reconstruction, we propose adversarial and perceptual losses operating on 2D renderings in order to achieve high-resolution, high-quality colored reconstructions of scenes. This exploits the high-resolution, self-consistent signal from individual raw RGB-D frames, in contrast to fused 3D reconstructions of the frames which exhibit inconsistencies from view-dependent effects, such as color balancing or pose inconsistencies. Thus, by informing our 3D scene generation directly through 2D signal, we produce high-quality colored reconstructions of 3D scenes, outperforming state of the art on both synthetic and real data.

Paper Video link (url) [BibTex]

Paper Video link (url) [BibTex]


no image
Neural RGB-D Surface Reconstruction

Azinovic, D., Martin-Brualla, R., Goldman, D. B., Nießner, M., Thies, J.

ArXiv, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages: 6280-6291 , IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2022), 2021 (conference)

link (url) DOI [BibTex]

link (url) DOI [BibTex]

We use cookies to improve your website experience. Find out more about our cookies and how to disable them. By continuing, you consent to our use of cookies. Continue