Chroma-HS: High-Fidelity Industrial Head Swapping with Chroma Keying

1Klleon AI Research, 2Samsung Research, 3Hyperconnect, 4Kyung Hee University
(*equal contribution)

Chroma-HS generates high-fidelity head swapped results.

Head swapping results from three source images to four target videos.


High-Fidelity Industrial Head Swapping Pipeline

Illustration of our Chroma-HS pipeline: After acquiring the actor's frames (source), we can seamlessly insert acting scenes into the desired scenes with our Chroma-HS. Chroma keying ensures high-fidelity backgrounds. Here, both of the source and the target actors are virtual humans.

Various Industrial Application Examples

By leveraging chroma key technique with our proposed Chroma-HS pipeline, we can obtain various high-fidelity head swapped videos in the wild environments. The red boxes represent the source images.


Effects of Our Contributions

Our novel H2 augmentation and FPAT with chroma keying successfully achieves high-fidelity head swapping performance. Point and drag your mouse to see the results!


Abstract

In the film industry, scenarios arise where footage of stunt performers necessitates subsequent replacement with the original actor's head. Deep learning-based head swapping may be a viable solution in such a scenario. However, we point out that existing head swapping frameworks still show artifacts, such as blurry results and imperfect foreground and background distinction. To mitigate this problem, we propose Chroma-HS, a new pipeline that generates high-fidelity results via splitting the head swapping task into the background and the foreground generation. Chroma-HS introduces chroma keying to the head swapping for the first time, which enables a flawless and diverse background generation. To this end, we introduce two novel methods to generate high-fidelity foregrounds. We propose Head shape and long Hair augmentation (H2 augmentation), which mimics diverse head attributes. Finally, Chroma-HS incorporates Foreground Predictive Attention Transformer (FPAT) which generates the foreground region by restricting the attention region with the predicted body mask. Experimental results show that our Chroma-HS significantly outperforms the state-of-the-art head swapping model on benchmark datasets both qualitatively and quantitatively.

Motivations

We propose Chroma-HS to consider the real-world application. As shown in (a), the existing work [2] (HeSer) shows severe artifacts on inpainting regions. To inpaint the background flawlessly, we propose to introduce chroma keying in the head swapping framework. However, it still shows low-fidelity results to inpaint the neck and the body, which is hidden due to the head shape and hair difference described in a red box of (b). Chroma-HS generates the high-fidelity foreground with H2 augmentation and Foreground Predictive Attention Transformer (FPAT). Chroma-HS removes artifacts as shown in the blue boxes of (b) and (c), and easily changes various high-fidelity real-world backgrounds.

Methods

Chroma-HS consists of H2 augmentation, an encoder (E), a head colorizer, a body blender including Foreground Predictive Attention Transformer (FPAT) modules, and a decoder (D). We visualize our proposed input (X) manipulation method with overall training (blue) and inference (red) schemes. The head colorizer colorizes the gray head of X, and the body blender inpaints the hidden body and the neck with a foreground mask-aware attention mechanism. The Foreground-Prediction module predicts the foreground mask (M) of the body and the neck region, and the attention is reweighted according to M.

Comparisons

Qualitative comparison with the state-of-the-art method [2]. The third and fourth rows are the results from HeSer inferenced on in-the-wild configuration and green screen configuration (Green Screen), respectively.

References

We utilized the pre-trained face reenactment network to generate our head swapping results.

[1] Latent Image Animator: Learning to Animate Images via Latent Space Navigation.

[2] Few-Shot Head Swapping in the Wild.