Chroma-HS: High-Fidelity Industrial Head Swapping with Chroma Keying

¹Klleon AI Research, ²Samsung Research, ³Hyperconnect, ⁴Kyung Hee University

(*equal contribution)

Abstract

In the film industry, scenarios arise where footage of stunt performers necessitates subsequent replacement with the original actor's head. Deep learning-based head swapping may be a viable solution in such a scenario. However, we point out that existing head swapping frameworks still show artifacts, such as blurry results and imperfect foreground and background distinction. To mitigate this problem, we propose Chroma-HS, a new pipeline that generates high-fidelity results via splitting the head swapping task into the background and the foreground generation. Chroma-HS introduces chroma keying to the head swapping for the first time, which enables a flawless and diverse background generation. To this end, we introduce two novel methods to generate high-fidelity foregrounds. We propose Head shape and long Hair augmentation (H² augmentation), which mimics diverse head attributes. Finally, Chroma-HS incorporates Foreground Predictive Attention Transformer (FPAT) which generates the foreground region by restricting the attention region with the predicted body mask. Experimental results show that our Chroma-HS significantly outperforms the state-of-the-art head swapping model on benchmark datasets both qualitatively and quantitatively.

References

Chroma-HS: High-Fidelity Industrial Head Swapping with Chroma Keying

Chroma-HS generates high-fidelity head swapped results.

Head swapping results from three source images to four target videos.

High-Fidelity Industrial Head Swapping Pipeline

Illustration of our Chroma-HS pipeline: After acquiring the actor's frames (source), we can seamlessly insert acting scenes into the desired scenes with our Chroma-HS. Chroma keying ensures high-fidelity backgrounds. Here, both of the source and the target actors are virtual humans.

Various Industrial Application Examples

By leveraging chroma key technique with our proposed Chroma-HS pipeline, we can obtain various high-fidelity head swapped videos in the wild environments. The red boxes represent the source images.

Effects of Our Contributions

Abstract

Motivations

Methods

Comparisons

Qualitative comparison with the state-of-the-art method [2]. The third and fourth rows are the results from HeSer inferenced on in-the-wild configuration and green screen configuration (Green Screen), respectively.

References