← Back to Projects Diffusion · Course Project

Simpsonization — Identity-Preserving Face Stylization

A diffusion-based pipeline that turns real human faces into Simpson characters while preserving both personal identity and the original scene context — the photograph still feels like you, just yellow.

Year2025
CourseDiffusion & Flow Models · Team 1
StackStable Diffusion · LoRA · ControlNet · PyTorch
Codegithub

Problem

Off-the-shelf "cartoonify" filters tend to fail in two distinct ways: either they erase the subject's identity, or the flat 2D style clashes badly with the photographic 3D scene around it. The goal of this project was to bridge that gap — apply a strongly stylized look without losing identity, and without leaving the cartoon "floating" on top of the photo.

Inference pipeline
Inference pipeline — masked image and noise are encoded into a 9-channel latent input; a Canny ControlNet conditions on the original face structure; the LoRA-adapted UNet denoises across N timesteps.

Pipeline

Three components stacked on top of a Stable Diffusion inpainting backbone:

LoRA fine-tuning step
Fine-tuning. LoRA adapters are injected into the K and V projections of the UNet's cross-attention; only those low-rank matrices are trained, on a curated Simpson-aesthetic image set, under a masked denoising loss.

Dataset

Validation used a "Face Detection in the Wild" set with diverse poses, lighting, and occlusions. Style training used a curated Simpson-aesthetic image set paired with descriptive prompts so the LoRA learned to associate trigger tokens with the target style.

Results

The pipeline produces results that look like the subject as a Simpson, rather than a generic Simpson stamped onto the subject. It is robust across challenging cases — side profiles, expressive faces, dramatic lighting — because the diffusion backbone preserves scene-consistent shading while ControlNet preserves the underlying face.

Sample 1: input · mask · stylized Sample 2: input · mask · stylized Sample 3: input · mask · stylized Sample 4: input · mask · stylized Sample 5: input · mask · stylized Sample 6: input · mask · stylized
Selected samples — each row shows input photograph · facial mask · Simpsonized output. Background lighting and identity cues survive the stylization.

Limitations: extreme profile views can struggle to map "Simpson" features onto a flat human profile; large facial occlusions can confuse the Canny conditioning; eyeglasses are sometimes blended unpredictably.

Takeaways