MIN_BILD_AI - GAN Archive

Why This Project?

This is the story of trying to teach a computer to make images that look like mine. Not copies, but new images carrying the same visual logic as my own image archive. A kind of mirror that shows back something I didn't ask for, but still recognize.

The project is called MIN BILD AI ("My Image AI") and is part of my degree project IN SILICO at Konstfack (University of Arts, Crafts and Design, Stockholm). It asks a question I still don't have the answer to: what is mine, really, when a machine learns from me and I learn from the machine?

The Journey

The project started in June 2025 with basic image experiments. After months of learning Python, PyTorch, and image processing fundamentals, I reached GAN experiments in November 2025. The GAN phase ran through January 2026, spanning 18 versions before switching to diffusion models.

What I learned: The most important insight has nothing to do with code. It's about distillation. About identifying what really matters and stripping away the rest. A model with 6 well-chosen features works better than one with 76 noisy ones. Complicated is not better.

What is a GAN?

GAN (Generative Adversarial Network) consists of two neural networks training together:

Generator (G): Tries to create fake images that look real
Discriminator (D): Tries to distinguish real from fake images

They "fight" each other until the generator becomes so good that the discriminator can't tell the difference.

# GAN Training Loop (simplified) for epoch in range(epochs): # 1. Train Discriminator real_pred = D(real_images) # Should give 1 (real) fake_pred = D(G(noise)) # Should give 0 (fake) d_loss = BCE(real_pred, 1) + BCE(fake_pred, 0) # 2. Train Generator fake_pred = D(G(noise)) # Generator wants D to give 1 g_loss = BCE(fake_pred, 1) # "Fool" the discriminator

Famous GAN Projects (Inspiration)

[thispersondoesnotexist.com]

This Person Does Not Exist

StyleGAN2 trained on faces. Each page refresh = a new face that never existed.

[NVIDIA StyleGAN]

StyleGAN / StyleGAN2 / 3

NVIDIA's state-of-the-art. Progressive training, style mixing, alias-free.

[thisartworkdoesnotexist]

This Artwork Does Not Exist

GAN trained on artworks. Generates paintings in various styles.

[Artbreeder]

Artbreeder

Lets users "blend" images using GAN latent space interpolation.

My Version History (v1-v18)

v1-v3 - Basic experiments Nov 2025

First attempts with vanilla GAN. Problems: mode collapse, unstable training.

v4-v6 - Compositional GAN Dec 2025

Tested separate generators for object and background. Mask pretrain, contrastive learning.

LESSON: Too complicated, too many moving parts.

v7-v8 - Mixed dataset Dec 2025

Dataset: 5684 images but mixed content (screenshots, memes, people).

LESSON: Data quality is MUCH more important than quantity!

v9-v10 - Cleaned dataset Jan 2026

Dataset: 3036 → 2310 images (~80% art). Problem: 148 features was too many.

LESSON: Fewer features = clearer signal. Model gets "confused" by too many inputs.

v11-v15 - Optimization Jan 19-24, 2026

76 features, dataset_v10, fine-tuned hyperparameters. Produced interesting samples but mode collapse after 200+ epochs.

v16-v18 - ADA collapse Jan 24-26, 2026

ADA-p dropped from 0.8 → 0.02. Discriminator became dominant, generator collapsed.

CONCLUSION: GAN training is fundamentally unstable for small datasets. Time to try Diffusion!

StyleGAN3 Test Jan 2026

Tested upgrade to StyleGAN3. Problem: 57,210 sec/kimg (~16h), OOM on 6GB VRAM.

Conclusion: Requires more powerful GPU than my GTX 1660 Super.

Sample Images from GAN Experiments

v10 - Cleaned dataset:

v14 & v17 - Later versions:

Code Attribution & Development Process

This project was developed during 1.5 years of programming studies at a vocational school, in dialogue with AI tools (both ChatGPT and Claude Code). They brought different perspectives and strengths which helped me understand problems from multiple angles.

What I Did Myself MY WORK

Area	Examples
Idea & Concept	Decision to make conditional GAN, use visual features instead of text
Dataset curation	Collected, sorted, cleaned ~7000 of my own photos (5684 → 2310)
Feature selection	Chose which 76 features to extract, wrote extraction scripts
Experiment design	18 versions, trying different approaches: masks, fewer features, ADA tuning
Training config	Learning rates, batch size, r1_gamma, ada_target - all tuned manually
Monitoring & analysis	Watched ADA-p, loss curves, detected mode collapse
Code structure	Organized project folders, config files, auto-backup scripts

What AI Helped With

I used both ChatGPT and Claude Code throughout the project. They brought different perspectives and strengths, which helped me understand problems from multiple angles.

Area	Type of help
Monotonous calculations	Writing repetitive code patterns, boilerplate, tensor reshaping
Math implementation	Translating paper equations into PyTorch (loss functions, gradient penalties)
Debugging	Finding why mode collapse happened, tensor dimension mismatches
Concrete examples	Showing how a concept looks in actual code, not just theory
Finding references	Locating relevant papers, documentation, and tutorials

Base Code NVIDIA StyleGAN2-ADA

Generator architecture - StyleGAN2 mapping network + synthesis network
Discriminator architecture - Progressive discriminator with ADA
Training loop - Based on official training_loop.py (modified for conditional input)
ADA augmentation - Adaptive augmentation from NVIDIA paper

Source: Official NVIDIA StyleGAN2-ADA repository
github.com/NVlabs/stylegan2-ada-pytorch

Libraries Used

PyTorch - Neural network framework
OpenCV - Image loading and preprocessing
NumPy - Numerical operations
Pillow - Image manipulation
scikit-image - Feature extraction (texture, color histograms)

IN SUMMARY
This project was developed in dialogue with AI, combining my 1.5 years of programming studies with AI assistance for implementation details. The ideas, experiments, feature selection, and dataset are my own work. I trained the model from scratch on ~7000 of my own photos across 18 versions.

Technical Details

Architecture

z_dim: 512 (latent space)
w_dim: 512 (intermediate space)
cond_dim: 76 features (v11-v15)
Resolution: 256x256
Batch size: 4 (6GB VRAM limit)

Best configuration (v11):

g_lr: 0.001 # Generator learning rate d_lr: 0.00025 # Discriminator LR (4x lower) r1_gamma: 20.0 # R1 regularization ada_target: 0.6 # ADA augmentation target style_mixing: 0.9 # Style mixing probability

Summary: What We Learned

1. Data quality > quantity
2310 clean art images >> 5684 mixed images

2. Fewer features = better
76 features worked ok, 148 was too many

3. GAN training is unstable
Requires constant monitoring, fine-tuning of lr-ratio, r1_gamma

4. Hardware matters
GTX 1660 Super (6GB) works for StyleGAN2 but not StyleGAN3

5. Masks are hard
Auto-generated masks vary too much. Bad masks are worse than no masks.

CONCLUSION: GAN is powerful but diffusion gives more stable training for small datasets.

→ See Diffusion v2 (current model)

Paper References

Original GAN - Goodfellow et al., "Generative Adversarial Nets" (NeurIPS 2014)
StyleGAN - Karras et al., "A Style-Based Generator Architecture for GANs" (CVPR 2019)
StyleGAN2 - Karras et al., "Analyzing and Improving the Image Quality of StyleGAN" (CVPR 2020)
StyleGAN3 - Karras et al., "Alias-Free Generative Adversarial Networks" (NeurIPS 2021)
ADA - Karras et al., "Training Generative Adversarial Networks with Limited Data" (NeurIPS 2020)
Progressive GAN - Karras et al., "Progressive Growing of GANs" (ICLR 2018)
WGAN-GP - Gulrajani et al., "Improved Training of Wasserstein GANs" (NeurIPS 2017)

GAN - Generative Adversarial Networks