GAN - Generative Adversarial Networks
StyleGAN2 Conditional - 18 versions of experiments
ARCHIVED - SWITCHED TO DIFFUSION
Training progression animation
This is the story of trying to teach a computer to make images that look like mine. Not copies, but new images carrying the same visual logic as my own image archive. A kind of mirror that shows back something I didn't ask for, but still recognize.
The project is called MIN BILD AI ("My Image AI") and is part of my degree project IN SILICO at Konstfack (University of Arts, Crafts and Design, Stockholm). It asks a question I still don't have the answer to: what is mine, really, when a machine learns from me and I learn from the machine?
The Journey
The project started in June 2025 with basic image experiments. After months of learning Python, PyTorch, and image processing fundamentals, I reached GAN experiments in November 2025. The GAN phase ran through January 2026, spanning 18 versions before switching to diffusion models.
GAN (Generative Adversarial Network) consists of two neural networks training together:
- Generator (G): Tries to create fake images that look real
- Discriminator (D): Tries to distinguish real from fake images
They "fight" each other until the generator becomes so good that the discriminator can't tell the difference.
This Person Does Not Exist
StyleGAN2 trained on faces. Each page refresh = a new face that never existed.
StyleGAN / StyleGAN2 / 3
NVIDIA's state-of-the-art. Progressive training, style mixing, alias-free.
This Artwork Does Not Exist
GAN trained on artworks. Generates paintings in various styles.
Artbreeder
Lets users "blend" images using GAN latent space interpolation.
First attempts with vanilla GAN. Problems: mode collapse, unstable training.
Tested separate generators for object and background. Mask pretrain, contrastive learning.
Dataset: 5684 images but mixed content (screenshots, memes, people).
Dataset: 3036 → 2310 images (~80% art). Problem: 148 features was too many.
76 features, dataset_v10, fine-tuned hyperparameters. Produced interesting samples but mode collapse after 200+ epochs.
ADA-p dropped from 0.8 → 0.02. Discriminator became dominant, generator collapsed.
Tested upgrade to StyleGAN3. Problem: 57,210 sec/kimg (~16h), OOM on 6GB VRAM.
Conclusion: Requires more powerful GPU than my GTX 1660 Super.
v10 - Cleaned dataset:


v14 & v17 - Later versions:


This project was developed during 1.5 years of programming studies at a vocational school, in dialogue with AI tools (both ChatGPT and Claude Code). They brought different perspectives and strengths which helped me understand problems from multiple angles.
What I Did Myself MY WORK
| Area | Examples |
|---|---|
| Idea & Concept | Decision to make conditional GAN, use visual features instead of text |
| Dataset curation | Collected, sorted, cleaned ~7000 of my own photos (5684 → 2310) |
| Feature selection | Chose which 76 features to extract, wrote extraction scripts |
| Experiment design | 18 versions, trying different approaches: masks, fewer features, ADA tuning |
| Training config | Learning rates, batch size, r1_gamma, ada_target - all tuned manually |
| Monitoring & analysis | Watched ADA-p, loss curves, detected mode collapse |
| Code structure | Organized project folders, config files, auto-backup scripts |
What AI Helped With
I used both ChatGPT and Claude Code throughout the project. They brought different perspectives and strengths, which helped me understand problems from multiple angles.
| Area | Type of help |
|---|---|
| Monotonous calculations | Writing repetitive code patterns, boilerplate, tensor reshaping |
| Math implementation | Translating paper equations into PyTorch (loss functions, gradient penalties) |
| Debugging | Finding why mode collapse happened, tensor dimension mismatches |
| Concrete examples | Showing how a concept looks in actual code, not just theory |
| Finding references | Locating relevant papers, documentation, and tutorials |
Base Code NVIDIA StyleGAN2-ADA
- Generator architecture - StyleGAN2 mapping network + synthesis network
- Discriminator architecture - Progressive discriminator with ADA
- Training loop - Based on official training_loop.py (modified for conditional input)
- ADA augmentation - Adaptive augmentation from NVIDIA paper
github.com/NVlabs/stylegan2-ada-pytorch
Libraries Used
- PyTorch - Neural network framework
- OpenCV - Image loading and preprocessing
- NumPy - Numerical operations
- Pillow - Image manipulation
- scikit-image - Feature extraction (texture, color histograms)
This project was developed in dialogue with AI, combining my 1.5 years of programming studies with AI assistance for implementation details. The ideas, experiments, feature selection, and dataset are my own work. I trained the model from scratch on ~7000 of my own photos across 18 versions.
Architecture
- z_dim: 512 (latent space)
- w_dim: 512 (intermediate space)
- cond_dim: 76 features (v11-v15)
- Resolution: 256x256
- Batch size: 4 (6GB VRAM limit)
Best configuration (v11):
2310 clean art images >> 5684 mixed images
76 features worked ok, 148 was too many
Requires constant monitoring, fine-tuning of lr-ratio, r1_gamma
GTX 1660 Super (6GB) works for StyleGAN2 but not StyleGAN3
Auto-generated masks vary too much. Bad masks are worse than no masks.
→ See Diffusion v2 (current model)
- Original GAN - Goodfellow et al., "Generative Adversarial Nets" (NeurIPS 2014)
- StyleGAN - Karras et al., "A Style-Based Generator Architecture for GANs" (CVPR 2019)
- StyleGAN2 - Karras et al., "Analyzing and Improving the Image Quality of StyleGAN" (CVPR 2020)
- StyleGAN3 - Karras et al., "Alias-Free Generative Adversarial Networks" (NeurIPS 2021)
- ADA - Karras et al., "Training Generative Adversarial Networks with Limited Data" (NeurIPS 2020)
- Progressive GAN - Karras et al., "Progressive Growing of GANs" (ICLR 2018)
- WGAN-GP - Gulrajani et al., "Improved Training of Wasserstein GANs" (NeurIPS 2017)