Types of Generative AI Models

Meet the Family

Prasad Sawant
9 min readOct 17, 2024

Hello👋 and welcome back to our Gen AI 101 series! In our last blogs, we introduced Generative AI and explored the basics of neural networks.

In case you are new to this series below are some links to help you get started

The Series

Today, we will meet the show’s stars — the different types of generative AI models. Think of these models as a family of talented artists, each with their own unique style and specialty, just like the diverse artistic traditions we have in India. Let’s get to know them!

1. GANs: The Mehndi Artist and Wedding Planner Duo

Generative Adversarial Networks (GANs) are like a dynamic duo at an Indian wedding — a skilled mehndi artist and a perfectionist wedding planner working together (or playfully competing) to create the perfect bridal mehndi design.

Persona:

Meet Gayatri the Generator (our mehndi artist) and Dhruv the Discriminator (our wedding planner).

How they work:

  1. Gayatri creates intricate mehndi designs for the bride, blending traditional patterns with modern twists.
  2. Dhruv, the detail-oriented wedding planner, examines both traditional mehndi designs and Gayatri’s creations, trying to ensure they meet the family’s expectations and wedding theme.
  3. As Dhruv gets better at spotting inconsistencies or modern elements that don’t fit the theme, Gayatri has to refine her skills to create designs that are both innovative and traditionally appropriate.

With each wedding (or training iteration), both improve their crafts:

  • Gayatri learns to create more convincing and beautiful designs.
  • Dhruv develops a keener eye for authenticating and appreciating Mehndi artistry.

The GAN process in Mehndi terms:

  1. Gayatri starts with a basic design (random noise).
  2. Dhruv provides feedback (“The peacock motif isn’t traditional enough!”).
  3. Gayatri refines the design based on the feedback.
  4. This back-and-forth continues until the final design is both unique and convincingly traditional.

Real-world application:

GANs, like our mehndi duo, are fantastic at creating realistic yet novel content:

  1. In fashion: Generating new paisley patterns for sarees, blending traditional motifs with modern aesthetics.
  2. In film: Creating realistic crowd scenes for Bollywood movies, populating backgrounds with diverse, unique individuals.
  3. In gaming: Designing unique characters for Indian mythology-based video games, each with distinct features and outfits.
  4. In art: Generating new Madhubani or Gond art pieces, inspired by traditional styles but with novel compositions.

2. VAEs: The Fusion Chef at a Modern Indian Restaurant

Variational Autoencoders (VAEs) are like innovative chefs at a modern Indian fusion restaurant, who capture the essence of traditional dishes and recreate them with a contemporary twist.

Persona:

Meet Vani the VAE, a creative chef known for her unique interpretations of classic Indian cuisine.

How they work:

  1. Encoding (Understanding the dish): Vani carefully studies a traditional dish, like Butter Chicken. She identifies the key elements that make it special: the creamy tomato base, the tender chicken, the blend of spices.
  2. Latent Space (Distilling the essence): In her mind, Vani creates a 'flavor profile' of the dish. This profile isn't an exact recipe, but rather a representation of the dish's essential characteristics.
  3. Decoding (Recreating with a twist): Using this flavor profile, Vani creates a new dish that captures the essence of Butter Chicken but in a novel form.
  4. The result might be a "Butter Chicken Mousse with Tandoori Crisp" – not exactly Butter Chicken, but embodying its key flavors and textures.

The VAE process in culinary terms:

  • Encoding: Tasting and analyzing the original dish.
  • Latent Space: Creating a mental ‘flavor map’ of the dish’s essence.
  • Decoding: Using the flavor map to inspire a new creation.

Real-world applications:

VAEs, like our fusion chef, are great at:

  1. Recipe Innovation: Creating new Indian fusion dishes by understanding the essence of traditional recipes. Example: An AI that suggests a “Masala Dosa Taco” or a “Gulab Jamun Cheesecake”.
  2. Personalized Nutrition Plans: Generating meal plans that capture the essence of a person’s favorite foods while meeting dietary restrictions. Example: Creating a low-calorie version of Biryani that still captures its essential flavors.
  3. Food Product Development: Helping food companies develop new products that blend familiar Indian flavors in novel ways. Example: Designing a new flavor of chips that tastes like Pani Puri.
  4. Aroma and Flavor Science: In the perfume or food industry, creating new scents or flavors that capture the essence of traditional Indian spices and flowers. Example: Developing a new perfume that evokes the scent of a jasmine-adorned bride.
  5. Art and Design: In fashion, creating new patterns for fabrics that capture the essence of traditional Indian textiles. Example: Generating a modern print that evokes the feel of Bandhani tie-dye without directly copying it.

3. Transformers: The Versatile Bollywood Scriptwriter

Transformer models are like versatile and prolific Bollywood scriptwriters who can craft stories in multiple languages, genres, and styles, adapting to any cinematic demand.

Persona:

Meet Tara the Transformer, a renowned scriptwriter known for her ability to write everything from epic historical dramas to contemporary rom-coms, in any Indian language.

How they work:

  1. Attention Mechanism (Script Research): Tara has an uncanny ability to focus on relevant information from vast amounts of source material. She can quickly identify important elements in historical texts, contemporary news, or classic literature that are relevant to her current script.
  2. Parallel Processing (Collaborative Writing): Tara doesn’t write scenes sequentially. Instead, she works on multiple parts of the script simultaneously. She can juggle dialogue, character development, and plot points all at once, much like how Transformer models process different parts of data in parallel.
  3. Contextual Understanding (Nuanced Storytelling): Tara excels at understanding context. She can write dialogues that perfectly fit each character’s background, personality, and the situation they’re in. This is similar to how Transformers understand the context of words in sentences.
  4. Multilingual Proficiency (Language Mastery): Tara can seamlessly switch between writing in Hindi, Tamil, Bengali, or any other Indian language, capturing the nuances of each. She can even translate her scripts between languages, maintaining the essence of the story.

The Transformer process in scriptwriting terms:

  • Input: Tara receives a brief for a new movie project.
  • Processing: She draws upon her vast knowledge of stories, dialogues, and cultural contexts.
  • Output: Tara produces a script that perfectly fits the project requirements.

Real-world application:

Transformers, like our versatile scriptwriter, excel at

  1. Multilingual Content Creation: Generating news articles or blog posts in multiple Indian languages. Example: An AI that can write a tech review in Hindi, then adapt it for a Tamil audience.
  2. Language Translation: Providing nuanced translations between India’s numerous languages. Example: Translating a Gujarati poem into Malayalam, preserving both meaning and style.
  3. Chatbots and Virtual Assistants: Creating conversational AI that can understand and respond in multiple Indian languages and dialects. Example: A customer service bot that can switch between formal Hindi and colloquial Hinglish based on the user’s style.
  4. Content Summarization: Condensing long articles or documents while retaining key information. Example: Summarizing a lengthy legal document in simple, clear language for a layperson.
  5. Personalized Education: Generating educational content tailored to individual students’ learning styles and language preferences. Example: Explaining the same physics concept differently to students from different linguistic backgrounds.
  6. Creative Writing Assistance: Helping authors overcome writer’s block by suggesting plot developments or dialogue. Example: An AI writing assistant that can help craft a mystery novel in the style of a Bengali detective story.

4. Diffusion Models: The Patient Sandalwood Carver

Diffusion models are like a master sandalwood carver who transforms a rough block of wood into an intricately detailed sculpture through a gradual, patient process of refinement.

Persona

Meet Devi the Diffusion model, a renowned sandalwood artisan from Mysuru, known for her ability to reveal exquisite sculptures hidden within blocks of sandalwood.

How they work:

  1. Initial Block (Raw Input): Devi starts with a rough, unformed block of sandalwood. This represents the initial 'noise' or unstructured data in diffusion models.
  2. Rough Shaping (Early Iterations): Using coarse tools, Devi begins to chip away at the block, creating basic shapes and outlines. This mirrors the early stages of a diffusion model, where broad features start to emerge from the noise.
  3. Progressive Refinement (Iterative Denoising): With each pass, Devi uses finer tools to add more detail. The figure within the wood becomes clearer. This reflects the iterative denoising process in diffusion models, where the output becomes more defined with each step.
  4. Detail Emergence (Feature Clarification): As carving progresses, intricate details like facial features, clothing folds, and ornate designs start to appear. This corresponds to how diffusion models gradually reveal finer details and structures in the data.
  5. Final Polishing (High-Resolution Output): In the final stages, Devi uses the finest tools to add the most delicate details and smooth the surface to a glossy finish. This represents the high-resolution, refined output of the diffusion model in its final iterations.

The Diffusion process in Sandalwood Carving terms:

  • Start: Rough sandalwood block (initial noisy state).
  • Middle: Gradual shaping and detailing (denoising process).
  • End: Emergence of a finely detailed, polished sculpture (final refined output).

Real-world applications:

Diffusion models, like our sandalwood carver, excel at:

  1. Image Generation: Creating highly detailed images from text descriptions. Example: Generating a photorealistic image of “an ancient Hoysala temple with intricate carvings, surrounded by a lush forest at sunset.”
  2. 3D Model Creation: Producing detailed 3D models from simple descriptions or rough sketches. Example: Creating a 3D model of a mythological creature described in ancient Indian texts.
  3. Audio Generation: Synthesizing natural-sounding speech or music. Example: Generating a realistic rendition of a classical Carnatic music composition based on a written score.
  4. Video Synthesis: Creating short video clips or animations from still images or text descriptions. Example: Animating a series of Mughal miniature paintings to tell a story.
  5. Scientific Visualization: Turning complex scientific data into clear, detailed visual representations. Example: Creating a detailed visual model of the molecular structure of Ayurvedic herbs.
  6. Architectural Design: Generating detailed architectural plans or 3D renderings from basic concepts. Example: Producing a detailed design for a modern building incorporating elements of traditional Indian architecture.

Comparing Our AI Kalakaars (Artists)

Each of these models has its unique strengths and characteristics:

  1. GANs (The Mehndi Artist and Wedding Planner Duo):
    Create highly realistic and detailed outputs, much like the intricate mehndi designs at an Indian wedding. However, they can be as challenging to perfect as coordinating a grand Indian wedding — requiring a delicate balance and constant refinement.
  2. VAEs (The Fusion Chef):
    Excel at capturing the essence of data and generating creative variations, similar to how our fusion chef creates new dishes that embody the spirit of traditional recipes. However, the outputs might lack some of the fine details of the original, much like how a fusion dish captures the essence of a traditional meal but might not be an exact replica.
  3. Transformers (The Versatile Bollywood Scriptwriter):
    Handle vast amounts of data and excel at understanding context, much like our multilingual scriptwriter who can craft stories in various languages and genres. They’re particularly adept at language-related tasks, adapting to different styles and contexts with the versatility of a seasoned Bollywood writer.
  4. Diffusion Models (The Patient Sandalwood Carver):
    Create high-quality, diverse outputs with incredible detail, much like the intricate sculptures carved from sandalwood. However, they can be slower to generate results, requiring a patient, step-by-step process similar to the meticulous art of sandalwood carving. The results, though, are often worth the wait, revealing complex and beautiful structures from initially rough or noisy inputs.

Conclusion

Each of these AI ‘artists’ brings something unique to the table:

  • GANs offer realism and detail but require careful balancing.
  • VAEs provide creative interpretations while capturing core essences.
  • Transformers excel in understanding and generating contextual, language-based content.
  • Diffusion models deliver highly detailed and diverse outputs through a gradual refinement process.

Just as different art forms serve different purposes in Indian culture, these AI models each have their own strengths and ideal applications in the world of generative AI. As technology advances, we’re seeing these models combined in new and exciting ways, pushing the boundaries of what AI can create.

Links you should Bookmark for further reading:

Which of these AI कलाकार (Kalakar — artists) fascinates you the most? Share your thoughts in the comments!

Thank you for reading till the end and I will see you in the next one — “The Magic Behind ChatGPT: Understanding Large Language Models” where we’ll dive deeper into one of these models and explore how it’s changing the world of AI.

P.S. Don’t forget to follow to ensure you don’t miss any posts in this series. And if you know someone who might be interested in learning about Generative AI, feel free to share this with them!

--

--

Prasad Sawant
Prasad Sawant

Written by Prasad Sawant

I write to simplify complex things. Building the largest tech learning community at LetsUpgrade.in