up-to-top

Building CaptionZen: A YouTube Video Summarizer using Microsoft.Extensions.AI and Blazor

January 10, 2025 by Binny Kanjur


CaptionZen

Generative AI has transformed how we build modern applications, enabling everything from chatbots that improve customer service to code assistants that speed up development. As developers, we're constantly exploring ways to enhance our applications with AI capabilities. Microsoft.Extensions.AI simplifies this process by providing a unified way to interact with various AI providers, letting us focus on building impactful features instead of wrestling with different APIs and implementations.

In this blog post, we'll explore how to build a YouTube video summarizer using Microsoft.Extensions.AI and Blazor, designed to work seamlessly across platforms—desktop, tablet, or phone. By combining the web capabilities of Blazor with the native strengths of .NET MAUI, CaptionZen offers a multi-platform experience. The summarizer extracts video transcripts and generates concise summaries, enabling users to quickly grasp the key points of videos without watching them in their entirety.

TL;DR


If you’re keen to try the tool without exploring the implementation details, follow these steps to build and run the application:

Prerequisites

Steps to Run the Application

  1. Clone the Repository

    git clone --branch v1.0 --single-branch https://github.com/binnykanjur/caption-zen.git
    

    Alternatively, download and extract the source zip.

  2. Navigate to the Project Folder

    cd caption-zen/src/CaptionZen.Web
    
  3. Set Encryption Keys CaptionZen encrypts sensitive details like AI Provider keys before storing to the database. Configure keys for secure data storage:

    dotnet user-secrets set "Encryption:Key" "a8F3kL9mQ2rT5vX1"
    dotnet user-secrets set "Encryption:IV" "Z7pR4sW8nJ6uY3b2"
    
  4. Run the Application

    dotnet run
    
  5. Summarize a YouTube Video

    • Update Settings: Provide your AI provider details.

      • Click in the sidebar to open up the Settings pane.
      • Select the AI Provider from the AI Providers dropdown list.
      • Enter the details required by the AI Provider.
      • Close the Settings pane.
    • Enter Video URL: Input the YouTube video URL to extract its thumbnail and title.

    • Summarize: Click the summarize button to generate a concise summary.

Architecture


In this section we go into some of the core implementation details of CaptionZen, focusing mainly on persistence, core services, and key workflows.

Persistence

CaptionZen uses EFCore with SQLite to store settings, AI provider configurations, chat histories, and messages. Key tables include:

  1. Setting: Stores application-specific configurations and encrypted API keys.
  2. AiProvider: Contains AI provider-specific details like endpoint URLs and model IDs.
  3. Chat: Maintains a list of conversations or summarizations.
  4. ChatMessage: Captures user and AI responses within chats.

The persistence layer is built around a custom DbContext, implemented in: /src/CaptionZen.Shared/Services/CaptionZenDbContext.cs. The database structure follows a clear relational design, illustrated in the diagram below:

Core Services

CaptionZen relies on several key services to perform its core functionalities. These services encapsulate core logic, interact with external APIs (AI provider, YouTube), and integrate with the persistence layer. All core services are registered with the DI container through the AddCaptionZenServices method in /src/CaptionZen.Shared/ServiceCollectionExtensions.cs

ISettingsService

Manages AI provider API keys and user preferences securely. Since we support both web and native platforms, we've implemented it in two ways:

  • Web Implementation: Uses SQLite for storage (/src/CaptionZen.Shared/Services/DbSettingsService.cs).
  • MAUI Implementation: Leverages platform-native storage mechanisms (/src/CaptionZen.Maui/Services/SettingsService.cs).

IYouTubeService

Handles Video details (title, description, thumbnail) & transcript/captions extraction. Implementation: /src/CaptionZen.Shared/Services/ScraperYouTubeService.cs.

⚠️ A Word of Caution: We're using Kelter.Forks.YoutubeTranscriptApi to grab video transcripts. This is an unofficial library that scrapes YouTube pages - which means it's not the most stable solution. In a production environment, consider using any of the following:

  • YouTube's official Data API: /src/CaptionZen.Shared/Services/GoogleYouTubeService.cs can serve as a good starting point. However, please note that this functionality may be limited to videos you've uploaded rather than working with all YouTube videos.
  • SearchApi: Use the YouTube Api to extract the Subtitles.

ICaptionZenService

This service is the central service that orchestrates interactions with AI providers, manages chats, and integrates with other services.

  • AI Integration: Interfaces with Microsoft.Extensions.AI to process transcripts.
  • Chat Management: Maintains session histories.
  • Workflow Orchestration: Coordinates between ISettingsService and IYouTubeService.

Implementation: /src/CaptionZen.Shared/Services/CaptionZenService.cs.

Prompts Usage


Prompts play a crucial role in generating summaries tailored to user needs. CaptionZen leverages Fabric patterns in the application. Specifically, the extract_wisdom system prompt is used to extract insightful and concise summaries from video transcripts.

Implementation Highlights


Key Workflows

Video Thumbnail Rendering

Extracts and displays video metadata (e.g., title, thumbnail) from YouTube URLs.

Summarization Process

  1. Fetch video transcripts.
  2. Send the transcript to the configured AI provider.
  3. Receive and display the summary in the UI.

Conclusion


This project demonstrates how to integrate AI features into Blazor applications using Microsoft.Extensions.AI. CaptionZen's multi-platform support through .NET MAUI Blazor Hybrid ensures seamless functionality across Android, iOS, Mac, Windows, and Web, providing a unified experience for diverse use cases. By combining transcript extraction with structured prompts and AI responses, CaptionZen provides a practical solution for summarizing YouTube videos.

The approach balances simplicity (e.g., SQLite for persistence) and flexibility (e.g., multi-provider AI integration), making it adaptable to other content-processing workflows. Explore the CaptionZen GitHub repository for more details on the implementation and to experiment with building similar tools.


Add a comment