Kumbh SahAIyak

ai

A sophisticated RAG (Retrieval-Augmented Generation) chat application with AI governance and deepfake technology. The system combines BM25, fuzzy search, and vector search with BGE Reranker for optimal retrieval. Features include custom guardrails, prompt optimization using Textgrad, and deepfake video generation with lip-syncing.

System Design & Architecture

Architecture Overview

The system follows a microservices architecture with separate components for RAG pipeline, guardrails, and deepfake generation. The RAG pipeline uses a hybrid retrieval approach combining BM25 (lexical) and BGE embeddings (semantic) for initial retrieval, followed by BGE Reranker for relevance optimization. The deepfake pipeline integrates multiple TTS models and lip-sync services.

Key Components

  • RAG Pipeline: BM25 + Vector Search + BGE Reranker
  • Custom Guardrails: Content safety and compliance checks
  • Prompt Optimization: Textgrad-based prompt engineering
  • Deepfake Audio: VITS, Coqui-TTS, ElevenLabs integration
  • Deepfake Video: Wav2Lip for lip-syncing
  • Caching Layer: Fuzzy search with Levenshtein distance
  • Chunk Merging: Jensen-Shannon divergence for similar chunks

Data Flow

User query → Guardrails check → Hybrid retrieval (BM25 + Vector) → BGE Reranker → LLM generation → Response guardrails → User. For deepfake: Text input → TTS models → Audio generation → Wav2Lip → Video output.

Technologies Used

PythonPyTorchDockerBGE EmbeddingsBM25Wav2LipVITSCoqui-TTSElevenLabsNaturalSpeech2

Architecture References

BGE Embedding Fine-tuning

Fine-tuned BGE embeddings for enhanced retrieval and reasoning capabilities using FlagOpen/FlagEmbedding framework.

Learn More

Wav2Lip - Lip Sync

Open-source lip-sync model by IIIT Hyderabad for synchronizing audio with video.

Learn More

VITS - Voice Cloning

Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech.

Learn More

Textgrad - Prompt Optimization

Similar to DSPY but better approach for prompt optimization in RAG applications.

Key Features

  • Hybrid retrieval system (BM25 + Vector Search + BGE Reranker)
  • Custom AI guardrails for content safety and compliance
  • Deepfake video generation with lip-sync
  • Voice cloning with multiple TTS models
  • Prompt optimization with Textgrad
  • Similar chunk merging using Jensen-Shannon divergence
  • Fuzzy search caching with Levenshtein distance

My Responsibilities

  • Built robust question-answering pipeline combining BM25 with fuzzy search and vector search
  • Applied BGE Reranker to optimize relevance and ranking accuracy
  • Performed prompt finetuning with Textgrad for RAG application
  • Fine-tuned BGE embeddings to enhance model retrieval and reasoning capabilities
  • Designed and enforced AI governance by mimicking AWS Guardrails
  • Developed AI-driven deepfake system for CM video generation
  • Integrated advanced audio cloning technologies including Eleven Labs, Coqui-TTS, VITs and NaturalSpeech2
  • Implemented precise lip-syncing using Wav2Lip
  • Fine-tuned VITs and Coqui-TTS models with audio cleaning using Nemo NVidia

Technologies Used

PythonPyTorchDockerBGE EmbeddingsBM25Wav2LipVITSCoqui-TTSElevenLabs