Quick Navigation
AI Overview
A portfolio of artificial intelligence implementations spanning four distinct production applications, each leveraging OpenAI models for different domains: business intelligence, healthcare, dating, and customer support. These implementations demonstrate expertise in RAG pipelines, structured prompt engineering, confidence-scored classification, adaptive multi-stage AI workflows, NLP fallback systems, and production-grade AI integration patterns.
AI Implementations by Project:
- Lead Web Scraper — 6 distinct AI analyzers using GPT-4o-mini for revenue prediction, owner detection, chain detection, review analysis, business classification, and vendor detection
- Third Eye Health (NatMed) — Full RAG pipeline with local ONNX embeddings (all-MiniLM-L6-v2), HNSW vector search, and OpenAI GPT-4o for healthcare chatbot
- LuvNote Dating App — AI-powered ambassador pipeline using GPT-4o-mini for Instagram DM intent classification (14 intents), response generation, and automated recruitment
- LuvNote Support Portal — AI question routing via GPT-4o-mini with structured JSON responses, confidence scoring, and a keyword-based NLP fallback engine
10+ AI use cases 4 production applications 3 programming languages OpenAI GPT-4o & GPT-4o-mini
RAG Pipeline & Embeddings
Full RAG Pipeline with Local Embeddings THIRD EYE HEALTH
A complete Retrieval-Augmented Generation pipeline that enables an AI chatbot to answer health-related questions using the application's knowledge base. The pipeline uses local ONNX embeddings for vector search and OpenAI GPT-4o (or local LmStudio) for text generation — no external API calls needed for the embedding step.
Pipeline Architecture:
- Text Tokenization: FastBertTokenizer and SharpToken process user queries into tokens for embedding
- Local Embedding Generation: OnnxEmbeddingProvider runs the all-MiniLM-L6-v2 model locally via Microsoft.ML.OnnxRuntime to generate 384-dimensional vector embeddings
- Vector Search: HNSW (Hierarchical Navigable Small World) algorithm finds the most semantically similar documents in the knowledge base
- Context Assembly: Retrieved documents are assembled into context for the LLM prompt
- Text Generation: OpenAiProvider sends the context-enriched prompt to GPT-4o for response generation (with LmStudioProvider as a local alternative)
- Caching: Memory cache stores embeddings and results for performance optimization
- Fallback: RagPipelineStub provides a graceful fallback when AI services are unavailable
Confidence-Weighted Evidence System LEAD WEB SCRAPER
Every AI-extracted data field in the system carries a confidence score (0-1) and an array of evidence objects with source URLs, descriptive notes, and optional source snippets. This creates a fully auditable data provenance chain from raw source to final AI-generated value.
Evidence Structure:
- Value: The AI-extracted data (text, number, or JSON)
- Confidence: 0.0-1.0 score indicating reliability of the AI extraction
- Evidence Array: Each entry contains source URL, human-readable note, and optional raw text snippet
- Pipeline Integration: Confidence scores drive automated decisions — revenue web search trigger at 0.6, owner early exit at 0.8, chain classification at 0.7
AI Analyzers & Classifiers
Two-Stage Revenue Prediction Pipeline LEAD WEB SCRAPER
A unique two-stage revenue estimation pipeline that adapts its data gathering based on confidence. The AI first estimates revenue from available business data, then autonomously decides whether to perform web searches for industry benchmarks to refine its prediction.
Pipeline Flow:
- Stage 1 — Initial AI Estimate: GPT-4o-mini analyzes the restaurant profile (name, category, rating, review count, price level, location) and produces a revenue breakdown by channel
- Confidence Gate: If the AI's confidence score falls below 0.6, the pipeline automatically triggers web research
- Web Research: Three targeted search queries are executed via headless Playwright browser
- Search Provider Cascade: Google Search first; if no results, falls back to Bing Search API
- Stage 2 — Enhanced Estimate: AI re-analyzes with the original data plus web search results for a refined prediction
- Multi-Channel Output: Final estimate breaks revenue into dine-in, takeout/pickup, and delivery channels
Multi-Source Owner Detection with Consensus Scoring LEAD WEB SCRAPER
An intelligent multi-page owner extraction system that discovers restaurant owners by crawling relevant pages, sending each to the AI for name extraction, and aggregating results across sources using a custom consensus scoring algorithm.
Detection Algorithm:
- Page Discovery: Scans the website for links containing keywords: about, story, team, founder, contact, leadership
- Priority Ordering: Pages are ranked by relevance for optimal processing order
- AI Extraction: Each page's content is sent to GPT-4o-mini with structured JSON output (owner_name, confidence, reasoning, source_snippet)
- Early Exit: If any single extraction returns confidence ≥ 0.8, processing stops immediately
- Consensus Aggregation: Final score = sum(confidences) × number of sources, with a multi-source boost of up to +0.15
Chain Affiliation Detection & Pre-Screening LEAD WEB SCRAPER
A two-phase chain detection system that first pre-screens businesses to identify obvious chains (saving processing time), then performs deep AI analysis for borderline cases. Detected chains are automatically declined from the lead pipeline.
Two-Phase Flow:
- Phase 1 — Pre-Screening: chainPreScreener sends a quick AI query to classify as large chain, small chain, or independent
- Skip Logic: If pre-screening identifies a large chain with high confidence, the full crawl is skipped entirely
- Phase 2 — Deep Analysis: For non-obvious cases, full AI analysis considering website structure and business name patterns
- Auto-Decline: When is_chain === true, the lead status is automatically set to "declined"
Business Type Classification (16 Categories) LEAD WEB SCRAPER
A hybrid classification system that combines regex-based pre-screening with AI-powered deep classification. The regex classifier provides a fast initial guess, while GPT-4o-mini analyzes website content, Google Place types, and service indicators for a definitive classification across 16 business categories.
Classification Flow:
- Fast Regex Pass: Pattern matching on website text for keywords with confidence 0.3-0.7
- Google Type Mapping: Maps Google Place types to the internal taxonomy
- AI Deep Classification: GPT-4o-mini selects from 16 categories: restaurant, fast_food, fast_casual, cafe, bakery, bar, pizzeria, food_truck, catering, deli, dessert_shop, breakfast_spot, juice_bar, buffet, food_hall, ghost_kitchen
- Validation: AI output is validated against the TypeScript enum; invalid types are rejected and defaulted
AI Review Problem Analysis LEAD WEB SCRAPER
Combines SerpAPI-powered review scraping with AI analysis to identify recurring business problems from customer reviews. Categorizes problems across 7 dimensions: food quality, service, delivery, cleanliness, pricing, wait times, and order accuracy.
Analysis Pipeline:
- Review Scraping: SerpAPI fetches up to 30 recent Google reviews with pagination
- Date Parsing: Handles three date formats: Unix timestamps, ISO dates, and relative strings ("3 days ago", "a week ago")
- Metrics Calculation: reviewsPerWeek, responseRate, and responseFrequency
- AI Problem Detection: GPT-4o-mini analyzes up to 10 most negative reviews and categorizes problems
NLP & Chat Systems
AI-Powered Support Chat with Question Routing LUVNOTE SUPPORT
A live AI-powered customer support portal that allows users to ask questions about the LuvNote app in natural language. Each question is processed by OpenAI GPT-4o-mini, which analyzes the query against the support page database and returns a structured JSON response with the best matching page, confidence score, and a helpful answer.
Routing Flow:
- User Input: Natural language question submitted through chat interface
- AI Routing: OpenaiSupportRouter sends the question and all support pages to GPT-4o-mini for analysis
- Structured Output: AI returns JSON with support_page_slug, confidence, answer, and reason
- Confidence Scoring: AI returns 0.0-1.0 indicating how well it understood the question
- Deep Links: Pages with in_app_url values allow the AI to link users directly into the relevant app feature
- Fallback: When OpenAI is unavailable, SupportPageMatcher provides keyword-based NLP routing
Keyword-Based NLP Fallback Engine LUVNOTE SUPPORT
SupportPageMatcher provides a self-contained keyword matching engine that can route questions without any external AI API dependency. Built with NLP fundamentals: tokenization, stopword removal, weighted field matching, and margin-based confidence scoring.
NLP Pipeline:
- Tokenization: Query is split, lowercased, and cleaned of punctuation
- Stopword Filtering: Common words (the, a, is, how, do, etc.) are removed
- Weighted Matching: Keywords matched against page title (weight 3), tags (weight 2), summary (weight 1), and slug (weight 1)
- Margin-Based Confidence: Score calculated from top match vs second-best match separation
AI Automation Pipelines
AI-Powered Ambassador Recruitment Pipeline LUVNOTE
An enterprise-grade AI automation pipeline that uses OpenAI GPT-4o-mini to intelligently manage Instagram DM conversations for ambassador recruitment. The system classifies user intent across 14 categories, generates contextual responses from 12 templates, and handles the complete conversation lifecycle autonomously.
Complete AI Pipeline Flow:
- Message Ingestion: Instagram webhook delivers incoming DMs to the application
- Context Loading: OpenAiRouterService loads last 10 messages from thread for conversation context
-
AI Intent Analysis: GPT-4o-mini analyzes the message and classifies intent across 14 categories:
- yes_interested, question_compensation, question_details, not_interested, already_ambassador
- wrong_person, spam, unclear, needs_followup, and more
- Confidence Scoring: AI assigns confidence score (0.0-1.0) indicating certainty of intent classification
- Response Generation: AI drafts contextual response and selects from 12 response templates
- Queue Processing: OutboundQueue with priority ordering manages message delivery
- Intelligent Escalation: Low confidence or complex intents trigger admin email alerts
- Retry Logic: Failed sends retry with exponential backoff; permanent failures trigger admin notification
Multi-Signal Ordering Vendor Detection LEAD WEB SCRAPER
A configurable rules-based detection engine that identifies which ordering platform a restaurant uses. The system follows ordering CTAs, navigates to order pages, and applies a multi-signal scoring algorithm across four detection dimensions with different weights.
Detection Signals & Weights:
- CTA Discovery: Scans for links matching order keywords (order online, start order, pickup, delivery)
- Link Following: Playwright navigates to the top 2 ordering links, capturing final URLs after redirects
- Domain Match (+5 pts): Checks order page domain against known vendor domains
- URL Regex (+4 pts): Pattern matching on the full URL for vendor-specific patterns
- Script Host Detection (+3 pts): Collects all <script src> hostnames and matches against vendor script domains
- Text Pattern (+2 pts): Scans page content for strings like "powered by Owner"
Zipcode Discovery with AI Summary Generation LEAD WEB SCRAPER
A session-based discovery pipeline that finds all restaurants in a zipcode, creates scaffold database entries, and processes them in resumable batches. Each business gets a full AI-generated summary by orchestrating all 6 AI analyzers in parallel.
Discovery Flow:
- Geocoding: Converts zipcode to lat/lng via Google Geocoding API
- Multi-Strategy Search: 6 place types plus 16 food keywords for comprehensive coverage
- Chain Pre-Screening: AI identifies obvious chains before expensive full crawls
- Full AI Summary: businessSummary orchestrates parallel execution of all AI analyzers per business
- Session Cleanup: Sessions auto-expire after 1 hour
AI Skills & Technologies
LLM Integration
- OpenAI GPT-4o (generation)
- OpenAI GPT-4o-mini (classification)
- Structured JSON prompt engineering
- Chat Completions API
- System prompt design
- Confidence scoring patterns
- LmStudio local LLM support
RAG & Embeddings
- Retrieval-Augmented Generation pipeline
- ONNX Runtime local embeddings
- all-MiniLM-L6-v2 model
- HNSW vector search algorithm
- FastBertTokenizer / SharpToken
- Knowledge base ingestion
- Embedding caching
AI Pipeline Design
- Multi-stage adaptive pipelines
- Confidence-gated processing
- Consensus scoring algorithms
- Early exit optimization
- Graceful degradation & fallbacks
- Provider cascade patterns
- Hybrid regex + AI classification
NLP Fundamentals
- Text tokenization
- Stopword filtering
- Weighted field scoring
- Intent classification (14 intents)
- Sentiment analysis (7 problem categories)
- Named entity extraction
- Semantic similarity search
Production AI Patterns
- Evidence provenance tracking
- AI output validation
- JSON response parsing with fallbacks
- Rate limiting & retry logic
- Background worker queues
- Audit trails for AI decisions
- Admin escalation on low confidence
Languages & Frameworks
- C# / ASP.NET Core 8.0 (NatMed AI)
- TypeScript / Node.js (Web Scraper)
- Ruby on Rails 8.1 (Support Portal)
- OpenAI Ruby Gem
- Microsoft.ML.OnnxRuntime
- Playwright (headless browser AI)
Key Achievements
10+ AI Use Cases
Revenue prediction, owner detection, chain detection, review analysis, classification, support routing, ambassador automation, RAG chatbot, NLP fallback, vendor detection
Full RAG Pipeline
Local ONNX embeddings with HNSW vector search and GPT-4o generation for healthcare chatbot
6 AI Analyzers
Independent AI modules for revenue, owners, chains, reviews, business types, and summaries
14 Intent Classes
AI-powered Instagram DM intent classification with confidence scoring and template responses
Adaptive Pipelines
Confidence-gated workflows that autonomously enrich data when AI certainty is low
Consensus Scoring
Multi-source AI extraction with cross-page confidence aggregation to reduce hallucination
NLP Fallback Engine
Zero-dependency keyword matching with tokenization, stopwords, and weighted scoring
3 Languages
AI implementations in C#, TypeScript, and Ruby across production applications