LeetMentor — Voice-Enabled AI Interviewer (Chrome Extension)

LeetMentor overlays a conversational, voice‑enabled AI interviewer directly on LeetCode. It supports two voice modes—Traditional and Realtime—and cleanly separates the extension runtime (background, content, popup) from a small Node WebSocket proxy for low‑latency Realtime audio.

This document is written as MDX so we can embed rich UI bits later (e.g., callouts, tabs). It’s intentionally concise but technical, with links to source files and a couple of deep‑dive code snippets.

TL;DR

What: Chrome MV3 extension with a React + TypeScript UI over LeetCode, styled with Tailwind.
Why: Practice algorithms via a guided mock interview with voice.
How: Background service worker orchestrates config, sessions, OpenAI calls, and messaging; a content script injects the interview UI; a Node Realtime proxy bridges mic → OpenAI Realtime → audio stream back.

Architecture (lean)

Extension (MV3)

Popup: settings + start button

Content Script: injects UI on LeetCode and extracts/caches problem data

Background (service worker): config/session store, message router, OpenAI calls, SPA navigation detection

Interview Page (web‑accessible): React app that reads cached problem + session/config and runs the conversation

Backend (optional, for Realtime): tiny Node ws proxy that relays mic/audio frames between the client and OpenAI Realtime and keeps the API key server‑side

Data/State

chrome.storage.sync: API key, model, voice settings, history window

chrome.storage.local: current session, transcript, cached problem

chrome.runtime.sendMessage: popup/content/interview ↔ background

Key Surfaces

Background service worker (src/background/background.ts)
Central message bus; reads/writes config (API key, model, voices) to chrome.storage.sync; manages sessions & OpenAI API calls; detects SPA nav and notifies content.
Content script (src/content/standalone-react.tsx or src/content/content.ts)
Injects the panel, extracts LeetCode problem metadata, caches current problem to chrome.storage.local for the interview page.
Interview page (src/interview/interview.tsx → InterviewApp)
Loads session & cached problem, fetches config, then drives the conversation via Traditional Voice (VoiceService) or Realtime Voice (RealtimeVoiceService).
Realtime proxy (backend/server.js)
Thin WS bridge: client ⇄ proxy ⇄ OpenAI Realtime. Keeps keys server-side, forwards audio frames, handles reconnection & basic per-origin control.

Runtime Flow

Open a LeetCode problem. Content script extracts title/difficulty/tags and writes to chrome.storage.local.
Click Start Interview (popup or injected UI). Popup/CS sends START_INTERVIEW → background.
Background resolves config, ensures a session, and opens interview.html (web-accessible resource) with a session id.
InterviewApp loads session + problem, greets the user, and selects a voice mode based on settings.
Traditional mode uses ChatGPTService (Chat Completions) + VoiceService (Web Speech / Whisper + TTS).
Realtime mode uses RealtimeVoiceService to stream mic audio to the Node proxy, which speaks back with low latency.

Tech Stack

UI: React 18, TypeScript, Tailwind (src/**, src/shared/styles.css)
Bundling: Webpack 5, ts-loader, CSS pipeline, two configs (webpack.config.js, webpack.standalone.config.js)
Chrome: Manifest V3 service worker, content scripts, web-accessible resources (public/manifest.json)
OpenAI: Chat Completions, Whisper (STT), TTS, Realtime (src/shared/constants.ts)
Backend: Node + ws (backend/server.js)

Commands

# Build the extension (dist/)
npm run build

# Watch mode for rapid iteration
npm run dev

# Watch + local React test page (e.g., src/content/react-test.html)
npm run dev:react

# Realtime voice backend
cd backend && npm run dev

Load the unpacked extension from dist/ at chrome://extensions.

Configuration & Storage

User config (API key, model, voice settings, history window) → chrome.storage.sync

Session, transcript, problem cache → chrome.storage.local

Messaging → chrome.runtime.sendMessage (popup/content/interview ↔ background)

Keep API keys out of the client where possible. Only Realtime requires a backend proxy (and keeps keys server‑side).

Voice Modes Traditional Voice

TTS: Browser voice or OpenAI TTS via background

STT: Web Speech API; fallback to Whisper via background

LLM: ChatGPTService composes concise prompts + last‑N history and calls Chat Completions; tracks usage/cost

Realtime Voice

Transport: Mic audio → Node proxy → OpenAI Realtime → streaming audio back

UX: Lower latency, barge‑in, conversational flow

Deep‑Dive Snippet #1 — Background Chat Handling

Here’s a real excerpt from the background service showing how embedded interview messages are routed through OpenAI. It demonstrates:

Config resolution (including storage fallback)

Prompt construction with system + history window

Phase‑aware interviewing (understanding → implementation → testing)

API call to Chat Completions

private async handleEmbeddedMessage(data: any): Promise<{ response: string }> {
const { problem, message, config, conversationHistory = [], interviewPhase = 'problem-understanding' } = data;


if (!config || !config.apiKey) {
const configResponse = await this.getConfig();
if (!configResponse || !configResponse.apiKey) {
throw new Error('API key not configured. Please configure your OpenAI API key in the extension popup.');
}
data.config = configResponse;
}


try {
const conciseSystem = `You are a technical interviewer. Keep answers 1–2 sentences. Ask questions, avoid lecturing.`;
const minimalProblem = `Problem: ${problem?.title || 'Unknown'} (${problem?.difficulty || 'Unknown'}). Phase: ${interviewPhase}`;
const phaseInstruction = interviewPhase === 'implementation'
? 'Explicitly ask the candidate to implement now in the editor.'
: interviewPhase === 'testing-review'
? 'Focus on analysis, complexity, and edge cases.'
: '';


const historyN = (config.historyWindow || 8);
const lastN = (conversationHistory || []).slice(-historyN);
const messages = [
{ role: 'system', content: conciseSystem },
{ role: 'system', content: minimalProblem },
phaseInstruction ? { role: 'system', content: phaseInstruction } : undefined,
...lastN,
{ role: 'user', content: message }
].filter(Boolean);


const response = await fetch('https://api.openai.com/v1/chat/completions', {
method: 'POST',
headers: {
'Authorization': `Bearer ${data.config.apiKey}`,
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: data.config.model || 'gpt-4o',
messages,
max_tokens: 200,
temperature: 0.7,
})
});


const result = await response.json();
return { response: result.choices[0]?.message?.content ?? 'Error: empty response.' };
} catch (err) {
console.error('Error handling embedded message:', err);
throw err;
}
}

Notice how the background service centralizes API calls. This keeps keys out of content scripts and ensures consistent prompting.

Content Script Notes

Current manifest loads . The more feature‑rich detector lives in .
Ensure the chosen content script caches the problem under leetmentor_current_problem; the interview page depends on it.

chrome.storage.local.set({ leetmentor_current_problem: { title, difficulty, examples } });

Getting Started (Dev)

Create .env files (client and backend). Do not commit keys.
npm run dev to watch the extension; npm run dev:react for UI test pages.
cd backend && npm run dev to enable Realtime mode.
Load dist/ in chrome://extensions; toggle Realtime in the settings popup.

Minimal Test Plan

Problem detection: Navigate across LeetCode SPA routes; ensure NAVIGATION_DETECTED fires and cache updates.
Popup flow: Start interview, opens interview.html, session id present.
Traditional voice: Mic capture → STT (Web Speech / Whisper) → Chat → TTS round‑trip.
Realtime voice: Proxy connects; confirm full‑duplex audio and barge‑in.
Persistence: Config survives reload via chrome.storage.sync; local transcripts stored.

Security Considerations

Keep OpenAI keys out of content scripts. Background is acceptable for non‑Realtime calls; Realtime must use the backend.
Restrict host permissions to what is necessary (api.openai.com, leetcode.com/*).
Consider per‑origin connection controls in the proxy; log minimal PII; rotate keys if leaked.

Appendix: File Map

package.json — scripts, deps
public/manifest.json — MV3 definition
src/background/background.ts — service worker / orchestrator
src/content/standalone-react.tsx or src/content/content.ts — injected UI
src/interview/components/InterviewApp.tsx — interview UX
src/shared/voice-service.ts — Traditional voice
src/shared/realtime-voice-service.ts — Realtime voice
src/shared/chatgpt-service.ts — Chat + usage tracking
backend/server.js — WS proxy to OpenAI Realtime

Changelog

2025‑09‑17: First MDX draft with embedded handleEmbeddedMessage snippet.