AI Error Diagnosis and Log Analysis for Frontend Applications
AI Error Diagnosis and Log Analysis for Frontend Applications
Real-World Problem Context
A production React application serves 500K daily users. The error monitoring dashboard (Sentry) shows 2,300 unresolved errors, 800 of which appeared in the last week after a deploy. The frontend team of eight developers rotates on-call duty, spending 3-4 hours per shift triaging errors: reading stack traces, reproducing issues, identifying root causes, and determining if an error is a new regression or a known issue resurfacing. Most errors lack context — a TypeError: Cannot read properties of undefined (reading 'map') stack trace tells you WHERE it crashed but not WHY the data was undefined. Was it a race condition? An API returning a different shape? A null user session? The team integrates AI at three points: (1) automatic error grouping and deduplication using embeddings instead of stack trace fingerprinting, (2) root cause analysis that examines the error context (browser, user actions, API responses, component tree) and generates a diagnosis, and (3) fix suggestion that proposes code changes with confidence scores. This post covers how each pipeline works.
Problem Statements
-
Intelligent Error Grouping: How do you group errors semantically (same root cause) rather than syntactically (same stack trace)? How do embeddings-based grouping catch variants of the same bug that traditional fingerprinting misses?
-
Automated Root Cause Analysis: How does an LLM analyze an error's full context (stack trace, breadcrumbs, network requests, user actions, component props) to determine the root cause? How do you prevent hallucinated diagnoses?
-
Fix Suggestion Pipeline: How do you map a diagnosed error to a specific code change? How does the AI find the relevant source code, generate a patch, and estimate confidence — and what validation prevents bad fixes?
Deep Dive: Internal Mechanisms
1. Traditional Error Fingerprinting vs Semantic Grouping
/*
* Traditional grouping (Sentry default): stack trace fingerprint
*
* How it works:
* 1. Take the top N frames of the stack trace
* 2. Normalize: remove line numbers, file hashes, query strings
* 3. Hash the normalized stack → fingerprint
* 4. Errors with same fingerprint → same group
*
* Problem:
* The SAME bug in different components produces DIFFERENT fingerprints:
*
* Error A: Error B:
* TypeError: x.map is not a fn TypeError: x.map is not a fn
* at UserList (UserList.tsx:42) at ProductList (ProductList.tsx:28)
* at renderWithHooks (react.js) at renderWithHooks (react.js)
*
* Same root cause (API returning null instead of array)
* but different stack traces → different groups → triaged twice.
*
* Semantic grouping with embeddings:
*
* ┌──────────────────────────────────────────────────┐
* │ Error event │
* │ ┌─────────────────────┐ │
* │ │ 1. Embed error │ Convert to vector │
* │ │ (message + stack │ using text embedding │
* │ │ + context) │ model │
* │ └────────┬────────────┘ │
* │ ▼ │
* │ ┌─────────────────────┐ │
* │ │ 2. Similarity search│ Find nearest existing │
* │ │ (cosine distance)│ error cluster │
* │ └────────┬────────────┘ │
* │ ▼ │
* │ ┌─────────────────────┐ │
* │ │ 3. Cluster or create│ If distance < threshold │
* │ │ new group │ → add to cluster │
* │ └─────────────────────┘ Otherwise → new group │
* └──────────────────────────────────────────────────┘
*/
// Embedding-based error grouping:
async function groupError(newError, existingGroups) {
// 1. Create a rich text representation of the error:
const errorText = formatErrorForEmbedding(newError);
// 2. Generate embedding:
const embedding = await getEmbedding(errorText);
// 3. Find nearest existing group:
let bestMatch = null;
let bestSimilarity = 0;
for (const group of existingGroups) {
const similarity = cosineSimilarity(embedding, group.centroidEmbedding);
if (similarity > bestSimilarity) {
bestSimilarity = similarity;
bestMatch = group;
}
}
// 4. Threshold decision:
const SIMILARITY_THRESHOLD = 0.85;
if (bestMatch && bestSimilarity >= SIMILARITY_THRESHOLD) {
// Add to existing group:
bestMatch.errors.push(newError);
bestMatch.centroidEmbedding = updateCentroid(bestMatch, embedding);
return { group: bestMatch, isNew: false };
} else {
// Create new group:
const newGroup = {
id: generateId(),
errors: [newError],
centroidEmbedding: embedding,
createdAt: Date.now(),
};
existingGroups.push(newGroup);
return { group: newGroup, isNew: true };
}
}
function formatErrorForEmbedding(error) {
// Combine multiple signals for richer embedding:
return [
`Error: ${error.type}: ${error.message}`,
`Component: ${error.componentStack?.[0] || 'unknown'}`,
`Action: ${error.breadcrumbs?.slice(-3).map(b => b.message).join(' → ') || 'unknown'}`,
`API: ${error.networkContext?.url || 'none'} ${error.networkContext?.status || ''}`,
`Stack: ${normalizeStack(error.stackTrace).slice(0, 3).join(' → ')}`,
].join('\n');
}
2. Error Context Collection
/*
* Rich context is what makes AI diagnosis useful.
* Collect everything available at error time:
*
* ┌──────────────────────────────────────────────────┐
* │ Error Context Object: │
* │ │
* │ { │
* │ // Error itself: │
* │ type: "TypeError", │
* │ message: "Cannot read properties of undefined │
* │ (reading 'map')", │
* │ stack: "at UserList.tsx:42...", │
* │ │
* │ // React component tree: │
* │ componentStack: ["UserList", "Dashboard", │
* │ "AppLayout", "App"], │
* │ componentProps: { userId: "123", filter: "all" },│
* │ │
* │ // User actions leading to error: │
* │ breadcrumbs: [ │
* │ { type: "navigation", data: "/dashboard" }, │
* │ { type: "click", data: "Filter: Active" }, │
* │ { type: "xhr", data: "GET /api/users/123" }, │
* │ { type: "error", data: "TypeError: ..." } │
* │ ], │
* │ │
* │ // Network context: │
* │ recentRequests: [ │
* │ { url: "/api/users/123", status: 200, │
* │ responseSnippet: '{"user":{"posts":null}}' }│
* │ ], │
* │ │
* │ // Environment: │
* │ browser: "Chrome 120", os: "macOS 14", │
* │ viewport: "1440x900", │
* │ url: "/dashboard?filter=active" │
* │ } │
* └──────────────────────────────────────────────────┘
*/
// Error boundary that collects rich context:
class DiagnosticErrorBoundary extends React.Component {
constructor(props) {
super(props);
this.state = { hasError: false };
}
static getDerivedStateFromError(error) {
return { hasError: true, error };
}
componentDidCatch(error, info) {
const context = this.collectContext(error, info);
reportErrorWithContext(error, context);
}
collectContext(error, reactInfo) {
return {
// Error:
type: error.constructor.name,
message: error.message,
stack: error.stack,
// React component stack:
componentStack: parseComponentStack(reactInfo.componentStack),
// Props of the component that errored:
componentProps: this.sanitizeProps(this.props),
// User action breadcrumbs (from global tracker):
breadcrumbs: window.__breadcrumbs?.slice(-20) || [],
// Recent network requests:
recentRequests: window.__networkLog?.slice(-10).map(req => ({
url: req.url,
method: req.method,
status: req.status,
duration: req.duration,
// Truncate response to avoid sending large payloads:
responseSnippet: truncateJSON(req.response, 500),
})) || [],
// State snapshot (if using a store):
stateSnapshot: this.getRelevantState(),
// Environment:
browser: navigator.userAgent,
url: window.location.href,
viewport: `${window.innerWidth}x${window.innerHeight}`,
timestamp: Date.now(),
sessionDuration: Date.now() - window.__sessionStart,
};
}
sanitizeProps(props) {
// Remove functions, DOM refs, and sensitive data:
const safe = {};
for (const [key, value] of Object.entries(props)) {
if (typeof value === 'function') continue;
if (value?.current instanceof HTMLElement) continue;
if (key.match(/password|token|secret|key/i)) {
safe[key] = '<redacted>';
continue;
}
safe[key] = typeof value === 'object'
? JSON.stringify(value).slice(0, 200)
: value;
}
return safe;
}
}
3. LLM-Powered Root Cause Analysis
/*
* The diagnosis prompt is the most critical part.
* It must be structured to prevent hallucination
* and produce actionable output.
*/
async function diagnoseError(errorContext) {
const prompt = `You are a senior frontend engineer diagnosing a production error.
RULES:
- Base your diagnosis ONLY on the provided evidence
- If you're unsure, say "uncertain" — don't guess
- Rank possible causes by likelihood
- Reference specific evidence for each hypothesis
ERROR:
Type: ${errorContext.type}
Message: ${errorContext.message}
Component: ${errorContext.componentStack[0]}
Component tree: ${errorContext.componentStack.join(' → ')}
Props: ${JSON.stringify(errorContext.componentProps)}
STACK TRACE:
${errorContext.stack}
USER ACTIONS (chronological):
${errorContext.breadcrumbs.map((b, i) =>
`${i + 1}. [${b.type}] ${b.data || b.message}`
).join('\n')}
RECENT API CALLS:
${errorContext.recentRequests.map(r =>
`${r.method} ${r.url} → ${r.status} (${r.duration}ms)\n Response: ${r.responseSnippet}`
).join('\n\n')}
ENVIRONMENT:
Browser: ${errorContext.browser}
URL: ${errorContext.url}
Viewport: ${errorContext.viewport}
DIAGNOSE:
1. What is the root cause? (one sentence)
2. What evidence supports this? (reference specific data above)
3. What are alternative explanations? (if any)
4. What is the user impact? (who is affected and how)
5. What is the fix priority? (P0-critical / P1-high / P2-medium / P3-low)
6. What additional data would help confirm the diagnosis?`;
const diagnosis = await callLLM(prompt, {
model: 'gpt-4o',
temperature: 0.1, // Low temperature for analytical tasks
maxTokens: 800,
});
return parseDiagnosis(diagnosis);
}
// Example diagnosis output:
/*
* 1. ROOT CAUSE: The /api/users/123 endpoint returns `posts: null`
* instead of `posts: []`, and UserList.tsx:42 calls
* `user.posts.map()` without null checking.
*
* 2. EVIDENCE:
* - API response shows `"posts":null` (not an empty array)
* - Error is TypeError on .map() — null doesn't have .map
* - Component props show userId: "123" — specific user
* - Breadcrumbs show user navigated to /dashboard, clicked filter
*
* 3. ALTERNATIVES:
* - Race condition: component renders before API completes (unlikely —
* API returned 200 before error)
* - Stale cache: previous API response had posts, new one doesn't
*
* 4. USER IMPACT: Users with no posts see a crash instead of empty state.
* Affects ~15% of users (new accounts with no posts).
*
* 5. PRIORITY: P1-high — crashes visible page for significant user segment
*
* 6. ADDITIONAL DATA: Check if API behavior changed in recent deploy.
* Check if `posts: null` is intentional API contract or a bug.
*/
4. Fix Suggestion Pipeline
/*
* From diagnosis → code fix requires:
* 1. Finding the relevant source file
* 2. Understanding the surrounding code
* 3. Generating a minimal fix
* 4. Validating the fix doesn't break anything
*
* ┌──────────────────────────────────────────────────┐
* │ Diagnosis: "user.posts is null, needs null check" │
* │ │ │
* │ ▼ │
* │ 1. Locate: UserList.tsx:42 (from stack trace) │
* │ │ │
* │ ▼ │
* │ 2. Read: surrounding code (±20 lines) │
* │ │ │
* │ ▼ │
* │ 3. Generate: minimal fix │
* │ - const posts = user.posts ?? []; │
* │ │ │
* │ ▼ │
* │ 4. Validate: │
* │ - TypeScript compiles │
* │ - Existing tests pass │
* │ - Fix addresses the null case │
* │ │ │
* │ ▼ │
* │ 5. Output: patch + confidence score │
* └──────────────────────────────────────────────────┘
*/
async function suggestFix(diagnosis, errorContext, sourceCode) {
// 1. Extract file and line from stack trace:
const location = parseStackLocation(errorContext.stack);
// { file: 'src/components/UserList.tsx', line: 42 }
// 2. Read source code around the error:
const fileContent = sourceCode[location.file];
const relevantCode = extractLines(fileContent,
Math.max(1, location.line - 20),
location.line + 20
);
// 3. Generate fix:
const prompt = `You are fixing a production frontend bug.
DIAGNOSIS:
${diagnosis.rootCause}
SOURCE CODE (${location.file}, error at line ${location.line}):
\`\`\`typescript
${relevantCode}
\`\`\`
RULES:
- Make the MINIMAL change needed to fix the bug
- Don't refactor surrounding code
- Don't change function signatures
- Preserve existing behavior for non-error cases
- Add a brief inline comment explaining the fix
- If the fix requires changes in multiple files, list them all
Generate a unified diff patch:`;
const fixSuggestion = await callLLM(prompt, {
model: 'gpt-4o',
temperature: 0.1,
maxTokens: 500,
});
// 4. Parse and validate:
const patch = parsePatch(fixSuggestion);
const validation = await validatePatch(patch, sourceCode);
// 5. Confidence scoring:
const confidence = calculateConfidence(diagnosis, patch, validation);
return {
patch,
confidence,
validation,
diagnosis: diagnosis.rootCause,
};
}
function calculateConfidence(diagnosis, patch, validation) {
let score = 0.5; // Base confidence
// Positive signals:
if (validation.typescriptCompiles) score += 0.15;
if (validation.testsPass) score += 0.15;
if (patch.linesChanged <= 5) score += 0.1; // Small change = focused fix
if (diagnosis.evidence.length >= 3) score += 0.1; // Strong evidence
// Negative signals:
if (patch.linesChanged > 20) score -= 0.2; // Large change = risky
if (validation.newWarnings > 0) score -= 0.1;
if (diagnosis.alternatives.length > 2) score -= 0.1; // Many alternatives = uncertain
return Math.max(0, Math.min(1, score));
}
5. Pattern Recognition Across Error Groups
/*
* AI can identify patterns across multiple error groups
* that humans miss when triaging one error at a time.
*
* Examples:
* - "5 different TypeError groups all involve API responses
* from the /api/v2 endpoints → API migration broke contracts"
* - "All ChunkLoadError instances happen on Safari 16 after
* 10+ minutes → stale ServiceWorker cache issue"
* - "ResizeObserver loop errors spike at 2 PM UTC daily
* → related to a scheduled analytics widget"
*/
async function analyzeErrorPatterns(errorGroups, timeRange) {
// Collect metadata across all groups:
const groupSummaries = errorGroups.map(group => ({
id: group.id,
message: group.errors[0].message,
type: group.errors[0].type,
count: group.errors.length,
affectedUsers: new Set(group.errors.map(e => e.userId)).size,
browsers: countBy(group.errors, e => parseBrowser(e.browser)),
urls: countBy(group.errors, e => e.url),
timeDistribution: buildTimeHistogram(group.errors),
firstSeen: Math.min(...group.errors.map(e => e.timestamp)),
apiEndpoints: [...new Set(group.errors.flatMap(e =>
e.recentRequests?.map(r => r.url) || []
))],
}));
const prompt = `You are analyzing error patterns in a production frontend app.
PERIOD: ${timeRange}
TOTAL ERROR GROUPS: ${groupSummaries.length}
GROUP SUMMARIES:
${groupSummaries.map((g, i) => `
Group ${i + 1}: ${g.type}: ${g.message}
Count: ${g.count} | Users affected: ${g.affectedUsers}
Browsers: ${JSON.stringify(g.browsers)}
URLs: ${JSON.stringify(g.urls)}
Time pattern: ${describeTimePattern(g.timeDistribution)}
First seen: ${new Date(g.firstSeen).toISOString()}
Related APIs: ${g.apiEndpoints.join(', ') || 'none'}
`).join('\n')}
IDENTIFY:
1. Groups that likely share a ROOT CAUSE (explain the connection)
2. Environment-specific patterns (browser, device, URL)
3. Time-based patterns (deploy-correlated, time-of-day, gradual increase)
4. Recommended investigation order (highest impact first)
5. Any patterns suggesting a SYSTEMIC issue (not just individual bugs)`;
const analysis = await callLLM(prompt, {
model: 'gpt-4o',
temperature: 0.2,
maxTokens: 1500,
});
return parsePatternAnalysis(analysis);
}
6. Source Map Integration for Production Errors
/*
* Production errors have MINIFIED stack traces.
* Source maps must be resolved BEFORE AI analysis.
*
* Pipeline:
* 1. Error arrives with minified stack: "main.a3f2.js:1:45632"
* 2. Fetch source map for that bundle
* 3. Resolve to original source: "UserList.tsx:42:15"
* 4. Fetch the actual source code around that line
* 5. Send resolved stack + source to AI
*/
const { SourceMapConsumer } = require('source-map');
async function resolveProductionError(error) {
const resolvedFrames = [];
for (const frame of parseStackFrames(error.stack)) {
if (!frame.fileName.includes('.js')) {
resolvedFrames.push(frame); // Already resolved
continue;
}
// Fetch source map:
const sourceMapUrl = `${frame.fileName}.map`;
const sourceMap = await fetchSourceMap(sourceMapUrl);
if (!sourceMap) {
resolvedFrames.push(frame); // No source map available
continue;
}
const consumer = await new SourceMapConsumer(sourceMap);
const original = consumer.originalPositionFor({
line: frame.lineNumber,
column: frame.columnNumber,
});
if (original.source) {
resolvedFrames.push({
fileName: original.source,
lineNumber: original.line,
columnNumber: original.column,
functionName: original.name || frame.functionName,
});
// Also fetch the source code from the source map:
const sourceContent = consumer.sourceContentFor(original.source);
if (sourceContent) {
frame.sourceCode = extractLines(sourceContent,
Math.max(1, original.line - 10),
original.line + 10
);
}
}
consumer.destroy();
}
return {
...error,
resolvedStack: resolvedFrames,
sourceSnippets: resolvedFrames
.filter(f => f.sourceCode)
.map(f => ({
file: f.fileName,
line: f.lineNumber,
code: f.sourceCode,
})),
};
}
7. Breadcrumb Analysis for User Action Context
/*
* Breadcrumbs tell the story of WHAT THE USER DID
* before the error occurred. AI can reconstruct
* the user journey and identify the trigger action.
*/
// Breadcrumb collector (runs in the browser):
class BreadcrumbCollector {
constructor(maxBreadcrumbs = 50) {
this.breadcrumbs = [];
this.max = maxBreadcrumbs;
this.setupListeners();
}
setupListeners() {
// Navigation:
window.addEventListener('popstate', () => {
this.add('navigation', { url: window.location.href });
});
// Clicks:
document.addEventListener('click', (e) => {
const target = e.target.closest('[data-testid], button, a, [role="button"]');
if (target) {
this.add('click', {
element: describeElement(target),
text: target.textContent?.slice(0, 50),
testId: target.dataset?.testid,
});
}
}, true);
// Network requests:
this.interceptFetch();
this.interceptXHR();
// Console errors:
const origError = console.error;
console.error = (...args) => {
this.add('console.error', {
message: args.map(a => String(a)).join(' ').slice(0, 200)
});
origError.apply(console, args);
};
// State changes (for React Query / Zustand):
this.interceptStateChanges();
}
interceptFetch() {
const originalFetch = window.fetch;
const collector = this;
window.fetch = async function(...args) {
const url = typeof args[0] === 'string' ? args[0] : args[0].url;
const method = args[1]?.method || 'GET';
const startTime = Date.now();
try {
const response = await originalFetch.apply(this, args);
collector.add('fetch', {
url: sanitizeUrl(url),
method,
status: response.status,
duration: Date.now() - startTime,
});
return response;
} catch (error) {
collector.add('fetch.error', {
url: sanitizeUrl(url),
method,
error: error.message,
duration: Date.now() - startTime,
});
throw error;
}
};
}
add(type, data) {
this.breadcrumbs.push({
type,
data,
timestamp: Date.now(),
});
if (this.breadcrumbs.length > this.max) {
this.breadcrumbs.shift();
}
}
getRecent(count = 20) {
return this.breadcrumbs.slice(-count);
}
}
// AI analysis of breadcrumb sequence:
async function analyzeBreadcrumbs(breadcrumbs, error) {
const prompt = `Analyze this sequence of user actions that led to a frontend error.
USER ACTIONS (chronological, most recent last):
${breadcrumbs.map((b, i) => {
const time = new Date(b.timestamp).toISOString().split('T')[1].split('.')[0];
return `${time} [${b.type}] ${JSON.stringify(b.data)}`;
}).join('\n')}
ERROR THAT OCCURRED:
${error.type}: ${error.message}
DETERMINE:
1. Which action TRIGGERED the error? (the direct cause)
2. Which previous actions SET UP the conditions for the error?
3. Is this a user-reproducible path? (can we write reproduction steps?)
4. What state was the app likely in when the error occurred?
Be specific — reference the exact breadcrumb entries by timestamp.`;
return await callLLM(prompt, {
model: 'gpt-4o',
temperature: 0.1,
maxTokens: 500,
});
}
8. Automated Triage and Priority Assignment
/*
* AI triage replaces the manual process of reading each error
* and deciding: is this critical? who should fix it? is it new?
*
* Triage dimensions:
* 1. Severity: how bad is the user impact?
* 2. Scope: how many users are affected?
* 3. Novelty: is this new or a known issue?
* 4. Assignment: which team/person should fix it?
* 5. Actionability: can we fix it, or is it external?
*/
async function triageError(errorGroup, codeOwners, recentDeploys) {
// Quantitative signals (no AI needed):
const metrics = {
errorCount: errorGroup.errors.length,
uniqueUsers: new Set(errorGroup.errors.map(e => e.userId)).size,
errorRate: errorGroup.errors.length /
(Date.now() - errorGroup.firstSeen) * 3600000, // errors/hour
isIncreasing: detectTrend(errorGroup.errors) === 'increasing',
firstSeen: errorGroup.firstSeen,
deployCorrelated: checkDeployCorrelation(errorGroup.firstSeen, recentDeploys),
};
// Determine severity from metrics:
let severity;
if (metrics.errorRate > 100 || metrics.uniqueUsers > 1000) {
severity = 'P0-critical';
} else if (metrics.errorRate > 10 || metrics.uniqueUsers > 100) {
severity = 'P1-high';
} else if (metrics.isIncreasing) {
severity = 'P2-medium';
} else {
severity = 'P3-low';
}
// AI-assisted: determine assignment and actionability
const sampleError = errorGroup.errors[0];
const affectedFiles = extractFilesFromStack(sampleError.stack);
// Find code owner:
const owner = affectedFiles
.map(file => findCodeOwner(file, codeOwners))
.find(o => o !== null) || 'unassigned';
// AI: determine actionability
const actionability = await callLLM(`
Is this frontend error actionable by our team, or is it external/uncontrollable?
Error: ${sampleError.type}: ${sampleError.message}
Stack: ${sampleError.stack?.split('\n').slice(0, 5).join('\n')}
Browser: ${sampleError.browser}
Count: ${metrics.errorCount} in ${Math.round((Date.now() - metrics.firstSeen) / 3600000)} hours
Classify as:
- "actionable" — a bug in our code we can fix
- "external" — third-party script, browser bug, ad blocker, extension
- "environmental" — network error, device-specific, CSP violation
- "expected" — user abort, navigation away, ResizeObserver loop
One-word classification, then one-sentence explanation.`,
{ model: 'gpt-4o-mini', temperature: 0, maxTokens: 50 }
);
return {
severity,
owner,
actionability: parseActionability(actionability),
metrics,
deployCorrelated: metrics.deployCorrelated,
};
}
// Deploy correlation detection:
function checkDeployCorrelation(errorFirstSeen, recentDeploys) {
for (const deploy of recentDeploys) {
const timeDiff = errorFirstSeen - deploy.timestamp;
// Error appeared within 1 hour of deploy:
if (timeDiff >= 0 && timeDiff < 3600000) {
return {
correlated: true,
deploy: deploy.id,
sha: deploy.commitSha,
timeSinceDeploy: timeDiff,
};
}
}
return { correlated: false };
}
9. Reproduction Step Generation
/*
* From breadcrumbs and error context, AI can generate
* reproduction steps — saving 30-60 minutes of manual reproduction.
*/
async function generateReproductionSteps(errorContext, breadcrumbs, diagnosis) {
const prompt = `Generate reproduction steps for this frontend bug.
ERROR: ${errorContext.type}: ${errorContext.message}
DIAGNOSIS: ${diagnosis.rootCause}
URL: ${errorContext.url}
BROWSER: ${errorContext.browser}
VIEWPORT: ${errorContext.viewport}
USER ACTIONS BEFORE ERROR:
${breadcrumbs.map((b, i) => `${i + 1}. [${b.type}] ${JSON.stringify(b.data)}`).join('\n')}
Generate:
1. Preconditions (account type, data state, browser)
2. Step-by-step reproduction (user actions)
3. Expected result (what should happen)
4. Actual result (the error)
5. A Playwright test script that reproduces this
For the Playwright test, use realistic selectors (getByRole, getByText).
If you need API mocking, include the mock setup.`;
const steps = await callLLM(prompt, {
model: 'gpt-4o',
temperature: 0.2,
maxTokens: 1000,
});
return steps;
}
// Example output:
/*
* ## Preconditions
* - Logged in as a user with userId: "123" who has no posts
* - Browser: Chrome (any version)
*
* ## Steps to Reproduce
* 1. Navigate to /dashboard
* 2. Wait for the page to load (user data appears in header)
* 3. Click the "Active" filter button
* 4. Observe: page crashes with white screen
*
* ## Expected Result
* Dashboard shows an empty state: "No active posts yet"
*
* ## Actual Result
* TypeError: Cannot read properties of undefined (reading 'map')
* Page becomes unresponsive
*
* ## Playwright Test
* ```typescript
* test('dashboard handles user with no posts', async ({ page }) => {
* // Mock API to return user with null posts:
* await page.route('/api/users/123', (route) => {
* route.fulfill({
* status: 200,
* body: JSON.stringify({ user: { id: '123', posts: null } }),
* });
* });
*
* await page.goto('/dashboard');
* await page.getByRole('button', { name: 'Active' }).click();
*
* // Should show empty state, not crash:
* await expect(page.getByText('No active posts yet')).toBeVisible();
* });
* ```
*/
10. Continuous Learning from Resolved Errors
/*
* Each resolved error teaches the system.
* Track: diagnosis accuracy, fix success, resolution time.
*
* Feedback loop:
* 1. AI diagnoses error → developer reviews
* 2. Developer marks diagnosis as correct/incorrect
* 3. Developer applies fix (AI-suggested or manual)
* 4. System tracks if the fix actually resolved the error
* 5. Feedback improves future diagnoses
*/
class DiagnosisTracker {
async recordDiagnosis(errorGroupId, diagnosis, fixSuggestion) {
await db.diagnoses.create({
errorGroupId,
diagnosis: diagnosis.rootCause,
confidence: fixSuggestion.confidence,
suggestedFix: fixSuggestion.patch,
timestamp: Date.now(),
status: 'pending_review',
});
}
async recordFeedback(diagnosisId, feedback) {
await db.diagnoses.update(diagnosisId, {
developerFeedback: feedback.correct ? 'correct' : 'incorrect',
actualRootCause: feedback.actualCause || null,
actualFix: feedback.actualFix || null,
fixApplied: feedback.fixApplied,
resolvedAt: feedback.fixApplied ? Date.now() : null,
});
// Track accuracy metrics:
await this.updateAccuracyMetrics();
}
async updateAccuracyMetrics() {
const recent = await db.diagnoses.find({
timestamp: { $gt: Date.now() - 30 * 24 * 3600000 },
developerFeedback: { $exists: true },
});
const metrics = {
total: recent.length,
correct: recent.filter(d => d.developerFeedback === 'correct').length,
incorrect: recent.filter(d => d.developerFeedback === 'incorrect').length,
fixApplied: recent.filter(d => d.fixApplied).length,
};
metrics.accuracy = metrics.correct / Math.max(metrics.total, 1);
metrics.fixAcceptanceRate = metrics.fixApplied / Math.max(metrics.total, 1);
console.log(`Diagnosis accuracy: ${(metrics.accuracy * 100).toFixed(1)}%`);
console.log(`Fix acceptance: ${(metrics.fixAcceptanceRate * 100).toFixed(1)}%`);
// If accuracy drops, adjust prompts or model:
if (metrics.accuracy < 0.7 && metrics.total > 20) {
await this.triggerPromptReview(recent.filter(d =>
d.developerFeedback === 'incorrect'
));
}
}
// Use incorrect diagnoses to improve the system prompt:
async triggerPromptReview(incorrectDiagnoses) {
const patterns = await callLLM(`
Analyze these cases where the AI diagnosis was INCORRECT.
Identify common patterns in the misdiagnoses.
Cases:
${incorrectDiagnoses.map(d => `
AI said: ${d.diagnosis}
Actual cause: ${d.actualRootCause}
Error: ${d.errorMessage}
`).join('\n---\n')}
What types of errors does the AI consistently misdiagnose?
What additional context would have helped?
Suggest prompt improvements.`,
{ model: 'gpt-4o', temperature: 0.3, maxTokens: 500 }
);
// Store improvement suggestions for prompt engineering review
return patterns;
}
}
Trade-offs & Considerations
| Aspect | Manual Triage | Rule-Based | AI-Assisted | Full AI Automation |
|---|---|---|---|---|
| Triage time/error | 15-30 min | 1-2 min | 1-2 min + review | < 30 sec |
| Diagnosis accuracy | High (human) | Low (pattern match) | 70-85% | 70-85% |
| Fix suggestion | N/A | N/A | Available (needs review) | Auto-PR (risky) |
| Novel errors | Handled well | Misses them | Handles most | Handles most |
| Cost | Engineer time | Free | ~$0.05/error | ~$0.05/error |
| False confidence | Low | None | Medium risk | High risk |
| Setup complexity | None | Medium | High | Very high |
Best Practices
-
Collect rich error context at capture time — stack traces alone are insufficient for AI diagnosis — instrument error boundaries to capture breadcrumbs (user actions), recent network requests (with truncated responses), component props (sanitized), and environment details; the AI's diagnosis quality is directly proportional to the context quality; a stack trace gives you WHERE, context gives you WHY.
-
Resolve source maps before sending errors to AI — minified stack traces produce garbage diagnoses — production JavaScript is minified, so stack traces reference
main.a3f2.js:1:45632; resolve these to original source files and line numbers using source maps, and include the actual source code around the error line; AI analyzing original TypeScript produces accurate diagnoses, AI analyzing minified JavaScript produces hallucinations. -
Use quantitative metrics for severity and AI for root cause — don't let AI decide priority — error rate, unique affected users, trend direction, and deploy correlation are objective and should determine P0/P1/P2/P3 automatically; use AI for the subjective analysis: what's the root cause, is it actionable, who should fix it; this prevents AI from under- or over-prioritizing based on error message wording.
-
Validate AI fix suggestions with TypeScript compilation and existing tests before presenting them — run the suggested patch through
tsc --noEmitand the test suite; only present fixes that pass both checks; include a confidence score based on patch size (smaller = higher confidence), evidence strength, and validation results; never auto-apply fixes to production code. -
Build a feedback loop from developer reviews to improve diagnosis accuracy over time — track whether each AI diagnosis was correct or incorrect; when accuracy drops below 70%, analyze the misdiagnosed cases to identify patterns and update the system prompt; common blind spots include race conditions, state synchronization bugs, and browser-specific issues that require additional context in the prompt.
Conclusion
AI-powered error diagnosis transforms frontend error triage from a manual, time-consuming process into a structured pipeline. Error context collection at the boundary captures breadcrumbs (user actions), network requests, component props, and environment data — providing the WHY behind the WHERE of a stack trace. Source map resolution converts minified production stacks into original file references with surrounding source code. Embeddings-based error grouping uses semantic similarity (cosine distance on text embeddings of error message + context) to cluster errors by root cause rather than stack trace fingerprint, catching variants that traditional grouping misses. The LLM diagnosis prompt is structured to prevent hallucination: it requires citing specific evidence from the provided context, ranking alternative explanations, and flagging uncertainty. Fix suggestions map diagnosed causes to minimal code patches, validated by TypeScript compilation and existing tests, with confidence scores based on evidence strength and patch size. Automated triage uses quantitative metrics (error rate, affected users, deploy correlation) for severity and AI for subjective analysis (actionability, root cause, assignment). The feedback loop — tracking diagnosis accuracy against developer reviews — drives continuous improvement, triggering prompt updates when accuracy drops below threshold. The key architectural principle is that AI handles analysis and suggestion while deterministic systems handle severity scoring, validation, and deployment decisions.
What did you think?