AI Commit Messages and Changelog Generation: LLM Workflow for Frontend Teams

May 12, 202697 min read0 views

ai developer tools

AI Commit Messages and Changelog Generation: LLM Workflow for Frontend Teams

Real-World Problem Context

A frontend team of twelve developers merges 40+ PRs per week across a React monorepo. Commit messages range from "fix stuff" to three-paragraph essays — there's no consistency. Release notes are manually assembled every two weeks by a rotation engineer who spends half a day reading through merged PRs, git diffs, and Jira tickets to write a changelog. Half the time, important changes are missed or described inaccurately. Meanwhile, PR descriptions are often copy-pasted templates with empty sections. The team decides to integrate AI at three points: (1) a git hook that generates commit messages from staged diffs, (2) a GitHub Action that auto-generates PR descriptions from the branch's commits, and (3) a release automation that produces changelogs from merged PRs using an LLM. This post covers how each of these works internally — how diffs are tokenized for LLM context, how conventional commit formats are enforced, how semantic versioning is inferred, and what the prompt engineering and validation pipeline looks like.

Problem Statements

Diff-to-Message Pipeline: How do you convert a git diff into a well-structured commit message using an LLM? How do you handle large diffs that exceed token limits, and how do you enforce conventional commit format (feat/fix/chore) reliably?
PR Description Generation: How do you aggregate multiple commits into a coherent PR description that includes what changed, why, testing notes, and breaking changes — without hallucinating details that aren't in the code?
Changelog Generation: How do you compile merged PRs into release notes grouped by category (features, fixes, breaking changes), infer semantic version bumps, and produce both human-readable and machine-parseable changelogs?

Deep Dive: Internal Mechanisms

1. Diff Tokenization and Context Windowing

/*
 * The core challenge: git diffs can be huge (thousands of lines),
 * but LLMs have finite context windows.
 *
 * A typical commit might touch 5 files with 200 lines changed.
 * At ~4 chars per token, that's ~2000 tokens for the diff alone.
 * But a large refactor touching 50 files? 20,000+ tokens.
 *
 * Strategy: intelligent diff chunking and prioritization.
 *
 * ┌──────────────────────────────────────────────────┐
 * │ Raw git diff                                      │
 * │ (could be 50,000 tokens)                         │
 * │                                                   │
 * │  ┌──────────────────┐                            │
 * │  │ 1. Parse hunks   │ Split by file + hunk       │
 * │  └────────┬─────────┘                            │
 * │           ▼                                       │
 * │  ┌──────────────────┐                            │
 * │  │ 2. Score hunks   │ Rank by semantic impact    │
 * │  └────────┬─────────┘                            │
 * │           ▼                                       │
 * │  ┌──────────────────┐                            │
 * │  │ 3. Fit to window │ Pack highest-scored hunks  │
 * │  └────────┬─────────┘  into token budget         │
 * │           ▼                                       │
 * │  ┌──────────────────┐                            │
 * │  │ 4. Summarize     │ LLM generates message      │
 * │  │    overflow      │ from selected hunks +       │
 * │  └──────────────────┘  file-level summaries       │
 * └──────────────────────────────────────────────────┘
 */

// Diff parsing and chunking:
function parseDiffIntoHunks(rawDiff) {
    const files = [];
    let currentFile = null;
    let currentHunk = null;
    
    for (const line of rawDiff.split('\n')) {
        if (line.startsWith('diff --git')) {
            currentFile = {
                path: extractFilePath(line),
                hunks: [],
                additions: 0,
                deletions: 0,
            };
            files.push(currentFile);
        } else if (line.startsWith('@@')) {
            currentHunk = {
                header: line,
                lines: [],
                additions: 0,
                deletions: 0,
            };
            currentFile.hunks.push(currentHunk);
        } else if (currentHunk) {
            currentHunk.lines.push(line);
            if (line.startsWith('+')) {
                currentHunk.additions++;
                currentFile.additions++;
            } else if (line.startsWith('-')) {
                currentHunk.deletions++;
                currentFile.deletions++;
            }
        }
    }
    
    return files;
}

// Hunk scoring — prioritize semantically important changes:
function scoreHunk(file, hunk) {
    let score = 0;
    
    // File type priority:
    if (file.path.match(/\.(tsx?|jsx?)$/)) score += 10;    // Source code
    if (file.path.match(/\.(test|spec)\./)) score += 3;    // Tests
    if (file.path.match(/\.(css|scss)$/)) score += 2;      // Styles
    if (file.path.match(/package\.json/)) score += 8;      // Dependencies
    if (file.path.match(/\.lock$/)) score -= 20;           // Lock files: skip
    if (file.path.match(/\.generated\./)) score -= 15;     // Generated: skip
    
    // Change type priority:
    const content = hunk.lines.join('\n');
    if (content.includes('export ')) score += 5;            // Public API changes
    if (content.includes('interface ') || content.includes('type ')) score += 4;
    if (content.includes('throw ') || content.includes('catch')) score += 3;
    if (content.includes('BREAKING')) score += 10;
    
    // Size penalty — very large hunks are likely refactors, not semantic:
    if (hunk.lines.length > 100) score -= 5;
    
    return score;
}

// Token-budget packing:
function fitDiffToTokenBudget(files, maxTokens = 3000) {
    const allHunks = files.flatMap(file =>
        file.hunks.map(hunk => ({
            file: file.path,
            hunk,
            score: scoreHunk(file, hunk),
            tokens: estimateTokens(hunk.lines.join('\n')),
        }))
    );
    
    // Sort by score descending:
    allHunks.sort((a, b) => b.score - a.score);
    
    let usedTokens = 0;
    const selected = [];
    const skippedFiles = new Set();
    
    for (const item of allHunks) {
        if (usedTokens + item.tokens <= maxTokens) {
            selected.push(item);
            usedTokens += item.tokens;
        } else {
            skippedFiles.add(item.file);
        }
    }
    
    // For skipped files, add a one-line summary:
    const summaries = [...skippedFiles].map(path => {
        const file = files.find(f => f.path === path);
        return `${path}: +${file.additions}/-${file.deletions} lines (truncated)`;
    });
    
    return { selected, summaries, totalFiles: files.length };
}

2. Commit Message Prompt Engineering

/*
 * The prompt must produce CONSISTENT, PARSEABLE output.
 * Key techniques:
 *   1. Strict format specification with examples
 *   2. Negative examples (what NOT to do)
 *   3. Scope inference rules
 *   4. Output validation with retry
 */

function buildCommitMessagePrompt(diff, context) {
    return `You are a commit message generator for a frontend codebase.
Generate a commit message following the Conventional Commits specification.

FORMAT (strict):
<type>(<scope>): <description>

<optional body>

TYPES (use exactly one):
- feat: A new feature visible to users
- fix: A bug fix
- refactor: Code change that neither fixes a bug nor adds a feature
- perf: Performance improvement
- test: Adding or updating tests
- docs: Documentation changes
- style: CSS/formatting changes (not code logic)
- chore: Build process, dependency updates, tooling
- ci: CI/CD configuration changes

SCOPE RULES:
- Use the primary component or module name (e.g., Button, auth, api)
- For cross-cutting changes, use the feature area (e.g., i18n, a11y)
- For infrastructure, use the tool name (e.g., webpack, eslint)

DESCRIPTION RULES:
- Imperative mood ("add", not "added" or "adds")
- No period at the end
- Max 72 characters
- Describe WHAT changed, not HOW

BODY RULES (optional, only for complex changes):
- Explain WHY the change was made
- Max 3 lines
- Reference ticket numbers if present in branch name

NEGATIVE EXAMPLES (do NOT generate these):
❌ "fix: fixed the bug" (past tense)
❌ "feat: stuff" (vague)
❌ "refactor: refactored code to be better" (circular)
❌ "chore: update" (too vague)

POSITIVE EXAMPLES:
✅ "feat(SearchBar): add debounced autocomplete suggestions"
✅ "fix(Cart): prevent duplicate items on rapid click"
✅ "refactor(auth): extract token refresh into dedicated hook"

CONTEXT:
Branch: ${context.branchName}
Files changed: ${context.filesChanged}
Staged diff:

${diff}

Generate ONLY the commit message. No explanations, no markdown fences.`;
}

// Git hook implementation (prepare-commit-msg):
// .husky/prepare-commit-msg
async function generateCommitMessage() {
    const diff = execSync('git diff --cached --no-color').toString();
    
    if (!diff.trim()) return; // Nothing staged
    
    const files = parseDiffIntoHunks(diff);
    const { selected, summaries } = fitDiffToTokenBudget(files, 3000);
    
    const formattedDiff = selected
        .map(s => `--- ${s.file} ---\n${s.hunk.lines.join('\n')}`)
        .join('\n\n');
    
    const context = {
        branchName: execSync('git branch --show-current').toString().trim(),
        filesChanged: files.map(f => f.path).join(', '),
    };
    
    const fullDiff = summaries.length > 0
        ? `${formattedDiff}\n\nAdditional files (truncated):\n${summaries.join('\n')}`
        : formattedDiff;
    
    const prompt = buildCommitMessagePrompt(fullDiff, context);
    
    const message = await callLLM(prompt, {
        model: 'gpt-4o-mini',   // Fast, cheap — good for commit messages
        temperature: 0.1,        // Low creativity — we want consistency
        maxTokens: 200,          // Commit messages are short
    });
    
    // Validate the format before using:
    const validated = validateConventionalCommit(message);
    
    if (validated.valid) {
        fs.writeFileSync(process.argv[2], validated.message);
    } else {
        // Retry once with format correction prompt:
        const retryMessage = await callLLM(
            `Fix this commit message to follow Conventional Commits format:\n${message}\n\nErrors: ${validated.errors.join(', ')}`,
            { model: 'gpt-4o-mini', temperature: 0, maxTokens: 200 }
        );
        fs.writeFileSync(process.argv[2], retryMessage);
    }
}

3. Conventional Commit Validation

/*
 * AI output MUST be validated — LLMs can hallucinate format violations.
 *
 * Validation pipeline:
 * 1. Regex check: matches conventional commit pattern
 * 2. Type check: type is in allowed set
 * 3. Length check: description ≤ 72 chars
 * 4. Grammar check: imperative mood (basic heuristic)
 * 5. Scope check: scope exists in known list (optional)
 */

const VALID_TYPES = ['feat', 'fix', 'refactor', 'perf', 'test', 'docs', 'style', 'chore', 'ci', 'build', 'revert'];

const CONVENTIONAL_COMMIT_REGEX = /^(?<type>\w+)(\((?<scope>[^)]+)\))?(?<breaking>!)?: (?<description>.+)$/;

function validateConventionalCommit(message) {
    const errors = [];
    const lines = message.trim().split('\n');
    const header = lines[0];
    
    // 1. Format check:
    const match = header.match(CONVENTIONAL_COMMIT_REGEX);
    if (!match) {
        errors.push(`Header doesn't match format: type(scope): description`);
        return { valid: false, errors, message };
    }
    
    const { type, scope, description } = match.groups;
    
    // 2. Type check:
    if (!VALID_TYPES.includes(type)) {
        errors.push(`Invalid type "${type}". Must be one of: ${VALID_TYPES.join(', ')}`);
    }
    
    // 3. Length check:
    if (header.length > 72) {
        errors.push(`Header too long: ${header.length} chars (max 72)`);
    }
    
    // 4. Imperative mood check (basic):
    const firstWord = description.split(' ')[0].toLowerCase();
    const pastTenseIndicators = ['added', 'fixed', 'updated', 'removed', 'changed', 'modified'];
    const thirdPersonIndicators = ['adds', 'fixes', 'updates', 'removes', 'changes'];
    
    if (pastTenseIndicators.includes(firstWord)) {
        errors.push(`Use imperative mood: "${firstWord}" should be "${firstWord.replace(/ed$/, '').replace(/ied$/, 'y')}"`);
    }
    if (thirdPersonIndicators.includes(firstWord)) {
        errors.push(`Use imperative mood: "${firstWord}" should be "${firstWord.replace(/s$/, '')}"`);
    }
    
    // 5. No period at end:
    if (description.endsWith('.')) {
        errors.push('Description should not end with a period');
    }
    
    // 6. Body format (if present):
    if (lines.length > 1 && lines[1].trim() !== '') {
        errors.push('Second line must be blank (separates header from body)');
    }
    
    return {
        valid: errors.length === 0,
        errors,
        message: message.trim(),
        parsed: { type, scope, description, hasBreaking: !!match.groups.breaking },
    };
}

4. PR Description Generation from Commits

/*
 * PR descriptions aggregate multiple commits into a coherent summary.
 *
 * Pipeline:
 * 1. Collect all commits in the branch (vs main)
 * 2. Group by type (feat, fix, etc.)
 * 3. Fetch the full diff for context
 * 4. Identify breaking changes, new dependencies, UI changes
 * 5. Generate structured PR description
 *
 * ┌─────────────────────────────────────────────────┐
 * │ Branch: feature/search-autocomplete             │
 * │                                                  │
 * │ Commits:                                         │
 * │  1. feat(SearchBar): add debounced input        │
 * │  2. feat(SearchBar): add suggestion dropdown    │
 * │  3. test(SearchBar): add autocomplete tests     │
 * │  4. fix(SearchBar): handle empty results state  │
 * │  5. style(SearchBar): align dropdown with input │
 * │                                                  │
 * │           ↓ LLM + structured template            │
 * │                                                  │
 * │ ## What                                          │
 * │ Adds debounced autocomplete to SearchBar with   │
 * │ dropdown suggestions, empty state handling, and  │
 * │ visual alignment fixes.                         │
 * │                                                  │
 * │ ## Why                                           │
 * │ Users reported difficulty finding products...    │
 * │                                                  │
 * │ ## Testing                                       │
 * │ - Added 12 unit tests for autocomplete flow     │
 * │ - Manual testing: Chrome, Firefox, Safari       │
 * │                                                  │
 * │ ## Breaking Changes                              │
 * │ None                                             │
 * └─────────────────────────────────────────────────┘
 */

// GitHub Action for PR description:
// .github/workflows/pr-description.yml

function buildPRDescriptionPrompt(commits, diff, metadata) {
    return `Generate a PR description for a frontend project.

TEMPLATE (fill in each section):
## What
[1-3 sentences: what this PR does]

## Why
[1-2 sentences: motivation, link to ticket if branch name contains one]

## Changes
[Bulleted list of specific changes, grouped by area]

## Testing
[What tests were added/modified, manual testing notes]

## Breaking Changes
[List any breaking changes, or "None"]

## Screenshots
[If UI changes detected, note "Screenshots needed" — otherwise omit]

RULES:
- Be specific. Reference actual component names and file paths from the diff.
- Do NOT hallucinate features not present in the diff.
- Do NOT assume testing that isn't evidenced in test file changes.
- If you're unsure about "Why", infer from the branch name and changes.
- Keep the total description under 500 words.

BRANCH: ${metadata.branch}
BASE: ${metadata.base}
AUTHOR: ${metadata.author}

COMMITS:
${commits.map(c => `- ${c.message}`).join('\n')}

DIFF SUMMARY:
${diff}

Generate the PR description using the template above.`;
}

// Anti-hallucination: cross-reference generated claims with actual diff
function validatePRDescription(description, actualFiles, actualCommits) {
    const warnings = [];
    
    // Extract mentioned file paths from description:
    const mentionedPaths = description.match(/[\w/-]+\.(tsx?|jsx?|css|scss)/g) || [];
    
    for (const path of mentionedPaths) {
        if (!actualFiles.some(f => f.includes(path))) {
            warnings.push(`Description mentions "${path}" but it's not in the diff`);
        }
    }
    
    // Check for common hallucination patterns:
    if (description.includes('100% test coverage') && 
        !actualFiles.some(f => f.includes('.test.'))) {
        warnings.push('Claims test coverage but no test files were changed');
    }
    
    return warnings;
}

5. Semantic Version Inference from Commits

/*
 * Conventional commits enable AUTOMATED semantic versioning:
 *
 * MAJOR (X.0.0): breaking change (type! or BREAKING CHANGE footer)
 * MINOR (0.X.0): feat type
 * PATCH (0.0.X): fix, perf, refactor types
 *
 * The LLM doesn't decide the version — the commit types do.
 * This is deterministic, not AI-dependent.
 *
 * ┌──────────────────────────────────────────────────┐
 * │ Merged PRs since last release (v2.3.1):          │
 * │                                                   │
 * │ PR #45: feat(Search): add voice input             │ ← MINOR
 * │ PR #46: fix(Cart): correct tax calculation        │ ← PATCH
 * │ PR #47: feat(Auth)!: migrate to OAuth 2.0        │ ← MAJOR (!)
 * │ PR #48: perf(Images): add lazy loading            │ ← PATCH
 * │ PR #49: chore(deps): update React to 19           │ ← (none)
 * │                                                   │
 * │ Highest bump: MAJOR → next version: v3.0.0        │
 * └──────────────────────────────────────────────────┘
 */

function inferVersionBump(commits) {
    let bump = 'none'; // none < patch < minor < major
    const features = [];
    const fixes = [];
    const breaking = [];
    
    for (const commit of commits) {
        const parsed = parseConventionalCommit(commit.message);
        if (!parsed) continue;
        
        // Check for breaking change:
        if (parsed.hasBreaking || commit.body?.includes('BREAKING CHANGE:')) {
            bump = 'major';
            breaking.push(parsed.description);
        } else if (parsed.type === 'feat') {
            if (bump !== 'major') bump = 'minor';
            features.push(parsed.description);
        } else if (['fix', 'perf'].includes(parsed.type)) {
            if (bump === 'none') bump = 'patch';
            fixes.push(parsed.description);
        }
    }
    
    return { bump, features, fixes, breaking };
}

function calculateNextVersion(currentVersion, bump) {
    const [major, minor, patch] = currentVersion.split('.').map(Number);
    
    switch (bump) {
        case 'major': return `${major + 1}.0.0`;
        case 'minor': return `${major}.${minor + 1}.0`;
        case 'patch': return `${major}.${minor}.${patch + 1}`;
        default: return currentVersion;
    }
}

6. Changelog Generation Pipeline

/*
 * The changelog combines deterministic grouping with
 * AI-generated human-readable summaries.
 *
 * Deterministic parts:
 *   - Version number (from commit types)
 *   - Grouping (feat/fix/breaking)
 *   - PR links, authors, dates
 *
 * AI-generated parts:
 *   - Release summary paragraph
 *   - User-facing descriptions (rewording technical commits)
 *   - Migration guide for breaking changes
 */

async function generateChangelog(mergedPRs, previousVersion) {
    // 1. Parse all commits from merged PRs:
    const allCommits = mergedPRs.flatMap(pr =>
        pr.commits.map(c => ({
            ...parseConventionalCommit(c.message),
            pr: pr.number,
            author: pr.author,
            sha: c.sha,
        }))
    ).filter(Boolean);
    
    // 2. Deterministic version calculation:
    const { bump, features, fixes, breaking } = inferVersionBump(allCommits);
    const nextVersion = calculateNextVersion(previousVersion, bump);
    
    // 3. Group by type (deterministic):
    const groups = {
        breaking: allCommits.filter(c => c.hasBreaking),
        features: allCommits.filter(c => c.type === 'feat' && !c.hasBreaking),
        fixes: allCommits.filter(c => c.type === 'fix'),
        performance: allCommits.filter(c => c.type === 'perf'),
        other: allCommits.filter(c => !['feat', 'fix', 'perf'].includes(c.type) && !c.hasBreaking),
    };
    
    // 4. AI-generated: user-facing summary
    const summary = await callLLM(`
Write a 2-3 sentence release summary for version ${nextVersion} of a frontend application.

Features added: ${features.join('; ') || 'none'}
Bugs fixed: ${fixes.join('; ') || 'none'}
Breaking changes: ${breaking.join('; ') || 'none'}

Write from the perspective of "This release..." — be concise and user-facing.
Do not mention internal refactors or tooling changes.`,
        { temperature: 0.3, maxTokens: 150 }
    );
    
    // 5. AI-generated: migration guide for breaking changes
    let migrationGuide = '';
    if (groups.breaking.length > 0) {
        const breakingDiffs = await getBreakingChangeDiffs(groups.breaking);
        migrationGuide = await callLLM(`
Write a migration guide for these breaking changes in a frontend library.
For each breaking change, show BEFORE and AFTER code examples.

Breaking changes:
${breakingDiffs}

Format as markdown with ### headings for each change.`,
            { temperature: 0.2, maxTokens: 500 }
        );
    }
    
    // 6. Assemble changelog (template + AI):
    return formatChangelog({ nextVersion, summary, groups, migrationGuide });
}

function formatChangelog({ nextVersion, summary, groups, migrationGuide }) {
    const date = new Date().toISOString().split('T')[0];
    let md = `## [${nextVersion}] - ${date}\n\n${summary}\n\n`;
    
    if (groups.breaking.length > 0) {
        md += `### ⚠️ Breaking Changes\n\n`;
        for (const c of groups.breaking) {
            md += `- ${c.description} ([#${c.pr}](${prUrl(c.pr)})) — @${c.author}\n`;
        }
        md += `\n${migrationGuide}\n\n`;
    }
    
    if (groups.features.length > 0) {
        md += `### Features\n\n`;
        for (const c of groups.features) {
            md += `- **${c.scope || 'core'}**: ${c.description} ([#${c.pr}](${prUrl(c.pr)}))\n`;
        }
        md += '\n';
    }
    
    if (groups.fixes.length > 0) {
        md += `### Bug Fixes\n\n`;
        for (const c of groups.fixes) {
            md += `- **${c.scope || 'core'}**: ${c.description} ([#${c.pr}](${prUrl(c.pr)}))\n`;
        }
        md += '\n';
    }
    
    return md;
}

7. LLM API Integration and Cost Management

/*
 * Each AI call costs money. For commit messages at 40 PRs/week
 * with ~5 commits each, that's 200 LLM calls per week.
 *
 * Cost optimization strategies:
 *
 * ┌──────────────────────────────────────────────────┐
 * │ Task              │ Model         │ Cost/call     │
 * │──────────────────│──────────────│──────────────│
 * │ Commit message    │ gpt-4o-mini  │ ~$0.001      │
 * │ PR description    │ gpt-4o-mini  │ ~$0.003      │
 * │ Changelog summary │ gpt-4o       │ ~$0.01       │
 * │ Migration guide   │ gpt-4o       │ ~$0.02       │
 * │──────────────────│──────────────│──────────────│
 * │ Weekly total      │              │ ~$1.50        │
 * │ Monthly total     │              │ ~$6.00        │
 * └──────────────────────────────────────────────────┘
 *
 * Key: use cheap models for frequent, simple tasks;
 * expensive models only for complex generation.
 */

class AICommitService {
    constructor(config) {
        this.apiKey = config.apiKey;
        this.cache = new Map();
        this.rateLimiter = createRateLimiter({
            maxConcurrent: 5,
            minDelay: 100,
        });
    }
    
    async callLLM(prompt, options = {}) {
        const {
            model = 'gpt-4o-mini',
            temperature = 0.1,
            maxTokens = 300,
        } = options;
        
        // Cache by prompt hash — identical diffs get identical messages:
        const cacheKey = hashPrompt(prompt + model);
        if (this.cache.has(cacheKey)) {
            return this.cache.get(cacheKey);
        }
        
        // Rate limit to avoid API throttling:
        await this.rateLimiter.acquire();
        
        try {
            const response = await fetch('https://api.openai.com/v1/chat/completions', {
                method: 'POST',
                headers: {
                    'Content-Type': 'application/json',
                    'Authorization': `Bearer ${this.apiKey}`,
                },
                body: JSON.stringify({
                    model,
                    messages: [{ role: 'user', content: prompt }],
                    temperature,
                    max_tokens: maxTokens,
                }),
            });
            
            const data = await response.json();
            const result = data.choices[0].message.content.trim();
            
            this.cache.set(cacheKey, result);
            return result;
        } finally {
            this.rateLimiter.release();
        }
    }
    
    // Fallback: if API is down, generate a basic message from diff:
    generateFallbackMessage(files) {
        const types = new Set();
        for (const file of files) {
            if (file.path.includes('.test.')) types.add('test');
            else if (file.path.includes('.css') || file.path.includes('.scss')) types.add('style');
            else if (file.additions > file.deletions * 2) types.add('feat');
            else types.add('fix');
        }
        
        const type = types.has('feat') ? 'feat' : types.has('fix') ? 'fix' : 'chore';
        const scope = extractCommonDirectory(files.map(f => f.path));
        const fileList = files.map(f => path.basename(f.path)).join(', ');
        
        return `${type}(${scope}): update ${fileList}`;
    }
}

8. GitHub Actions Integration

# .github/workflows/ai-pr-description.yml
# Generates PR description when PR is opened or updated

name: AI PR Description
on:
  pull_request:
    types: [opened, synchronize]

jobs:
  generate-description:
    runs-on: ubuntu-latest
    # Only run if PR body is empty or has the template placeholder:
    if: |
      github.event.pull_request.body == '' ||
      contains(github.event.pull_request.body, '<!-- AI_GENERATE -->')
    
    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 0  # Full history for diff
      
      - name: Get PR commits and diff
        id: pr-data
        run: |
          # Get commits in this PR:
          git log --format="%s" origin/${{ github.base_ref }}..HEAD > commits.txt
          
          # Get diff (truncated for token budget):
          git diff origin/${{ github.base_ref }}...HEAD \
            --stat > diff-stat.txt
          
          git diff origin/${{ github.base_ref }}...HEAD \
            -- '*.ts' '*.tsx' '*.js' '*.jsx' '*.css' \
            --no-color | head -c 50000 > diff.txt
      
      - name: Generate PR description
        uses: actions/github-script@v7
        env:
          OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
        with:
          script: |
            const fs = require('fs');
            const commits = fs.readFileSync('commits.txt', 'utf8');
            const diff = fs.readFileSync('diff.txt', 'utf8');
            const diffStat = fs.readFileSync('diff-stat.txt', 'utf8');
            
            // Build prompt and call LLM (same pattern as above)
            const description = await generatePRDescription(
              commits, diff, diffStat, context.payload.pull_request
            );
            
            // Update the PR body:
            await github.rest.pulls.update({
              owner: context.repo.owner,
              repo: context.repo.repo,
              pull_number: context.payload.pull_request.number,
              body: description + '\n\n---\n*Generated by AI — please review and edit.*',
            });

9. Local CLI Tool Integration

/*
 * Developers want AI commit messages in their terminal workflow,
 * not just in CI. Integration points:
 *
 * 1. Git hook (prepare-commit-msg) — auto-fills commit message
 * 2. CLI command (ai-commit) — interactive commit with AI
 * 3. VS Code extension — shows suggestion in editor
 *
 * The CLI flow:
 *
 * $ git add .
 * $ ai-commit
 *
 * Analyzing staged changes...
 *   3 files changed: SearchBar.tsx, SearchBar.test.tsx, SearchBar.css
 *
 * Suggested commit message:
 *   feat(SearchBar): add debounced autocomplete with dropdown suggestions
 *
 * [a]ccept / [e]dit / [r]egenerate / [m]anual? a
 *
 * ✓ Committed: abc1234
 */

#!/usr/bin/env node
// bin/ai-commit.js

import { execSync } from 'child_process';
import { createInterface } from 'readline';

async function main() {
    // 1. Check for staged changes:
    const stagedDiff = execSync('git diff --cached --no-color', { encoding: 'utf8' });
    
    if (!stagedDiff.trim()) {
        console.log('No staged changes. Use `git add` first.');
        process.exit(1);
    }
    
    // 2. Parse and prepare diff:
    const files = parseDiffIntoHunks(stagedDiff);
    const { selected, summaries } = fitDiffToTokenBudget(files);
    
    console.log(`\nAnalyzing ${files.length} changed files...`);
    
    // 3. Generate message:
    const service = new AICommitService({ apiKey: getApiKey() });
    let message = await service.callLLM(
        buildCommitMessagePrompt(formatHunks(selected, summaries), {
            branchName: execSync('git branch --show-current', { encoding: 'utf8' }).trim(),
            filesChanged: files.map(f => f.path).join(', '),
        }),
        { temperature: 0.1 }
    );
    
    // 4. Validate:
    const validation = validateConventionalCommit(message);
    if (!validation.valid) {
        console.log(`⚠ Format issues: ${validation.errors.join(', ')}`);
        // Auto-fix common issues:
        message = autoFixCommitMessage(message, validation.errors);
    }
    
    // 5. Interactive prompt:
    console.log(`\nSuggested commit message:`);
    console.log(`  ${message}\n`);
    
    const rl = createInterface({ input: process.stdin, output: process.stdout });
    const answer = await question(rl, '[a]ccept / [e]dit / [r]egenerate / [m]anual? ');
    
    switch (answer.toLowerCase()) {
        case 'a':
            execSync(`git commit -m ${escapeShellArg(message)}`);
            console.log('✓ Committed');
            break;
        case 'e':
            // Open in $EDITOR with pre-filled message:
            const tmpFile = '/tmp/ai-commit-msg';
            fs.writeFileSync(tmpFile, message);
            execSync(`$EDITOR ${tmpFile}`, { stdio: 'inherit' });
            const edited = fs.readFileSync(tmpFile, 'utf8').trim();
            execSync(`git commit -m ${escapeShellArg(edited)}`);
            break;
        case 'r':
            // Regenerate with higher temperature:
            return main(); // Recursive retry
        case 'm':
            execSync('git commit', { stdio: 'inherit' }); // Normal commit
            break;
    }
    
    rl.close();
}

10. Guardrails and Quality Assurance

/*
 * AI-generated content needs guardrails to prevent:
 * 1. Hallucinated file paths or feature descriptions
 * 2. Leaked sensitive data in commit messages
 * 3. Incorrect breaking change classification
 * 4. Inconsistent format across team members
 *
 * Defense layers:
 *
 * ┌──────────────────────────────────────────────────┐
 * │ Layer 1: Input Sanitization                       │
 * │  - Strip secrets from diffs (.env changes)        │
 * │  - Remove binary file diffs                       │
 * │  - Truncate lock file diffs                       │
 * │                                                   │
 * │ Layer 2: Output Validation                        │
 * │  - Regex validation (conventional commit format)  │
 * │  - Cross-reference: mentioned files ∈ actual diff │
 * │  - Length constraints                             │
 * │                                                   │
 * │ Layer 3: Human Review                             │
 * │  - Interactive accept/edit/reject                 │
 * │  - PR descriptions marked as AI-generated         │
 * │  - Changelogs reviewed before publish             │
 * └──────────────────────────────────────────────────┘
 */

// Input sanitization — prevent secrets from reaching the LLM:
function sanitizeDiff(diff) {
    const lines = diff.split('\n');
    const sanitized = [];
    let skipFile = false;
    
    for (const line of lines) {
        // Skip sensitive files entirely:
        if (line.startsWith('diff --git')) {
            skipFile = false;
            const path = extractFilePath(line);
            
            if (path.match(/\.(env|pem|key|secret|credentials)/)) {
                skipFile = true;
                sanitized.push(`diff --git ${path} (contents redacted — sensitive file)`);
                continue;
            }
            if (path.match(/\.(lock|sum)$/)) {
                skipFile = true;
                sanitized.push(`diff --git ${path} (lock file — changes omitted)`);
                continue;
            }
        }
        
        if (skipFile) continue;
        
        // Redact inline secrets:
        const redactedLine = line
            .replace(/(api[_-]?key|secret|password|token)\s*[:=]\s*['"]?[^'"}\s]+/gi,
                     '$1=<REDACTED>')
            .replace(/Bearer\s+[A-Za-z0-9._-]+/g, 'Bearer <REDACTED>');
        
        sanitized.push(redactedLine);
    }
    
    return sanitized.join('\n');
}

// Cross-reference validation for changelogs:
function validateChangelogAccuracy(changelog, mergedPRs) {
    const issues = [];
    
    // Extract PR numbers mentioned in changelog:
    const mentionedPRs = [...changelog.matchAll(/#(\d+)/g)].map(m => parseInt(m[1]));
    const actualPRs = mergedPRs.map(pr => pr.number);
    
    // Check for phantom PRs:
    for (const pr of mentionedPRs) {
        if (!actualPRs.includes(pr)) {
            issues.push(`Changelog mentions PR #${pr} but it wasn't merged in this release`);
        }
    }
    
    // Check for missing PRs (feat/fix not mentioned):
    for (const pr of mergedPRs) {
        const hasFeatureOrFix = pr.commits.some(c => {
            const parsed = parseConventionalCommit(c.message);
            return parsed && ['feat', 'fix'].includes(parsed.type);
        });
        
        if (hasFeatureOrFix && !mentionedPRs.includes(pr.number)) {
            issues.push(`PR #${pr.number} has feat/fix commits but isn't in the changelog`);
        }
    }
    
    return issues;
}

// Team consistency: shared config file
// .ai-commit.json
const CONFIG_SCHEMA = {
    model: 'gpt-4o-mini',           // LLM model for commits
    temperature: 0.1,                // Low for consistency
    maxDiffTokens: 3000,             // Token budget for diff
    allowedTypes: ['feat', 'fix', 'refactor', 'perf', 'test', 'docs', 'style', 'chore'],
    allowedScopes: ['SearchBar', 'Cart', 'Auth', 'Nav', 'API', 'i18n', 'a11y'],
    requireScope: true,              // Scope is mandatory
    maxHeaderLength: 72,
    autoCommit: false,               // Always prompt for confirmation
    redactPatterns: [                // Additional secret patterns
        /NEXT_PUBLIC_/,
        /DATABASE_URL/,
    ],
};

Trade-offs & Considerations

Aspect	Manual Workflow	AI-Assisted (Local)	AI-Automated (CI)
Consistency	Low — varies by developer	High — enforced format	Highest — no human variance
Speed	Slow (writing + reviewing)	Fast (accept/edit)	Instant (no human step)
Accuracy	Depends on developer	Good with validation	Needs guardrails
Cost	Developer time	~$6/month API	~$6/month API
Hallucination risk	None	Low (human reviews)	Medium (auto-published)
Adoption friction	None	Low (opt-in hook)	None (runs in CI)
Offline capability	Full	Needs fallback	N/A

Best Practices

Use cheap, fast models for frequent tasks and expensive models only for complex generation — commit messages are short and formulaic, so gpt-4o-mini at $0.001/call is sufficient; reserve gpt-4o for changelog summaries and migration guides that need deeper reasoning; this keeps monthly costs under $10 for a team of 12 while maintaining quality where it matters.
Always validate AI output against the actual diff before committing — cross-reference mentioned file paths, component names, and feature descriptions against the staged diff to catch hallucinations; enforce conventional commit format with regex validation and retry once on format failure; never auto-commit without at least format validation passing.
Sanitize diffs before sending to any LLM API — strip secrets, credentials, and sensitive file contents — environment files, private keys, and API tokens in diffs will be sent to third-party APIs; maintain a redaction pipeline that strips .env changes, inline secrets matching common patterns, and lock file noise; this is both a security requirement and improves prompt quality by removing irrelevant content.
Keep version inference deterministic — let commit types drive semver, not the LLM — the LLM generates human-readable text; the version number comes from parsing conventional commit types (feat → minor, fix → patch, ! → major); mixing AI judgment into versioning creates unpredictable releases; use the LLM only for the prose around deterministic structure.
Mark all AI-generated content and require human review before publishing — PR descriptions should include a footer noting AI generation; changelogs should be reviewed by the release engineer before tagging; commit messages should default to interactive mode (accept/edit/reject) rather than auto-commit; this maintains accountability and catches the edge cases AI misses.

Conclusion

AI commit message generation works through a pipeline of diff parsing (splitting raw git output into scored hunks), token-budget packing (selecting the most semantically important changes within the LLM's context window), prompt engineering (strict format specification with conventional commit rules, negative examples, and scope inference), and output validation (regex matching, imperative mood checking, cross-referencing mentioned files against actual diffs). PR descriptions aggregate branch commits into structured templates where the AI fills in the "what" and "why" while deterministic logic handles PR links, author attribution, and file listings. Changelog generation separates concerns: version numbers are computed deterministically from commit types (feat → minor, fix → patch, breaking → major), while the LLM generates human-readable summaries and migration guides. Cost stays low (~$6/month for a 12-person team) by using cheap models for frequent simple tasks. The critical guardrails are input sanitization (stripping secrets from diffs before they reach any API), output validation (format checking with retry), cross-reference verification (ensuring generated text matches actual changes), and human review (interactive accept/edit for commits, review gates for changelogs). The AI accelerates the mechanical work of writing structured messages while the validation pipeline and human review maintain accuracy and accountability.

What did you think?