feat: add AI generation language preferences by richiemcilroy · Pull Request #1839 · CapSoftware/Cap

richiemcilroy · 2026-05-18T19:34:31Z

Summary

Add an organization preference for AI generation language, using the same language options as transcript translation plus auto-detect.
Apply the selected language to Deepgram transcription and AI-generated titles, summaries, chapter names, section summaries, and key points.
Keep viewer setting lookups narrowed to boolean-only settings.

Validation

pnpm exec biome check --write packages/web-domain/src/Language.ts packages/web-domain/src/index.ts packages/database/schema.ts apps/web/actions/videos/translation-languages.ts apps/web/actions/organization/settings.ts apps/web/app/(org)/dashboard/settings/organization/components/CapSettingsCard.tsx apps/web/app/s/[videoId]/Share.tsx apps/web/workflows/transcribe.ts apps/web/workflows/generate-ai.ts apps/web/__tests__/unit/generate-ai-title.test.ts
pnpm --dir apps/web exec next typegen && pnpm exec tsc -b packages/web-domain packages/database apps/web
pnpm --dir apps/web test __tests__/unit/generate-ai-title.test.ts __tests__/integration/transcribe.test.ts

Greptile Summary

This PR adds a per-organization AI generation language preference that flows through Deepgram transcription and all Groq AI prompt paths (titles, summaries, chapters, key points). It also addresses two prior review concerns: a markError recovery path prevents videos from being stuck in PROCESSING when transcription fails, and detect_language is now constrained to a curated list of Nova-3-supported codes rather than the unbounded detect_language: true.

packages/web-domain/src/Language.ts — new module centralises SUPPORTED_LANGUAGES, defines the narrower AI_GENERATION_LANGUAGE_CODES set (Punjabi excluded), and exports type guards/parse helpers used across the stack.
apps/web/workflows/transcribe.ts — wraps the transcription+save steps in a try/catch that calls markError on failure; getDeepgramTranscriptionOptions is extracted as a pure, testable function and selects either language or a constrained detect_language array.
apps/web/workflows/generate-ai.ts — joins organizations to read aiGenerationLanguage, derives a languageInstruction string injected into every AI prompt, and adds a "Keep JSON property names exactly as shown" guard to prevent key translation.

Confidence Score: 5/5

Safe to merge — the new error-recovery path, constrained detect_language list, and type-narrowed viewer settings are all well-implemented with no new correctness issues introduced.

The changes are incremental and well-scoped: language constants are centralised, the AI workflows gain a pure options-builder that is unit-tested, and the transcription workflow now correctly marks videos as ERROR instead of leaving them in PROCESSING on failure. The two inline suggestions are minor edge-case robustness improvements that do not affect the happy path.

apps/web/workflows/transcribe.ts — the placement of markNoAudio inside the try block and the unguarded cleanupTempAudio call in the catch are worth a quick look.

Important Files Changed

Filename	Overview
packages/web-domain/src/Language.ts	New module centralising language constants — introduces AI_GENERATION_LANGUAGES (without "pa"), type guards, and parse helpers; well-structured.
apps/web/workflows/transcribe.ts	Adds error-recovery path (markError sets transcriptionStatus=ERROR on throw), constrained detect_language array for auto-detect, and per-org language forwarding to Deepgram.
apps/web/workflows/generate-ai.ts	Joins organizations to fetch aiGenerationLanguage, passes a language instruction to all AI prompt paths (single-chunk and multi-chunk), and adds "Keep JSON property names exactly as shown" guard.
apps/web/actions/organization/settings.ts	Adds aiGenerationLanguage to pro settings, runtime-validates the submitted value, and correctly defaults non-Pro orgs to "auto" via defaultProOrganizationSettings.
apps/web/app/(org)/dashboard/settings/organization/components/CapSettingsCard.tsx	Adds language dropdown with click-outside handler; correctly gates the button on user.isPro and types boolean settings with BooleanOrganizationSettingKey to prevent accidental non-boolean comparisons.
apps/web/app/s/[videoId]/Share.tsx	Narrows isDisabled parameter from keyof OrganizationSettings to ViewerSettingKey, preventing the string aiGenerationLanguage value from being accidentally treated as a boolean toggle.
apps/web/tests/unit/transcribe-language.test.ts	New test file covering AI generation language guards and Deepgram option construction; "pa" exclusion and constrained detect_language array are verified.
packages/database/schema.ts	Adds aiGenerationLanguage to the organisations settings JSON type; no migration needed as it extends an existing JSON column.

Prompt To Fix All With AI

Fix the following 2 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 2
apps/web/workflows/transcribe.ts:67-87
The `markNoAudio` call is inside the `try` block, so if it throws (transient DB failure), the `catch` fires and sets `transcriptionStatus = "ERROR"` — overwriting the intended `"NO_AUDIO"` status — and then also calls `cleanupTempAudio`, even though `extractAudio` never produced a temp file. Moving `markNoAudio` out of the try block avoids this mis-classification and the unnecessary cleanup call.

```suggestion
	const audioUrl = await extractAudio(videoId, userId, videoData.video);

	if (!audioUrl) {
		await markNoAudio(videoId);
		return {
			success: true,
			message: "Video has no audio track - skipped transcription",
		};
	}

	try {
		const [transcription] = await Promise.all([
			transcribeWithDeepgram(audioUrl, videoData.aiGenerationLanguage),
		]);

		await saveTranscription(videoId, userId, videoData.video, transcription);
	} catch (error) {
		await markError(videoId);
		await cleanupTempAudio(videoId, userId, videoData.video);
		throw error;
	}
```

### Issue 2 of 2
apps/web/workflows/transcribe.ts:83-87
If `cleanupTempAudio` itself throws inside the catch block, the exception propagates before `throw error` is reached, silently discarding the original transcription error. Wrapping the cleanup in a nested try/catch ensures the real error is always re-thrown.

```suggestion
	} catch (error) {
		await markError(videoId);
		try {
			await cleanupTempAudio(videoId, userId, videoData.video);
		} catch (cleanupError) {
			console.error("Failed to clean up temp audio after error:", cleanupError);
		}
		throw error;
	}
```

_{Reviews (2): Last reviewed commit: "feat: add AI generation language prefere..." | Re-trigger Greptile}

tembo · 2026-05-18T19:39:18Z

Since transcription language can now be forced, it might help to include the selected language in the Deepgram error message (makes unsupported/invalid language codes easier to debug).

Suggested change

throw new Error(

`Deepgram transcription failed (language=${language}): ${error.message}`,

);

greptile-apps · 2026-05-18T19:39:30Z

+export function getDeepgramTranscriptionOptions(
+	language: AiGenerationLanguage,
+) {
+	const baseOptions = {
+		model: "nova-3",
+		smart_format: true,
+		utterances: true,
+		mime_type: "audio/mpeg",
+	} as const;
+
+	if (language === AI_GENERATION_LANGUAGE_AUTO) {
+		return {
+			...baseOptions,
+			detect_language: true,
+		};
+	}
+
+	return {
+		...baseOptions,
+		language,
+	};
+}


Unsupported language codes will break all transcriptions for an org

SUPPORTED_LANGUAGES was originally built for a translation API and includes codes that Deepgram Nova-3 does not accept — among them bn, gu, pa, mr, te, fa, he, ur, and possibly sk, ar, zh (Deepgram expects zh-CN, not bare zh). When a Pro user selects one of these, transcribeWithDeepgram passes the bare code to Deepgram, which returns an error. That error is rethrown as an exception inside a "use step" — but by then validateVideo has already set transcriptionStatus = "PROCESSING" with no recovery path, so every subsequent video for that org will be stuck in PROCESSING indefinitely.

The two fix options are: (a) define a separate DEEPGRAM_SUPPORTED_LANGUAGES allowlist that only contains codes Deepgram Nova-3 accepts, and use that for the language picker; or (b) add a mapping layer that converts zh → zh-CN and rejects codes with no Deepgram equivalent before calling the API.

Prompt To Fix With AI

This is a comment left during a code review. Path: apps/web/workflows/transcribe.ts Line: 288-309 Comment: **Unsupported language codes will break all transcriptions for an org** `SUPPORTED_LANGUAGES` was originally built for a translation API and includes codes that Deepgram Nova-3 does not accept — among them `bn`, `gu`, `pa`, `mr`, `te`, `fa`, `he`, `ur`, and possibly `sk`, `ar`, `zh` (Deepgram expects `zh-CN`, not bare `zh`). When a Pro user selects one of these, `transcribeWithDeepgram` passes the bare code to Deepgram, which returns an error. That error is rethrown as an exception inside a `"use step"` — but by then `validateVideo` has already set `transcriptionStatus = "PROCESSING"` with no recovery path, so every subsequent video for that org will be stuck in PROCESSING indefinitely. The two fix options are: (a) define a separate `DEEPGRAM_SUPPORTED_LANGUAGES` allowlist that only contains codes Deepgram Nova-3 accepts, and use that for the language picker; or (b) add a mapping layer that converts `zh → zh-CN` and rejects codes with no Deepgram equivalent before calling the API. How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2026-05-18T19:39:34Z

+export function getDeepgramTranscriptionOptions(
+	language: AiGenerationLanguage,
+) {
+	const baseOptions = {
+		model: "nova-3",
+		smart_format: true,
+		utterances: true,
+		mime_type: "audio/mpeg",
+	} as const;
+
+	if (language === AI_GENERATION_LANGUAGE_AUTO) {
+		return {
+			...baseOptions,
+			detect_language: true,
+		};
+	}
+
+	return {
+		...baseOptions,
+		language,
+	};
+}


nova-3 + detect_language combination

When language === "auto", the code sends detect_language: true to Deepgram Nova-3. Deepgram's own documentation notes that language detection queries against "35 supported languages", but the Nova-3 model page lists a much smaller subset. If detection returns a language code that Nova-3 can't decode (which can happen with less common audio), the result may silently produce a poor-quality transcript. Consider adding detect_language=en&detect_language=es&... with only the languages Nova-3 supports to constrain the detection pool, instead of relying on unconstrained auto-detect.

Prompt To Fix With AI

This is a comment left during a code review. Path: apps/web/workflows/transcribe.ts Line: 288-309 Comment: **`nova-3` + `detect_language` combination** When `language === "auto"`, the code sends `detect_language: true` to Deepgram Nova-3. Deepgram's own documentation notes that language detection queries against "35 supported languages", but the Nova-3 model page lists a much smaller subset. If detection returns a language code that Nova-3 can't decode (which can happen with less common audio), the result may silently produce a poor-quality transcript. Consider adding `detect_language=en&detect_language=es&...` with only the languages Nova-3 supports to constrain the detection pool, instead of relying on unconstrained auto-detect. How can I resolve this? If you propose a fix, please make it concise.

richiemcilroy · 2026-05-18T19:49:59Z

please re-review the pr @greptileai

tembo · 2026-05-18T19:50:37Z

+
+	return {
+		...baseOptions,
+		language,


getDeepgramTranscriptionOptions passes any non-"auto" org setting straight through as language. Since AiGenerationLanguage currently includes codes outside DEEPGRAM_DETECTABLE_LANGUAGES (and some APIs expect mapped codes like zh-CN), this can turn into a hard failure for every transcription in that org.

If DEEPGRAM_DETECTABLE_LANGUAGES is effectively the allowlist for nova-3, consider guarding here (or constraining the picker) so unsupported codes fall back to auto-detect instead of erroring:

Suggested change

language,

if (

language === AI_GENERATION_LANGUAGE_AUTO ||

!DEEPGRAM_DETECTABLE_LANGUAGES.includes(language)

) {

return {

...baseOptions,

detect_language: [...DEEPGRAM_DETECTABLE_LANGUAGES],

};

}

return {

...baseOptions,

language,

};

richiemcilroy marked this pull request as ready for review May 18, 2026 19:34

superagent-security Bot added contributor:verified Contributor passed trust analysis. pr:verified PR passed security analysis. labels May 18, 2026

tembo Bot reviewed May 18, 2026

View reviewed changes

greptile-apps Bot reviewed May 18, 2026

View reviewed changes

feat: add AI generation language preferences

7fc140a

richiemcilroy force-pushed the ai-generation-language-preferences branch from 7d65afa to 7fc140a Compare May 18, 2026 19:46

tembo Bot reviewed May 18, 2026

View reviewed changes

richiemcilroy merged commit 38506f8 into main May 18, 2026
20 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add AI generation language preferences#1839

feat: add AI generation language preferences#1839
richiemcilroy merged 1 commit into
mainfrom
ai-generation-language-preferences

richiemcilroy commented May 18, 2026 •

edited by greptile-apps Bot

Loading

Uh oh!

tembo Bot May 18, 2026

Uh oh!

greptile-apps Bot May 18, 2026

Uh oh!

greptile-apps Bot May 18, 2026

Uh oh!

richiemcilroy commented May 18, 2026

Uh oh!

tembo Bot May 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

+		throw new Error(
+			`Deepgram transcription failed (language=${language}): ${error.message}`,
+		);

Conversation

richiemcilroy commented May 18, 2026 • edited by greptile-apps Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Validation

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Uh oh!

tembo Bot May 18, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot May 18, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot May 18, 2026

Choose a reason for hiding this comment

Uh oh!

richiemcilroy commented May 18, 2026

Uh oh!

tembo Bot May 18, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

richiemcilroy commented May 18, 2026 •

edited by greptile-apps Bot

Loading