Vox

Vox is a highly configurable Android application designed for seamless audio recording, speech-to-text transcription, and advanced AI-driven text processing. By leveraging external Automatic Speech Recognition (ASR) and Large Language Model (LLM) APIs, Vox empowers users to transcribe voice memos and immediately process the text for summarization, translation, grammatical correction, and custom contextual prompts.

Features

High-Fidelity Audio Recording: Captures audio in 16kHz, 16-bit PCM WAV format to ensure optimal compatibility with modern ASR engines.
Customizable ASR Integration: Interfaces with OpenAI's Whisper API by default, but can be configured to communicate with any compatible external or self-hosted ASR endpoint.
AI Text Processing: Integrated LLM actions allow users to seamlessly:
- Summarize transcriptions.
- Translate text into specified target languages.
- Correct grammar and spelling.
- Execute custom user-defined prompts against the transcribed text.
Intent Sharing Support: Accepts audio files shared from other applications via the native Android sharing menu (Intent.ACTION_SEND), processing them instantly.
Markdown Rendering: Formats and presents LLM responses using built-in HTML/Markdown rendering for high readability.
Complete Data Control: Users can define their own API URLs, authentication tokens, and model identifiers directly within the application settings.

Installation

Vox is available for download across the following platforms:

Google Play Store: Download Vox on Google Play
GitHub Releases: Pre-compiled APK files for the latest versions can be found on the Releases page.

Configuration and Usage

Upon launching Vox for the first time, you will need to configure your API credentials.

Tap the Settings (gear) icon in the top right corner.
Under the ASR section:
- API URL: Enter your transcription endpoint (e.g., https://api.openai.com/v1/audio/transcriptions).
- Auth Token: Enter your API Bearer token.
- Model: Specify the model name (e.g., whisper-1).
Under the LLM section:
- API URL: Enter your LLM chat completion endpoint (e.g., https://api.openai.com/v1/chat/completions).
- Auth Token: Enter your API Bearer token.
- Model: Specify the text model name (e.g., gpt-4o or gpt-3.5-turbo).
- Language: Specify your preferred default language for translations and summarizations.
Tap Save.

Recording and Processing

Tap the Microphone icon to begin recording. You will be prompted to grant audio recording permissions upon first use.
Tap the Stop icon to end the recording. The app will automatically upload the audio to your configured ASR endpoint.
Once the transcription is complete, tap the AI icon (located in the top left corner of the app bar) to open the LLM action menu and select your desired text processing operation.

Permissions

RECORD_AUDIO: Required to capture voice input via the device microphone.
INTERNET: Required to communicate with the configured external APIs.

Building from Source

To build the project locally:

Clone the repository:

git clone https://github.com/eja/vox.git

Open the project in Android Studio.
Sync the Gradle files to download dependencies (Jetpack Compose, OkHttp, CommonMark, etc.).
Build and deploy to your emulator or physical device.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.github/workflows		.github/workflows
app		app
gradle		gradle
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
build.gradle.kts		build.gradle.kts
gradle.properties		gradle.properties
gradlew		gradlew
gradlew.bat		gradlew.bat
settings.gradle.kts		settings.gradle.kts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Vox

Features

Installation

Configuration and Usage

Recording and Processing

Permissions

Building from Source

About

Uh oh!

Releases 1

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Vox

Features

Installation

Configuration and Usage

Recording and Processing

Permissions

Building from Source

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Contributors

Uh oh!

Languages