csm.rs

csm.rs is a high-performance Rust implementation of Sesame's Conversational Speech Model (CSM), designed for fast, efficient, and real-time streaming text-to-speech (TTS) inference. It is built on the candle machine learning framework.

This implementation is simple, straightforward, and aims for raw performance.

✨ Features

⚡️ Blazing-Fast: High-performance inference powered by Rust and candle.
🤗 Broad Model Support: Natively supports both the original sesame/csm-1b weights and weights from Hugging Face transformers-compatible fine-tunes.
🤏 Quantization: Supports GGUF-based q8_0 and q4_k quantization for reduced memory footprint and faster inference on CPU.
⚙️ Multiple Backends: Leverages candle to support multiple hardware targets, including MKL, Accelerate (macOS), CUDA, cuDNN, and Metal (Apple Silicon).
🔌 OpenAI Compatible: Includes an OpenAI-compatible API web server for seamless integration with existing tools.

🚀 Getting Started

Compilation

To build the project, select the appropriate feature flag for your target hardware. The project provides three main binaries: main (for command-line interface usage), benchmark (for throughput measurement), and server (for the OpenAI-compatible API).

CPU (MKL - Linux/Windows) For optimal performance on Intel CPUs.

RUSTFLAGS="-C target-cpu=native" cargo build --release --features mkl

CPU (Accelerate - macOS) For optimal performance on Apple CPUs.

RUSTFLAGS="-C target-cpu=native" cargo build --release --features accelerate

NVIDIA GPU (CUDA) Requires the CUDA Toolkit to be installed.

cargo build --release --features cuda

NVIDIA GPU (cuDNN) For faster CUDA performance with cuDNN.

cargo build --release --features cudnn

Apple Silicon GPU (Metal) For running on M-series Macs.

cargo build --release --features metal

The compiled binaries will be available in the ./target/release/ directory.

💻 Usage

Command-Line Interface (CLI)

The CLI allows you to generate audio directly from your terminal. Models are downloaded automatically from the Hugging Face Hub on first use.

Generate audio with a full-precision model:

./target/release/main \
    --text "Hello there from the full precision model" \
    --model-id "sesame/csm-1b" \
    --output "output_fp16.wav"

Generate audio with a quantized model:

./target/release/main \
    --text "Hello there from the quantized model" \
    --model-id cartesia/sesame-csm-1b-gguf \
    --model-file q8.gguf \
    --output "output_q8.wav"

To quantize your own models see the Quantization section.

OpenAI-Compatible Server

csm.rs includes a server that is compatible with the OpenAI Speech API, allowing you to use it as a drop-in replacement.

Start the server with a full-precision model:

./target/release/server --port 8080 --model-id "sesame/csm-1b"

Start the server with a quantized model:

./target/release/server \
    --port 8080 \
    --model-id cartesia/sesame-csm-1b-gguf \
    --model-file q8.gguf

Python Client Example You can use the official OpenAI Python client to interact with the server.

# pip install openai
from openai import OpenAI
from pathlib import Path

# Point the client to your local server
client = OpenAI(base_url="http://localhost:8080/v1", api_key="not-needed")

# Request speech synthesis
response = client.audio.speech.create(
    model="csm-1b", # Model name is ignored by the server but required by the API
    input="Hello! This audio was generated by the server.",
    voice="alloy", # Voice is ignored, use speaker_id instead
    # You can pass custom parameters in extra_body
    extra_body={
        "speaker_id": 0,
        "temperature": 0.7,
    }
)

# Save the output to a file
speech_file_path = Path("server_output.wav")
response.stream_to_file(speech_file_path)

# Or use the streaming endpoint
with client.audio.speech.with_streaming_response.create(
    model="csm-1b",
    voice="alloy",
    input="Hello from the streaming endpoint",
    response_format="wav",
    extra_body=dict(
        speaker_id=0,
    )
) as response:
    for chunk in response.iter_bytes(chunk_size=1024):
        print(chunk)

Command-Line Arguments

All binaries share a common set of arguments for model loading and hardware selection.

Common Arguments (for `main`, `benchmark`, `server`)

Argument	Description	Default Value
`--weights-path`	Absolute path to a weight file (`.safetensors` or `.gguf`). Overrides all other model loading options.	`None`
`--model-id`	The model ID from the Hugging Face Hub (e.g., `'sesame/csm-1b'`).	`None`
`--model-path`	Path to a local directory containing the model files.	`None`
`--model-file`	The name of a single model file to use within a `--model-id` or `--model-path`.	`None`
`--index-file`	The name of the index file for sharded models.	`None`
`--tokenizer-id`	The tokenizer ID from the Hugging Face Hub. Defaults to the `--model-id` if not set.	`None`
`--cpu`	If set, forces the computation to run on the CPU.	`false`

Specific Arguments for `main` (CLI)

Argument	Description	Default Value
`--text`	The text to generate audio from.	`"Hello there, this is a test"`
`--output`	The path to save the output `.wav` file.	`"csm_output.wav"`
`--speaker-id`	The speaker ID to use for generation.	`0`
`--temperature`	Sampling temperature.	`0.7`
`--top-k`	The number of highest probability tokens to consider for sampling (Top-K).	`100`
`--max-audio-len-ms`	The maximum length of the generated audio in milliseconds.	`30000.0`
`--buffer-size`	The number of audio frames to buffer before decoding to audio.	`20`
`--tokenizer-template`	A custom tokenizer template. E.g., `"<\|begin_of_text\|>[{speaker_id}]{text}<\|end_of_text\|>"`.	`None`

Specific Arguments for `benchmark`

Argument	Short	Description	Default Value
`--text`	`-t`	The text to use for benchmarking.	`"Hi there, this is a test"`
`--warmup-runs`	`-w`	The number of warm-up runs to perform before timing.	`1`
`--num-runs`	`-n`	The number of timed runs to perform for the benchmark.	`5`
`--speaker-id`		The speaker ID to use for generation.	`0`
`--temperature`		Sampling temperature.	`0.7`
`--top-k`		The number of highest probability tokens to consider for sampling (Top-K).	`100`
`--buffer-size`		The number of audio frames to buffer before decoding to audio.	`20`
`--tokenizer-template`		A custom tokenizer template. E.g., `"<\|begin_of_text\|>[{speaker_id}]{text}<\|end_of_text\|>"`.	`None`

Specific Arguments for `server`

Argument	Description	Default Value
`--host`	The host address to bind the server to.	`"0.0.0.0"`
`--port`	The port to run the server on.	`8080`
`--api-key`	If set, requires clients to provide this key in the `Authorization: Bearer <key>` header.	`None`

🤏 Quantization

You can significantly reduce the model size and improve CPU inference speed by quantizing the weights to 8-bit (q8_0) or 4-bit (q4_k). We use the GGUF file format for quantized models.

A Python script is provided to handle downloading, loading, and converting .safetensors weights into a quantized GGUF file. The script can work directly with both single-file and sharded models from local paths or the Hugging Face Hub.

Install dependencies:
```
pip install -r scripts/requirements.txt
```

Run the quantization script:

The script can quantize a model directly from the Hugging Face Hub or from a local directory.

To quantize a model from the Hub (e.g., sesame/csm-1b) to Q8_0:

python scripts/quantize.py \
    --model-id "sesame/csm-1b" \
    --index-file "transformers.safetensors.index.json" \
    --output-path ./csm-1b-q8_0.gguf \
    --qtype q8_0

To quantize a local model to Q4_K:

python scripts/quantize.py \
    --model-path /path/to/your/local/model/directory \
    --output-path ./csm-1b-q4_k.gguf \
    --qtype q4_k

📊 Benchmarks

You can run the built-in benchmark tool to measure the performance on your hardware. The tool reports the Real-Time Factor (RTF), which is the time taken to generate 1 second of audio (lower is better), and Throughput (xRealTime), which is how many seconds of audio are generated in 1 second (higher is better).

Example benchmark command:

# For a full-precision model with CUDA
cargo run --release --features cuda --bin benchmark

# For a quantized model on CPU
RUSTFLAGS="-C target-cpu=native" cargo run --release --features mkl --bin benchmark -- --weights-path ./csm-1b-q8_0.gguf

📜 License

This project is licensed under the GNU Affero General Public License Version 3. See the LICENSE file for details.

🤝 Contributing

Contributions are welcome!

If you have suggestions for improvements, find a bug, or want to add a new feature, please feel free to open an issue or submit a pull request.

Name		Name	Last commit message	Last commit date
Latest commit History 179 Commits
.devcontainer		.devcontainer
.planning		.planning
csm-cli		csm-cli
csm-core		csm-core
csm-server		csm-server
scripts		scripts
src		src
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
TESTING.md		TESTING.md
docker-compose.yml		docker-compose.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

csm.rs

✨ Features

🚀 Getting Started

Compilation

💻 Usage

Command-Line Interface (CLI)

OpenAI-Compatible Server

Command-Line Arguments

Common Arguments (for `main`, `benchmark`, `server`)

Specific Arguments for `main` (CLI)

Specific Arguments for `benchmark`

Specific Arguments for `server`

🤏 Quantization

📊 Benchmarks

📜 License

🤝 Contributing

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

csm.rs

✨ Features

🚀 Getting Started

Compilation

💻 Usage

Command-Line Interface (CLI)

OpenAI-Compatible Server

Command-Line Arguments

Common Arguments (for main, benchmark, server)

Specific Arguments for main (CLI)

Specific Arguments for benchmark

Specific Arguments for server

🤏 Quantization

📊 Benchmarks

📜 License

🤝 Contributing

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Common Arguments (for `main`, `benchmark`, `server`)

Specific Arguments for `main` (CLI)

Specific Arguments for `benchmark`

Specific Arguments for `server`

Packages