You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
description: str="An open source speech-to-text API that runs completely locally. The project is based on the OpenAI Whisper model and the faster inference Faster Whisper model, and implements an asynchronous model pool, using the asynchronous features of FastAPI for efficient packaging, supporting thread-safe asynchronous task queues, asynchronous file IO, asynchronous database IO, asynchronous web crawler modules, and more custom features."
1271
+
description: str="⚡ A high-performance asynchronous API for Automatic Speech Recognition (ASR) and translation. No need to purchase the Whisper API—perform inference using a locally running Whisper model with support for multi-GPU concurrency and designed for distributed deployment. It also includes built-in crawlers for social media platforms like TikTok and Douyin, enabling seamless media processing from multiple social platforms. This provides a powerful and scalable solution for automated media content data processing."
1259
1272
# 项目版本 | Project version
1260
-
version: str="1.0.3"
1273
+
version: str="1.0.4"
1261
1274
# Swagger 文档 URL | Swagger docs URL
1262
1275
docs_url: str="/"
1263
1276
# 是否开启 debug 模式 | Whether to enable debug mode
1264
1277
debug: bool=False
1265
1278
# 当检测到项目代码变动时是否自动重载项目 | Whether to automatically reload the project when changes to the project code are detected
Copy file name to clipboardExpand all lines: README.md
+40-13Lines changed: 40 additions & 13 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -27,6 +27,7 @@ The system efficiently manages resource scheduling and task management through a
27
27
***Asynchronous Model Pool** : Implements an efficient asynchronous AI model pool that supports multi-instance concurrent processing for OpenAI Whisper and Faster Whisper models under thread-safe conditions. In CUDA-accelerated, multi-GPU environments, intelligent loading mechanisms dynamically assign models to GPUs, balancing load and optimizing task processing. Note: Concurrency is unavailable on single-GPU setups.
28
28
***Asynchronous Database** : Supports MySQL and SQLite databases. It can run locally without MySQL, as SQLite allows for quick setup. When using MySQL, it facilitates distributed computing with multiple nodes accessing the same database for tasks.
29
29
***Asynchronous Web Crawlers** : Equipped with data crawler modules for multiple platforms, currently supporting `Douyin` and `TikTok`. By simply entering the video link, users can quickly process media for speech recognition, with plans for more social media platform support in the future.
30
+
***ChatGPT integration**: This project has integrated ChatGPT as the support for the LLM part, and can use the data in the database to interact with ChatGPT.
30
31
***Workflow and Component Design (Pending)** : With a focus on Whisper transcription tasks, the project will support a highly customizable workflow system. Users can define components, task dependencies, and execution orders in JSON files or write custom components in Python, facilitating complex multi-step processing.
31
32
***Event-Driven Intelligent Workflow (Pending)** : The workflow system supports event-driven triggers, including time-based, manual, or crawler module auto-triggers. More than single-task processing, workflows will offer intelligent, automated control with conditional branching, task dependencies, dynamic parameter passing, and retry strategies.
32
33
@@ -52,10 +53,11 @@ The system efficiently manages resource scheduling and task management through a
52
53
***Generate Subtitle File** : Users can generate subtitles for a task by specifying the `task_id` and output format (`output_format`). Currently supports (`srt`) and (`vtt`) subtitle file formats.
53
54
***Create TikTok Task** : Users can create tasks by crawling TikTok videos through a video link.
54
55
***Create Douyin Task** : Users can create tasks by crawling Douyin videos through a video link.
56
+
-**Use ChatGPT to summarize tasks**: Users can use the task ID to give the translated natural language to ChatGPT for content summarization and other interactions, and support selecting models and custom prompt words in the interface.
description: str="An open source speech-to-text API that runs completely locally. The project is based on the OpenAI Whisper model and the faster inference Faster Whisper model, and implements an asynchronous model pool, using the asynchronous features of FastAPI for efficient packaging, supporting thread-safe asynchronous task queues, asynchronous file IO, asynchronous database IO, asynchronous web crawler modules, and more custom features."
1276
+
description: str="⚡ A high-performance asynchronous API for Automatic Speech Recognition (ASR) and translation. No need to purchase the Whisper API—perform inference using a locally running Whisper model with support for multi-GPU concurrency and designed for distributed deployment. It also includes built-in crawlers for social media platforms like TikTok and Douyin, enabling seamless media processing from multiple social platforms. This provides a powerful and scalable solution for automated media content data processing."
1264
1277
# 项目版本 | Project version
1265
-
version: str="1.0.3"
1278
+
version: str="1.0.4"
1266
1279
# Swagger 文档 URL | Swagger docs URL
1267
1280
docs_url: str="/"
1268
1281
# 是否开启 debug 模式 | Whether to enable debug mode
1269
1282
debug: bool=False
1270
1283
# 当检测到项目代码变动时是否自动重载项目 | Whether to automatically reload the project when changes to the project code are detected
0 commit comments