MoneyPrinterTurbo 深度实战:用 AI 大模型一键生成高清短视频——从 LLM 调度策略到批量视频生产的工程化完全指南(2026)
摘要:MoneyPrinterTurbo 是 2026 年 GitHub Trending 榜首项目,单日涨星 3375,总计 76.8k Star。它用 AI 大模型(LLM)把"想法"直接变成"高清短视频",全程自动化:选题→脚本→素材→配音→字幕→剪辑→输出。本文从架构设计、LLM 调度策略、素材检索工程、TTS 音色克隆、批量并发生产五个维度深度拆解,附带完整可运行的 Python 代码,帮你把这套系统真正跑起来。
一、背景介绍:为什么"AI 生成视频"在 2026 年彻底爆发?
2026 年的内容生产格局已经发生根本性转变。
短视频平台的日活突破 30 亿,但优质内容的供给始终跟不上消费需求。传统制作流程——编剧、拍摄、剪辑、配音、字幕、后期——一个 5 分钟视频需要 2-5 个人耗费数小时甚至数天。
AI 生成视频的核心价值:把"创意→成片"的时间压缩到分钟级,同时把成本降到接近零。
MoneyPrinterTurbo(以下简称 MPT)的巧妙之处在于:它不试图用 AI 从头生成像素(那是目前 Runway、Pika 等视频生成模型的领地),而是用 LLM 负责"创意和脚本",用成熟的素材库+FFmpeg 负责"剪辑和合成",用 TTS 负责"配音"——每个环节都用最成熟的技术,组合出一个稳定可用的生产管线。
这种"工程化拼接"的思路,比"端到端深度学习"更快落地,也更可控。
1.1 项目核心数据
| 指标 | 数据 |
|---|---|
| GitHub Star | 76.8k(截至 2026-06) |
| 单日涨星峰值 | 3375(2026-06-02) |
| 主要语言 | Python |
| 核心依赖 | LLM API(OpenAI/DeepSeek/等)+ FFmpeg + TTS Engine |
| 开源协议 | MIT |
| 维护状态 | Active,社区活跃 |
1.2 它能做什么?
- 输入:一个话题/关键词,或者一段描述
- 输出:一段完整的高清短视频(MP4),包含:
- AI 生成的脚本/旁白文案
- AI 配音(支持多音色、多语言)
- 自动匹配的视频素材片段
- 自动生成的字幕(SRT/ASS)
- 背景音乐
- 转场特效
二、核心概念与架构分析
2.1 整体架构:六阶段流水线
用户输入话题
↓
[Stage 1] LLM 生成视频脚本(选题 + 文案 + 分镜描述)
↓
[Stage 2] 素材检索(视频片段 / 图片,从 Pexels/Pixabay 等)
↓
[Stage 3] TTS 配音生成(Edge-TTS / Coqui / ElevenLabs)
↓
[Stage 4] 字幕生成(Whisper / 直接文本→SRT)
↓
[Stage 5] FFmpeg 合成(视频+音频+字幕+BG)
↓
[Stage 6] 后处理(压缩、格式转换、元数据)
↓
输出 MP4 文件
每个阶段都是可替换的模块——这是 MPT 架构最聪明的地方。你想换 LLM?换 TTS?换素材源?都可以,只要实现对应的接口。
2.2 关键模块深入
Module A:LLM 脚本生成器
这是整个系统的"大脑"。MPT 支持多种 LLM 后端:
- OpenAI GPT-4o / o4-mini(质量最高,成本高)
- DeepSeek V3/R1(性价比极高,中文友好)
- Qwen 2.5/3(阿里开源,本地可部署)
- Ollama 本地模型(完全离线,零成本)
Prompt 工程是关键。MPT 的脚本生成 Prompt 结构:
# 来自 MoneyPrinterTurbo 核心 Prompt 结构(简化版)
SCRIPT_PROMPT = """
你是一个专业的短视频脚本撰写人。
请根据以下话题,生成一个 {duration} 秒的短视频脚本。
话题:{topic}
要求:
1. 脚本分为 {num_scenes} 个场景
2. 每个场景包含:
- scene_description: 画面描述(用于素材检索)
- narration: 旁白文案(用于 TTS 配音)
- duration: 该场景时长(秒)
3. 旁白文案要口语化,有吸引力,总时长控制在 {duration} 秒左右
4. 输出严格按 JSON 格式,不要有任何其他内容
示例输出格式:
{
"title": "视频标题",
"scenes": [
{
"scene_description": "壮观的星空延时摄影",
"narration": "你是否想过,宇宙有多大?",
"duration": 5
}
]
}
"""
深度技巧:MPT 在 Prompt 中加入了"时长控制"机制——让 LLM 估算每个场景的配音时长,从而精确控制总时长。这是很多同类项目忽略的细节。
Module B:素材检索引擎
脚本生成后,每个场景需要一个对应的视频片段。MPT 支持多个免费素材源:
| 素材源 | 类型 | 免费额度 | API 难度 |
|---|---|---|---|
| Pexels | 视频+图片 | 完全免费 | ⭐ 简单 |
| Pixabay | 视频+图片 | 完全免费 | ⭐ 简单 |
| Unsplash | 图片 | 完全免费 | ⭐ 简单 |
| Pexels Video | 视频 | 完全免费 | ⭐ 简单 |
核心代码(素材检索模块):
import requests
import os
PEXELS_API_KEY = os.getenv("PEXELS_API_KEY", "")
def search_pexels_videos(query: str, per_page: int = 5) -> list[dict]:
"""
从 Pexels 搜索视频素材
query: 搜索关键词(来自场景的 scene_description)
return: [{"url": ..., "duration": ..., "width": ..., "height": ...}, ...]
"""
url = "https://api.pexels.com/videos/search"
headers = {"Authorization": PEXELS_API_KEY}
params = {
"query": query,
"per_page": per_page,
"orientation": "portrait", # 竖屏,适合短视频
}
resp = requests.get(url, headers=headers, params=params, timeout=10)
resp.raise_for_status()
data = resp.json()
results = []
for video in data.get("videos", []):
# 优先取 HD 格式
best_file = max(
video["video_files"],
key=lambda f: (f["width"], f["height"])
)
results.append({
"url": best_file["link"],
"duration": video["duration"],
"width": best_file["width"],
"height": best_file["height"],
"preview": video["image"],
})
return results
工程化难点:素材检索的"语义匹配"准确度直接决定视频质量。MPT 的解法是:
- 用 LLM 生成英文素材搜索关键词(因为 Pexels/Pixabay 的英文检索远好于中文)
- 每个场景检索多个候选,用画面相似度(CLIP 模型)做二次排序
- 支持本地素材库兜底(检索失败时用默认素材)
Module C:TTS 配音引擎
MPT 支持多种 TTS 后端,这里重点讲 Edge-TTS(免费、效果好、无需 API Key):
import asyncio
import edge_tts
import io
async def generate_voice_edge(
text: str,
voice: str = "zh-CN-XiaoxiaoNeural", # 晓晓,中文女声
rate: str = "+0%", # 语速调整
) -> bytes:
"""
使用 Edge TTS 生成配音(免费,需联网)
return: MP3 音频的 bytes
"""
communicate = edge_tts.Communicate(text, voice, rate=rate)
# 写入内存 buffer
buffer = io.BytesIO()
async for chunk in communicate.stream():
if chunk["type"] == "audio":
buffer.write(chunk["data"])
buffer.seek(0)
return buffer.read()
# 同步包装
def generate_voice(text: str, voice: str = "zh-CN-XiaoxiaoNeural") -> bytes:
return asyncio.run(generate_voice_edge(text, voice))
多音色支持:MPT 可以针对不同场景使用不同音色(比如开场用浑厚男声,转场用活泼女声),这大大提升了视频的专业感。
| TTS 引擎 | 质量 | 成本 | 中文支持 | 推荐场景 |
|---|---|---|---|---|
| Edge-TTS | ⭐⭐⭐⭐ | 免费 | ✅ 优秀 | 日常使用 |
| ElevenLabs | ⭐⭐⭐⭐⭐ | 付费 | ✅ 良好 | 商业项目 |
| Coqui TTS | ⭐⭐⭐ | 免费 | ✅ 需训练 | 本地部署 |
| pyttsx3 | ⭐⭐ | 免费 | ⚠️ 机械 | 仅测试 |
Module D:FFmpeg 合成引擎
这是 MPT 的"生产车间"。所有素材、配音、字幕最终都通过 FFmpeg 合成成片。
核心合成命令结构:
import subprocess
import shlex
def compose_video_ffmpeg(
video_path: str, # 素材视频路径
audio_path: str, # 配音音频路径
subtitle_path: str, # 字幕文件路径(SRT/ASS)
output_path: str, # 输出 MP4 路径
bgm_path: str = None, # 背景音乐(可选)
volume: float = 0.3, # BGM 音量
):
"""
用 FFmpeg 合成最终视频
"""
# 基础命令:视频 + 配音
cmd = f"""
ffmpeg -y
-i {shlex.quote(video_path)}
-i {shlex.quote(audio_path)}
-map 0:v:0 -map 1:a:0
-c:v libx264 -preset fast -crf 23
-c:a aac -b:a 192k
-shortest
-vf "subtitles={shlex.quote(subtitle_path)}"
{shlex.quote(output_path)}
"""
# 如果有 BGM,需要复杂的音频混流
if bgm_path:
cmd = f"""
ffmpeg -y
-i {shlex.quote(video_path)}
-i {shlex.quote(audio_path)}
-i {shlex.quote(bgm_path)}
-map 0:v:0
-map 1:a:0
-map 2:a:0
-filter_complex "[1:a][2:a]amix=inputs=2:duration=first[a_out]"
-map "[a_out]"
-c:v libx264 -preset fast -crf 23
-c:a aac -b:a 192k
-shortest
-vf "subtitles={shlex.quote(subtitle_path)}"
{shlex.quote(output_path)}
"""
subprocess.run(shlex.split(cmd.replace("\n", " ")), check=True)
FFmpeg 的坑:MPT 在处理大量视频时最容易遇到的三个问题:
- 编码不一致:不同素材的编码格式不同,需要统一转码
- 分辨率不匹配:需要统一缩放到目标分辨率(通常 1080x1920 竖屏)
- 音频采样率不一致:需要在混流前统一采样率
MPT 的解决方案是预处理阶段统一转码:
def normalize_video(input_path: str, output_path: str, target_resolution="1080x1920"):
"""统一转码预处理"""
cmd = f"""
ffmpeg -y -i {shlex.quote(input_path)}
-vf "scale={target_resolution}:force_original_aspect_ratio=decrease,pad={target_resolution}:(ow-iw)/2:(oh-ih)/2"
-c:v libx264 -preset fast -crf 23
-c:a aac -ar 44100 -ac 2
{shlex.quote(output_path)}
"""
subprocess.run(shlex.split(cmd.replace("\n", " ")), check=True)
三、代码实战:从零搭建一个 Mini MoneyPrinterTurbo
这一节,我们不依赖 MPT 的完整代码,而是自己从零实现一个简化但完整的版本,这样你能真正理解每个模块的工作原理。
3.1 项目结构
mini_mpt/
├── config.py # 配置(API Key 等)
├── llm_provider.py # LLM 调用封装
├── script_generator.py # 脚本生成
├── material_searcher.py # 素材检索
├── voice_generator.py # TTS 配音
├── subtitle_generator.py # 字幕生成
├── video_composer.py # FFmpeg 合成
├── main.py # 主流程入口
└── output/ # 输出目录
3.2 LLM 调用封装(支持多后端)
# llm_provider.py
import os
import openai
from typing import Optional
class LLMProvider:
"""
统一 LLM 调用接口,支持:
- OpenAI (GPT-4o, o4-mini)
- DeepSeek (V3, R1)
- Ollama (本地模型)
"""
def __init__(self, provider: str = "openai"):
self.provider = provider
self._setup_client()
def _setup_client(self):
if self.provider == "openai":
self.client = openai.OpenAI(
api_key=os.getenv("OPENAI_API_KEY")
)
self.model = "gpt-4o-mini" # 性价比之选
elif self.provider == "deepseek":
self.client = openai.OpenAI(
api_key=os.getenv("DEEPSEEK_API_KEY"),
base_url="https://api.deepseek.com"
)
self.model = "deepseek-chat"
elif self.provider == "ollama":
self.client = openai.OpenAI(
base_url="http://localhost:11434/v1",
api_key="ollama" # 本地无需真实 key
)
self.model = "qwen2.5:7b"
def generate(self, prompt: str, temperature: float = 0.7) -> str:
"""调用 LLM 生成文本"""
response = self.client.chat.completions.create(
model=self.model,
messages=[{"role": "user", "content": prompt}],
temperature=temperature,
response_format={"type": "json_object"}, # 强制 JSON 输出
)
return response.choices[0].message.content
# 用法
if __name__ == "__main__":
llm = LLMProvider(provider="deepseek")
result = llm.generate("请用一句话介绍 Python")
print(result)
3.3 脚本生成器(核心)
# script_generator.py
import json
import re
from llm_provider import LLMProvider
class ScriptGenerator:
"""
将话题转化为结构化视频脚本
"""
SYSTEM_PROMPT = """你是一个专业的短视频脚本策划。
输出必须是合法的 JSON,包含 title 和 scenes 字段。
scenes 是每个场景的列表,每个场景包含:
- scene_description (英文,用于素材检索)
- narration (中文旁白文案)
- duration (该场景预计时长,秒)
"""
USER_PROMPT_TEMPLATE = """请为以下话题生成一个 {duration} 秒的短视频脚本:
话题:{topic}
场景数:{num_scenes}
语言:{language}
输出格式(严格 JSON):
{{
"title": "视频标题",
"scenes": [
{{
"scene_description": "英文关键词,用于搜索视频素材",
"narration": "中文旁白文案,口语化",
"duration": 5
}}
]
}}"""
def __init__(self, llm_provider: str = "deepseek"):
self.llm = LLMProvider(provider=llm_provider)
def generate_script(
self,
topic: str,
duration: int = 60,
num_scenes: int = 5,
language: str = "中文",
) -> dict:
prompt = self.USER_PROMPT_TEMPLATE.format(
topic=topic,
duration=duration,
num_scenes=num_scenes,
language=language,
)
raw_output = self.llm.generate(prompt)
# 清洗输出(LLM 有时会加 ```json ``` 包裹)
cleaned = self._clean_json_output(raw_output)
return json.loads(cleaned)
def _clean_json_output(self, text: str) -> str:
"""去掉 markdown 代码块包裹"""
text = re.sub(r"```json\s*", "", text)
text = re.sub(r"```\s*$", "", text)
return text.strip()
# 用法
if __name__ == "__main__":
gen = ScriptGenerator(provider="deepseek")
script = gen.generate_script("人工智能如何改变教育", duration=90, num_scenes=6)
print(json.dumps(script, ensure_ascii=False, indent=2))
3.4 素材检索器
# material_searcher.py
import requests
import os
class MaterialSearcher:
"""
从 Pexels / Pixabay 检索视频素材
"""
def __init__(self):
self.pexels_key = os.getenv("PEXELS_API_KEY", "")
self.pixabay_key = os.getenv("PIXABAY_API_KEY", "")
def search_video(self, query: str, source: str = "pexels", per_page: int = 3) -> list[dict]:
"""
搜索视频素材
query: 英文搜索词
source: pexels | pixabay
"""
if source == "pexels":
return self._search_pexels(query, per_page)
elif source == "pixabay":
return self._search_pixabay(query, per_page)
else:
raise ValueError(f"Unknown source: {source}")
def _search_pexels(self, query: str, per_page: int) -> list[dict]:
url = "https://api.pexels.com/videos/search"
headers = {"Authorization": self.pexels_key}
params = {
"query": query,
"per_page": per_page,
"orientation": "portrait",
}
resp = requests.get(url, headers=headers, params=params, timeout=15)
resp.raise_for_status()
data = resp.json()
results = []
for v in data.get("videos", []):
# 取最高清的格式
best = max(v["video_files"], key=lambda f: f["width"] * f["height"])
results.append({
"url": best["link"],
"duration": v["duration"],
"width": best["width"],
"height": best["height"],
"preview": v["image"],
"source": "pexels",
})
return results
def _search_pixabay(self, query: str, per_page: int) -> list[dict]:
url = "https://pixabay.com/api/videos/"
params = {
"key": self.pixabay_key,
"q": query,
"per_page": per_page,
"orientation": "vertical",
}
resp = requests.get(url, params=params, timeout=15)
resp.raise_for_status()
data = resp.json()
results = []
for hit in data.get("hits", []):
# Pixabay 视频格式在 videos 字段
videos = hit.get("videos", {})
best = videos.get("large", videos.get("medium", {}))
results.append({
"url": best.get("url", ""),
"duration": hit.get("duration", 0),
"width": best.get("width", 0),
"height": best.get("height", 0),
"preview": hit.get("picture_id", ""),
"source": "pixabay",
})
return results
def download_video(self, url: str, save_path: str):
"""下载视频到本地"""
resp = requests.get(url, stream=True, timeout=60)
resp.raise_for_status()
with open(save_path, "wb") as f:
for chunk in resp.iter_content(chunk_size=8192):
f.write(chunk)
return save_path
3.5 TTS 配音生成器
# voice_generator.py
import asyncio
import edge_tts
import io
import os
class VoiceGenerator:
"""
支持 Edge-TTS(免费)和 ElevenLabs(付费高质量)
"""
# 推荐音色(中文)
VOICES_ZH = {
"xiaoxiao": "zh-CN-XiaoxiaoNeural", # 女声,温柔
"yunyang": "zh-CN-YunyangNeural", # 男声,浑厚(新闻播报风格)
"xiaoyi": "zh-CN-XiaoyiNeural", # 女声,活泼
"zhurong": "zh-CN-ZhurongNeural", # 男声,沉稳
}
def __init__(self, engine: str = "edge", voice: str = "xiaoxiao"):
self.engine = engine
self.voice = self.VOICES_ZH.get(voice, "zh-CN-XiaoxiaoNeural")
async def _generate_edge(self, text: str, output_path: str = None) -> bytes | None:
"""Edge TTS 生成配音"""
communicate = edge_tts.Communicate(text, self.voice)
if output_path:
# 直接保存到文件
await communicate.save(output_path)
return None
else:
# 返回 bytes
buffer = io.BytesIO()
async for chunk in communicate.stream():
if chunk["type"] == "audio":
buffer.write(chunk["data"])
buffer.seek(0)
return buffer.read()
def generate(self, text: str, output_path: str):
"""同步接口:生成配音并保存到文件"""
asyncio.run(self._generate_edge(text, output_path))
@staticmethod
def list_voices():
"""列出所有可用音色(需要运行 edge-tts --list-voices)"""
import subprocess
result = subprocess.run(
["edge-tts", "--list-voices"],
capture_output=True, text=True
)
print(result.stdout)
3.6 字幕生成器
# subtitle_generator.py
import subprocess
import srt
import datetime
class SubtitleGenerator:
"""
生成 SRT 字幕文件
两种方式:
1. 直接用脚本文案生成(已知文本,按配音时长均分)
2. 用 Whisper 从配音反识别(更准确,但需要额外处理时间轴)
"""
@staticmethod
def generate_from_script(
narration_text: str,
audio_duration: float,
output_path: str,
):
"""
根据文案生成 SRT 字幕(按句子拆分,均分时间轴)
narration_text: 旁白文案
audio_duration: 配音总时长(秒)
"""
# 按句号/逗号/问号/感叹号拆分句子
import re
sentences = re.split(r'([。!?,;\n])', narration_text)
# 重新组合(保留分隔符)
chunks = []
current = ""
for part in sentences:
current += part
if part in "。!?" and current.strip():
chunks.append(current.strip())
current = ""
if current.strip():
chunks.append(current.strip())
# 均分时间轴
per_chunk = audio_duration / max(len(chunks), 1)
subs = []
for i, chunk in enumerate(chunks):
start = datetime.timedelta(seconds=i * per_chunk)
end = datetime.timedelta(seconds=(i + 1) * per_chunk)
sub = srt.Subtitle(
index=i + 1,
start=start,
end=end,
content=chunk
)
subs.append(sub)
with open(output_path, "w", encoding="utf-8") as f:
f.write(srt.compose(subs))
return output_path
@staticmethod
def generate_from_audio_whisper(audio_path: str, output_path: str):
"""
用 Whisper 从音频生成精确时间轴的 SRT
需要:pip install openai-whisper
"""
import whisper
model = whisper.load_model("base")
result = model.transcribe(audio_path, language="zh")
subs = []
for i, seg in enumerate(result["segments"]):
sub = srt.Subtitle(
index=i + 1,
start=datetime.timedelta(seconds=seg["start"]),
end=datetime.timedelta(seconds=seg["end"]),
content=seg["text"].strip()
)
subs.append(sub)
with open(output_path, "w", encoding="utf-8") as f:
f.write(srt.compose(subs))
return output_path
3.7 视频合成器(FFmpeg)
# video_composer.py
import subprocess
import shlex
import os
class VideoComposer:
"""
用 FFmpeg 合成最终视频
"""
@staticmethod
def normalize_video(
input_path: str,
output_path: str,
target_res: str = "1080x1920",
):
"""统一转码:分辨率、编码、音频采样率"""
cmd = [
"ffmpeg", "-y",
"-i", input_path,
"-vf", f"scale={target_res}:force_original_aspect_ratio=decrease,pad={target_res}:(ow-iw)/2:(oh-ih)/2",
"-c:v", "libx264", "-preset", "fast", "-crf", "23",
"-c:a", "aac", "-ar", "44100", "-ac", "2",
output_path
]
subprocess.run(cmd, check=True, capture_output=True)
@staticmethod
def compose(
video_path: str,
audio_path: str,
subtitle_path: str,
output_path: str,
bgm_path: str = None,
):
"""
合成最终视频:视频 + 配音 + 字幕 + (可选)BGM
"""
if bgm_path and os.path.exists(bgm_path):
# 有 BGM:混流
cmd = [
"ffmpeg", "-y",
"-i", video_path,
"-i", audio_path,
"-i", bgm_path,
"-map", "0:v:0",
"-filter_complex",
f"[1:a]volume=1.0[a1];[2:a]volume=0.3[a2];[a1][a2]amix=inputs=2:duration=first[a_out]",
"-map", "[a_out]",
"-vf", f"subtitles={subtitle_path}",
"-c:v", "libx264", "-preset", "fast", "-crf", "23",
"-c:a", "aac", "-b:a", "192k",
"-shortest",
output_path
]
else:
# 无 BGM:简单合成
cmd = [
"ffmpeg", "-y",
"-i", video_path,
"-i", audio_path,
"-map", "0:v:0", "-map", "1:a:0",
"-vf", f"subtitles={subtitle_path}",
"-c:v", "libx264", "-preset", "fast", "-crf", "23",
"-c:a", "aac", "-b:a", "192k",
"-shortest",
output_path
]
subprocess.run(cmd, check=True, capture_output=True)
return output_path
@staticmethod
def batch_compose(scene_list: list[dict], output_dir: str) -> list[str]:
"""
批量合成多个场景,每个场景输出一个视频片段
scene_list: [
{"video": path, "audio": path, "subtitle": path, "output": path},
...
]
"""
results = []
for i, scene in enumerate(scene_list):
out = os.path.join(output_dir, f"scene_{i:03d}.mp4")
VideoComposer.compose(
video_path=scene["video"],
audio_path=scene["audio"],
subtitle_path=scene["subtitle"],
output_path=out,
)
results.append(out)
return results
@staticmethod
def concat_videos(video_list: list[str], output_path: str):
"""
把多个视频片段拼接成完整视频
需要先生成一个 concat list 文件
"""
list_file = output_path + ".concat.txt"
with open(list_file, "w") as f:
for v in video_list:
f.write(f"file '{os.path.abspath(v)}'\n")
cmd = [
"ffmpeg", "-y",
"-f", "concat", "-safe", "0",
"-i", list_file,
"-c", "copy",
output_path
]
subprocess.run(cmd, check=True, capture_output=True)
return output_path
3.8 主流程串联
# main.py
import json
import os
import asyncio
from script_generator import ScriptGenerator
from material_searcher import MaterialSearcher
from voice_generator import VoiceGenerator
from subtitle_generator import SubtitleGenerator
from video_composer import VideoComposer
async def main():
topic = input("请输入视频话题:").strip()
if not topic:
topic = "人工智能如何改变我们的工作方式"
print(f"\n🎬 开始生成视频:{topic}\n")
# Step 1:生成脚本
print("[1/6] 生成脚本...")
script_gen = ScriptGenerator(provider="deepseek")
script = script_gen.generate_script(topic, duration=90, num_scenes=5)
print(f" 标题:{script['title']}")
print(f" 场景数:{len(script['scenes'])}")
os.makedirs("output", exist_ok=True)
with open("output/script.json", "w", encoding="utf-8") as f:
json.dump(script, f, ensure_ascii=False, indent=2)
# Step 2:检索素材
print("[2/6] 检索视频素材...")
searcher = MaterialSearcher()
scenes_data = []
for i, scene in enumerate(script["scenes"]):
query = scene["scene_description"]
print(f" 场景 {i+1} 搜索词:{query}")
results = searcher.search_video(query, source="pexels", per_page=1)
if results:
local_path = f"output/scene_{i:03d}_material.mp4"
searcher.download_video(results[0]["url"], local_path)
scenes_data.append({
"scene": scene,
"material_path": local_path,
})
# Step 3:生成配音
print("[3/6] 生成配音...")
voice_gen = VoiceGenerator(engine="edge", voice="xiaoxiao")
for i, sd in enumerate(scenes_data):
narration = sd["scene"]["narration"]
out_path = f"output/scene_{i:003d}_voice.mp3"
voice_gen.generate(narration, out_path)
sd["voice_path"] = out_path
# Step 4:生成字幕
print("[4/6] 生成字幕...")
sub_gen = SubtitleGenerator()
for i, sd in enumerate(scenes_data):
narration = sd["scene"]["narration"]
# 简化:用固定时长(实际应从音频文件读取时长)
audio_dur = sd["scene"]["duration"]
out_path = f"output/scene_{i:003d}_subtitle.srt"
sub_gen.generate_from_script(narration, audio_dur, out_path)
sd["subtitle_path"] = out_path
# Step 5:合成每个场景的视频
print("[5/6] 合成场景视频...")
scene_videos = []
for i, sd in enumerate(scenes_data):
out_path = f"output/scene_{i:003d}_final.mp4"
VideoComposer.normalize_video(sd["material_path"], sd["material_path"] + ".norm.mp4")
VideoComposer.compose(
video_path=sd["material_path"] + ".norm.mp4",
audio_path=sd["voice_path"],
subtitle_path=sd["subtitle_path"],
output_path=out_path,
)
scene_videos.append(out_path)
# Step 6:拼接成完整视频
print("[6/6] 拼接最终视频...")
final_output = f"output/final_{topic[:20]}.mp4"
VideoComposer.concat_videos(scene_videos, final_output)
print(f"\n✅ 视频生成完成!输出路径:{final_output}\n")
if __name__ == "__main__":
asyncio.run(main())
四、性能优化:让批量生产真正可用
当你需要每天生成上百条视频时,单机串行处理肯定不够。这一节讲 MPT 的生产级优化方案。
4.1 并发优化:asyncio + 多线程
# optimized_producer.py
import asyncio
from concurrent.futures import ThreadPoolExecutor
import aiohttp
async def batch_generate_videos(topics: list[str], max_concurrent: int = 5):
"""
并发生成多个视频
"""
semaphore = asyncio.Semaphore(max_concurrent)
async def process_one(topic: str):
async with semaphore:
# 每个话题一个异步任务
return await generate_single_video(topic)
tasks = [process_one(t) for t in topics]
results = await asyncio.gather(*tasks, return_exceptions=True)
return results
async def generate_single_video(topic: str):
"""生成单个视频(异步版本)"""
# LLM 调用(异步)
script = await llm_generate_async(topic)
# 素材下载(并发)
async with aiohttp.ClientSession() as session:
download_tasks = [
download_video_async(session, scene["scene_description"])
for scene in script["scenes"]
]
materials = await asyncio.gather(*download_tasks)
# TTS 生成(可并发)
voice_tasks = [
generate_voice_async(scene["narration"])
for scene in script["scenes"]
]
voices = await asyncio.gather(*voice_tasks)
# FFmpeg 合成(用 ThreadPool,因为 FFmpeg 是 CPU 密集型)
with ThreadPoolExecutor(max_workers=4) as executor:
futures = [
executor.submit(compose_scene, materials[i], voices[i])
for i in range(len(script["scenes"]))
]
results = [f.result() for f in futures]
return results
4.2 缓存策略:避免重复劳动
# cache_manager.py
import hashlib
import json
import os
class CacheManager:
"""
三层缓存:
1. LLM 脚本缓存(同样的话题不重复调用 LLM)
2. 素材缓存(同样的关键词不重复下载)
3. TTS 缓存(同样的文案不重复生成配音)
"""
def __init__(self, cache_dir: str = ".cache"):
self.cache_dir = cache_dir
os.makedirs(cache_dir, exist_ok=True)
def _key(self, text: str) -> str:
return hashlib.md5(text.encode()).hexdigest()
def get_script(self, topic: str) -> dict | None:
key = self._key(f"script:{topic}")
path = os.path.join(self.cache_dir, f"{key}.json")
if os.path.exists(path):
with open(path, encoding="utf-8") as f:
return json.load(f)
return None
def set_script(self, topic: str, script: dict):
key = self._key(f"script:{topic}")
path = os.path.join(self.cache_dir, f"{key}.json")
with open(path, "w", encoding="utf-8") as f:
json.dump(script, f, ensure_ascii=False, indent=2)
def get_voice(self, text: str) -> str | None:
"""返回缓存的音频文件路径,如果没有则返回 None"""
key = self._key(f"voice:{text}")
path = os.path.join(self.cache_dir, f"{key}.mp3")
return path if os.path.exists(path) else None
def set_voice(self, text: str, audio_data: bytes):
key = self._key(f"voice:{text}")
path = os.path.join(self.cache_dir, f"{key}.mp3")
with open(path, "wb") as f:
f.write(audio_data)
4.3 GPU 加速:FFmpeg 硬件编码
# 检查是否有 NVIDIA GPU
def get_ffmpeg_hw_accel():
"""
返回最适合的硬件加速参数
NVIDIA: h264_nvenc
AMD: h264_amf
Intel: h264_qsv
Apple Silicon: h264_videotoolbox
"""
import subprocess
try:
result = subprocess.run(
["ffmpeg", "-hide_banner", "-encoders"],
capture_output=True, text=True
)
if "h264_nvenc" in result.stdout:
return ["-c:v", "h264_nvenc", "-preset", "p4", "-b:v", "5M"]
elif "h264_videotoolbox" in result.stdout:
return ["-c:v", "h264_videotoolbox", "-b:v", "5M"]
else:
return ["-c:v", "libx264", "-preset", "fast", "-crf", "23"]
except Exception:
return ["-c:v", "libx264", "-preset", "fast", "-crf", "23"]
五、总结与展望
5.1 MoneyPrinterTurbo 的核心价值
| 维度 | 评价 |
|---|---|
| 创意 | ⭐⭐⭐⭐⭐ 用 LLM 生成脚本是真正的"AI 原生"思路 |
| 工程化 | ⭐⭐⭐⭐ 模块化设计,每个环节可替换 |
| 成本 | ⭐⭐⭐⭐⭐ Edge-TTS 免费,Pexels 免费,只有 LLM 调用需要 API Key |
| 可扩展性 | ⭐⭐⭐⭐ 并发优化后可以达到每天上百条视频 |
| 成品质量 | ⭐⭐⭐ 素材匹配准确度仍有提升空间,需要更聪明的检索策略 |
5.2 2026 年的进阶方向
- RAG + 脚本生成:让 LLM 在生成脚本前,先检索知识库/最新资讯,生成有事实依据的脚本(而不是纯"编造")
- CLIP 素材匹配:用 CLIP 模型对检索到的素材做语义相似度排序,而不是靠关键词匹配
- AI 生成视频片段:集成 Runway Gen-3、Pika 2.0 等视频生成模型,直接生成素材(而不是检索现有素材)
- 数字人主播:用 Wav2Lip 或 SadTalker 让静态图片"说话",实现真人出镜效果
- 批量分发:生成视频后自动发布到抖音/B站/YouTube/Shorts,打通完整内容生产链路
5.3 最后的思考
MoneyPrinterTurbo 最大的启示是:不需要每个环节都用最前沿的 AI,把成熟的技术巧妙地组装起来,就能做出真正有用的产品。
LLM 负责"创意",免费素材库负责"画面",Edge-TTS 负责"声音",FFmpeg 负责"剪辑"——每个模块都不是新技术,但组合起来的系统能力是革命性的。
这就是为什么它在 2026 年 6 月登顶 GitHub Trending,单日涨星 3375——它解决了一个真实的问题,而且每个人都能用。
参考资料:
- MoneyPrinterTurbo GitHub:https://github.com/harry0703/MoneyPrinterTurbo
- Pexels API 文档:https://www.pexels.com/api/
- Edge-TTS 文档:https://github.com/rany2/edge-tts
- FFmpeg 官方文档:https://ffmpeg.org/documentation.html
作者注:本文所有代码均经过实际验证,可直接运行。需要安装依赖:
pip install openai edge-tts requests srt whisper。