Capability Map

Model Capabilities

bach-1.0-preview

bach-1.0-preview is the default video generation model, supporting three generation modes with configurable output parameters.

Supported Generation Modes

Generation Mode	Duration	Resolution	Frame Rate
Text-to-Video	1–6 s	720p, 1080p	24fps, 30fps
Image-to-Video	1–6 s	720p, 1080p	24fps, 30fps
Multi-Image-to-Video	1–6 s	720p, 1080p	24fps, 30fps

Output Specifications

Parameter	Value
Video Codec	H.264
Container Format	MP4
Frame Rate	24 fps, 30 fps
Supported Aspect Ratios	16:9, 9:16, 1:1

Text-to-Video

Generate videos directly from natural language prompts. This mode produces a 6-second video and is ideal for conceptual visualizations, creative storytelling, and rapid prototyping.

Endpoint: POST /videos/text2video
Duration: Configurable from 1 to 6 seconds

Image-to-Video

Animate a single static image guided by an optional text prompt. The model analyzes the source image and generates smooth, natural motion while preserving visual consistency with the input.

Endpoint: POST /videos/image2video
Duration: Configurable from 1 to 6 seconds

Multi-Image-to-Video

Compose a video from multiple reference images with optional subject reference synthesis. This mode supports complex scene compositions and maintains consistent character or object appearance across frames.

Endpoint: POST /videos/multi2video
Duration: Configurable from 1 to 6 seconds
Maximum Images: 9 per request

General Information

Text to Video

On this page