Skip to content

Captioning

Generate captions for images using a vision-language model.

Supported Engines

Engine Default Model Cost Requires
openrouter google/gemini-2.0-flash-001 Free API key
openai gpt-4o-mini Paid API key
ollama llava Free Ollama running locally

Usage

from ciagen import caption

caption(
    images="data/real/train/images/",
    captions_dir="data/real/train/captions/",
    engine="openrouter",
    model="google/gemini-2.0-flash-001",
    api_key="sk-or-v1-...",  # Get from https://openrouter.ai/keys
)

CLI

ciagen caption \
    --images data/real/train/images/ \
    --output data/real/train/captions/ \
    --engine openrouter \
    --model google/gemini-2.0-flash-001 \
    --api-key YOUR_KEY

Captions are saved as .txt files matching each image's filename. Already-captioned images are skipped.