Captioning¶

Generate captions for images using a vision-language model.

Supported Engines¶

Engine	Default Model	Cost	Requires
`openrouter`	`google/gemini-2.0-flash-001`	Free	API key
`openai`	`gpt-4o-mini`	Paid	API key
`ollama`	`llava`	Free	Ollama running locally

Usage¶

from ciagen import caption

caption(
    images="data/real/train/images/",
    captions_dir="data/real/train/captions/",
    engine="openrouter",
    model="google/gemini-2.0-flash-001",
    api_key="sk-or-v1-...",  # Get from https://openrouter.ai/keys
)

CLI¶

ciagen caption \
    --images data/real/train/images/ \
    --output data/real/train/captions/ \
    --engine openrouter \
    --model google/gemini-2.0-flash-001 \
    --api-key YOUR_KEY

Captions are saved as .txt files matching each image's filename. Already-captioned images are skipped.