Index once, search many. Turn video files into searchable, queryable context — what jq is for JSON, but for video.
Paste this to your AI agent — Claude, GPT, Cursor, or any agent with tool use.
Read https://agentic.video/skill.md and follow the instructions to set up video memory.Copy the instruction above and paste it into any AI agent chat.
Your agent installs pixelml-av, configures a provider, and ingests video.
Start querying your videos. Your agent can search, ask questions, and get citations.
ffmpeg extracts audio. Whisper transcribes. Embeddings are generated. Everything lands in a single SQLite file.
FTS5 full-text search as primary. Cosine similarity reranking when embeddings are available. Fast, local, no network needed.
RAG Q&A over your indexed videos. Get answers with timestamped citations pointing back to the source.
video file / URL
│
▼
┌─────────────────────────────────┐
│ av ingest │
│ ├─ ffmpeg → audio → Whisper │
│ ├─ ffmpeg → frames → Vision │
│ ├─ Embeddings (batch) │
│ └─ SQLite (FTS5 + vectors) │
└─────────────────────────────────┘
│
▼
┌─────────────────────────────────┐
│ av search / av ask │
│ ├─ FTS5 full-text match │
│ ├─ Cosine reranking │
│ └─ RAG Q&A with citations │
└─────────────────────────────────┘Structured output on stdout, progress on stderr. Agents parse one, humans read the other.
No Postgres, no Redis, no external dependencies. One file at ~/.config/av/av.db.
Full-text search works without embeddings. Cosine reranking is optional — works offline.
OpenAI, Anthropic, Gemini. Switch providers with av config setup. One interface.
If a stage fails (auth, model access), the pipeline continues and warns. No hard crashes.
Pass a YouTube URL to av ingest. Uses yt-dlp under the hood to download and index.
| Command | Description |
|---|---|
av config setup | Interactive provider setup wizard |
av config show | Show current configuration |
av ingest <path> | Ingest video file(s) into the index |
av search <query> | Full-text + semantic search |
av ask <question> | RAG Q&A with citations |
av list | List all indexed videos |
av info <video_id> | Detailed video metadata |
av transcript <id> | Output transcript (VTT/SRT/text) |
av export | Export as JSONL/VTT/SRT |
av open <id> --at <sec> | Open video at timestamp |
Switch providers with av config setup. The pipeline adapts automatically.
| Provider | Transcription | Vision / Chat | Embeddings |
|---|---|---|---|
| OpenAI (OAuth) | whisper-1 | gpt-4-1 | text-embedding-3-small |
| OpenAI (API key) | whisper-1 | gpt-4-1 | text-embedding-3-small |
| Anthropic | — | claude-sonnet-4-5 | — |
| Gemini | — | gemini-2.5-flash | text-embedding-004 |
When a capability is unavailable, the pipeline skips that stage and warns. Use AV_OPENAI_API_KEY as a transcription fallback for non-OpenAI providers.