v0.1.0 — Apache 2.0

Video memory for
AI agents.

Index once, search many. Turn video files into searchable, queryable context — what jq is for JSON, but for video.

View on GitHub

~/project

$ av ingest meeting.mp4

{"status": "complete", "video_id": "a1b2c3", "duration_sec": 3600, "artifacts_count": 847}

$ av search "what was discussed about pricing"

{"results": [{"rank": 1, "score": 0.87, "timestamp": "00:24:15", "text": "..."}]}

$ av ask "what were the key decisions?"

{"answer": "Three key decisions were made...", "citations": [{"timestamp": "00:24:15"}]}

Give your agent video memory

Paste this to your AI agent — Claude, GPT, Cursor, or any agent with tool use.

Read https://agentic.video/skill.md and follow the instructions to set up video memory.

Paste to your agent

Copy the instruction above and paste it into any AI agent chat.

Agent sets up av

Your agent installs pixelml-av, configures a provider, and ingests video.

Search and ask

Start querying your videos. Your agent can search, ask questions, and get citations.

Need managed infrastructure? hello@pixelml.com

How it works

Ingest

ffmpeg extracts audio. Whisper transcribes. Embeddings are generated. Everything lands in a single SQLite file.

Search

FTS5 full-text search as primary. Cosine similarity reranking when embeddings are available. Fast, local, no network needed.

Ask

RAG Q&A over your indexed videos. Get answers with timestamped citations pointing back to the source.

architecture

video file / URL
       │
       ▼
┌─────────────────────────────────┐
│  av ingest                      │
│  ├─ ffmpeg → audio → Whisper    │
│  ├─ ffmpeg → frames → Vision    │
│  ├─ Embeddings (batch)          │
│  └─ SQLite (FTS5 + vectors)     │
└─────────────────────────────────┘
       │
       ▼
┌─────────────────────────────────┐
│  av search / av ask             │
│  ├─ FTS5 full-text match        │
│  ├─ Cosine reranking            │
│  └─ RAG Q&A with citations      │
└─────────────────────────────────┘

Built for agents

JSON to stdout

Structured output on stdout, progress on stderr. Agents parse one, humans read the other.

Single SQLite file

No Postgres, no Redis, no external dependencies. One file at ~/.config/av/av.db.

FTS5 primary search

Full-text search works without embeddings. Cosine reranking is optional — works offline.

Provider-agnostic

OpenAI, Anthropic, Gemini. Switch providers with av config setup. One interface.

Best-effort pipeline

If a stage fails (auth, model access), the pipeline continues and warns. No hard crashes.

YouTube URL support

Pass a YouTube URL to av ingest. Uses yt-dlp under the hood to download and index.

Command reference

config

# Interactive setup wizard

$ av config setup

# Show current config

$ av config show

{"provider": "openai", "transcribe_model": "whisper-1"}

ingest

# Ingest a video file

$ av ingest video.mp4

# With frame captions

$ av ingest video.mp4 --captions

# YouTube URL

$ av ingest "https://youtu.be/..."

$ av search "pricing discussion"

{
"results": [{
"rank": 1,
"score": 0.87,
"timestamp": "00:24:15",
"text": "We agreed on the $49/mo tier..."
}]
}

ask

$ av ask "what were the key decisions?"

{
"answer": "Three key decisions...",
"citations": [{
"timestamp": "00:24:15",
"score": 0.91
}],
"confidence": 0.85
}

Command	Description
`av config setup`	Interactive provider setup wizard
`av config show`	Show current configuration
`av ingest <path>`	Ingest video file(s) into the index
`av search <query>`	Full-text + semantic search
`av ask <question>`	RAG Q&A with citations
`av list`	List all indexed videos
`av info <video_id>`	Detailed video metadata
`av transcript <id>`	Output transcript (VTT/SRT/text)
`av export`	Export as JSONL/VTT/SRT
`av open <id> --at <sec>`	Open video at timestamp

Provider compatibility

Switch providers with av config setup. The pipeline adapts automatically.

Provider	Transcription	Vision / Chat	Embeddings
OpenAI (OAuth)	`whisper-1`	`gpt-4-1`	`text-embedding-3-small`
OpenAI (API key)	`whisper-1`	`gpt-4-1`	`text-embedding-3-small`
Anthropic	—	`claude-sonnet-4-5`	—
Gemini	—	`gemini-2.5-flash`	`text-embedding-004`

When a capability is unavailable, the pipeline skips that stage and warns. Use AV_OPENAI_API_KEY as a transcription fallback for non-OpenAI providers.

Get started

Three commands to searchable video.

quickstart

# Install

$ pip install pixelml-av

# Configure your provider