v0.1.0 — Apache 2.0

Video memory for
AI agents.

Index once, search many. Turn video files into searchable, queryable context — what jq is for JSON, but for video.

View on GitHub
~/project
$ av ingest meeting.mp4
{"status": "complete", "video_id": "a1b2c3", "duration_sec": 3600, "artifacts_count": 847}
$ av search "what was discussed about pricing"
{"results": [{"rank": 1, "score": 0.87, "timestamp": "00:24:15", "text": "..."}]}
$ av ask "what were the key decisions?"
{"answer": "Three key decisions were made...", "citations": [{"timestamp": "00:24:15"}]}

Give your agent video memory

Paste this to your AI agent — Claude, GPT, Cursor, or any agent with tool use.

Read https://agentic.video/skill.md and follow the instructions to set up video memory.
1

Paste to your agent

Copy the instruction above and paste it into any AI agent chat.

2

Agent sets up av

Your agent installs pixelml-av, configures a provider, and ingests video.

3

Search and ask

Start querying your videos. Your agent can search, ask questions, and get citations.

Need managed infrastructure? hello@pixelml.com

How it works

1

Ingest

ffmpeg extracts audio. Whisper transcribes. Embeddings are generated. Everything lands in a single SQLite file.

2

Search

FTS5 full-text search as primary. Cosine similarity reranking when embeddings are available. Fast, local, no network needed.

3

Ask

RAG Q&A over your indexed videos. Get answers with timestamped citations pointing back to the source.

architecture
video file / URL
       │
       ▼
┌─────────────────────────────────┐
│  av ingest                      │
│  ├─ ffmpeg → audio → Whisper    │
│  ├─ ffmpeg → frames → Vision    │
│  ├─ Embeddings (batch)          │
│  └─ SQLite (FTS5 + vectors)     │
└─────────────────────────────────┘
       │
       ▼
┌─────────────────────────────────┐
│  av search / av ask             │
│  ├─ FTS5 full-text match        │
│  ├─ Cosine reranking            │
│  └─ RAG Q&A with citations      │
└─────────────────────────────────┘

Built for agents

JSON to stdout

Structured output on stdout, progress on stderr. Agents parse one, humans read the other.

Single SQLite file

No Postgres, no Redis, no external dependencies. One file at ~/.config/av/av.db.

FTS5 primary search

Full-text search works without embeddings. Cosine reranking is optional — works offline.

Provider-agnostic

OpenAI, Anthropic, Gemini. Switch providers with av config setup. One interface.

Best-effort pipeline

If a stage fails (auth, model access), the pipeline continues and warns. No hard crashes.

YouTube URL support

Pass a YouTube URL to av ingest. Uses yt-dlp under the hood to download and index.

Command reference

config
# Interactive setup wizard
$ av config setup
# Show current config
$ av config show
{"provider": "openai", "transcribe_model": "whisper-1"}
ingest
# Ingest a video file
$ av ingest video.mp4
# With frame captions
$ av ingest video.mp4 --captions
# YouTube URL
$ av ingest "https://youtu.be/..."
search
$ av search "pricing discussion"
{
"results": [{
"rank": 1,
"score": 0.87,
"timestamp": "00:24:15",
"text": "We agreed on the $49/mo tier..."
}]
}
ask
$ av ask "what were the key decisions?"
{
"answer": "Three key decisions...",
"citations": [{
"timestamp": "00:24:15",
"score": 0.91
}],
"confidence": 0.85
}
CommandDescription
av config setupInteractive provider setup wizard
av config showShow current configuration
av ingest <path>Ingest video file(s) into the index
av search <query>Full-text + semantic search
av ask <question>RAG Q&A with citations
av listList all indexed videos
av info <video_id>Detailed video metadata
av transcript <id>Output transcript (VTT/SRT/text)
av exportExport as JSONL/VTT/SRT
av open <id> --at <sec>Open video at timestamp

Provider compatibility

Switch providers with av config setup. The pipeline adapts automatically.

ProviderTranscriptionVision / ChatEmbeddings
OpenAI (OAuth)whisper-1gpt-4-1text-embedding-3-small
OpenAI (API key)whisper-1gpt-4-1text-embedding-3-small
Anthropicclaude-sonnet-4-5
Geminigemini-2.5-flashtext-embedding-004

When a capability is unavailable, the pipeline skips that stage and warns. Use AV_OPENAI_API_KEY as a transcription fallback for non-OpenAI providers.

Get started

Three commands to searchable video.

quickstart
# Install
$ pip install pixelml-av
# Configure your provider
$ av config setup
# Index a video
$ av ingest meeting.mp4
# Search it
$ av search "action items"
GitHubPyPI
Requires Python 3.11+ and FFmpeg