How VidSkill works

How VidSkill works.

Four stages, one traceable chain from video frame to executable skill.

1. Watch

We open the video in a headless browser and capture frames at 1 fps. If audio is available, we transcribe it in real time. No video file leaves your URL — we only index what we extract.

2. Extract observations

Every 30 seconds, Gemini 2.5 Pro bundles the frames + transcript and writes a list of timestamped, modality-tagged observations — each one verbatim-quoted when the presenter is speaking.

3. Consolidate into values

Claude runs the full observation list twice at different temperatures. Only the decision patterns that survive both runs become values — each one a falsifiable claim about what the presenter does differently, with ≥ 2 citations.

4. Compile into skills

Each value goes through three Claude passes: operationalize (what would an agent do?), steps (typed action DAG), checks (postconditions + value-adherence). Every skill is validated locally before it's admitted.

Why it's trustworthy

Every paragraph in the brief, every step in a compiled skill, every acceptance-criterion check, links back to a timestamped observation in the source video. No hallucinated "here's what the video was about." Every claim is falsifiable.

Try it with your video →