TLDW — Too Long; Didn't Watch

An AI content intelligence pipeline that watches 51 YouTube channels so I don't have to. Daily-driven. The video at the top of this site is one of its outputs.

By the numbers

YouTube channels tracked

783+

videos processed

artifact types per video (transcript, summary, detailed)

daily

cron-driven pipeline

What it does

Every day, a scheduled job polls the YouTube Data API for new uploads across a curated set of strategy/AI/engineering channels. New videos are transcribed (Whisper, GPU-accelerated), classified, and run through two Claude models in sequence — one for a fast TLDW summary, one for a deeper analytical writeup. Output is committed to a git repo so every artifact is versioned, diffable, and immediately queryable by other Claude sessions.

Pipeline

YouTube Data API (polling)
    ↓ new video
Whisper transcription (faster-whisper, CUDA)
    ↓ transcript.md
Claude Haiku — fast summarization, watch recommendation, action items
    ↓ summary.md
Claude Sonnet — detailed analysis, themes, creator perspective, quotes
    ↓ detailed.md
Git commit per artifact · cross-channel search via Claude Code skill

Output structure

For every video, three markdown files land in a per-channel directory:

transcript.md — frontmatter-tagged full transcript with video ID, channel ID, published date, YouTube URL.
summary.md — TLDW one-paragraph, watch/skip recommendation, key insights (3–5 bullets), action items.
detailed.md — multi-section deep analysis: core thesis, key points, tools mentioned, problems highlighted, notable quotes, creator perspective.

The directory layout is deliberately filesystem-native (channels/{channel-slug}/{date}_{video-id}/) so any Claude Code session can grep, glob, or summarize across the corpus without hitting a database.

Why it matters (the wedge)

Most "AI summarizer" tools are one-shot consumer apps. TLDW is a production pipeline: it has retry/backoff, state persistence, dual-model cost routing (Haiku for cheap summarization, Sonnet for analysis), and a cross-channel search skill that lets me ask "what did the strategy channels say about agent legibility this month?" without re-reading 50 videos.

This is the same operational discipline that managed 100+ Windows Mobile builds — branch isolation, idempotent steps, observable failures — applied to LLM workflows.

Tech

YouTube Data API for channel polling
Whisper (faster-whisper, CUDA) for transcription
Anthropic Claude API — Haiku and Sonnet, routed by task complexity
Claude Code skill nate-transcript-digest for cross-channel synthesis
Git as the storage and audit layer
Cron / systemd timers on the homelab fleet