Claude Token Vampire – Monitor Claude Code token usage

Claude Token Vampire monitors your Claude Code token usage in real time. Anthropic doesn’t show you how much of your current 5-hour session quota you’ve consumed — Claude Token Vampire does.

Free. Open source. Runs offline. Uses zero tokens.

GitHub repository  |  Download

Claude Token Vampire logo

Why it exists

Anthropic’s usage page shows a single progress bar: “Current session — Resets in X hours.” No token counts, no breakdown, no cost, no per-project history. Just a bar and a countdown.

The progress bar is honest about the reset (the 5-hour session really does hard-reset at that timestamp). What it is not honest about is the ceiling: Anthropic never publishes the exact token cap, so 72% could mean you have 10 minutes or 2 hours of heavy use left. That is the gap Claude Token Vampire fills.

How Anthropic’s 5-hour window actually works

Your first message starts a 5-hour clock. Every subsequent message counts against the same session budget. When the clock hits 5 hours the entire counter hard-resets to zero — not gradually, all at once. Your next message after that starts a fresh 5-hour session.

Anthropic’s own support page confirms this: “if you hit your limit at 2 PM, your next allocation begins at 7 PM, then 12 AM, and so on” (source). The Claude Code statusline exposes a single rate_limits.five_hour.resets_at timestamp (statusline docs) — one reset moment per session, not continuous expiry.

Correction note (2026-04-24): an earlier version of this page described the quota as a true sliding window where individual tokens “drip off” after 5 hours. That was wrong. The window is session-based. The rest of the page has been updated accordingly.

What’s actually missing from Anthropic’s UI

The reset countdown is fine. What you cannot see on the usage page:

  • How many tokens you’ve actually burned (only a percentage)
  • When inside the 5-hour session they were burned (early big burst vs steady drip looks identical)
  • What kind of tokens (cache reads are cheap; cache creation is expensive)
  • Per-project split (one heavy project can eat a quiet day’s budget)
  • Cache status (5-minute and 1-hour tiers — about to cost you a rebuild?)
  • A hard dollar number for the session so far

A single progress bar crushes all of that into one number.

What Claude Token Vampire shows instead

A per-15-minute bar chart spanning the full 5-hour session (start → +1h → +2h → +3h → +4h → end (reset)), with:

  • Exact token counts by type (input, output, cache creation, cache read)
  • A configurable ceiling so the bar chart’s colours actually mean something (green → yellow → orange → red)
  • A real countdown to the session hard reset
  • A cost estimate in USD with editable per-million rates
  • Cache warmth per tier (5-minute / 1-hour), with a warning before you pay for a full rebuild
  • Per-project breakdown so you know which project is eating the budget

Claude Token Vampire main window showing 5-hour session window

Comparison

claude.ai/settings/usageClaude Token Vampire
Token countsHidden behind a vague percentageExact numbers by type
Usage over timeFlattened into one bar5h bar chart, 15-minute buckets
Reset timerCorrect, but the ceiling it resets against is hiddenSame hard countdown, plus your configured ceiling so the percentage means something
Per-project splitNoneEvery project tracked separately
Cache statusNot shownHit rate + warm/cold warning per tier
Cost estimateNot shownConfigurable per-token rates
Chart spanNoneCurrent 5-hour session (20 bars)
Works offlineNeeds browser + loginLocal files only. Zero tokens. Zero API calls.

Anthropic tells you that you’re using tokens. Claude Token Vampire tells you how many, where, how fast, what kind, and how long until the session resets.

Related reading: The 5-hour mirage — Anthropic’s moving goalposts.

What it does

It puts you in control of your Claude Code tokens:

  • Tracks all billable token types: input, output, cache creation, cache reads
  • Shows the current 5-hour session with a per-bucket bar chart spanning session_start → session_end
  • Color-coded bars: green → yellow → red as you approach your limit
  • Estimates cost (configurable $/1M token rates)
  • Shows cache hit rate and warns when the 5-minute cache gap expires
  • Counts down until the session hard-resets (all tokens reset at once, not gradually)
  • Runs quietly in the system tray — click the icon to show/hide
  • USES 0 TOKENS — runs entirely offline, no API calls, no Claude queries

Views

  • All Projects — combined current-session view across everything
  • Per Project — same chart broken down by project

User manual

All Projects tab — Stats panel

LabelMeaning
Total tokens (session)Sum of all tokens (input + output + cache creation + cache read) in the current 5-hour session. Shown as used / limit.
MessagesNumber of assistant responses in the current session.
Cache hit ratecache_read / (cache_read + input). Higher = cheaper. 99%+ is normal for long sessions.
Estimated costUSD estimate based on token counts and per-million rates (configurable in Settings).
Reset inMinutes until the current session hard-resets. When it hits zero, the entire counter returns to zero — all at once, not per-message.
Cache statusTwo cache tiers. 5m cache (subagents/tools) — warm if last message < 5 min ago. 1h cache (main conversation) — warm if last message < 1 hour ago. Orange = 5m cold, 1h still warm. Red = both cold (full rebuild on next message).
Web searches / fetchesCount of web_search and web_fetch tool calls in the session.
Cache 1h / 5mBreakdown of cache creation tokens by tier: 1-hour ephemeral vs 5-minute ephemeral. Display only — already included in the total.

All Projects tab — Chart

Each bar = one 15-minute bucket. The full 5-hour session has 20 bars, left edge at session_start, right edge at session_end (reset).

Y-axis scales to your per-bucket budget (limit / 20). So if your limit is 88M, the Y-axis tops out at 4.4M per bar.

Bar colors (when limit is set):

  • Green: < 50% of Y-axis max
  • Yellow: 50–75%
  • Orange: 75–90%
  • Red: ≥ 90%

Bar colors (no limit set): blue (auto-scale mode). A value label appears above each bar showing the token count for that 15-minute interval.

Per Project tab

  • Project list: active projects (with token counts) and inactive known projects (gray, separated)
  • Per-project stats: tokens, messages, cache hit rate, cost, reset-in, cache status
  • Per-project bar chart (same renderer, filtered data)
  • Selection preserved across automatic refreshes

Settings

Claude Token Vampire settings dialog

SettingDefaultNotes
Max tokens (5h session)88,000,000Your estimated session ceiling. Anthropic does not publish exact caps; 88M is a community estimate of the Max 5x cap. Set to 0 if unknown (chart switches to auto-scale blue).
Cost: input tokens ($/1M)3.00Anthropic’s price per 1M input tokens.
Cost: output tokens ($/1M)15.00Per 1M output tokens.
Cost: cache read ($/1M)0.30Per 1M cache-read tokens.
Cost: cache creation ($/1M)3.75Per 1M cache-creation tokens.
Refresh interval (seconds)60How often to re-scan session files. Minimum 10.
Start minimized to trayoffHide window on app startup.
Start with WindowsoffLaunch at Windows login.

Tips

  • “CACHE COLD” warning: Two tiers. 5-min cache (subagents/tools): expires after 5 min idle — step away briefly and subagent calls get more expensive. 1-hour cache (main conversation): expires after 60 min idle — the full prompt rebuild only hits when you’ve been away for more than an hour. Orange = subagent cache cold, main cache still warm. Red = both cold.
  • Reset-in countdown: When this hits 0, the whole session resets and your next message starts a new 5-hour window with full quota.
  • Per-bucket chart: Helps spot usage spikes. A single tall bar suggests a big refactor or long conversation in that 15-minute window.
  • Cost estimate: Approximate. Real billing may differ. Useful for relative comparison.

Install

Download from the GitHub repository.

  1. Copy the extracted folder somewhere permanent (e.g. C:\Tools\ClaudeTokenVampire)
  2. Double-click Install.cmd
  3. In Claude Code, run: /reload-plugins

Launch (two ways):

  • Fast: Type launch vampire, start vampire, or token monitor in Claude Code (instant, bypasses the model)
  • Skill: Type /claudetokenvampire:monitor in Claude Code (~5 sec, loads full context)

Requirements

  • Windows 10/11
  • Claude Code (no API keys needed)
  • No external libraries needed

Platform support

PlatformStatus
WindowsAvailable now
macOSComing soon

The codebase uses FMX (FireMonkey), which is cross-platform. The macOS port mainly requires swapping %USERPROFILE%\.claude\ for ~/.claude/.

Safety

  • Opens files in read-only shared mode — never interferes with Claude Code
  • Totally local
  • No data is sent anywhere
  • No tokens are wasted
  • No API key required

Roadmap

  • Read Anthropic’s authoritative rate_limits.five_hour.resets_at via a Claude Code statusline bridge (official, no scraping)
  • Weekly (7-day) cap tracking, alongside the session view
  • Show window in “minimal” mode (only critical info)
  • Beep when approaching maximum quota
  • Warning when using Claude during peak hours
  • Rate-limit prediction: “at this pace, limit in 47 min”
  • Budget enforcement via hooks
  • Real-time activity indicator — tray icon color change when Claude is active
  • Tool-call analytics — “Top 10 tool calls” stat
  • Session search — keyword search across sessions
  • Support for multiple computers (same account, two machines)

Sources

Stars are free

If you find Claude Token Vampire useful, drop a star on GitHub. It encourages further development.


Written in Delphi 13 (FMX). Part of my wider collection of open-source Delphi libraries and tools.

If AI-assisted Delphi development interests you, see my book “Delphi in all its glory – AI-Assisted Development for Delphi”.

Scroll to Top