Claude Token Vampire - Monitor Claude Code token usage

Claude Token Vampire monitors your Claude Code token usage in real time. Anthropic doesn’t show you how much of your current 5-hour session quota you’ve consumed — Claude Token Vampire does.

Free. Open source. Runs offline. Uses zero tokens.

GitHub repository | Download

Why it exists

Anthropic’s usage page shows a single progress bar: “Current session — Resets in X hours.” No token counts, no breakdown, no cost, no per-project history. Just a bar and a countdown.

The progress bar is honest about the reset (the 5-hour session really does hard-reset at that timestamp). What it is not honest about is the ceiling: Anthropic never publishes the exact token cap, so 72% could mean you have 10 minutes or 2 hours of heavy use left. That is the gap Claude Token Vampire fills.

How Anthropic’s 5-hour window actually works

Your first message starts a 5-hour clock. Every subsequent message counts against the same session budget. When the clock hits 5 hours the entire counter hard-resets to zero — not gradually, all at once. Your next message after that starts a fresh 5-hour session.

Anthropic’s own support page confirms this: “if you hit your limit at 2 PM, your next allocation begins at 7 PM, then 12 AM, and so on” (source). The Claude Code statusline exposes a single rate_limits.five_hour.resets_at timestamp (statusline docs) — one reset moment per session, not continuous expiry.

Correction note (2026-04-24): an earlier version of this page described the quota as a true sliding window where individual tokens “drip off” after 5 hours. That was wrong. The window is session-based. The rest of the page has been updated accordingly.

What’s actually missing from Anthropic’s UI

The reset countdown is fine. What you cannot see on the usage page:

How many tokens you’ve actually burned (only a percentage)
When inside the 5-hour session they were burned (early big burst vs steady drip looks identical)
What kind of tokens (cache reads are cheap; cache creation is expensive)
Per-project split (one heavy project can eat a quiet day’s budget)
Cache status (5-minute and 1-hour tiers — about to cost you a rebuild?)
A hard dollar number for the session so far

A single progress bar crushes all of that into one number.

What Claude Token Vampire shows instead

A per-15-minute bar chart spanning the full 5-hour session (start → +1h → +2h → +3h → +4h → end (reset)), with:

Exact token counts by type (input, output, cache creation, cache read)
A configurable ceiling so the bar chart’s colours actually mean something (green → yellow → orange → red)
A real countdown to the session hard reset
A cost estimate in USD with editable per-million rates
Cache warmth per tier (5-minute / 1-hour), with a warning before you pay for a full rebuild
Per-project breakdown so you know which project is eating the budget

Comparison

	claude.ai/settings/usage	Claude Token Vampire
Token counts	Hidden behind a vague percentage	Exact numbers by type
Usage over time	Flattened into one bar	5h bar chart, 15-minute buckets
Reset timer	Correct, but the ceiling it resets against is hidden	Same hard countdown, plus your configured ceiling so the percentage means something
Per-project split	None	Every project tracked separately
Cache status	Not shown	Hit rate + warm/cold warning per tier
Cost estimate	Not shown	Configurable per-token rates
Chart span	None	Current 5-hour session (20 bars)
Works offline	Needs browser + login	Local files only. Zero tokens. Zero API calls.

Anthropic tells you that you’re using tokens. Claude Token Vampire tells you how many, where, how fast, what kind, and how long until the session resets.

What it does

It puts you in control of your Claude Code tokens:

Tracks all billable token types: input, output, cache creation, cache reads
Shows the current 5-hour session with a per-bucket bar chart spanning session_start → session_end
Color-coded bars: green → yellow → red as you approach your limit
Estimates cost (configurable $/1M token rates)
Shows cache hit rate and warns when the 5-minute cache gap expires
Counts down until the session hard-resets (all tokens reset at once, not gradually)
Runs quietly in the system tray — click the icon to show/hide
USES 0 TOKENS — runs entirely offline, no API calls, no Claude queries

Views

All Projects — combined current-session view across everything
Per Project — same chart broken down by project

User manual

All Projects tab — Stats panel

Label	Meaning
Total tokens (session)	Sum of all tokens (input + output + cache creation + cache read) in the current 5-hour session. Shown as `used / limit`.
Messages	Number of assistant responses in the current session.
Cache hit rate	`cache_read / (cache_read + input)`. Higher = cheaper. 99%+ is normal for long sessions.
Estimated cost	USD estimate based on token counts and per-million rates (configurable in Settings).
Reset in	Minutes until the current session hard-resets. When it hits zero, the entire counter returns to zero — all at once, not per-message.
Cache status	Two cache tiers. 5m cache (subagents/tools) — warm if last message < 5 min ago. 1h cache (main conversation) — warm if last message < 1 hour ago. Orange = 5m cold, 1h still warm. Red = both cold (full rebuild on next message).
Web searches / fetches	Count of web_search and web_fetch tool calls in the session.
Cache 1h / 5m	Breakdown of cache creation tokens by tier: 1-hour ephemeral vs 5-minute ephemeral. Display only — already included in the total.

All Projects tab — Chart

Each bar = one 15-minute bucket. The full 5-hour session has 20 bars, left edge at session_start, right edge at session_end (reset).

Y-axis scales to your per-bucket budget (limit / 20). So if your limit is 88M, the Y-axis tops out at 4.4M per bar.

Bar colors (when limit is set):

Green: < 50% of Y-axis max
Yellow: 50–75%
Orange: 75–90%
Red: ≥ 90%

Bar colors (no limit set): blue (auto-scale mode). A value label appears above each bar showing the token count for that 15-minute interval.

Per Project tab

Project list: active projects (with token counts) and inactive known projects (gray, separated)
Per-project stats: tokens, messages, cache hit rate, cost, reset-in, cache status
Per-project bar chart (same renderer, filtered data)
Selection preserved across automatic refreshes

Settings

Setting	Default	Notes
Max tokens (5h session)	88,000,000	Your estimated session ceiling. Anthropic does not publish exact caps; 88M is a community estimate of the Max 5x cap. Set to 0 if unknown (chart switches to auto-scale blue).
Cost: input tokens ($/1M)	3.00	Anthropic’s price per 1M input tokens.
Cost: output tokens ($/1M)	15.00	Per 1M output tokens.
Cost: cache read ($/1M)	0.30	Per 1M cache-read tokens.
Cost: cache creation ($/1M)	3.75	Per 1M cache-creation tokens.
Refresh interval (seconds)	60	How often to re-scan session files. Minimum 10.
Start minimized to tray	off	Hide window on app startup.
Start with Windows	off	Launch at Windows login.

Tips

“CACHE COLD” warning: Two tiers. 5-min cache (subagents/tools): expires after 5 min idle — step away briefly and subagent calls get more expensive. 1-hour cache (main conversation): expires after 60 min idle — the full prompt rebuild only hits when you’ve been away for more than an hour. Orange = subagent cache cold, main cache still warm. Red = both cold.
Reset-in countdown: When this hits 0, the whole session resets and your next message starts a new 5-hour window with full quota.
Per-bucket chart: Helps spot usage spikes. A single tall bar suggests a big refactor or long conversation in that 15-minute window.
Cost estimate: Approximate. Real billing may differ. Useful for relative comparison.

Install

Download from the GitHub repository.

Copy the extracted folder somewhere permanent (e.g. C:\Tools\ClaudeTokenVampire)
Double-click Install.cmd
In Claude Code, run: /reload-plugins

Launch (two ways):

Fast: Type launch vampire, start vampire, or token monitor in Claude Code (instant, bypasses the model)
Skill: Type /claudetokenvampire:monitor in Claude Code (~5 sec, loads full context)

Requirements

Windows 10/11
Claude Code (no API keys needed)
No external libraries needed

Platform support

Platform	Status
Windows	Available now
macOS	Coming soon

The codebase uses FMX (FireMonkey), which is cross-platform. The macOS port mainly requires swapping %USERPROFILE%\.claude\ for ~/.claude/.

Safety

Opens files in read-only shared mode — never interferes with Claude Code
Totally local
No data is sent anywhere
No tokens are wasted
No API key required

Roadmap

Read Anthropic’s authoritative rate_limits.five_hour.resets_at via a Claude Code statusline bridge (official, no scraping)
Weekly (7-day) cap tracking, alongside the session view
Show window in “minimal” mode (only critical info)
Beep when approaching maximum quota
Warning when using Claude during peak hours
Rate-limit prediction: “at this pace, limit in 47 min”
Budget enforcement via hooks
Real-time activity indicator — tray icon color change when Claude is active
Tool-call analytics — “Top 10 tool calls” stat
Session search — keyword search across sessions
Support for multiple computers (same account, two machines)

Sources

Anthropic Support: Manage extra usage for paid Claude plans
Anthropic Support: Usage limit best practices
Anthropic Support: What is the Max plan?
Claude Code: Customize your status line — documents rate_limits.five_hour.resets_at and used_percentage

Stars are free

If you find Claude Token Vampire useful, drop a star on GitHub. It encourages further development.

Written in Delphi 13 (FMX). Part of my wider collection of open-source Delphi libraries and tools.

If AI-assisted Delphi development interests you, see my book “Delphi in all its glory – AI-Assisted Development for Delphi”.