Claude Token Vampire - Monitor Claude Code token usage

Claude Token Vampire monitors your Claude Code token usage in real time. Anthropic doesn’t show you how much of your current 5-hour session quota you’ve consumed — Claude Token Vampire does.

Free. Open source. Runs offline. Uses zero tokens.

GitHub repository | Download | Vote on next feature

Why it exists

Anthropic’s usage page shows a single progress bar: “Current session — Resets in X hours.” No token counts, no breakdown, no cost, no per-project history. Just a bar and a countdown.

The progress bar is honest about the reset (the 5-hour session really does hard-reset at that timestamp). What it is not honest about is the ceiling: Anthropic never publishes the exact token cap, so 72% could mean you have 10 minutes or 2 hours of heavy use left. That is the gap Claude Token Vampire fills.

How Anthropic’s 5-hour window actually works

Your first message starts a 5-hour clock. Every subsequent message counts against the same session budget. When the clock hits 5 hours the entire counter hard-resets to zero — not gradually, all at once. Your next message after that starts a fresh 5-hour session.

Anthropic’s own support page confirms this: “if you hit your limit at 2 PM, your next allocation begins at 7 PM, then 12 AM, and so on” (source). The Claude Code statusline exposes a single rate_limits.five_hour.resets_at timestamp (statusline docs) — one reset moment per session, not continuous expiry.

Correction note (2026-04-24): an earlier version of this page described the quota as a true sliding window where individual tokens “drip off” after 5 hours. That was wrong. The window is session-based. The rest of the page has been updated accordingly.

What’s actually missing from Anthropic’s UI

The reset countdown is fine. What you cannot see on the usage page:

How many tokens you’ve actually burned (only a percentage)
When inside the 5-hour session they were burned (early big burst vs steady drip looks identical)
What kind of tokens (cache reads are cheap; cache creation is expensive)
Per-project split (one heavy project can eat a quiet day’s budget)
Cache status (5-minute and 1-hour tiers — about to cost you a rebuild?)
A hard dollar number for the session so far
The 7-day cap alongside the 5-hour cap (Anthropic enforces both)
A pace projection (at the current burn rate, when do you hit the cap?)

A single progress bar crushes all of that into one number.

What Claude Token Vampire shows instead

A per-15-minute bar chart spanning the full 5-hour session (start → +1h → +2h → +3h → +4h → end (reset)), with:

Exact token counts by type (input, output, cache creation, cache read)
A configurable ceiling so the bar chart’s colours actually mean something (green → yellow → orange → red)
A real countdown to the session hard reset
A cost estimate in USD with editable per-million rates
Cache warmth per tier (5-minute / 1-hour), with a warning before you pay for a full rebuild
Per-project breakdown so you know which project is eating the budget
7-day rolling tally alongside the session number, so the weekly cap doesn’t ambush you
Pace projection — “at this rate you’ll hit the cap in 47 min” when usage is on track for an early lockout
Peak-hours warning — alerts you when you’re working during Anthropic’s known high-load window (12:00–18:00 UTC), where rate-limit pressure is reportedly higher

Comparison

	claude.ai/settings/usage	Claude Token Vampire
Token counts	Hidden behind a vague percentage	Exact numbers by type
Usage over time	Flattened into one bar	5h bar chart, 15-minute buckets
Reset timer	Correct, but the ceiling it resets against is hidden	Same hard countdown, plus your configured ceiling so the percentage means something
Per-project split	None	Every project tracked separately
Cache status	Not shown	Hit rate + warm/cold warning per tier
Cost estimate	Not shown	Configurable per-token rates
7-day cap	Not shown	Tracked alongside the session number
Pace projection	None	“At this rate, cap in X min”
Chart span	None	Current 5-hour session (20 bars)
Works offline	Needs browser + login	Local files only. Zero tokens. Zero API calls.

Anthropic tells you that you’re using tokens. Claude Token Vampire tells you how many, where, how fast, what kind, and how long until the session resets.

What it does

It puts you in control of your Claude Code tokens:

Tracks all billable token types: input, output, cache creation, cache reads
Shows the current 5-hour session with a per-bucket bar chart spanning session_start → session_end
Tracks the 7-day rolling cap alongside the session, with a configurable weekly limit
Color-coded bars: green → yellow → red as you approach your limit
Estimates cost (configurable $/1M token rates)
Shows cache hit rate and warns when the 5-minute cache gap expires
Counts down until the session hard-resets (all tokens reset at once, not gradually)
Projects pace: “at this rate, you’ll hit the cap in N minutes”
Flags Anthropic peak hours (12:00–18:00 UTC), when rate-limit tightness is reported by users
Lets you add manual entries for token usage from other machines or the claude.ai web interface (helps when one quota covers multiple devices)
Runs quietly in the system tray — click the icon to show/hide. Choose between a solid status square or a 5-hour sparkline icon.
USES 0 TOKENS — runs entirely offline, no API calls, no Claude queries

Views

All Projects — combined current-session view across everything
Per Project — same chart broken down by project

User manual

All Projects tab — Stats panel

Label	Meaning
Total tokens (session)	Sum of all tokens (input + output + cache creation + cache read) in the current 5-hour session. Shown as `used / limit`. When the burn rate would hit the cap before the session ends, also shows `(time) to cap at pace`.
Total tokens (7d)	True sliding 7-day total. Shown raw if no weekly cap is configured, or as `used / cap (%)` when you set `MaxTokensWeekly` in Settings.
Messages	Number of assistant responses in the current session.
Cache hit rate	`cache_read / (cache_read + input)`. Higher = cheaper. 99%+ is normal for long sessions.
Estimated cost	USD estimate based on token counts and per-million rates (configurable in Settings).
Reset in	Minutes until the current session hard-resets. When it hits zero, the entire counter returns to zero — all at once, not per-message.
Cache status	Two cache tiers. 5m cache (subagents/tools) — warm if last message < 5 min ago. 1h cache (main conversation) — warm if last message < 1 hour ago. Orange = 5m cold, 1h still warm. Red = both cold (full rebuild on next message).
Web searches / fetches	Count of web_search and web_fetch tool calls in the session.
Cache 1h / 5m	Breakdown of cache creation tokens by tier: 1-hour ephemeral vs 5-minute ephemeral. Display only — already included in the total.

All Projects tab — Chart

Each bar = one 15-minute bucket (configurable 2–60 minutes per bar). The full 5-hour session has 20 bars at the default width, left edge at session_start, right edge at session_end (reset).

Y-axis scales to your per-bucket budget (limit / 20). So if your limit is 88M, the Y-axis tops out at 4.4M per bar. The Y-axis floor (MinYAxis) ratchets upward as your usage grows so the scale stays sensible.

Bar colors (when limit is set):

Green: < 50% of Y-axis max
Yellow: 50–75%
Orange: 75–90%
Red: ≥ 90%

Bar colors (no limit set): blue (auto-scale mode). A value label appears above each bar showing the token count for that 15-minute interval.

Per Project tab

Project list: active projects (with token counts) and inactive known projects (gray, separated)
Per-project stats: tokens, messages, cache hit rate, cost, reset-in, cache status
Per-project bar chart (same renderer, filtered data)
Selection preserved across automatic refreshes

Manual entries

The Add External toolbar button lets you log token usage that didn’t come from Claude Code on this machine — for example, sessions from claude.ai, the Anthropic console, or another computer running on the same account. Manual entries fold into the session and 7-day totals exactly like real entries, so the limits stay accurate when one quota covers more than one device.

Settings

Setting	Default	Notes
Max tokens (5h session)	88,000,000	Your estimated session ceiling. Anthropic does not publish exact caps; 88M is a community estimate of the Max 5x cap. Set to 0 if unknown (chart switches to auto-scale blue).
Max tokens (7-day)	0 (= unknown)	The 7-day rolling cap. Anthropic also doesn’t publish this number. Set to 0 to display raw weekly tokens, or enter your own estimate to get a percentage and color coding.
Cost: input tokens ($/1M)	3.00	Anthropic’s price per 1M input tokens.
Cost: output tokens ($/1M)	15.00	Per 1M output tokens.
Cost: cache read ($/1M)	0.30	Per 1M cache-read tokens.
Cost: cache creation ($/1M)	3.75	Per 1M cache-creation tokens.
Refresh interval (seconds)	60	How often to re-scan session files. Minimum 10.
Bucket width (minutes)	15	Width of each chart bar. Range 2–60. Smaller = more, narrower bars; larger = fewer, wider bars.
Y-axis minimum	10,000,000	Y-axis floor. Ratchets up as your usage grows so the scale stays useful; never shrinks back during a session.
Start minimized to tray	off	Hide window on app startup.
Launch at login	off	Launch at Windows login (Run registry key).
Minimize on close	on	If on, the X button hides the app to tray. If off, X exits the app instead.
Tray icon: sparkline	off	If off, tray icon is a solid color square (green/yellow/orange/red by usage). If on, the icon shows a tiny 5-hour sparkline of your bucket data.

Tips

“CACHE COLD” warning: Two tiers. 5-min cache (subagents/tools): expires after 5 min idle — step away briefly and subagent calls get more expensive. 1-hour cache (main conversation): expires after 60 min idle — the full prompt rebuild only hits when you’ve been away for more than an hour. Orange = subagent cache cold, main cache still warm. Red = both cold.
Reset-in countdown: When this hits 0, the whole session resets and your next message starts a new 5-hour window with full quota.
Per-bucket chart: Helps spot usage spikes. A single tall bar suggests a big refactor or long conversation in that 15-minute window.
Pace projection: When the rate of usage projects you’ll hit the cap before the session ends (and at least 10 minutes have passed for the rate to be meaningful), the tokens row shows (time) to cap at pace. Otherwise it stays hidden — no fake “you’re fine” reassurance.
Peak-hours banner: 12:00–18:00 UTC is when Anthropic’s load is highest based on user reports. The orange “Peak hours” banner translates that to your local clock.
Cost estimate: Approximate. Real billing may differ. Useful for relative comparison.

Install

Download from the GitHub repository.

Copy the extracted folder somewhere permanent (e.g. C:\Tools\ClaudeTokenVampire)
Double-click Install.cmd
In Claude Code, run: /reload-plugins

Launch (two ways):

Fast: Type launch vampire, start vampire, or token monitor in Claude Code (instant, bypasses the model)
Skill: Type /claudetokenvampire:monitor in Claude Code (~5 sec, loads full context)

Requirements

Windows 10/11
Claude Code (no API keys needed)
No external libraries needed

Platform support

Platform	Status
Windows	Available now
macOS	Coming soon

The codebase uses FMX (FireMonkey), which is cross-platform. The macOS port mainly requires swapping %USERPROFILE%\.claude\ for ~/.claude/.

Safety

Opens files in read-only shared mode — never interferes with Claude Code
Totally local
No data is sent anywhere
No tokens are wasted
No API key required

Vote on the next feature

Claude Token Vampire’s roadmap is shaped by what users actually need. Drop a vote (or a feature request) on GitHub Discussions. The most-requested items get prioritised. The app itself nudges you to vote once after your third launch and never asks again unless you want it to.

Roadmap

Read Anthropic’s authoritative rate_limits.five_hour.resets_at via a Claude Code statusline bridge (official, no scraping)
Show window in “minimal” mode (only critical info)
Beep when approaching maximum quota
Budget enforcement via Claude Code hooks (warn or block before tools fire when you’re over budget)
Real-time activity indicator — tray icon flashes when Claude is actively writing to the session file
Tool-call analytics — “Top 10 tool calls” stat for the current session
Session search — keyword search across all session files, with one-click resume
Multi-computer support (live sync between two machines on the same account, beyond today’s manual-entry workaround)
macOS port

Sources

Anthropic Support: Manage extra usage for paid Claude plans
Anthropic Support: Usage limit best practices
Anthropic Support: What is the Max plan?
Claude Code: Customize your status line — documents rate_limits.five_hour.resets_at and used_percentage

Stars are free

If you find Claude Token Vampire useful, drop a star on GitHub. It encourages further development.

Written in Delphi 13 (FMX). Part of my wider collection of open-source Delphi libraries and tools.

If AI-assisted Delphi development interests you, see my book “Delphi in all its glory – AI-Assisted Development for Delphi”.