Claude Token Vampire monitors your Claude Code token usage in real time. Anthropic doesn’t show you how much of your current 5-hour session quota you’ve consumed — Claude Token Vampire does.
Free. Open source. Runs offline. Uses zero tokens.

Why it exists
Anthropic’s usage page shows a single progress bar: “Current session — Resets in X hours.” No token counts, no breakdown, no cost, no per-project history. Just a bar and a countdown.
The progress bar is honest about the reset (the 5-hour session really does hard-reset at that timestamp). What it is not honest about is the ceiling: Anthropic never publishes the exact token cap, so 72% could mean you have 10 minutes or 2 hours of heavy use left. That is the gap Claude Token Vampire fills.
How Anthropic’s 5-hour window actually works
Your first message starts a 5-hour clock. Every subsequent message counts against the same session budget. When the clock hits 5 hours the entire counter hard-resets to zero — not gradually, all at once. Your next message after that starts a fresh 5-hour session.
Anthropic’s own support page confirms this: “if you hit your limit at 2 PM, your next allocation begins at 7 PM, then 12 AM, and so on” (source). The Claude Code statusline exposes a single rate_limits.five_hour.resets_at timestamp (statusline docs) — one reset moment per session, not continuous expiry.
Correction note (2026-04-24): an earlier version of this page described the quota as a true sliding window where individual tokens “drip off” after 5 hours. That was wrong. The window is session-based. The rest of the page has been updated accordingly.
What’s actually missing from Anthropic’s UI
The reset countdown is fine. What you cannot see on the usage page:
- How many tokens you’ve actually burned (only a percentage)
- When inside the 5-hour session they were burned (early big burst vs steady drip looks identical)
- What kind of tokens (cache reads are cheap; cache creation is expensive)
- Per-project split (one heavy project can eat a quiet day’s budget)
- Cache status (5-minute and 1-hour tiers — about to cost you a rebuild?)
- A hard dollar number for the session so far
A single progress bar crushes all of that into one number.
What Claude Token Vampire shows instead
A per-15-minute bar chart spanning the full 5-hour session (start → +1h → +2h → +3h → +4h → end (reset)), with:
- Exact token counts by type (input, output, cache creation, cache read)
- A configurable ceiling so the bar chart’s colours actually mean something (green → yellow → orange → red)
- A real countdown to the session hard reset
- A cost estimate in USD with editable per-million rates
- Cache warmth per tier (5-minute / 1-hour), with a warning before you pay for a full rebuild
- Per-project breakdown so you know which project is eating the budget

Comparison
| claude.ai/settings/usage | Claude Token Vampire | |
|---|---|---|
| Token counts | Hidden behind a vague percentage | Exact numbers by type |
| Usage over time | Flattened into one bar | 5h bar chart, 15-minute buckets |
| Reset timer | Correct, but the ceiling it resets against is hidden | Same hard countdown, plus your configured ceiling so the percentage means something |
| Per-project split | None | Every project tracked separately |
| Cache status | Not shown | Hit rate + warm/cold warning per tier |
| Cost estimate | Not shown | Configurable per-token rates |
| Chart span | None | Current 5-hour session (20 bars) |
| Works offline | Needs browser + login | Local files only. Zero tokens. Zero API calls. |
Anthropic tells you that you’re using tokens. Claude Token Vampire tells you how many, where, how fast, what kind, and how long until the session resets.
Related reading: The 5-hour mirage — Anthropic’s moving goalposts.
What it does
It puts you in control of your Claude Code tokens:
- Tracks all billable token types: input, output, cache creation, cache reads
- Shows the current 5-hour session with a per-bucket bar chart spanning
session_start → session_end - Color-coded bars: green → yellow → red as you approach your limit
- Estimates cost (configurable $/1M token rates)
- Shows cache hit rate and warns when the 5-minute cache gap expires
- Counts down until the session hard-resets (all tokens reset at once, not gradually)
- Runs quietly in the system tray — click the icon to show/hide
- USES 0 TOKENS — runs entirely offline, no API calls, no Claude queries
Views
- All Projects — combined current-session view across everything
- Per Project — same chart broken down by project
User manual
All Projects tab — Stats panel
| Label | Meaning |
|---|---|
| Total tokens (session) | Sum of all tokens (input + output + cache creation + cache read) in the current 5-hour session. Shown as used / limit. |
| Messages | Number of assistant responses in the current session. |
| Cache hit rate | cache_read / (cache_read + input). Higher = cheaper. 99%+ is normal for long sessions. |
| Estimated cost | USD estimate based on token counts and per-million rates (configurable in Settings). |
| Reset in | Minutes until the current session hard-resets. When it hits zero, the entire counter returns to zero — all at once, not per-message. |
| Cache status | Two cache tiers. 5m cache (subagents/tools) — warm if last message < 5 min ago. 1h cache (main conversation) — warm if last message < 1 hour ago. Orange = 5m cold, 1h still warm. Red = both cold (full rebuild on next message). |
| Web searches / fetches | Count of web_search and web_fetch tool calls in the session. |
| Cache 1h / 5m | Breakdown of cache creation tokens by tier: 1-hour ephemeral vs 5-minute ephemeral. Display only — already included in the total. |
All Projects tab — Chart
Each bar = one 15-minute bucket. The full 5-hour session has 20 bars, left edge at session_start, right edge at session_end (reset).
Y-axis scales to your per-bucket budget (limit / 20). So if your limit is 88M, the Y-axis tops out at 4.4M per bar.
Bar colors (when limit is set):
- Green: < 50% of Y-axis max
- Yellow: 50–75%
- Orange: 75–90%
- Red: ≥ 90%
Bar colors (no limit set): blue (auto-scale mode). A value label appears above each bar showing the token count for that 15-minute interval.
Per Project tab
- Project list: active projects (with token counts) and inactive known projects (gray, separated)
- Per-project stats: tokens, messages, cache hit rate, cost, reset-in, cache status
- Per-project bar chart (same renderer, filtered data)
- Selection preserved across automatic refreshes
Settings

| Setting | Default | Notes |
|---|---|---|
| Max tokens (5h session) | 88,000,000 | Your estimated session ceiling. Anthropic does not publish exact caps; 88M is a community estimate of the Max 5x cap. Set to 0 if unknown (chart switches to auto-scale blue). |
| Cost: input tokens ($/1M) | 3.00 | Anthropic’s price per 1M input tokens. |
| Cost: output tokens ($/1M) | 15.00 | Per 1M output tokens. |
| Cost: cache read ($/1M) | 0.30 | Per 1M cache-read tokens. |
| Cost: cache creation ($/1M) | 3.75 | Per 1M cache-creation tokens. |
| Refresh interval (seconds) | 60 | How often to re-scan session files. Minimum 10. |
| Start minimized to tray | off | Hide window on app startup. |
| Start with Windows | off | Launch at Windows login. |
Tips
- “CACHE COLD” warning: Two tiers. 5-min cache (subagents/tools): expires after 5 min idle — step away briefly and subagent calls get more expensive. 1-hour cache (main conversation): expires after 60 min idle — the full prompt rebuild only hits when you’ve been away for more than an hour. Orange = subagent cache cold, main cache still warm. Red = both cold.
- Reset-in countdown: When this hits 0, the whole session resets and your next message starts a new 5-hour window with full quota.
- Per-bucket chart: Helps spot usage spikes. A single tall bar suggests a big refactor or long conversation in that 15-minute window.
- Cost estimate: Approximate. Real billing may differ. Useful for relative comparison.
Install
Download from the GitHub repository.
- Copy the extracted folder somewhere permanent (e.g.
C:\Tools\ClaudeTokenVampire) - Double-click
Install.cmd - In Claude Code, run:
/reload-plugins
Launch (two ways):
- Fast: Type launch vampire, start vampire, or token monitor in Claude Code (instant, bypasses the model)
- Skill: Type
/claudetokenvampire:monitorin Claude Code (~5 sec, loads full context)
Requirements
- Windows 10/11
- Claude Code (no API keys needed)
- No external libraries needed
Platform support
| Platform | Status |
|---|---|
| Windows | Available now |
| macOS | Coming soon |
The codebase uses FMX (FireMonkey), which is cross-platform. The macOS port mainly requires swapping %USERPROFILE%\.claude\ for ~/.claude/.
Safety
- Opens files in read-only shared mode — never interferes with Claude Code
- Totally local
- No data is sent anywhere
- No tokens are wasted
- No API key required
Roadmap
- Read Anthropic’s authoritative
rate_limits.five_hour.resets_atvia a Claude Code statusline bridge (official, no scraping) - Weekly (7-day) cap tracking, alongside the session view
- Show window in “minimal” mode (only critical info)
- Beep when approaching maximum quota
- Warning when using Claude during peak hours
- Rate-limit prediction: “at this pace, limit in 47 min”
- Budget enforcement via hooks
- Real-time activity indicator — tray icon color change when Claude is active
- Tool-call analytics — “Top 10 tool calls” stat
- Session search — keyword search across sessions
- Support for multiple computers (same account, two machines)
Sources
- Anthropic Support: Manage extra usage for paid Claude plans
- Anthropic Support: Usage limit best practices
- Anthropic Support: What is the Max plan?
- Claude Code: Customize your status line — documents
rate_limits.five_hour.resets_atandused_percentage
Stars are free
If you find Claude Token Vampire useful, drop a star on GitHub. It encourages further development.
Written in Delphi 13 (FMX). Part of my wider collection of open-source Delphi libraries and tools.
If AI-assisted Delphi development interests you, see my book “Delphi in all its glory – AI-Assisted Development for Delphi”.