Pricing

Pay only for what you use.

Pay-as-you-go. Every API call deducts in pence against a credit balance, billed monthly to a card or invoice. No subscription required. (Self-service top-up lands during the public beta — for now we invoice in arrears against beta keys.)

What

Rate

Unit

Chat completions (LLM)

qwen2.5-32b-instruct, qwen2.5-vl-7b-instruct

Blended rate — input + output tokens charged at the same rate. No separate prompt/completion pricing. Same rate for the vision model (qwen2.5-vl-7b-instruct); image inputs are billed as tokens.

£1.00

per 1M tokens

Speech-to-text

whisper-large-v3-turbo

Billed on actual audio duration (not file size). 6-second clip = 0.1 minutes = £0.0005.

£0.005

per audio minute

Embeddings

bge-m3

Multilingual 1024-dim dense vectors via BGE-M3. Token counts come from the model's own tokenizer for accuracy.

£0.05

per 1M tokens

Chat completions (LLM)

qwen2.5-32b-instruct, qwen2.5-vl-7b-instruct

£1.00per 1M tokens

Blended rate — input + output tokens charged at the same rate. No separate prompt/completion pricing. Same rate for the vision model (qwen2.5-vl-7b-instruct); image inputs are billed as tokens.

Speech-to-text

whisper-large-v3-turbo

£0.005per audio minute

Billed on actual audio duration (not file size). 6-second clip = 0.1 minutes = £0.0005.

Embeddings

bge-m3

£0.05per 1M tokens

Multilingual 1024-dim dense vectors via BGE-M3. Token counts come from the model's own tokenizer for accuracy.

All prices excluding VAT. UK customers see VAT added at checkout. EU B2B customers reverse-charge per their local rules.

Private beta access

Email us, we issue an API key the same day. No card required during the beta — first month's usage is on us up to £20, billed in arrears after that.

Request beta access

Sovereign Enterprise

Dedicated capacity, signed DPA, named-subprocessor disclosure, audit-log retention to your specification, 24-hour SLA. For banks, public sector, and regulated industries.

Talk to sales →

Model Hosting

Coming with Factory

Once you fine-tune a model with us, it stays with us — callable, durable, optionally always-warm. Hosting is a separate subscription on top of the inference rates above. Three tiers, three latency targets:

Cold

30–90s warm-up

£0.50

per GB / month

Stored in UK object storage. Loads to GPU on first call after idle. No minimum — pay for what you store.

Warm

2–5s cold start

Flat £/mo

per active model

Kept hot on local SSD; loads to VRAM on demand. Per-call inference billed at the standard rate above.

Always-hot

0s — permanent VRAM

Flat £/mo

reserves a VRAM slot

Permanently loaded into GPU memory. For latency-sensitive production traffic. Available once second GPU lands.

Sovereignty enterprise tier (hosted in the UK on hardware we own, flat monthly contract) available alongside Always-hot for regulated buyers. Email hello@gpubox.ai.

What's included at every price

Hardware located and operated in the United Kingdom
OpenAI-compatible API surface (drop-in for any OpenAI SDK)
Per-call audit log retained for 30 days minimum
Streaming responses on chat completions (SSE)
Token-level usage metering visible in dashboard
VAT-compliant invoicing (UK + EU)