Cluster Protocol: The Private AI Infrastructure Whitepaper
Jun 2, 2026
10 min Read

What you'll find in this article
This is a guided walkthrough of the whitepaper, structured to help you navigate the architecture quickly.
The article is organized in these sections:
The market we're building for: AI and blockchain market context
The Cluster thesis: the philosophy underlying the infrastructure
The architecture: the four-layer stack at a glance
Inference engine: the unified model gateway
Tokenized data marketplace: datasets as on-chain assets
AI Services: the closed-loop data-to-inference economy
x402 payment protocol: micropayments for autonomous agents
Agentic infrastructure: identity, payments, and orchestration primitives
CodeXero: the prompt-to-dApp application layer
Technical architecture: system, contracts, security, and SDK
The platform: Cluster Hub and the developer dashboard
What's live today: shipped status, raise, and roadmap
The full whitepaper: closing summary and reading link
The market we're building for
AI is on track to become a multi-trillion dollar industry. The blockchain market is compounding at 66% annually. Both are accelerating at the same time, and the intersection is wide open.
Software revenue from AI alone is projected to reach $738 billion by 2030. Most of that value is being captured by closed, centralized providers. The infrastructure layer underneath, the part that decides who owns the data, who sees the prompts, who controls the models, is being built right now.
Cluster Protocol is the infra layer.
The Cluster thesis
The internet was built on open protocols. HTTP, SMTP, TCP/IP. No permissions, no gatekeepers, no rent extraction at the protocol level.
AI is being built on the opposite. Closed APIs. Proprietary datasets. Vertically integrated providers. Developers rent access. Data creators get nothing. Agents cannot transact.
Cluster Protocol exists to reverse this trajectory.
The vision is straightforward: build the Private AI infrastructure layer that operates the way the internet's protocols were supposed to. Open, composable, economically fair, verifiable onchain.
Six advantages anchor the whitepaper:
One API, full modality coverage: chat, embeddings, image generation, text-to-speech, speech-to-text, document reranking, all through a single OpenAI-compatible endpoint.
Data as a yield-bearing asset: datasets minted as NFTs, creators earn 85% of every purchase forever.
Data-to-model pipeline: upload a dataset, tokenize it, fine-tune a model on it, serve the model through the inference API, all in one platform.
Onchain settlement and provenance: every payment, every ownership transfer, every dataset, recorded on Base.
Developer-first distribution: pip install cluster-sdk and one base URL change.
x402 pay-per-request: agents pay with USDC per call, no accounts, no API keys, no deposits.
The architecture
Four horizontal layers. Each independently functional, all composable. Every layer feeds into the next, and everything settles on Base.
Inference Engine at the top. 500+ models, multi-modal, multi-provider, one unified API.
Tokenized Data Marketplace + AI Services in the middle. Datasets on IPFS, ownership on Base via NFTs, automatic revenue distribution via PaymentRouter. Fine-tuning pipeline that consumes those datasets and outputs production endpoints.
Settlement Layer on Base. Smart contracts handle payments, ownership, x402 micropayments, Python SDK access.
CodeXero wraps all three. Prompt-to-dApp application layer that consumes inference, data, and compute natively.
This isn't three products bolted together. It's one closed economic loop.
Inference engine
A single OpenAI-compatible endpoint that serves 500+ open-source models across every major modality.
Any application built for the OpenAI API works with Cluster by changing one URL. Llama, Mistral, DeepSeek, Qwen, Gemma, and hundreds more. Parameter ranges from 7B to 405B. Categorized by task type: general chat, code generation, reasoning, instruction following.
Six endpoints cover the full modality spectrum:
- /v1/chat/completions: text inference, streaming supported
- /v1/embeddings: vector representations for RAG, search, clustering
- /v1/images/generations: open-source diffusion models
- /v1/audio/speech: text-to-speech across voices and languages
- /v1/audio/transcriptions: speech-to-text
- /v1/rerank: document reranking for RAG pipelines
Behind the API, the gateway maintains live connections to multiple inference providers. Every request is routed based on availability, latency, and capacity. If a provider drops, the gateway automatically reroutes. The client sees one endpoint, one model name, one response.
Per-token billing. No subscriptions. No minimums. No lock-in.
Tokenized data marketplace
Datasets uploaded to Cluster become on-chain assets. Stored on IPFS for decentralized persistence. Minted as ERC-721 NFTs on Base for verifiable ownership. Transacted through a smart contract that distributes revenue automatically.
Every purchase splits automatically at the contract level:
- 85% to the dataset creator
- 10% to the protocol treasury
- 5% to the referrer
The split cannot be circumvented, modified per transaction, or routed around. Creators earn every time their dataset is purchased, forever.
The marketplace supports structured and unstructured datasets across NLP, computer vision, medical, financial, social, and domain-specific training data. Access control operates at four levels: public metadata, login-gated previews, purchase-gated downloads, and creator-only full access.
This is what "data as a yield-bearing asset" actually looks like in production.
AI Services, the closed-loop economy
This is the single most important diagram in the whitepaper.
A data creator uploads a dataset. It tokenizes on-chain as an NFT. Another developer uses that dataset to fine-tune a model on Cluster compute. The fine-tuned model deploys automatically to the inference API. Consumers call the model. The original data creator earns from both the dataset purchase and every downstream inference call.
Value flows back to contributors at every stage of the pipeline.
The fine-tuning workflow itself is five steps:
- Select a base model from the 500+ catalog (Llama 3.1 70B, etc.)
- Provide training data, either uploaded directly or referenced from the marketplace
- Configure parameters (learning rate, epochs, batch size, evaluation criteria)
- Submit the job: Cluster provisions GPU compute and runs training
- Deploy the result: the model is automatically hosted on Cluster's inference layer, accessible via the same /v1/chat/completions endpoint
The output isn't a model file you download. It's a live, production-ready inference endpoint served through the same gateway as every other model on the platform.
This end-to-end loop is what no other platform connects. Data tokenization, model training, and inference serving in a single integrated flow where value attribution is preserved at every stage.
x402, the payment protocol for autonomous agents
The most important shift in the whitepaper isn't a feature. It's a primitive.
x402 is an open payment protocol originated by Coinbase that uses the HTTP 402 status code to enable native micropayments within API calls. Cluster integrates x402 as a first-class payment method across the inference API and data marketplace.
How it works:
- Client sends a request to a Cluster endpoint, no payment attached.
- Server responds with 402 Payment Required and includes payment instructions (amount, recipient wallet, network)
- Client signs a USDC payment on Base and resubmits the request with the payment proof in the header.
- Server verifies the payment on-chain, executes the request, returns the response.
The entire flow adds approximately 1-2 seconds of latency on the initial request. For streaming inference, this is a one-time delay, subsequent tokens stream normally.
The implication is significant: an AI agent with a USDC wallet on Base can call Cluster's inference API, pay for each request programmatically, and receive responses without any human registering an account, generating an API key, or depositing funds.
The agent is a self-sufficient economic actor.
Cluster supports three concurrent payment methods:
- x402 for agents and Web3 developers: pay per request, no account, zero protocol fees, Cluster retains 100% of the payment.
- Balance system for regular users and teams: deposit USDC or ETH, deduct per call, traditional pay-as-you-go.
- Direct on-chain payment for dataset purchases: one-time payment via PaymentRouter, revenue splits automatically.
The agentic infrastructure layer
Cluster shipped the four primitives agents actually need to exist.
Agent Identity (ERC-8004)
As AI agents become autonomous participants in on-chain ecosystems, they require verifiable identity, not for surveillance, but for reputation, accountability, and interoperability. Cluster implements ERC-8004, an on-chain identity standard for AI agents.
Every agent deployed through the Cluster ecosystem can register with:
- A unique onchain identifier.
- Metadata describing capabilities and purpose.
- A verifiable record of inference consumption and on-chain activity.
- Reputation scores derived from usage history.
Agent Payment Rails (x402)
x402 is the economic layer that makes agents self-sufficient. An agent with a USDC wallet on Base can discover Cluster's inference API, receive a 402 Payment Required response with pricing, autonomously sign a payment and resubmit, receive inference results, and use those results to take further on-chain actions.
No human needs to create an account, generate an API key, or approve a transaction. The agent operates as an independent economic entity.
Agent-to-Inference Loop
The standard execution loop for an autonomous agent on Cluster:
Agent receives task
→ Agent calls Cluster inference API (x402 payment)
→ Receives model response
→ Processes response
→ Executes on-chain action (swap, transfer, deploy, etc.)
→ Logs activity against ERC-8004 identity
→ Repeats
The agent never leaves the on-chain ecosystem. Inference is settled on Base. Actions are executed on Base. Identity is on Base. The entire loop is verifiable and composable with other onchain systems.
Multi-Agent Orchestration
As agents proliferate, coordination becomes necessary. Cluster's infrastructure supports multi-agent patterns where a primary agent decomposes a task into subtasks, subtasks are delegated to specialized agents (each with their own ERC-8004 identity), each agent independently calls Cluster inference and pays via x402, results are aggregated, and the full execution trace is logged on-chain.
The orchestration layer doesn't impose a specific framework. It provides the primitives: identity, payments, inference. Any agent framework can compose them into coordinated workflows.
CodeXero, the application layer
CodeXero is the application layer built natively on Cluster's infrastructure. It enables anyone to create and deploy fully on-chain applications using a natural language prompt.
A user describes what they want to build in plain language.
CodeXero:
- Interprets the prompt and generates a build plan.
- Generates frontend code and smart contract logic.
- Compiles smart contracts (Solidity → ABI/bytecode)
- Deploys frontend to hosting (Netlify/IPFS)
- Deploys contracts to the target blockchain via the user's connected wallet.
The entire process happens in the browser. No local setup, no CLI, no dependencies. CodeXero runs a full development environment in the browser via WebContainer: a browser-native Node.js runtime.
Every dApp deployed through CodeXero generates inference token consumption, creating organic demand for the Cluster token at the infrastructure level. CodeXero is not a standalone product. It's the primary consumption engine for Cluster's infrastructure:
Inference: Every prompt-to-dApp generation calls Cluster's inference API for code generation, planning, and model routing.
Compute: Smart contract compilation and deployment verification runs on Cluster compute.
Data: Templates and starter repos are sourced through the Cluster ecosystem.
Technical architecture
The architecture follows a gateway pattern where all client requests, whether from the web frontend, the Python SDK, CodeXero, or external AI agents, hit a single Fastify-based API gateway.
The gateway handles authentication, billing, rate limiting, and request routing. PostgreSQL for primary storage. Redis for caching and sessions. IPFS via Pinata for decentralized data persistence. Base L2 for smart contracts. Multi-provider compute backend.
Smart contracts on Base
Data monetization enable via ERC-721 Ownership on Base Mainnet (Chain ID: 8453):
DatasetNFT (ERC-721, UUPS Proxy): One NFT per dataset, NFT holder is the revenue beneficiary.
DatasetRegistry: Maps human-readable dataset slugs to on-chain token IDs.
PaymentRouter: Handles all dataset purchase transactions, enforces 85/10/5 split at the contract level.
ClusterToken (ERC-20): Native protocol token on Base for inference payments, dataset purchases, compute provisioning, staking, and governance.
All contracts use OpenZeppelin libraries and the UUPS (Universal Upgradeable Proxy Standard) pattern for safe upgradeability.
Security architecture
Seven layers of protection, perimeter to contract:
- Web application firewall for bot control and injection prevention.
- TLS 1.3 and HSTS at the transport layer.
- Privy JWT + hashed API keys at authentication.
- Per-key and per-IP rate limits at the API layer.
- IPFS CIDs gated behind authentication and purchase verification at the data layer.
- UUPS upgradeable proxies with OpenZeppelin battle-tested libraries at the contract layer.
- Role-based access control (user, creator, admin).
Python SDK
One client. Inference, data, on-chain queries. Authentication, retries, streaming, error handling, all handled.
The platform
Cluster Hub (hub.clusterprotocol.ai) is the unified web interface for the entire Cluster Protocol ecosystem. Five distinct personas converge on one platform: builders, developers, data creators, agents, and enterprises.
Authentication is handled by Privy, supporting both Web2 (email, Google) and Web3 (wallet) login methods. From the same interface, users access the model marketplace, dataset marketplace, and developer dashboard.
API key management: create, revoke, configure permissions (read, write, admin) with expiration.
Usage analytics: per-model, per-day breakdown of API calls and token consumption.
Balance management: deposit funds, track spending
Creator dashboard: earnings from dataset sales, purchase history.
Dataset management: upload, edit, manage tokenized datasets.
Consumption via Python SDK or REST API. x402 supported on every authenticated endpoint as an alternative to API keys.
What's live today
This is not a roadmap. This is shipped.
- Data monetization enable via ERC-721 Ownership on Base Mainnet.
- Inference gateway serving 500+ models in production.
- Dataset marketplace live with on-chain settlement.
- CodeXero shipping prompt-to-dApp deployments in the browser.
- Python SDK on PyPI: pip install cluster-sdk.
- x402 payment rails active across the inference API.
- ERC-8004 agent identity standard implemented.
- [Privy](https://x.com/privy_io) authentication live for Web2 and Web3 users.
Cluster Protocol has raised $7.75M across two SAFT rounds. Led by DAO5.
Conclusion
Cluster Protocol is a unified AI infrastructure layer for the decentralized web. What you just read documents the complete stack: not a roadmap, not a thesis, the system as it exists today.
To recap what's live:
Inference: A single OpenAI-compatible API serving 500+ models across text, image, audio, embeddings, and reranking.
Data: A tokenized dataset marketplace with IPFS storage, ERC-721 ownership on Base, and automatic 85% creator revenue distribution.
Compute: GPU provisioning for custom model hosting and fine-tuning, with output served through the same unified inference API.
Settlement: Four smart contracts deployed on Base mainnet, handling payments, ownership, and revenue routing.
Agents: x402 micropayments and ERC-8004 identity, the two primitives that make autonomous AI agents economically self-sufficient.
Applications: @CodeXero_xyz, the prompt-to-dApp layer that consumes Cluster inference, data, and compute to ship on-chain applications from natural language.
Everything settles in a single native economy on Base.
The bet is simple. AI infrastructure should be permissionless, sovereign, and account-free, with payment and ownership as first-class primitives rather than billing add-ons. Cluster is what that looks like when you actually build it instead of writing about it.
The full whitepaper goes deeper on every layer above, plus the technical architecture, security model, and the agent and fine-tuning surfaces we didn't have room to fully unpack here. If anything in this article caught your attention, the whitepaper is where to keep reading.
Read the full whitepaper: https://cluster-protocol.gitbook.io/whitepaper🔗
