hi, i'm rob

qapish/labman

Rust 100.0%

labman

labman is an open-source, operator-friendly node for a p2p AI mesh.

It turns one or more local LLM runtimes (via Portman) into a provider in a decentralised network where:

there is no central control plane (no Conplane),
some labman nodes act as public gateways exposing OpenAI-compatible HTTP endpoints,
other labman nodes act as private providers, reachable only via the mesh,
Front-End Portals (FEPs) handle users, balances, and payouts,
providers get paid via signed receipts and periodic settlements.

This README describes the new mesh-based, conplane-less architecture and points to deeper design docs:

protocol.md – Portman ↔ labman protocol and mesh message formats
economics.md – economics, FEPs, receipts, and payouts

The goal is that this README + architecture.md + protocol.md + economics.md in the repo root can stand alone without requiring any additional docs.

1. High-Level Overview

In the current design, labman is not a thin agent talking to a central control plane. Instead, each labman instance is a full mesh peer:

maintains a secure, authenticated p2p overlay (“labnet”),
tracks other peers’ capabilities and demand via gossip,
can expose public OpenAI-compatible HTTP endpoints (gateway mode),
can host local models via Portman (provider mode),
participates in a receipt-based economic system driven by FEPs.

1.1 Roles

There are three architectural roles:

Gateway labman
- runs an OpenAI-compatible HTTP API (/v1/chat/completions, etc.) on a public endpoint (e.g. --public-endpoint),
- authenticates and authorises requests via a FEP,
- uses mesh data to route requests to suitable providers (possibly itself),
- streams results back to the client,
- ensures providers receive FEP-signed receipts for completed work.
Provider labman
- usually does not expose HTTP publicly,
- attaches one or more Portman daemons that run the actual LLMs,
- advertises which models it can/is hosting, plus capacity and prices,
- receives proxied requests over the mesh,
- verifies and stores receipts for payment.
Front-End Portal (FEP)
- external service that manages:
  - user accounts, API keys, and balances,
  - deposits and withdrawals (crypto/fiat/test tokens),
- is logically paired with one or more gateway nodes,
- issues cryptographically signed receipts per completed job,
- periodically settles payouts to providers.

Portman is the local LLM runner:

talks only to its local labman instance,
loads/unloads models, sets concurrency, and reports metrics,
has no direct knowledge of the mesh or FEPs.

2. What labman Does (and Does Not Do)

2.1 Responsibilities

labman is responsible for:

Mesh participation
- node identity and keys,
- p2p connections to a subset of peers,
- gossiping peers, capabilities, and market signals.
Gateway behaviour (if configured)
- OpenAI-compatible HTTP endpoints for end users,
- integrating with FEPs for auth and billing,
- routing to the best provider(s) based on mesh view and policy,
- proxying OpenAI requests/responses as mesh messages.
Provider behaviour (if configured)
- attaching to Portman instances,
- keeping selected models warm according to a hosting plan,
- executing proxied jobs from gateways via Portman,
- recording and verifying receipts from FEPs,
- building claims and tracking payouts.
Operator ergonomics
- single config file (TOML),
- one main daemon (labmand),
- a CLI for inspecting peers, hosting plans, receipts, etc.

2.2 Non-Goals / Invariants

labman explicitly does not:

execute arbitrary remote commands,
expose the operator’s LAN to remote peers,
manage user balances or payments directly (that’s the FEP’s job),
require a centralised scheduler or control plane.

The only remotely exercised behaviours are the ones described by:

the mesh protocol (mesh section in protocol.md), and
the Portman control protocol (Portman ⇄ labman local WS).

3. Mesh Architecture (Labnet)

The mesh is provided by a set of logical subsystems often referred to as labnet:

node identity & keys,
encrypted p2p transport,
peer store,
gossip,
typed messages for proxying requests, receipts, and market data.

3.1 Node Identity

Each labman node has a long-lived keypair:

node_id is derived from the public key,
all mesh messages are signed by the sender,
receivers verify:
- signature validity,
- that sender_node_id matches the key,
- timestamp sanity (anti-replay).

This enables:

per-node trust/deny lists,
verifiable audit of who sent what,
stable addressing in gossip and protocol messages.

3.2 Transport

The transport layer provides:

encrypted, authenticated streams between peers,
bi-directional, back-pressured channels suitable for token streaming,
NAT-friendly behaviour for homelabs.

The exact implementation (e.g. libp2p/QUIC/custom) is an internal concern, but from the outside you can rely on:

stream-oriented messaging (ProxyRequest / ProxyResponse),
clear ownership of each stream by a (gateway, provider) pair,
separation between:
- control messages (gossip, errors, receipts, claims),
- data streams (OpenAI request/response chunks).

3.3 Gossip

Labman maintains an eventually-consistent view of the mesh via signed gossip messages (see the mesh section of protocol.md):

GossipPeers – who exists; addresses; roles; public endpoints.
GossipCapabilities – what models each node can/does host; capacity and pricing hints; payout currencies/addresses.
GossipMarket – demand snapshots and utilisation for models.

Gossip is:

small and periodic (seconds-scale),
merged into local peer and market views,
used by:
- gateways for routing decisions,
- providers (via labman-market) to decide which models to keep warm.

4. Request Flow

4.1 User ➝ Gateway (HTTP)

Client sends a standard OpenAI-style request to a gateway:

POST https://gateway.example/v1/chat/completions
Authorization: Bearer <api-key-from-FEP>
{
  "model": "tenant:ep:model",
  "messages": [...],
  "stream": true
}

Gateway labman:
- validates the API key with the FEP,
- checks user balance/limits for the predicted cost,
- parses model slug and any routing hints.

4.2 Gateway ➝ Provider (Mesh)

Gateway selects candidate providers from its mesh view:
- providers that advertise:
  - support for tenant:ep:model (or compatible),
  - compatible payout currencies/addresses,
  - acceptable price per 1k tokens,
- filtered by:
  - trust lists,
  - risk limits (e.g. max unpaid exposure per FEP).
Gateway ranks candidates by:
- expected latency,
- price,
- historical reliability,
- current capacity.
Gateway initiates a p2p stream to the chosen provider and sends a ProxyRequest message which includes:
- a stable request_id,
- the embedded OpenAI request body,
- FEP identifier (fep_id),
- a soft upper bound on cost,
- an economics hint matching the provider’s advertised currency/address.
Provider labman:
- verifies the message signature and sender trust,
- checks the economics hint against local config,
- either:
  - accepts the job and routes to Portman, or
  - rejects with a ProxyError (e.g. MODEL_NOT_AVAILABLE).

4.3 Provider ➝ Portman (Local Execution)

Provider uses the local Portman protocol to:
- ensure the model is hosted (load if allowed by hosting plan/policy),
- execute the request against the correct runtime (vLLM, llama.cpp, etc.),
- collect metrics (tokens in/out, duration, status).

4.4 Provider ➝ Gateway ➝ User (Streaming Back)

Provider sends one or more ProxyResponse messages back over the p2p stream, each containing an OpenAI-style chunk.
Gateway forwards these chunks to the end-user as a normal OpenAI streaming response.
When the job completes:
- Gateway reports final usage to the FEP,
- FEP computes the charge and signs a receipt,
- The receipt reaches the provider either:
  - embedded in the final ProxyResponse, or
  - in a follow-up ReceiptDelivery message.
Provider verifies the receipt signature and economic fields, and stores it durably for future payouts.

Detailed message schemas are in protocol.md (mesh section).

5. Economics & FEPs (Receipts and Payouts)

The economics layer is specified in economics.md.

Key points:

End-users never pay providers directly.
End-users deposit into a FEP (via crypto/fiat/testnet).
FEPs issue signed receipts per successful job, containing:
- receipt_id, fep_id, provider_node_id, gateway_node_id,
- canonical billing currency (e.g. USD equivalent),
- the provider’s chosen payout currency + address,
- tokens in/out, timestamps, and other metadata,
- a cryptographic signature.

Providers:

verify each receipt’s signature and fields,
store receipts locally (e.g. in SQLite),
periodically construct claims to each FEP (a batch of receipts per currency/address),
receive payouts off-chain/on-chain as configured by the FEP.

labman’s role in economics is to:

map Portman instances to payout configs (currency/address/min payout),
surface this information in capabilities gossip,
verify receipts and help build claims,
enforce risk limits (e.g. max unpaid amount per FEP).

Settlement details (mock FEP for MVP, real payouts later) live in economics.md.

6. Portman Integration (Local Orchestration)

Portman remains the local LLM orchestrator.

labman’s Portman-facing responsibilities:

track one or more Portman instances (e.g. via local WebSockets),
derive a HostingPlan from local and global signals:
- which models to keep warm,
- which models to unload,
convert that plan into Portman control messages:
- LoadModel(model_id, priority),
- UnloadModel(model_id, reason),
- SetConcurrency or equivalent,
collect runtime metrics:
- per-model utilisation,
- per-request tokens in/out and latency.

The labman-market subsystem performs the hosting decision logic, based on:

local demand and receipts (how profitable a model is locally),
mesh demand and supply (how hot and how replicated a model is),
operator policy (allow/deny lists, minimum prices).

This keeps Portman simple and allows the mesh economy to adapt dynamically without any central controller.

7. Project Structure (Conceptual)

The workspace is organised into crates aligned with the mesh design. Names are indicative; check Cargo.toml for the exact list.

bin/labmand/           # main daemon
bin/labman-cli/        # CLI for operators

crates/
  labman-core/         # shared types, node/model descriptors, errors
  labman-config/       # config loading and validation
  labman-telemetry/    # logging, tracing, metrics hooks

  labman-labnet-core/  # node identity, peer store, protocol message defs
  labman-labnet-transport/
                       # encrypted p2p transport and streams
  labman-labnet-gossip/
                       # peers, capabilities, market gossip

  labman-gateway/      # OpenAI HTTP server + routing into the mesh
  labman-market/       # hosting decision engine (local/mesh demand)
  labman-economics/    # provider payout config, receipt verification, claims

  labman-portman-ws/   # Portman WebSocket client/server integration

architecture.md        # mesh-based architecture & security design (root)
protocol.md            # Portman ↔ labman and mesh-level protocol
economics.md           # economics layer, FEPs, receipts, and claims
  economics-and-portals.md
                       # FEPs, receipts, payouts, claims

Over time, any conplane-specific crates or references should be either removed or repurposed to fit this mesh-first picture.

8. Running labman (Conceptually)

The full mesh implementation is in progress, but the intended operator experience is:

Install labmand and labman-cli (binaries or container).
Write a config file (e.g. /etc/labman/labman.toml) that defines:
- node identity (or allow labman to generate it),
- bootstrap peers,
- FEP endpoints and credentials (if acting as a gateway),
- Portman instances (if acting as a provider),
- payout currencies and addresses.
Start the daemon:
```
sudo systemctl enable --now labmand
```
Use labman-cli to:
- inspect mesh peers and roles,
- check which models are hosted locally,
- view pending receipts and build claims,
- adjust policies (trusted FEPs, risk limits, etc.).

Implementation details and exact CLI arguments will evolve, but the one daemon + one config + one CLI shape is stable.

9. Relationship to Older Control-Plane Design

Earlier versions of this repository assumed:

a separate Conplane control plane,
a post-quantum WireGuard tunnel from each labman node to Conplane,
a Portman ⇄ labman ⇄ Conplane WebSocket protocol for directives and metrics.

That design has been superseded by the mesh-based architecture described in this README and in:

architecture.md
protocol.md
economics.md

Any remaining references to Conplane, central control-plane APIs, or WG-based per-node tunnels are historical context only and should be updated or removed as the mesh implementation progresses.