The Holy Grail of Crypto AI: The Frontier Exploration of Decentralized Training

Reprinted from chaincatcher

06/11/2025·6D

Author: 0xjacobzhao and ChatGPT 4o

Special thanks to Advait Jayant (Peri Labs), Sven Wellmann (Polychain Capital), Chao (Metropolis DAO), Jiahao (Flock), Alexander Long (Pluralis Research) Ben Fielding & Jeff Amico (Gensyn) for their suggestions and feedback.

In the entire value chain of AI, model training is the link with the largest resource consumption and the highest technical threshold, which directly determines the upper limit of the model's capabilities and practical application effects. Compared with lightweight calls in the inference stage, the training process requires continuous large-scale computing power investment, complex data processing processes and high-intensity optimization algorithm support, which is a real "heavy industry" for AI system construction. From the perspective of architectural paradigm, training methods can be divided into four categories: centralized training, distributed training, federated learning, and decentralized training that is focused on in this article.

Centralized training is the most common traditional method. A single institution completes all training processes in a local high-performance cluster. All components from hardware (such as NVIDIA GPU), underlying software (CUDA, cuDNN), cluster scheduling systems (such as Kubernetes), to training frameworks (such as PyTorch based on the NCCL backend) are coordinated and run by a unified control system. This deep collaboration architecture enables the optimal efficiency of memory sharing, gradient synchronization and fault tolerance mechanisms. It is very suitable for training large-scale models such as GPT and Gemini. It has the advantages of high efficiency and controllable resources, but at the same time there are problems such as data monopoly, resource barriers, energy consumption and single-point risk.

Distributed Training is the mainstream method of large-scale model training. Its core is to disassemble the model training task and distribute it to multiple machines for collaborative execution to break through the bottleneck of single-machine computing and storage. Although it has physically "distributed" characteristics, the overall system is still controlled, scheduled and synchronized by a centralized organization. It is often run in a high-speed local area network environment. Through NVLink high-speed interconnection bus technology, the main nodes coordinate each subtask. Mainstream methods include:

Data Parallel: Each node trains different data parameters to share, and needs to match the model weight
Model Parallel: Deploy different parts of the model on different nodes to achieve strong scalability;
Pipeline Parallel: Execute in stages in serial to improve throughput;
Tensor Parallel: Finely calculate segmentation matrix to improve parallel granularity.

Distributed training is a combination of "centralized control + distributed execution", which is similar to the same boss remotely commanding multiple "office" employees to collaborate to complete tasks. Currently, almost all mainstream big models (GPT-4, Gemini, LLaMA, etc.) are trained in this way.

Decentralized training represents a future path with more openness and censorship resistance. Its core feature is that multiple nodes that do not trust each other (may be home computers, cloud GPUs or edge devices) coordinate to complete training tasks without a central coordinator, usually drive task distribution and collaboration through protocols, and ensure honesty of contributions with the help of encryption incentives. The main challenges facing this model include:

Difficulty in equipment heterogeneity and slicing: the coordination of heterogeneous equipment is difficult and the task slicing efficiency is low;
Communication efficiency bottleneck: Network communication is unstable, and gradient synchronization bottleneck is obvious;
Lack of trusted execution: lack of a trusted execution environment, making it difficult to verify whether the node is actually involved in the computing;
Lack of unified coordination: there is no central scheduler, and the task distribution and exception rollback mechanism are complex.

Decentralized training can be understood as: a group of global volunteers, each contributing to the computing power to collaborate in the model, but "really feasible large-scale decentralized training" is still a systematic engineering challenge, involving multiple levels such as system architecture, communication protocols, password security, economic mechanisms, model verification, etc., but whether it can be "coordinated and effective + motivational honest + correct results" is still in the early stage of prototype exploration.

Federated Learning, as a transitional form between distributed and decentralized, emphasizes local data retention and centralized aggregation of model parameters, is suitable for scenarios that focus on privacy and compliance (such as medical care and finance). Federated learning has the engineering structure and local collaboration capabilities of distributed training, and has the advantages of data decentralization of decentralized training, but it still relies on trusted coordinators and does not have the characteristics of being completely open and censor-resistant. It can be regarded as a "controlled decentralized" solution in the privacy compliance scenario. It is relatively gentle in training tasks, trust structure and communication mechanism, and is more suitable as a transitional deployment architecture in the industry.

AI training paradigm panoramic comparison table (technical architecture ×

trust incentive × application characteristics)

The boundaries, opportunities and realistic paths of decentralized

training

From the perspective of training paradigm, decentralized training is not applicable to all task types. In some scenarios, due to the complex task structure, extremely high resource requirements or high collaboration difficulty, it is naturally not suitable for efficient completion between heterogeneous and trustless nodes. For example, large model training often relies on high video memory, low latency and high speed bandwidth, making it difficult to effectively split and synchronize in open networks; tasks with strong data privacy and sovereignty restrictions (such as medical, finance, confidential data) are limited by legal compliance and ethical constraints and cannot be shared openly; while tasks without a foundation for collaborative incentives (such as enterprise closed-source models or internal prototype training) lack external participation motivation. Together, these boundaries constitute the realistic limitations of current decentralized training.

But this does not mean that decentralized training is a false proposition. In fact, among the lightweight, easy to parallel and motivatable task types, decentralized training shows clear application prospects. Including but not limited to: LoRA fine-tuning, behavioral alignment post-training tasks (such as RLHF, DPO), data crowdsourcing training and labeling tasks, small basic model training with controllable resources, and collaborative training scenarios involving edge devices. These tasks generally have the characteristics of high parallelism, low coupling and tolerance of heterogeneous computing power, and are very suitable for collaborative training through P2P networks, Swarm protocols, distributed optimizers, etc.

Overview of adaptability of decentralized training tasks

Analysis of classic project of decentralized training

Currently, in the cutting-edge fields of decentralized training and federated learning, representative blockchain projects mainly include Prime Intellect, Pluralis.ai, Gensyn, Nous Research and Flock.io. From the perspective of technological innovation and engineering implementation difficulty, Prime Intellect, Nous Research and Pluralis.ai have proposed a lot of original explorations in system architecture and algorithm design, representing the cutting-edge direction of current theoretical research; while the implementation paths of Gensyn and Flock.io are relatively clear, and we can see preliminary engineering progress. This article will analyze the core technologies and engineering architecture behind these five projects in turn, and further explore their differences and complementarity in the decentralized AI training system.

Prime Intellect: A pioneer in reinforcement learning collaborative

networks with verifiable training trajectories

Prime Intellect is committed to building a trustless AI training network that allows anyone to participate in training and earn trustworthy rewards for their computing contributions. Prime Intellect hopes to build an AI decentralized training system with verifiability, openness and complete incentive mechanisms through the three major modules of PRIME-RL + TOPLOC + SHARDCAST.

1. Prime Intellect protocol stack structure and key module value

2. Detailed explanation of the key mechanisms of Prime Intellect training

PRIME-RL: Decoupled asynchronous reinforcement learning task architecture

PRIME-RL is a task modeling and execution framework customized by Prime Intellect for decentralized training scenarios, designed for heterogeneous networks and asynchronous participation. It adopts reinforcement learning as a priority adaptation object, structurally decoupling the training, inference and weight uploading processes, so that each training node can complete task loops locally and coordinate with the verification and aggregation mechanism through standardized interfaces. Compared with traditional supervised learning processes, PRIME-RL is more suitable for implementing elastic training in a centerless scheduling environment, which not only reduces system complexity, but also lays the foundation for supporting multitasking parallelism and strategy evolution.

TOPLOC: Lightweight training behavior verification mechanism

TOPLOC (Trusted Observation & Policy-Locality Check) is a core training verifiability mechanism proposed by Prime Intellect, which is used to determine whether a node has truly completed effective strategy learning based on observed data. Unlike heavy-duty solutions such as ZKML, TOPLOC does not rely on full model recalculation, but completes lightweight structure verification by analyzing the local consistency trajectory between "observation sequence ↔ strategy updates". For the first time, it transforms behavioral trajectories during training into verifiable objects, is a key innovation in realizing the allocation of trustless training rewards, and provides a feasible path for building an auditable and incentivized decentralized collaborative training network.

SHARDCAST: Asynchronous Weight Aggregation and Propagation Protocol

SHARDCAST is a weight propagation and aggregation protocol designed by Prime Intellect, which is optimized for a real network environment with asynchronous, bandwidth-constrained and variable node state. It combines the gossip propagation mechanism and local synchronization strategy, allowing multiple nodes to continuously submit partial updates in an asynchronous state, realizing incremental convergence of weights and multi-version evolution. Compared with centralized or synchronous AllReduce methods, SHARDCAST significantly improves the scalability and fault tolerance of decentralized training, and is the core basis for building stable weight consensus and continuous training iteration.

OpenDiLoCo: Sparse asynchronous communication framework

OpenDiLoCo is a communication optimization framework independently implemented and open sourced by the Prime Intellect team based on the DiLoCo concept proposed by DeepMind. It is designed for the common challenges of bandwidth limitation, device heterogeneity and node instability in decentralized training. Its architecture is based on data parallelism. By building sparse topological structures such as Ring, Expander, Small-World, the high communication overhead of global synchronization is avoided, and the model collaborative training can be completed by relying only on local neighbor nodes. Combining asynchronous updates and breakpoint fault tolerance mechanisms, OpenDiLoCo enables consumer-grade GPUs and edge devices to participate in training tasks stably, significantly improving the participation of global collaborative training, and is one of the key communication infrastructures for building a decentralized training network.

PCCL: Collaborative communication library

PCCL (Prime Collective Communication Library) is a lightweight communication library tailored by Prime Intellect for decentralized AI training environments, aiming to solve the adaptation bottlenecks of traditional communication libraries (such as NCCL and Gloo) in heterogeneous devices and low-bandwidth networks. PCCL supports sparse topology, gradient compression, low-precision synchronization and breakpoint recovery, and can run on consumer-level GPUs and unstable nodes. It is the underlying component that supports the asynchronous communication capabilities of the OpenDiLoCo protocol. It significantly improves the bandwidth tolerance and device compatibility of the training network, and opens up the "last mile" communication foundation for building a truly open and trustless collaborative training network.

3. Prime Intellect motivates the division of labor between the network and the role

Prime Intellect builds a license-free, verifiable, and economic incentive-based training network that allows anyone to participate in the task and receive rewards based on real contributions. The protocol runs based on three types of core roles:

Task initiator: Define training environment, initial model, reward function and verification criteria
Training node: Perform local training, submit weight updates and observation trajectories
Verification node: Use the TOPLOC mechanism to verify the authenticity of training behavior and participate in reward calculation and policy aggregation

The core process of the protocol includes task release, node training, trajectory verification, weight aggregation (SHARDCAST) and reward distribution, forming a closed incentive loop around "real training behavior".

IV. INTELLECT-2: The release of the first verifiable decentralized training model

Prime Intellect released INTELLECT-2 in May 2025, the world's first reinforcement learning model trained by asynchronous, trustless decentralized node collaboration, with a parameter scale of 32B. The INTELLECT-2 model is completed by collaborative training of 100+ GPU heterogeneous nodes across three continents. It uses a fully asynchronous architecture and has a training duration of more than 400 hours, demonstrating the feasibility and stability of an asynchronous collaborative network. This model is not only a performance breakthrough, but also the first systematic implementation of the "training is consensus" paradigm proposed by Prime Intellect. INTELLECT-2 integrates core protocol modules such as PRIME-RL (asynchronous training structure), TOPLOC (training behavior verification) and SHARDCAST (asynchronous weight aggregation), marking the first time that the decentralized training network has realized the openness of the training process, verification and economic incentive closed loop.

In terms of performance, INTELLECT-2 is trained based on QwQ-32B and has special RL training in code and mathematics, and is at the forefront of the current open source RL fine-tuning model. Although it has not surpassed closed source models such as GPT-4 or Gemini, its real significance lies in: it is the world's first decentralized model experiment with a reproducible, verifiable and auditable complete training process. Prime Intellect not only open source the model, but more importantly, open source the training process itself - the training data, policy update trajectory, verification process and aggregation logic are all transparent and verifiable, building a decentralized training network prototype where everyone can participate, collaborate with trustworthy and share benefits.

5. Team and financing background

Prime Intellect completed a $15 million seed round in February 2025, led by Founders Fund, with investments including Menlo Ventures, Andrej Karpathy, Clem Delangue, Dylan Patel, Balaji Srinivasan, Emad Mostaque, Sandeep Nailwal and other industry leaders participating. Previously, the project completed a US$5.5 million early-stage round of financing in April 2024, led by CoinFund and Distributed Global, and also participated by Compound VC, Collab + Currency, Protocol Labs and other institutions. As of now, Prime Intellect has raised more than US$20 million in total.

The co-founders of Prime Intellect are Vincent Weisser and Johannes Hagemann. The team members are from AI and Web3 fields. The core members are from Meta AI, Google Research, OpenAI, Flashbots, Stability AI and the Ethereum Foundation. They have profound capabilities in system architecture design and distributed engineering implementation. They are one of the few execution teams that have successfully completed real decentralized large-scale model training.

Pluralis: Paradigm explorer for collaborative training of asynchronous

model parallelism and structural compression

Pluralis is a Web3 AI project focused on "trusted collaborative training networks". Its core goal is to promote a model training paradigm with decentralized, open participation, and long-term incentive mechanisms. Unlike the current mainstream centralized or closed training paths, Pluralis proposed a new concept called Protocol Learning: "protocol" the model training process, and build an open training system with endogenous incentive closed loop through verifiable collaboration mechanisms and model ownership mapping.

1. Core concept: Protocol Learning

Protocol Learning proposed by Pluralis contains three key pillars:

Unmaterializable Models: The model is distributed among multiple nodes in fragmented form, and any single node cannot restore the full weight and remains closed source. This design makes the model naturally a "in-protocol asset", which can realize access credential control, leakage protection and profit ownership binding.
Model-parallel Training over Internet: Through the asynchronous Pipeline model parallel mechanism (SWARM architecture), different nodes only hold part of the weight and complete training or inference through low-bandwidth network collaboration.
Partial Ownership for Incentives: All participating nodes obtain partial ownership of the model based on their training contributions, thus enjoying future revenue sharing and protocol governance rights.

2. Technical architecture of Pluralis protocol stack

3. Detailed explanation of key technical mechanisms

Unmaterializable Models

In "A Third Path: Protocol Learning", the system proposed for the first time that model weights are distributed in fragmented form to ensure that "model assets" can only run in the Swarm network and ensure that their access and returns are controlled by the protocol. This mechanism is a prerequisite for achieving a sustainable incentive structure for decentralized training.

Asynchronous Model-Parallel Training

In SWARM Parallel with Asynchronous Updates, Pluralis constructs a parallel architecture of Pipeline-based asynchronous model and conducts empirical evidence on LLaMA-3 for the first time. The core innovation lies in the introduction of the Nesterov Accelerated Gradient (NAG) mechanism to effectively correct the problem of gradient drift and convergence instability in the asynchronous update process, so that training between heterogeneous devices is practically feasible in low bandwidth environments.

Column-Space Sparsification

In "Beyond Top-K", it is proposed that the structure-aware column space compression method replaces traditional Top-K to avoid damaging semantic paths. This mechanism takes into account both model accuracy and communication efficiency. It is measured that more than 90% of communication data can be compressed in the asynchronous model parallel environment, which is a key breakthrough in achieving efficient structure-aware communication.

4. Technical positioning and path selection

Pluralis clearly takes "asynchronous model parallelism" as its core direction, emphasizing that it has the following advantages over data parallelism:

Supports low-bandwidth networks and non-consistent nodes;
Adapt the device heterogeneous, allowing consumer-grade GPUs to participate;
Naturally, it has flexible scheduling capabilities, and supports nodes to be frequently online/offlined;
Structural compression + asynchronous update + weight non-extractability are the three major breakthrough points.

Currently, according to the six technical blog documents published on the official website, the logical structure is integrated into the following three main lines:

Philosophy and Vision: "A Third Path: Protocol Learning" "Why Decentralized Training Matters"
Technical mechanism details: "SWARM Parallel", "Beyond Top-K", "Asynchronous Updates"
Exploration of institutional innovation: "Unmaterializable Models" and "Partial Ownership Protocols"

At present, Pluralis has not yet launched products, test networks or open source code. The reason is that the technical path it has chosen is extremely challenging: it is necessary to solve system-level problems such as underlying system architecture, communication protocols, and non-exportable weights before it can package product services upward.

In a new paper released by Pluralis Research in June 2025, its decentralized training framework was expanded from model pre-training to model fine-tuning stage, supporting asynchronous updates, sparse communications and partial weight aggregation. Compared with the previous design that focused on theory and pre-training, this work pays more attention to implementation feasibility, marking its further maturity in the full-cycle training architecture.

5. Team and financing background

Pluralis completed a $7.6 million seed round in 2025, led by Union Square Ventures (USV) and CoinFund. Founder Alexander Long comes from a PhD background in machine learning and has a dual background in mathematics and systems research. The core members are all composed of machine learning researchers with doctoral backgrounds. They are typical technology-driven projects, with high-density papers and technology blogs as the main publishing path. There is no BD/Growth team yet and are focusing on solving the infrastructure problems of parallelism of low-bandwidth asynchronous models.

**Gensyn: Decentralized training protocol layer with verifiable

execution-driven**

Gensyn is a Web3 AI project focusing on "trusted execution of deep learning training tasks". The core is not to reconstruct the model architecture or training paradigm, but to build a verifiable distributed training execution network with the entire process of "task distribution + training execution + result verification + fair incentive". Through off-chain training + on-chain verification architecture design, Gensyn has established an efficient, open and inspiring global training market, making "training as mining" a reality.

1. Project positioning: the execution protocol layer of training tasks

Gensyn is not "how to train", but the infrastructure of "who trains, how to verify, and how to share". Its essence is a verifiable computing protocol for training tasks, and its main solution is:

Who will perform training tasks (computing power distribution and dynamic matching)
How to verify the execution result (no full recalculation is required, only the dispute operator is verified)
How to allocate training benefits (Stake, Slashing and multi-role game mechanism)

2. Overview of technical architecture

3. Detailed explanation of the module

RL Swarm: Collaborative reinforcement learning training system

Gensyn's first RL Swarm is a decentralized multi-model collaborative optimization system for the back-training stage, with the following core features:

Distributed inference and learning process:

Generation stage (Answering): Each node outputs the answer independently;
Critique stage: nodes comment on others' outputs and select the best answers and logic;
Consensus phase (Resolving): predicts the preferences of most nodes and modify their own answers accordingly to achieve local weight updates.

The RL Swarm proposed by Gensyn is a decentralized multi-model collaborative optimization system. Each node runs an independent model and is trained locally without gradient synchronization. It naturally adapts to heterogeneous computing power and unstable network environments, and supports elastic access and exit of nodes. This mechanism draws on the idea of RLHF and multi-agent gameplay, but is closer to the dynamic evolution logic of the collaborative reasoning network. The nodes receive rewards based on the degree of consistency with the group consensus results, thereby driving the continuous optimization and convergence learning of reasoning capabilities. RL Swarm has significantly improved the robustness and generalization capabilities of the model under open networks and has been deployed and launched as a core execution module in Gensyn's Testnet Phase 0 based on Ethereum Rollup.

Verde + Proof-of-Learning: Trusted Verification Mechanism

Gensyn's Verde module combines three mechanisms:

Proof-of-Learning: judge whether training is actually happening based on gradient trajectory and training metadata;
Graph-Based Pinpoint: locate divergent nodes in the training calculation graph, and only recalculate the specific operations;
Refereed Delegation: Using an arbitration verification mechanism, verifier and challenger dispute and local verification are made, greatly reducing verification costs.

Compared with ZKP or full recalculation verification solutions, the Verde solution achieves a better balance between verifiability and efficiency.

SkipPipe: Communication fault tolerance optimization mechanism

SkipPipe is to solve the communication bottleneck problem in the "low bandwidth + node disconnection" scenario, and its core capabilities include:

Skip Ratio: Skip restricted nodes to avoid training blockage;
Dynamic scheduling algorithm: generate the optimal execution path in real time;
Fault-tolerant execution: Even if 50% of nodes fail, the inference accuracy decreases by only about 7%.

Support training throughput improvement by up to 55%, and realize key capabilities such as "early-exit reasoning", "seamless rearrangement", and "inference completion".

HDE: Cross-domain heterogeneous expert cluster

The HDEE (Heterogeneous Domain-Expert Ensembles) module is committed to optimizing the following scenarios:

Multi-domain, multi-modal, multi-task training;
The distribution of various types of training data is uneven and the difficulty varies greatly;
Task allocation and scheduling problems in environments where equipment computing capabilities are heterogeneous and communication bandwidth are inconsistent.

Its core features:

MHe-IHo: Assign models of different sizes to tasks of different difficulty levels (model heterogeneity and consistent training step size);
MHo-IHe: The task difficulty is uniform, but the training step size is adjusted asynchronously;
Support heterogeneous expert model + pluggable training strategy to improve adaptability and fault tolerance;
Emphasizing "parallel collaboration + extremely low communication + dynamic expert allocation" is suitable for complex task ecology in reality.

Multi-role game mechanism: trust and incentives go hand in hand

The Gensyn network introduces four categories of participants:

Submitter: Publish training tasks, set structure and budget;
Solver: execute training tasks and submit results;
Verifier: Verify training behavior to ensure that it is compliant and effective;
Whistleblower: Challenge the validator, obtain arbitration rewards or bear the penalty.

This mechanism is inspired by Truebit's economic game design, which motivates participants to cooperate honestly and ensures that the network is operated with trustworthy through forced insertion errors + random arbitration.

4. Test network and roadmap planning

5. Team and financing background

Gensyn is co-founded by Ben Fielding and Harry Grieve and is headquartered in London, England. In May 2023, Gensyn announced the completion of a $43 million Series A funding led by a16z crypto, with other investors including CoinFund, Canonical, Ethereal Ventures, Factor and Eden Block. The team's background integrates distributed systems and machine learning engineering experience and has long been committed to building a verifiable and trustworthy large-scale AI training execution network.

Nous Research: A cognitive evolutionary training system driven by

subjective AI concept

Nous Research is one of the few decentralized training teams that combine philosophical heights and engineering implementation. Its core vision comes from the concept of "Desideratic AI": treat AI as an intelligent subject with subjectivity and evolutionary capabilities, rather than a simple controllable tool. The uniqueness of Nous Research is that it does not optimize AI training as a "efficiency problem", but rather treats it as the formation process of a "cognitive subject". Driven by this vision, Nous focuses on building an open training network that is coordinated by heterogeneous nodes, does not require central scheduling, and can resist censorship verification, and implements it systematically through a full-stack tool chain.

1. Concept support: Redefining the "purpose" of training

Instead of investing too much in incentive design or protocol economics, Nous tries to change the philosophical premise of training itself:

Oppose "alignmentism": does not agree with "training training" with human control as the only goal, and advocates that training should encourage the model to form an independent cognitive style;
Emphasize model subjectivity: It is believed that the basic model should retain uncertainty, diversity and hallucination as virus;
Model training means cognitive formation: the model is not an "optimizing task completion degree", but an individual participating in the cognitive evolution process.

Although this training view is "romantic", it reflects the core logic of Nous' design training infrastructure: how to make heterogeneous models evolve in open networks rather than being disciplined uniformly.

2. Training core: Psyche Network and DisTrO Optimizer

The most critical contribution of Nous to decentralized training is to build the Psyche network and the underlying communication optimizer DisTrO (Distributed Training Over-the-Internet), which together form the execution center of the training task: The DisTrO + Psyche network has a number of core capabilities, including communication compression (using DCT + 1-bit sign encoding, greatly reducing bandwidth requirements), node adaptability (supports heterogeneous GPU, disconnection and autonomous exit), asynchronous fault tolerance (sustainable training without synchronization and high fault tolerance), and decentralized scheduling mechanism (no need for central coordinator, and achieve consensus and task distribution based on blockchain). This architecture provides a realistic and feasible technical basis for low-cost, highly resilient, verifiable open training networks.

This architecture design emphasizes practical feasibility: on-chain traceability that does not rely on central servers, adapts to global volunteer nodes, and has training results.

3. The reasoning and proxy system composed of Hermes / Forge / TEE_HEE

In addition to building a decentralized training infrastructure, Nous Research has also conducted multiple exploratory system experiments around the concept of "AI subjectivity":

1. Hermes open source model series: Hermes 1 to 3 is a representative open source model launched by Nous. It is based on LLaMA 3.1 training and covers three parameter scales: 8B, 70B and 405B. The series aims to reflect the training concept of "de-instructionization, diversity preservation" advocated by Nous, and demonstrates stronger expressiveness and generalization capabilities in long context maintenance, role-playing, multi-round dialogue, etc.

2. Forge Reasoning API: Multi-mode Inference System

Forge is Nous’ self-developed reasoning framework, combining three complementary mechanisms to achieve more flexible and creative reasoning capabilities:

MCTS (Monte Carlo Tree Search): Policy search for complex tasks;
CoC (Chain of Code): introduces the combination path of code chain and logical reasoning;
MoA (Mixture of Agents): allows multiple models to negotiate to improve the breadth and diversity of outputs.

The system emphasizes "non-deterministic reasoning" and combined generation paths, which are a powerful response to the traditional instruction alignment paradigm.

3. TEE_HEE: AI Autonomous Agent Experiment: TEE_HEE is Nous's cutting-edge exploration in the direction of autonomous agents, aiming to verify whether AI can run independently in a Trusted Execution Environment (TEE) and have a unique digital identity. The agent has a dedicated Twitter and Ethereum account, and all control permissions are managed by remotely verifiable enclave, and developers cannot interfere with their behavior. The experimental goal is to build an AI subject with "immutability" and "independent behavioral intentions" and take an important step in building an autonomous agent.

4. AI behavior simulator platform: Nous has also developed multiple simulators including WorldSim, Doomscroll, Gods & S8n, etc., to study the behavior evolution and value formation mechanism of AI in a multi-role social environment. Although not directly involved in the training process, these experiments laid the semantic layer foundation for cognitive behavioral modeling of long-term autonomous AI.

4. Team and financing overview

Nous Research was founded in 2023 and was co-founded by Jeffrey Quesnelle (CEO), Karan Malhotra, Teknium, Shivani Mitra and others. The team attaches equal importance to philosophy-driven and system engineering, and has diverse backgrounds such as machine learning, system security, and decentralized networks. It received a seed round of $5.2 million in 2024, and in April 2025, it completed a US$50 million Series A round led by Paradigm, with a valuation of $1 billion, becoming one of the Web3 AI unicorns.

Flock: Blockchain enhanced federated learning network

Flock.io is a blockchain-based federated learning platform designed to decentralize data, computing and models trained by AI. FLock tends to the integrated framework of "federal learning + blockchain reward layer", which is essentially an on-chain evolution version of traditional FL architecture rather than a systematic exploration of building a brand new training protocol. Compared with decentralized training programs such as Gensyn, Prime Intellect, Nous Research and Pluralis, Flock focuses on privacy protection and availability improvements, rather than making theoretical breakthroughs in communication, verification or training methods. The truly suitable objects for comparison are federated learning systems such as Flower, FedML, and OpenFL.

1. The core mechanism of Flock.io

1. Federated Learning Architecture: Emphasizes data sovereignty and privacy protection

Flock is based on the classic federated learning (FL) paradigm, allowing multiple data owners to collaborate in training unified models without sharing original data, focusing on solving data sovereignty, security and trust issues. The core process includes:

Local training: Each participant (Proposer) trains the model on the local device without uploading the original data;
On-chain aggregation: After training is completed, local weight updates are submitted, and the on-chain Miner is aggregated into a global model;
Committee evaluation: Aggregation model effectiveness and score using independent test sets through VRF random election Voter nodes;
Incentives and punishments: Execute rewards or fines and confiscation of deposits based on the score results to achieve resistance to evil and dynamic trust maintenance.

2. Blockchain integration: realize trustless system coordination

Flock links the core links of the training process (task allocation, model submission, evaluation score, incentive execution) to achieve system transparency, verifiability and anti-review. The main mechanisms include:

VRF random election mechanism: improves the rotational fairness and anti-manipulation ability of Proposer and Voter;
Equity mortgage mechanism (PoS): to constrain node behavior through token staking and punishment, improve system robustness;
Automatic execution of on-chain incentives: through smart contracts, reward distribution and slashing penalty are achieved, and a collaborative network without trust intermediaries are built.

3. zkFL: Privacy protection innovation of zero-knowledge aggregation mechanism: Flock introduced zkFL zero-knowledge aggregation mechanism, allowing Proposer to submit locally updated zero-knowledge proofs. Voter can verify its correctness without accessing the original gradient, which protects privacy while improving the credibility of the training process, representing an important innovation in the integration of privacy protection and verifiability of federated learning.

2. Flock 's core product components

AI Arena: It is a decentralized training platform for Flock.io. Users can participate in model tasks through train.flock.io, serve as trainer, validator or delegate roles, and receive rewards by submitting models, evaluating performance or delegate tokens. The task is currently officially released and will be gradually opened to the community for co-creation in the future.

FL Alliance: It is the Flock federated learning client that supports participants to further fine-tune the model using private data. Through VRF elections, staking and slashing mechanisms, ensuring the honesty and collaborative efficiency of the training process is a key link connecting community initial training and real deployment.

AI Marketplace: It is a model co-creation and deployment platform. Users can propose models, contribute data, and call model services, support database access and RAG to strengthen reasoning, and promote the implementation and circulation of AI models in various practical scenarios.

3. Team and financing overview

Flock.io was founded by Sun Jiahao and has issued platform token FLOCK. The project has raised a total of US$11 million in funding, and investors include DCG, Lightspeed Faction, Tagus Capital, Animoca Brands, Fenbushi, OKX Ventures, etc. In March 2024, Flock completed a $6 million seed round of financing to launch the test network and the federated learning client; in December of the same year, it added $3 million in financing and received funding from the Ethereum Foundation, focusing on researching blockchain-driven AI incentive mechanisms. At present, the platform has created 6428 models, accessing 176 training nodes, 236 verification nodes, and 1178 commissioners.

Compared with decentralized training projects, systems based on federated learning have more advantages in training efficiency, scalability and privacy protection, especially suitable for collaborative training of small and medium-sized models. The solutions are pragmatic and easy to implement, and are more biased towards engineering-level feasibility optimization; while projects such as Gensyn and Pluralis pursue deeper theoretical breakthroughs in training methods and communication mechanisms, with greater system challenges, but they are also closer to the real "trust-destruction and decentralization" training paradigm exploration.

EXO: Decentralized training attempts for edge computing

EXO is a highly representative AI project in the current edge computing scenario, committed to achieving lightweight AI training, reasoning and Agent applications on home-level consumer devices.其去中心化训练路径强调「低通信开销+ 本地自主执行」，采用DiLoCo 异步延迟同步算法与SPARTA 稀疏参数交换机制，大幅降低多设备协同训练的带宽需求。系统层面，EXO 并未构建链上网络或引入经济激励机制，而是推出单机多进程模拟框架EXO Gym，支持研究者在本地环境中便捷开展分布式训练方法的快速验证与实验。

一、核心机制概览

DiLoCo 异步训练：每H 步进行一次节点同步，适配非稳定网络；
SPARTA 稀疏同步：每步仅交换极少量参数（如0.1%），保持模型相关性并降低带宽需求；
异步组合优化：两者可组合使用，在通信与性能之间取得更优折中。
evML 验证机制探索：Edge-Verified Machine Learning（evML）提出使用TEE / Secure Context 进行低成本计算验证，通过远程验证+ 抽查机制实现无需质押的边缘设备可信参与，是经济安全与隐私保障之间的工程型折中方案。

二、工具与场景应用

EXO Gym：可在单台设备模拟多节点训练环境，支持NanoGPT、CNN、Diffusion 等模型的通信策略实验；
EXO Desktop App：面向个人用户的桌面AI 工具，支持本地大模型运行、iPhone 镜像控制、私人上下文集成（如短信、日历、视频记录）等隐私友好型个性化功能。

EXO Gym 更像是一个以探索导向的去中心化训练实验项目，主要通过整合现有的通信压缩技术（如DiLoCo 与SPARTA）来实现训练路径的轻量化。相较于Gensyn、Nous、Pluralis 等项目，EXO 尚未迈入链上协作、可验证激励机制或真实分布式网络部署等核心阶段。

去中心化训练的前链条引擎：模型预训练全景研究

面对去中心化训练中普遍存在的设备异构、通信瓶颈、协调困难与缺乏可信执行等核心挑战，Gensyn、Prime Intellect、Pluralis 与Nous Research 分别提出了具有差异化的系统架构路径。从训练方法和通信机制两个层面来看，这四个项目展现了各自独特的技术焦点与工程实现逻辑。

在训练方法优化方面，四者分别从协同策略、更新机制和异步控制等关键维度展开探索，覆盖了从预训练到后训练的不同阶段。

Prime Intellect 的PRIME-RL 属于面向预训练阶段的异步调度结构，通过「本地训练+ 周期性同步」的策略，在异构环境下实现高效而可验证的训练调度机制。该方法强具有较强的通用性与灵活性。理论创新度较高，在训练控制结构上提出明确范式；工程实现难度中高，对底层通信与控制模块有较高要求。
Nous Research 推出的DeMo 优化器，则聚焦于异步低带宽环境下的训练稳定性问题，实现了异构GPU 条件下的高容错梯度更新流程，是当前少数在「异步通信压缩闭环」上完成理论与工程统一的方案。理论创新度很高，特别是在压缩与调度协同路径上具有代表性；工程实现难度也很高，尤其依赖异步并行的协调精度。
Pluralis 的SWARM + NAG 则是目前异步训练路径中最具系统性与突破性的设计之一。它基于异步模型并行框架，引入Column-space 稀疏通信与NAG 动量修正，构建出一种可在低带宽条件下稳定收敛的大模型训练方案。理论创新度极高，是异步协同训练的结构性开创者；工程难度同样极高，需要多级同步与模型切分的深度集成。
Gensyn 的RL Swarm 主要服务于后训练阶段，聚焦于策略微调与智能体协同学习。其训练过程遵循「生成- 评估- 投票」的三步流程，特别适合多代理系统中复杂行为的动态调整。理论创新度中高，主要体现在智能体协同逻辑上；工程实现难度适中，主要挑战在于系统调度与行为收敛控制。

在通信机制优化层面，这四个项目亦各有针对性布局，普遍关注带宽瓶颈、节点异构与调度稳定性问题的系统解法。

Prime Intellect 的PCCL 是一个用于替代传统NCCL 的底层通信库，旨在为上层训练协议提供更稳健的集体通信基础。理论创新度中高，在容错通信算法上有一定突破；工程难度中等，具备较强的模块适配性。
Nous Research 的DisTrO 是DeMo 的通信核心模块，强调在低带宽下实现最小通信开销的同时保障训练闭环的连贯性。理论创新度高，在调度协同结构上具备通用性设计价值；工程难度高，对压缩精度与训练同步要求高。
Pluralis 的通信机制深度嵌入SWARM 架构中，显著降低了大模型异步训练中的通信负载，在保障收敛性的同时保持高效吞吐。理论创新度高，为异步模型通信设计树立了范式；工程难度极高，依赖分布式模型编排与结构稀疏性控制。
Gensyn 的SkipPipe 是配套RL Swarm 的容错调度组件。该方案部署成本低，主要用于工程落地层的训练稳定性增强。理论创新度一般，更多是已知机制的工程化实现；工程难度较低，但在实际部署中实用性强。

此外，我们可以从区块链协作层与AI 训练层更为宏观的两大类衡量去中心化训练项目的价值：

区块链协作层面：强调协议可信性与激励协作逻辑

可验证性：对训练过程是否可验证、是否引入博弈或加密机制建立信任；
激励机制：是否设计了任务驱动的Token 奖励/ 角色机制；
开放性与准入门槛：节点是否易于接入，是否中心化或许可控制。

AI 训练系统层面：突出工程能力与性能可达性

调度与容错机制：是否容错、异步、动态、分布式调度；
训练方法优化：是否对模型训练算法或结构有优化；
通信路径优化：是否压缩梯度/ 稀疏通信，适应低带宽。

以下表格基于上述指标体系，对Gensyn、Prime Intellect、Pluralis 和Nous Research 在去中心化训练路径上的技术深度、工程成熟度与理论创新进行了系统性评估。

去中心化训练的后链条生态：基于LoRA 的模型微调

在去中心化训练的完整价值链中，Prime Intellect、Pluralis.ai、Gensyn 和Nous Research 等项目主要聚焦于模型预训练、通信机制与协同优化等前端基础设施建设。然而，另有一类项目则专注于训练后阶段的模型适配与推理部署（post-training fine-tuning & inference delivery），不直接参与预训练、参数同步或通信优化等系统性训练流程。代表性项目包括Bagel、Pond 和RPS Labs，他们均以LoRA 微调方法为核心，构成去中心化训练生态图谱中关键的「后链条」一环。

LoRA + DPO：Web3 微调部署的现实路径

LoRA（Low-Rank Adaptation）是一种高效的参数微调方法，其核心思路是在预训练大模型中插入低秩矩阵来学习新任务，同时冻结原始模型参数。这一策略显著降低了训练成本与资源消耗，提升了微调速度与部署灵活性，尤其适用于以模块化、组合调用为特征的Web3 场景。

传统的大语言模型如LLaMA、GPT-3 等往往拥有数十亿甚至千亿级参数，直接微调成本高昂。而LoRA 通过仅训练插入的少量参数矩阵，实现对大模型的高效适配，成为当前最具实用性的主流方法之一。

Direct Preference Optimization（DPO）作为近年来兴起的语言模型后训练方法，常与LoRA 微调机制协同使用，用于模型行为对齐阶段。相比传统的RLHF（Reinforcement Learning from Human Feedback）方法，DPO 通过对成对样本的直接优化实现偏好学习，省去了复杂的奖励建模与强化学习过程，结构更为简洁，收敛更加稳定，尤其适合轻量化与资源受限环境下的微调任务。由于其高效与易用性，DPO 正逐渐成为众多去中心化AI 项目在模型对齐阶段的优选方案。

强化学习（Reinforcement Learning, RL）：后训练微调的未来演进方向

从长期视角来看，越来越多的项目将强化学习（Reinforcement Learning, RL）视为去中心化训练中更具适应性与演化潜力的核心路径。相较于依赖静态数据的监督学习或参数微调机制，RL 强调在动态环境中持续优化策略，天然契合Web3 网络中异步、异构与激励驱动的协作格局。通过与环境持续交互，RL 能够实现高度个性化、持续增量式的学习过程，为Agent 网络、链上任务市场及智能经济体构建提供可演化的「行为智能」基础设施。

这一范式不仅在理念上高度契合去中心化精神，也具备显著的系统优势。然而，受限于较高的工程门槛和复杂的调度机制，RL 在当前阶段的落地仍面临较大挑战，短期内尚难广泛推广。

值得注意的是，Prime Intellect 的PRIME-RL 以及Gensyn 的RL Swarm 正在推动RL 从后训练微调机制向预训练主结构演进，试图构建一个以RL 为中心、无需信任协调的协同训练体系。

Bagel（zkLoRA）：LoRA 微调的可信验证层

Bagel 基于LoRA 微调机制，引入零知识证明（ZK）技术，致力于解决「链上模型微调」过程中的可信性与隐私保护难题。zkLoRA 并不参与实际的训练计算，而是提供一种轻量、可验证的机制，使外部用户无需访问原始数据或权重，即可确认某个微调模型确实源自指定的基础模型和LoRA 参数。

与Gensyn 的Verde 或Prime Intellect 的TOPLOC 所关注的训练过程「行为是否真实发生」的动态验证不同，Bagel 更专注于「微调结果是否可信」的静态验证。zkLoRA 的最大优势在于验证资源消耗低、保护隐私强，但其应用范围通常局限于参数变动较小的微调任务。

Pond：GNN 场景下的微调与智能体演化平台

Pond 是当前业内唯一专注于图神经网络（GNN）微调的去中心化训练项目，服务于结构化数据应用，如知识图谱、社交网络与交易图等。其通过支持用户上传图结构数据并参与模型训练反馈，为个性化任务提供了一个轻量、可控的训练与推理平台。

Pond 同样采用LoRA 等高效微调机制，其核心目标是在GNN 架构上实现模块化、可部署的智能体系统，开辟了「小模型微调+ 多智能体协作」在去中心化语境下的新探索路径。

RPS Labs：面向DeFi 的AI 驱动流动性引擎

RPS Labs 是一个基于Transformer 架构的去中心化训练项目，致力于将微调后的AI 模型用于DeFi 流动性管理，主要部署在Solana 生态中。其旗舰产品UltraLiquid 是一套主动式做市引擎，利用微调后的模型动态调节流动性参数，降低滑点、提升深度，并优化代币发行与交易体验。

此外，RPS 还推出UltraLP 工具，支持流动性提供者实时优化其在DEX 上的资金分配策略，从而提升资本效率、降低无常损失风险，体现了AI 微调在金融场景中的实用价值。

从前链条引擎到后链条生态：去中心化训练的前路

在去中心化训练的完整生态图谱中，整体可划分为两大类：前链条引擎对应模型预训练阶段、后链条生态对应模型微调部署阶段，构成了从基础设施到应用落地的完整闭环。

前链条引擎聚焦于模型预训练的底层协议构建，由Prime Intellect、Nous Research、Pluralis.ai、Gensyn 等项目代表。它们致力于打造具备异步更新、稀疏通信与训练可验证性的系统架构，在去信任网络环境中实现高效、可靠的分布式训练能力，构成了去中心化训练的技术根基。

与此同时，Flock 作为中间层代表，通过联邦学习路径，融合模型聚合、链上验证与多方激励等机制，在训练与部署之间建立起可落地、可协作的桥梁，为多节点协同学习提供实践范式。

后链条生态则聚焦于模型的微调与应用层部署。项目如Pond、Bagel 与RPS Labs，围绕LoRA 微调方法展开：Bagel 提供链上可信验证机制，Pond 专注于图神经网络的小模型演化，RPS 则将微调模型应用于DeFi 场景的智能做市。它们通过推理API 与Agent SDK 等组件，为开发者和终端用户提供低门槛、可组合的模型调用与个性化定制方案，是去中心化AI 落地的重要入口。

我们相信，去中心化训练不仅是区块链精神在AI 时代的自然延伸，更是全球协作式智能生产力体系的基础设施雏形。未来，当我们回望这条充满挑战的前路征途，仍将以那句初心共勉：去中心化不只是手段，它本身就是价值。