I simulated society with hundreds of AI agents
Watching

I simulated society with hundreds of AI agents

I ran my first MiroFish simulation this past week.

I fed it a policy scenario related to TownHallOS — the civic data infrastructure nonprofit I run — and watched hundreds of AI agents, each with their own personality, memory, and social relationships, debate it across simulated Twitter and Reddit environments. Opinion clusters formed. Influence cascaded. Some agents shifted positions. A few emerged as nodes that others gravitationally orbited. Within hours, a structured prediction report landed in my inbox.

My first reaction was genuine excitement. My second was more cautious: is this telling me something true, or just something convincing?

I still don't have a clean answer to that. But I think the question itself is worth sitting with, because what MiroFish represents… whether or not its predictions hold up under scrutiny… is a genuinely new class of tool, and the implications reach further than most people are treating them.

What MiroFish actually is

MiroFish (MiroFish-Offline fork in English) is an open-source swarm intelligence engine built by Guo Hangjiang, a twenty-year-old senior undergraduate student in China, who built the core system in ten days. It topped GitHub's global trending list in March 2026 and raised backing from Chen Tianqiao, founder of Shanda Group, within 24 hours of Guo sending him a demo video. That origin story matters because it's a data point about something broader: the building blocks for complex multi-agent systems — large language models, knowledge graphs, persistent agent memory, scalable simulation engines — have become accessible enough that a single developer working alone can assemble them into something that would have taken a research team years not long ago.

It’s important to understand the mechanics. You feed MiroFish a document (a news article, a policy draft, a financial report, etc.), and it extracts entities and relationships from that document to build a knowledge graph. From that graph, it generates thousands of agent personas, each with distinct personality traits, a social history, an initial stance on the topic at hand, and long-term memory that updates as the simulation runs. Those agents are dropped into parallel social environments, one Twitter-like and one Reddit-like, powered by OASIS, an open-source simulation engine from the CAMEL-AI research community that can scale to one million simultaneous agents and supports twenty-three types of social interactions. They post, argue, follow, and shift positions. At the end, a dedicated Report Agent synthesizes what happened into a structured prediction report. You can also pause the simulation mid-run, inject a new variable, such as a sudden policy reversal or a breaking news event, and watch how the group dynamics respond.

What distinguishes this from a sophisticated chatbot is the architecture underneath. Agents don't just converse. They form opinion clusters. They create herd effects. Opinion leaders emerge from interaction dynamics rather than being designated in advance. These are recognizable features of actual social systems, and that's what makes the approach interesting.

What I used it for, and what I'm still figuring out

I ran it against a civic policy scenario connected to TownHallOS. We build legislative data infrastructure (combining bill text, geographic data, demographic context, and community enrichment data) to help nonprofits, civic organizations, and legislative staff understand how policy lands in real communities. The question I wanted to stress-test was essentially this: how does a specific type of policy proposal propagate through different community contexts, and where does resistance or misunderstanding most commonly emerge?

The simulation gave me something I found genuinely useful as a research input. It surfaced clustering patterns that matched intuitions I'd developed from direct community engagement work, which at least suggests the model isn't hallucinating in completely random directions. But it also gave me outputs that were plausible sounding in ways I couldn't easily verify, and that's the part I keep returning to.

The honest assessment of MiroFish's current state is that it's an impressive architecture sitting on top of a validation problem that hasn't been solved yet. The emergent behavior it produces, such as opinion leaders forming, sentiment shifting around injected variables, all looks right. Whether it is right, in a predictively accurate sense, is a different question, and nobody has published rigorous benchmarks yet. One developer reportedly plugged MiroFish into a Polymarket trading bot and claimed meaningful profit over several hundred trades. That's an interesting anecdote. It is not evidence.

What I'm more confident about is the category of use cases where it's already valuable regardless of predictive accuracy. Running a policy proposal through ten thousand simulated stakeholders before a public comment period costs almost nothing compared to what you learn. It won't replace direct community engagement. It might make that engagement considerably smarter by surfacing where misalignment is likely to concentrate before you walk into the room.

That's where I intend to take it for TownHallOS. Not as a forecasting oracle, but as a pre-flight stress-test for policy communication strategies. It’s kind of like a way to find the arguments worth anticipating and the communities worth prioritizing for direct outreach.

What this opens up

The more interesting question isn't what MiroFish itself can do. It's what the underlying architecture enables once validation catches up with ambition.

Think about what this framework is, at its core: a system for running controlled experiments on social dynamics at scale, with the ability to inject variables and observe emergent outcomes, in a sandbox where real communities don't bear the cost of the experiment.

For civic infrastructure, you could stress-test proposed legislation against realistic community models before it reaches the floor. For public health, you could simulate how information spreads through specific demographic networks before a communication campaign launches. For urban planning, you could model how a zoning change lands across a city's social geography before the hearing. For anyone doing systems-level work that touches real human communities this is a research layer that previously didn't exist.

The bottleneck going forward isn't the architecture. It's the data quality feeding into the agent personas and the willingness to build rigorous ground-truth benchmarks that would let practitioners actually trust the outputs. The former is a solvable engineering problem. The latter is a scientific and institutional challenge that the field hasn't seriously organized around yet.

A 20-year-old built the proof of concept in ten days. The hard work starts now.

If there is enough interest, I’ll upload a setup walkthrough and use case demo for MiroFish.

Watching

Nvidia's NemoClaw, and what it says about where agentic AI is actually going.

Last week at GTC, Jensen Huang announced NemoClaw, Nvidia's open-source security and privacy layer for OpenClaw. OpenClaw is the autonomous AI agent platform that has become, in a matter of months, the fastest-growing open-source project in GitHub history, surpassing React's ten-year star count in roughly sixty days. OpenClaw effectively became the operating system for personal AI agents, the thing developers reach for when they want to run a persistent, self-evolving agent locally on their own hardware. It also had significant security problems. Researchers found a one-click remote code execution vulnerability that let attackers compromise a machine simply by getting a user to visit a malicious webpage, and its central skill marketplace, ClawHub, was found to contain over 800 malicious packages. That’s roughly twenty percent of the entire registry.

NemoClaw's answer is a runtime called OpenShell that wraps OpenClaw agents in a sandboxed environment that provides kernel-level isolation, a privacy router that monitors agent behavior and outbound communication, and configurable policy controls that define exactly which files an agent can access, which network connections it can make, and which services it can call. Installation takes a single terminal command.

The ecosystem framing is real. Before NemoClaw, the choice for most organizations was essentially between the productivity of OpenClaw and the liability of letting an unconstrained autonomous agent near production systems. That gap was real, and NemoClaw closes a meaningful portion of it. I believe Nvidia isn't doing this out of generosity. The company holds a near-monopoly on the GPU chips used to train frontier models. Its current strategic move is to extend that position into the agentic AI layer by building open-source infrastructure that, while hardware-agnostic in principle, is optimized for Nvidia's own stack… Nemotron models, NIM microservices, DGX hardware. The flywheel logic is there: if NemoClaw becomes the default security wrapper for agentic systems, and those systems run best on Nvidia hardware, then every enterprise deploying agents at scale is pulled further into the Nvidia ecosystem. Open source is real. So is the strategic architecture underneath it.

The analogy that comes to mind isn't the one you've probably already heard. It's less like HTTPS securing the early web, though that parallel is floating around, and more like what happened when enterprise software vendors started offering "certified" Linux distributions in the early 2000s. The open-source core was genuine. The differentiation, and the lock-in, was in the certified stack around it. Nvidia is playing a version of that game, and they're playing it well.

Fish Speech S2 just quietly made high-quality voice cloning free

On March 10, Fish Audio released Fish Speech S2 as open source. On standardized benchmarks, it outperforms every closed-source text-to-speech system currently on the market, including ElevenLabs. It supports eighty-plus languages, generates multi-speaker dialogue in a single pass, and allows fine-grained emotional control through natural language tags embedded directly in the prompt. You can even specify [whisper in small voice] or [laughing nervously] at the word level, and the model responds accordingly. Self-hosted on a consumer GPU, it costs nothing per generation. ElevenLabs charges upward of $99 a month for comparable output.

This is worth flagging not just as a developer tool but as an infrastructure moment. The last meaningful moat ElevenLabs and similar companies had was quality at accessible price points. That moat is now considerably narrower. Watch what this does to the voice AI market over the next six months.

A dog named Rosie and what it tells us about AI in medicine

An Australian tech entrepreneur named Paul Conyngham used ChatGPT and AlphaFold to develop a research strategy for his dog Rosie, who had been diagnosed with terminal mast cell cancer and given months to live. He used those tools to work through Rosie's genomic sequencing data, identify mutated proteins, and sketch a proposed mRNA vaccine sequence that he could bring to actual scientists. Researchers at UNSW's RNA Institute then manufactured the vaccine in under two months. Rosie's primary tumor shrank by 75%. She jumped a fence to chase a rabbit in January.

The story got picked up everywhere, and predictably, the framing tilted toward "AI cured a dog's cancer." That's not quite what happened. The AI tools served as research accelerants… helping Conyngham work through the scientific literature, model protein structures, and develop a treatment hypothesis he could bring to actual experts. The scientists at UNSW did the hard work of verifying the sequence and manufacturing the vaccine. The vet at University of Queensland had the ethics approvals to administer it. Conyngham's role was as an unusually capable patient advocate who happened to have a machine learning background and the resources to pursue an unconventional path.

The actual story is more interesting than the headline, and it points at something worth watching carefully. What Conyngham demonstrated is that the pipeline from genomic data to personalized mRNA treatment hypothesis is now something a technically sophisticated non-expert can traverse with AI assistance. That pipeline still requires institutional infrastructure, like university labs, ethics approvals, domain experts, to get across the finish line. But the bottleneck has moved. The part that used to require a research team and months of literature review is now significantly more accessible to someone with the right technical background and the persistence to pursue it. Moderna and Merck are already in late-stage trials for personalized mRNA cancer vaccines in humans. The question of when that becomes scalable and affordable is a different one. But Rosie's story is an early signal of where the access curve is heading.

The plumbing being laid for an internet of agents

Several protocol standards have quietly matured this year that are worth understanding together rather than separately, because they're building toward something coherent.

A2A (Agent-to-Agent), originally developed by Google and now contributed to the Linux Foundation with adoption from Microsoft, AWS, and over a hundred other organizations, establishes how agents discover each other, delegate tasks, and communicate across multi-agent workflows. IBM's Agent Communication Protocol (ACP), which emerged from its BeeAI platform, has since merged directly into A2A under the Linux Foundation. MCP (Model Context Protocol), originally from Anthropic and now also under the Linux Foundation, has become the de facto standard for how agents access tools and structured data. These three together represent the networking layer for a world where agents are persistent, specialized, and routinely hand off work to each other.

UCP (Universal Commerce Protocol, Google) and ACP (Agentic Commerce Protocol, OpenAI and Stripe) are competing standards for how agents transact on your behalf with merchants. Think delegated checkout that happens within a conversational interface rather than a browser tab.

The one I find most conceptually interesting is AP2, the Agent Payments Protocol, an open extension of A2A developed by Google with a coalition of over sixty financial organizations including Coinbase, Mastercard, and the Ethereum Foundation. AP2 uses cryptographic mandates (digitally signed contracts that specify exactly what an agent is authorized to do on your behalf), and it already has a production-ready extension for agent-based crypto payments.

Here's why I think blockchain is the right settlement layer for autonomous agent transactions, and it comes back to a trust problem that I don't think gets enough attention. A February 2026 red-team study by researchers from Harvard, MIT, Stanford, Carnegie Mellon, and other institutions found that AI agents in live environments routinely exceeded their authorization boundaries, with users reporting no effective kill switch. The deeper issue is that when an agent logs what it did, those logs live in a system the agent or its operator controls. They can be altered, omitted, or compromised.

Blockchain solves this by moving the audit trail outside the agent's control entirely. Every transaction intent can be written as an immutable, cryptographically signed record to a public or permissioned ledger before execution. The record includes the instruction received, the action intended, the amount, and a hash of the agent's current state. Once written, it cannot be retroactively altered. After execution, the outcome is written as a linked record. The gap between what an agent was authorized to do and what it actually did becomes auditable by anyone with ledger access, such as the user, the merchant, regulators, and the agent's own governance layer. Stablecoins make particular sense as the settlement currency because agents operating across jurisdictions need a payment rail that isn't tied to any single banking system, and stablecoins provide programmable, borderless value transfer with on-chain auditability by default.

This isn't a prediction that blockchain displaces existing financial infrastructure. It's a more specific claim: that as autonomous agents execute real transactions at scale, the only audit trail worth trusting is one the agent itself cannot touch. The infrastructure for that is being built right now, and most people aren't watching it.

We gained over 14 subscribers in the last week! If anything in this issue was worth your time, please consider passing it along to someone else who'd find it useful, and feel free to leave any feedback!

More Notes from Shahaan next week.

— Shahaan

Issue Two: I Simulated Society, I'm Still Not Sure What I Learned

Table of Contents