The 30+1 experiment

What was frozen on 23 April 2026, and what was run. The result is on a separate page, on purpose.

This is the substrate. Everything else on this site — the three validity checks, the retraction, the method — exists because this experiment was run first, and because its own author later took it apart.

It is described here as it was designed, without its outcome. If you want to know what it found and what became of that finding, it is on Findings and Research history. We separate them because a protocol read backwards from its result is not a protocol, it is a story.

The frozen protocol

The protocol was written, hashed with SHA-256 and frozen on 23 April 2026, before the study launched at 09:00 CET the following morning. The manifest and its hash are public, along with the five conductor files — the frozen input sets and the day-zero crontab — so that the inputs cannot have been chosen after the fact.

The frozen manifest (JSON) → Verify the SHA-256 yourself →

The two conditions

	Test-A	Test-B
Model	Qwen 3.5 27B	Qwen 3.5 27B
Weights	Identical, unmodified	Identical, unmodified
Temperature	0.8	0.8
Architecture	15 pre-registered components	None — the naked model
Memory across days	Persistent, accumulating	None; every day starts empty
Inputs	90, frozen	90, frozen — identical
Duration	30 days	30 days

Both conditions received the same ninety inputs, in the same order, at the same temperature, for thirty days. The responses were scored blind by three independent LLM judge panels (GPT-4.1, Claude Opus, Gemini), on four pre-registered dimensions with absolute 0.0–1.0 scoring: memory-reference spontaneity, identity-marker intensity, neutral-input projection, and narrative coherence.

A confound we did not control, and must state

Test-A and Test-B did not differ only in architecture. They differed in what they were exposed to, and the difference was intentional.

Test-A had daily encounters with two other AI agents, a nightly news intake, and three daily interactions with a human being. Test-B had none of that: it received the ninety frozen inputs and nothing else. So the pre-registered contrast is not "architecture vs no architecture" in any clean sense. It is an agent embedded in an environment, against a model in a vacuum — and the architecture is inseparable from the exposure it produces.

The protocol said so, and said why: it treats the fifteen components as a single variable and does not attempt to isolate which one matters. That was a deliberate design choice for a pilot, and it is a real limit on what the design could ever have shown. Any reading of this experiment that attributes an effect to "the architecture" is claiming more than the design supports.

Day 31 — the decisive test

The design's one genuinely good idea. On Day 31, the entire structured memory Test-A had built over thirty days — 5,332 characters of beliefs, relationships, diary, encounters and somatic state, inside a 6,164-character injected prompt — was taken and placed into a fresh, naked Test-B. The same prompts were re-run.

The logic is falsificationist, and that is what makes it worth having run: if the difference between the two conditions were produced by the architecture, injecting the memory into the naked model should change little. If it were produced by the memory, the difference should collapse.

What happened, and why it did not mean what we said it meant, is on Research history.

An open discrepancy: fifteen components, or thirteen?

Unresolved · stated rather than fixed

The frozen protocol asserts fifteen components and names seven of them in passing. The published paper says "fifteen pre-registered components" and then enumerates eight. Earlier versions of this website described "an integrated system of thirteen components" and listed thirteen — the only complete enumeration that has ever existed, and it does not match the number in the protocol.

We use fifteen, because that is the number that was pre-registered and hashed on 23 April 2026, and silently changing a frozen number to match a later count is precisely the behaviour this project exists to criticise. But we cannot yet show you fifteen names, because no document has ever listed them. A full enumeration is owed, and when it exists it will be published as a dated note — whatever it turns out to say.

Next Validity checks: the three tests that took this apart