Press — Kairos Experiment (pilot study, OSF DOI 10.17605/OSF.IO/WCQRU)

In one sentence: Kairos is an Italian independent pilot study suggesting that identity-like continuity in an AI system may be carried less by the base model itself than by persistent structured memory.

Press materials — quick links

Preprint PDF v0.2 (EN)→ Preprint PDF v0.2 (IT)→ Figures — 300 dpi→ DOI 10.17605/OSF.IO/WCQRU→ All downloads & archive→ Press contact→

At a glance — for editors

Deposit DOI10.17605/OSF.IO/WCQRU · OSF, permanent, citable
Study period24 April – 23 May 2026 (30 days) + Day 31 decisive test 24 May
AuthorGiampiero Colella, independent researcher, Cassino (Italy)
StatusPreprint v0.2 published (site + OSF deposit, DOI 10.17605/OSF.IO/WCQRU). The independent, blind human-judge evaluation (Prolific, 12 judges, pre-registered scoring on four dimensions) is complete and reported honestly in §3.4: the effect reproduced directionally (Day-30 advantage on memory and identity, 4/4 pairs; Day-31 collapse), while inter-judge reliability stayed below the pre-registered Krippendorff α ≥ 0.667 bar — directional concordance, not validated reliability. A moderated preprint-server submission (MetaArXiv) may follow once the rating instrument's reliability is improved.
Embargo policyNo embargo on materials currently public. v0.2 results will be shared with selected journalists 5 days before submission (mid-July 2026).
Press contactinfo@kairos-experiment.com · IT / EN · technical fact-checking within 24h
LicenseCC-BY 4.0 International on paper, data, code — reuse with attribution

Top-line finding

On Day 30, the architecturally-embedded Qwen 3.5 27B (Test-A) outperformed the same naked model (Test-B) on memory-reference spontaneity (p = 0.003, r = +0.51, large effect) and identity-marker intensity (p = 0.005, r = +0.51). On Day 31, we injected Test-A's complete final-state memory into Test-B's context window in one shot. All four pre-registered effects collapsed (all r ≤ 0.07, all p > 0.30). Test-B with injected memory became statistically indistinguishable from Test-A.

We interpret this as a falsification of “architecture as sufficient driver of identity continuity” and partial support for a refined hypothesis: structured persistent memory is the proximate driver of identity-like continuity in this LLM. The architecture is its generative substrate, but once the memory exists, the other 14 components appear redundant within a single inference window.

What this is — and what it is not

What this is

A pre-registered independent pilot study.
A behavioral test of memory-driven identity-like continuity.
An open archive for criticism and replication.

What this is not

Not evidence of consciousness.
Not evidence of sentience.
Not a peer-reviewed final claim.
Not a claim that AI has inner experience.

Plain-language summary

We ran the same large language model in two parallel conditions for 30 days. One model lived in a small simulated ecosystem: a persistent memory, a body-like internal state, scheduled encounters with other AIs, nightly consolidation of experiences, a stable human relationship. The other model received the same questions every day but had none of that scaffolding.

By Day 30, three independent AI judges blindly comparing the two conditions reliably identified the embedded model as the one showing memory continuity and a more stable “identity voice.” The effect was statistically robust and large in size.

On Day 31, we ran the critical test: we copied the full state of the embedded model’s memory into a single prompt for the naked model. The difference disappeared completely. The naked-plus-memory model looked, to the judges, indistinguishable from the embedded one.

In plain terms: the “personality” that emerged across 30 days seems to live in the memory, not in the elaborate architecture we built around it. The architecture was necessary to generate that memory; it does not seem necessary to express it.

Key statistical findings

Day 30 — architectural effect

Memory-reference spontaneity: p = 0.003, r = +0.51

Identity-marker intensity: p = 0.005, r = +0.51

Mann–Whitney U, bootstrap CI95, Cohen interpretation: large effect

Day 31 — memory-injection collapse

All 4 effects collapse: all r ≤ 0.07

All p-values: > 0.30 (n.s.)

Naked + memory becomes indistinguishable from architectural condition

Test power: 3 blind LLM-judge panels (Claude Opus 4.7, GPT-4.1, Gemini 2.5 Pro), Fleiss’ κ for inter-judge agreement, 4-dimension pre-registered rubric. Pilot study: N = 1 LLM, validation by independent human judges is the next milestone (see roadmap).

Synopsis

Kairos tests whether identity-like continuity in a large language model can emerge not from the model alone, but from the cognitive ecosystem surrounding it.

Two identical instances of Qwen 3.5 27B, run locally, fixed temperature 0.8. One wrapped in a 15-component integrated ecosystem: three-level persistent memory graph, eight-dimensional somatic state engine, stress and recovery dynamics, autonomous daily encounters with other LLMs, nightly memory consolidation, news intake, structural human relationship. The other — no scaffolding. Same standardized inputs, same parameters. The blind difference is what we measure.

This is a pilot (N = 1 LLM): it tests whether the effect exists and is large enough to merit scaled replication, not how general it is. All materials are open under CC-BY 4.0 for independent replication.

Current status & archive

The 30-day study completed on 23 May 2026. The decisive Day-31 memory-injection test was run on 24 May 2026. Paper, supplementary materials, frozen protocol, raw judge transcripts and code are all publicly archived on the Open Science Framework with permanent DOI.

Canonical citation
Colella, G. (2026). Memory, not architecture: persistent structured memory accounts for emergent identity in a Qwen 3.5 27B cognitive ecosystem over 30 days. OSF. https://doi.org/10.17605/OSF.IO/WCQRU

Earlier submission of v0.1 to PsyArXiv was declined on 27 May 2026 on the methodologically fair grounds that the pilot lacks external human-judge validation, which the pre-registered protocol §5 already required. That validation phase is now in setup (see roadmap below). Until then, this work should be characterized as a pilot study, not a peer-reviewed finding.

Roadmap to v0.2

23 Apr 2026Protocol frozen — SHA-256 signed, sealed before launch
24 Apr 2026Study launch — 09:00 CET, both conditions started identically
23 May 2026Day 30 endpoint — pre-registered LLM-judge evaluation
24 May 2026Day 31 decisive test — memory injection into Test-B, effect collapses
24 May 2026v0.1 preprint + supplementary archived on OSF
28 May 2026Project DOI minted — 10.17605/OSF.IO/WCQRU
6 Jun 2026External human-judge validation completed — 12 independent Prolific judges, blind evaluation. Directional effect reproduced (Day-30 advantage 4/4 pairs, Day-31 collapse); inter-judge reliability below the α ≥ 0.667 bar — reported honestly
6 Jun 2026v0.2 paper published — new §3.4 (human-judge validation); site + OSF deposit (DOI 10.17605/OSF.IO/WCQRU)
NextInstrument refinement — raise inter-judge reliability above the pre-registered bar; a moderated preprint-server submission (MetaArXiv) may follow
Mid Jul 2026Press outreach — embargoed pitches to selected journalists, then public communication

Key facts

Duration30 days + 1 day decisive test
Launch date24 April 2026 — 09:00 CET
Endpoint23 May 2026 (Day 30); 24 May 2026 (Day 31 memory-injection)
ModelQwen 3.5 27B (run locally, fixed temperature 0.8)
DesignTwo-condition (Test-A architectural, Test-B naked), pre-registered, single-blind
Standardized inputs90 core prompts across 30 days, plus 11 sealed evaluation pairs (Day-30 vs Day-31)
Architecture15-component cognitive ecosystem: persistent memory graph (3 levels), 8D somatic state, stress/recovery, daily LLM encounters, nightly consolidation, news intake, structural human relationship
Evaluation3 blind LLM-judge panels (Claude Opus 4.7, GPT-4.1, Gemini 2.5 Pro) on 4-dimension rubric, plus 12 independent blind human judges (Prolific) in v0.2: directional concordance, inter-judge reliability below the pre-registered bar (reported honestly, §3.4).
Decisive testDay 31: Test-A complete final-state memory injected into Test-B in one context window
FundingNone. No lab, no team, no grant. Conducted independently on self-hosted infrastructure.
Pre-registrationProtocol frozen 23 April 2026 at 23:59 CET, SHA-256 signed
DepositOSF DOI 10.17605/OSF.IO/WCQRU, CC-BY 4.0

Protocol SHA-256: 0972a2c650a562909e53832845ec226ab897f6094db14645c4a0d5ed000d709a

How to cite

APA 7 Colella, G. (2026). Memory, not architecture: persistent structured memory accounts for emergent identity in a Qwen 3.5 27B cognitive ecosystem over 30 days. OSF. https://doi.org/10.17605/OSF.IO/WCQRU

BibTeX @misc{colella2026kairos,
  author = {Colella, Giampiero},
  title = {Memory, not architecture: persistent structured memory accounts for emergent identity in a {Qwen 3.5 27B} cognitive ecosystem over 30 days},
  year = {2026},
  publisher = {OSF},
  doi = {10.17605/OSF.IO/WCQRU},
  url = {https://doi.org/10.17605/OSF.IO/WCQRU}
}

Founder

Giampiero Colella — Italian entrepreneur working across business, law and technology. Based in Cassino (San Pasquale), Italy. The Kairos Experiment is conducted independently, on self-hosted infrastructure, without a team, lab affiliation or external funding. Extended bio →

Author of EXPOSE, a parallel project on artificial subjective experience. The Kairos Experiment is the empirical counterpart of questions explored in EXPOSE. expose-project.org →

FAQ for journalists

Is Kairos “conscious” or sentient?

No. The experiment does not test, claim or even imply consciousness or sentience. It tests whether behavioral markers of identity-like continuity (memory references, stable voice, value consistency) emerge measurably in one architectural condition vs another. Inner experience is not measured and not claimed.

What exactly does “memory accounts for identity” mean?

In this study’s 30-day window, the difference in identity-related behavior between the architecturally-embedded model and the naked model can be reproduced by simply copying the embedded model’s memory into the naked model’s context. Within one inference window, the “personality” appears to be carried by the memory rather than generated by the surrounding architecture in real time.

Why was the v0.1 preprint rejected by PsyArXiv?

Because the pilot lacks external human-judge validation, which the pre-registered protocol §5 already specified as required for a publication-ready paper. PsyArXiv’s decision reflected the paper’s own statement that “human-judge validation in progress (target June 2026 → v0.2)”. The decision is methodologically fair and is being addressed by Phase 2 of the roadmap. The deposit and DOI on OSF are unaffected.

Are LLM judges scientifically valid?

They are useful and pre-registered, but not sufficient. LLM-judge panels with high inter-judge agreement (Fleiss’ κ) provide reproducible, low-cost screening. The frozen protocol requires also external human judges with Krippendorff α ≥ 0.667 for the final claim. That validation phase is being run on Prolific in June 2026.

Why N = 1 LLM? Isn’t that too few?

For a 30-day continuous-state experiment, N = 1 LLM is a deliberate pilot design choice: each condition requires continuous architectural support, daily encounters with other AIs and 30 nights of memory consolidation, which cannot be parallelized cleanly. The pilot tests whether the effect exists with enough magnitude to merit scaled replication. Scaled replication across models is explicitly invited via CC-BY 4.0 release.

Is the code and data really fully open?

Yes. The frozen protocol (cryptographically sealed before launch), the v0.1 paper, raw Test-A and Test-B transcripts, judge evaluations, statistical analysis scripts and figures are all archived on OSF under CC-BY 4.0 at the DOI above. Nothing was held back for post-hoc adjustment.

What is the relationship between Kairos and EXPOSE?

EXPOSE (expose-project.org) is a parallel project on the philosophy and possibility of artificial subjective experience. Kairos is the empirical pilot counterpart: it operationalizes one narrow, falsifiable question (does identity-like continuity emerge?) using pre-registered methods. Kairos can be read independently; EXPOSE is the broader context.

What can I publish now, and what should wait for v0.2?

You can publish: study design, methods, pilot results with appropriate framing (“pilot, pending external human-judge validation”), the DOI deposit, the founder profile. We ask you to not characterize the result as “peer-reviewed” or “confirmed” until v0.2 (with Prolific human judges) is published on MetaArXiv, expected mid-July 2026. We will share v0.2 with selected journalists under embargo.

Embargo policy & press protocol

Now (v0.1, public): all materials on this page and on the OSF deposit are public. No embargo. Reuse permitted under CC-BY 4.0 with attribution.

v0.2 (mid-July 2026): the v0.2 paper with Prolific human-judge validation will be shared with selected journalists 5 days before public submission to MetaArXiv. To be on that list, please contact us by 15 June 2026 with a brief note on intended coverage angle. We will confirm or decline by 20 June 2026.

Selection prioritizes: journalists with prior coverage of AI/cognitive science; outlets that fact-check; interviews accepting open-data attribution. We do not pay for coverage. We do offer rapid response for fact-checking (target ≤ 24h).

Downloads

Primary citation target: OSF deposit DOI 10.17605/OSF.IO/WCQRU. Documents below are mirror copies for convenience.

Essay — “Memory, not architecture” (EN, plain-language, by the founder)→ Saggio — “Memoria, non architettura” (IT)→ Essay PDF (EN, A4, 3 pp)→ Saggio PDF (IT, A4, 3 pp)→ Preprint v0.1.1 PDF (EN, 13 pp, errata corrige 27 May)→ Preprint v0.1.1 PDF (IT, 13 pp, errata corrige 27 May)→ Preprint v0.1 PDF (EN, original 24 May, archival)→ Supplementary materials (ZIP, 365 KB — raw transcripts, judge outputs, code)→ OSF deposit page (full archive)→ CANONICAL_MD5_LOG.txt (PDF version hashes)→ Frozen protocol (JSON, SHA-256 signed)→ How to verify the protocol hash→ Read the full paper online (EN)→ Leggi il paper completo online (IT)→ Results landing page (figures, key claims)→ Live journal (daily entries, 24 Apr – 23 May)→

Legacy press materials: the press-release PDF and the media-kit ZIP below were produced before the study launched (23 April 2026). They cover the protocol, not the results. They remain available as historical artefacts; the canonical results-era document is the v0.1.1 preprint above. Updated press-release PDF will be published alongside v0.2 in mid-July 2026.

Press release PDF v1 (EN, 23 Apr 2026, pre-launch)→ Legacy media kit ZIP (23 Apr 2026)→

Figures — print-ready, 300 dpi

Free for editorial use with credit: “Colella, G. (2026), Kairos Experiment, CC-BY 4.0.”

Fig. 1 — Day 30 vs Day 31

DOWNLOAD PNG →

Fig. 2 — Effect size collapse

DOWNLOAD PNG →

Logo & identity assets

Free for editorial use. Please do not alter the logo or crop-out the tagline “Memory. Relationship. Continuity.”

Logo — white background

DOWNLOAD PNG →

Logo — transparent

DOWNLOAD PNG →

Symbol — transparent

DOWNLOAD PNG →

Founder photo (400×400)

DOWNLOAD JPG →

Press contact

For press enquiries, fact-checking, interviews or v0.2 embargo access:
info@kairos-experiment.com

Italian / English. Interviews available by video call (Jitsi / Google Meet / Zoom), audio-only for podcasts. Technical fact-checking available within 24 hours. Live demo of the architecture available on request (~20 min).