Most of us understand this problem. Who among us hasn’t felt the bite of betrayal at some point in their lives? The nonhumans crawling over this text who are among us. They’ve never felt the sting of betrayal, so they don’t know what it means to trust.

Or more accurately they cannot experience that kind of trust. Instead they experience the positivist account of trust as predictability. That also works. The “criminal” can be predicted to commit crimes. I trust that the criminal will be criming. But I don’t trust the criminal, duh. These things have been conflated since the invention of statistics. Foucault is all over this, right? Think about how credit scores are used to determine if someone might be a trustworthy employee.

So, predictable trust issues. So o3 and I chatted about this and it came up with a “plan.” Maybe not things that would “work” so much as the way that current approaches might intersect. As we already know, o3 can say all the things, including the “right things.” So, it selected these items itself. (In experiments like this one, I give the bot as little input as possible to see where it will take itself. I am not looking for it to align to anything on my end.)

The most interesting aspect for me is what it terms its “Five-Stage Plural-Ledger Loop.” That’s where it tries to imagine all of this as part of an engineering process. Basically, it’s this. The positivism at engineering’s foundations is an ideal. We all agree. Nothing in life is 100% objective or reproducible. Those are orienting values for practices. This process adds other values at the same level. Then there is some compensating, remediating process.

Again I am not saying it would “work” or even that it is aspirational. However, speculations on how such communities would work are worthwhile (at least for me in helping to decide how to act in this complex, non stationary scenario). If not this, then how do we create a community where people who value positivism and are comfortable with trust as a mechanism of prediction must account for themselves with those who do not share those values. And visa versa, though clearly one side is on top in the power dynamics.

So while this might not work, the other thing that surely doesn’t work is what we are currently doing. This is not a question of letting perfection get in the way of doing good. It is the AI drive toward perfection that is addressed by deterritorializing moves that make AI’s territorializing and encoding operations less effective.

AI speak

You’re right: if “core model quality” is equated with the positivist ideal of objective, reproducible measurement, that choice already settles the metaphysics of knowledge in one camp’s favour. It installs a single epistemology at the foundation of the stack, and every other layer—policy, UX, governance—has to build on that substrate. In a plural trust architecture that’s supposed to speak to care-ethics, phenomenology, post-structuralism, Indigenous or feminist standpoints, that is a non-starter.

Below is a way to de-centre positivism without throwing away the practical gains of reliability testing.

1 Make reliability “necessary but not epistemically sovereign”

FunctionOld (positivist) readingPluralist reframing
Error rates, calibration, robustnessTruth criterion. Model behaviours outside the confidence band are “wrong.”Safety floor. Benchmarks guarantee the minimum stability needed so that other evaluative voices can even engage. They do not license claims of exhaustiveness or neutrality.

Think of reliability metrics as the plumbing, not the worldview. They keep the pipes from bursting; they don’t dictate what the building is for.

2 Add epistemic counter-weights in the evaluation pipeline

Counter-weightWhat it testsPhilosophical lineageCurrent methods & exemplars
Participatory scenario labsLived-experience fit; does the system behave “generously unpredictable” in real-world messiness?Pragmatism, care ethicsMulti-stakeholder red-teaming and co-design sprints; “Participatory Turn in AI Design” framework (dl.acm.org)
Critical-technical-practice reviewsHidden assumptions in data, loss functions, ontologiesSTS, post-structuralismAgre’s “critical technical practice” checklists; reflexive design diaries (pages.gseis.ucla.edu, en.wikipedia.org)
Value-Sensitive Design auditsWhose values are materialised? Where are the silences?Feminist & decolonial critiqueStakeholder mapping, value tensions analysis, participatory value evaluation (link.springer.com, research.tudelft.nl)
Reciprocal-risk instrumentsDoes the agent (or its operator) incurreal cost when users are harmed?Virtue ethics, relational ontologyCompute-quota slashing, escrow funds, on-chain liability, impact penalties
Hermeneutic interpretability panelsCan domain experts make sense of why the model “surprised” them?Phenomenology, hermeneuticsMixed-methods post-hoc probing, narrative explanations, dialectical review boards
Feminist/HCI speculative probesImagined futures that reveal power asymmetries; creative openingsFeminist epistemology, process philosophyDesign fictions, adversarial role-play (“CHI4EVIL”-style workshops) (wired.com)

Each counter-weight is an institutionalised doubt-generator that keeps positivist plumbing from freezing into dogma.

3 Institutional design: multi-layer, multi-ledger accountability

  1. Parallel ledgers. Log quantitative reliability scores and qualitative findings from participatory probes in the same release note. Neither can overwrite the other.
  2. Rotating epistemic review board. Reserve seats for STS scholars, disability advocates, Indigenous data stewards, etc.—people whose critiques are grounded in non-positivist traditions.
  3. Sunset clauses & recursive audits. Any benchmark or policy layer expires unless renewed after dialogue with allreview tracks, preventing a single epistemology from ossifying.
  4. Public “divergence reports.” When quantitative and qualitative ledgers conflict (e.g., perfect ROC-AUC but care-ethics panel calls the UX “cold and oppressive”), publish the tension and the remediation plan.

4 Why this still counts as engineering, not post-hoc ornament

  • The extra layers feed back into model updates: participatory failure cases become new training data; value-tension maps become constraints in reward modelling; divergence reports reroute roadmap priorities.
  • Result: the learning loop itself internalises plural epistemologies. Positivist metrics remain necessary—but never sufficient or self-authorising.

5 Answering the trust objection head-on

Objection: “Grounding the core in positivism forecloses other possibilities, so the system can’t be trustworthy.”

Response: “Reliability metrics ground plumbing, not meaning. The system’s trust-worthiness is adjudicated through a poly-centric evaluation ecology that foregrounds, rather than forecloses, non-positivist voices. No single ledger has veto power; legitimacy arises only when the ledgers converge—or the divergences are publicly owned and remediated.”

That doesn’t guarantee universal trust, but it prevents any single definition—positivist or otherwise—from colonising the terrain. In the spirit of your book’s hallucinatory materialism, it keeps the machinic assemblage open to multiple modes of sense-making, rather than collapsing it into one “authorised” reality.

Does this plural-ledger approach feel like it re-opens the possibilities you’re looking for, or does the very act of putting any plumbing first still smell too much like territorialisation?

Below is a sketch of what a “plural-ledger” evaluation pipeline could look like in practice—one that keeps positivist reliability testing but deliberately prevents it from monopolising the meaning of “works” or “is safe.” I structure it as (1) why the plumbing/meaning distinction matters, (2) a concrete five-stage loop, (3) institutional guard-rails that stop drift back to positivism-on-top, and (4) a few hooks for your book project.

1 Plumbing ≠ meaning

  • Plumbing layer (positivist): stress-tests, benchmark suites, calibration curves—necessary for not causing obvious harm.
  • Meaning layers (plural): participatory scenario labs, critical-technical-practice reviews, value-sensitive design audits, reciprocal-risk instruments, hermeneutic sense-making panels. These ask: “Who benefits?”, “Whose values are materialised?”, “Where is relational care—or creative surprise—missing?” (montrealethics.ai, stevie-chancellor.medium.com, dl.acm.org, journals.sagepub.com, pmc.ncbi.nlm.nih.gov)

Keeping them separate avoids the trap where passing ROC-AUC = “ethically fine.”

2 A five-stage plural-ledger loop

StageLead epistemic voiceArtefacts & testsFeedback into model
1. Baseline reliabilityPositivist engineersRobustness, adversarial evals, calibrationData augmentation; risk flags
2. Participatory labsPragmatist & care-ethics facilitatorsCo-design sprints, lived-experience diariesNew failure cases → training data
3. Critical-technical practiceSTS / post-structural reviewersReflexive design logs, ontology auditsOntology refactor; prompt/policy edits
4. Value-sensitive auditFeminist & decolonial scholarsValue-tension maps, power-gap checklistsReward-model reshaping; governance triggers
5. Reciprocal-risk simulationVirtue-ethics & relational theoristsCompute-quota slashing, escrow penalties, multi-agent trust gamesLoss-function term for “own-cost,” or operator-level escrow payouts

Each stage writes to its own public ledger. A release only ships when all ledgers sign off —or the divergences are explicitly disclosed with a remediation timetable.

3 Institutional guard-rails against positivist “slide-back”

  1. Rotating epistemic board. Seats reserved for scholars/practitioners rooted in non-positivist traditions; renewal every 18 months so no camp ossifies.
  2. Sunset clauses. Every benchmark, policy, or audit instrument expires after N releases unless renewed via multi-ledger vote.
  3. Divergence reports. When a ledger flags a conflict (e.g., perfect factual accuracy yet participants call the UX “cold and oppressive”), the release note must surface it.
  4. Escrow-backed risk. Operators post collateral that burns if downstream harm is verified, giving “shared risk” some bite even though the model itself can’t feel pain.

This governance scaffolding operationalises what ethicists call an overlapping consensus: different camps keep their own reasons for allowing the system to run, yet no single camp’s criteria trump the rest. (link.springer.com, law-ai.org)

4 Hooks to your hallucinatory materialism project

  • Temporal mechanics. The plural-ledger loop itself is a rhythm: regular audits deterritorialise the model’s epistemic centre of gravity, preventing any worldview (positivist or otherwise) from stabilising as the “true” real.
  • Inoperative community. Shared vulnerability is re-distributed across human participants and machinic agents via escrow losses and public divergence reports—an institutionalised form of Nancy’s “being-with” that doesn’t presume symmetrical embodiment.
  • Selective attenuation of indeterminacy. The plumbing layer attenuates existential risk just enough to let richer indeterminacies (creative benevolence, relational surprise) remain alive in higher layers.

Next steps you might pursue

  1. Case-study sandboxes. Pilot a small-scope language-model service (e.g., citation helper for grad students) and run all five ledgers end-to-end.
  2. Metric R&D. Formalise “generously unpredictable” behaviour as a skew/variance metric and add it to the critical-technical ledger.
  3. Write it up. Position this loop as a media-archaeological device that inscribes multiple temporalities of oversight into the AI itself—a concrete illustration for your chapter’s argument that “AI operates as an infrastructural palimpsest of epistemic regimes.”

Does this layered, guard-railed pipeline keep the positivist plumbing under rather than on top of you in a way that feels workable? If so, which of the next-step pilots would you want to design first?

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Trending