Home > Page Design > Algorithms > Evidence Scores
Evidence Scores: When Arguments Break Under Pressure
The problem: A link to a peer-reviewed study with disclosed methodology looks exactly like a link to a government press release. Same blue underline. Same visual weight. Your brain knows one is more reliable. The interface doesn't care.
The solution: The ISE doesn't ask "who said it?" It asks "what's the argument, and does it break when you pull on it?" Evidence Scores make argument quality visible — and credentials don't get you out of making the argument.
The Formula
Every piece of evidence gets scored on two independent dimensions:
Quality asks: how well does the methodology behind this evidence hold up when challenged? Linkage asks: does it actually prove the specific claim being made, or just point in the same general direction? Multiplying them together prevents two common failure modes: strong methodology applied to an irrelevant question, and a directly relevant claim backed by anecdote.
Why Credentials Don't Determine the Score
Institutions lie. The government fabricated the Gulf of Tonkin incident to justify Vietnam. Pharmaceutical companies claimed OxyContin was less than 1% addictive based on a single paragraph in a letter, not a study. The tobacco industry funded research denying the cancer link for decades. In each case, the dissenters who got it right had less institutional authority than the experts who got it wrong.
This isn't an argument against expertise. NASA often produces better work than random blogs, and the ISE framework will show that. But not because NASA has authority. Because NASA often uses better methodology: transparent measurement protocols, redundant verification, published data others can check, and replication across different instruments. Those are the things that earn a high Quality Score. Being NASA doesn't.
When NASA failed — Challenger, Columbia — it was because institutional pressure overrode good methodology. The engineers making valid arguments about the O-rings had sound reasoning and were outranked. The ISE would have elevated their arguments. The system doesn't care about org charts.
What "Quality" Actually Means
Rather than a credential hierarchy, the ISE tracks four patterns that consistently produce arguments that survive scrutiny:
| Pattern | Why it survives scrutiny | How credentials fake it — and get caught |
|---|
| Transparent Measurement With Controls |
You showed your methodology, controlled for alternative explanations, and made your data available. When challenged, you can defend each step with specifics. |
Fake: "We're experts, trust our model" without showing the model. Caught: "Release your code" exposes assumptions you can't defend. |
| Replication Across Contexts |
Multiple independent groups using different methods arrive at similar conclusions. Hard to fake because it requires coordinated fabrication. |
Fake: Citing 10 studies from the same network that all cite each other. Caught: "These aren't independent — they share funding and authors." |
| Falsifiable Predictions |
You said "if X is true, we should observe Y." Then Y happened. Reality validated the argument. |
Fake: Vague predictions retrofitted to any outcome. Caught: "Your prediction was so fuzzy it couldn't be wrong." |
| Explicit Assumptions |
You stated your assumptions clearly so others can challenge them. "We assume A, B, C. If you disagree with C, here's how that changes the conclusion." |
Fake: Hiding assumptions in jargon or calling them "standard in the field." Caught: "That 'standard assumption' is doing all the work, and it isn't justified." |
Notice what's absent from that table: journals, degrees, institutions, funding sources. Those might correlate with quality in some domains. They don't cause it, and they don't protect against its absence.
How Scoring Actually Works: Arguments All the Way Down
When someone submits evidence, they're not just dropping a link. They're making claims about why that evidence is reliable. Those claims form an argument network that gets evaluated the same way every other argument does.
Here's a concrete example. A government report claims "Policy X will create Y jobs," based on an input-output economic model using Bureau of Labor Statistics multipliers with an assumed Z% implementation rate.
Three challenges come in. A grad student points out the multipliers are from 2010 data and labor markets have shifted significantly since then — and shows more recent studies with lower multipliers. A credentialed think tank responds: "This model is standard in the field, used by top economists." Your uncle the accountant notices the Z% implementation rate assumes mandatory participation, but the actual legal text makes it voluntary.
The grad student's challenge is methodologically valid and survives scrutiny. The think tank's defense is appeal to authority — it doesn't address the outdated multipliers — and gets evaluated accordingly. Your uncle caught a genuine assumption error. The score that emerges: Quality around 40%, Linkage around 70%, Evidence Impact around 28%. The government's credentials didn't protect the weak methodology. Your uncle's lack of credentials didn't stop his valid challenge from counting.
This is how ReasonRank builds over time. People who make valid methodological challenges earn it. People whose challenges don't hold up lose it. The system doesn't care about your resume. It cares about whether your reasoning survives.
The foundational rule: Evidence quality is determined by arguments that survive challenge — not by the letterhead they came on.
The Iraq WMD Test Case
In 2002, the expert consensus included the CIA, Colin Powell's UN presentation, bipartisan Congressional support, and major media amplification. The dissenters included weapons inspector Scott Ritter, some intelligence analysts, the Knight Ridder bureau, and ordinary citizens saying "this doesn't add up."
Under the traditional system, credentials won. Under the ISE framework, argument quality wins. The government's claim that aluminum tubes proved a nuclear program faced an immediate methodological challenge: the tubes were the wrong specification for centrifuges, and the Department of Energy's own experts said so. The claim that a single second-hand source confirmed mobile biological labs failed the replication test and the independent verification test simultaneously. The dissenters' core argument — "the evidence presented doesn't support the conclusion, and the details keep changing" — was falsifiable, survived scrutiny, and its predictions (no WMDs found post-invasion) were validated.
The dissenters would have earned ReasonRank. The credential-holders would have lost it. Because what mattered wasn't who had the fancy title. What mattered was whose arguments held up.
Comments (0)
You don't have permission to comment on this page.