| 
View
 

Evidence Tiers and Objective Criteria Would Make Quality Visible

Page history last edited by Mike 3 months, 3 weeks ago

Current Debate Systems Are Designed to Manipulate, Not Inform

Linking Evidence to Conclusions Would Automatically Update Truth

Grouping Similar Arguments and Detecting Fallacies Would End Repetitive Debates

Evidence Tiers and Objective Criteria Would Make Quality Visible

Wikipedia, Google, and Stack Overflow Already Proved This Model Works

 

Evidence Tiers and Objective Criteria Would Make Quality Visible

Part 4 of 5: Debate Reform Series

The thesis: Treating peer-reviewed studies the same as blog posts is insane. Every debate needs category-level objective criteria established FIRST, then evidence evaluated by tier. This makes quality measurable instead of subjective.


The Insanity: All Evidence Treated Equally

The Current Disaster

In every debate, on every platform, this happens:

Participant A: "Here's a peer-reviewed study from Johns Hopkins showing X"

Participant B: "Here's a blog post from RandoOpinions.com saying not-X"

Platform's response: Both treated equally. Both get same visibility. No distinction made.

Audience conclusion: "Experts disagree, I guess it's just opinion"

This is insane.

A peer-reviewed study that survived:

  • Rigorous methodology review
  • Statistical analysis verification
  • Expert scrutiny
  • Replication attempts
  • Publication in vetted journal

...gets treated the same as someone's blog written in their basement with zero fact-checking.

We wouldn't accept this anywhere else:

  • Court: "Your Honor, I have this legal precedent" vs. "I have this blog post" → Court only considers precedent
  • Medicine: "This drug passed FDA trials" vs. "This herb has a testimonial" → Doctor only prescribes proven drug
  • Engineering: "This design passed stress tests" vs. "I think this looks sturdy" → Engineer only uses tested design

But in public debate? Blog post = peer-reviewed study. Both cited as if equivalent.


The Four-Tier Evidence System

Making Quality Measurable

Tier 1: Highest Quality (Score: 85-95)

What qualifies:

  • Peer-reviewed studies in established journals
  • Government statistics and official data
  • Certified measurements and testing
  • Systematic reviews and meta-analyses
  • Replicated findings

Examples:

  • IPCC climate reports (thousands of peer-reviewed studies synthesized)
  • Census Bureau demographic data
  • FDA drug trial results
  • Consumer Reports lab testing
  • Multiple universities replicating same finding

Why Tier 1:

  • Multiple expert review layers
  • Methodology transparent and verified
  • Data available for scrutiny
  • Replication possible and attempted
  • Institutional accountability

Base Score: 90


Tier 2: Good Quality (Score: 70-80)

What qualifies:

  • Expert analysis from credentialed professionals
  • Think tank research with transparent methodology
  • Professional organization consensus statements
  • Investigative journalism from established outlets
  • Industry data verified by third parties

Examples:

  • Brookings Institution policy analysis
  • American Medical Association guidelines
  • ProPublica investigative report
  • McKinsey consulting research
  • IEEE engineering standards

Why Tier 2:

  • Expert-level analysis
  • Transparent methodology
  • Institutional credibility
  • Not peer-reviewed but professionally vetted
  • Accountable sources

Base Score: 75


Tier 3: Moderate Quality (Score: 50-65)

What qualifies:

  • Quality journalism citing experts
  • Aggregated user reviews (large sample)
  • Professional opinion pieces with citations
  • Documentary films with verifiable sources
  • Trade publication reporting

Examples:

  • New York Times news article citing studies
  • Amazon reviews (10,000+ sample)
  • Wall Street Journal editorial with data
  • Well-researched documentary
  • Industry magazine analysis

Why Tier 3:

  • Some vetting process
  • Sources cited (though not peer-reviewed)
  • Professional standards applied
  • Large sample sizes where applicable
  • Generally reliable but not rigorously verified

Base Score: 58


Tier 4: Lowest Quality (Score: 20-40)

What qualifies:

  • Opinion blogs without citations
  • Individual testimonials and anecdotes
  • Social media posts
  • Advocacy organization claims (uncited)
  • Self-published books without peer review

Examples:

  • Personal blog: "In my experience..."
  • Facebook post: "My cousin said..."
  • YouTube comment: "I heard that..."
  • Advocacy site: "Studies show..." [no links]
  • Self-published book with no references

Why Tier 4:

  • No vetting process
  • No expert review
  • Anecdotal rather than systematic
  • No accountability
  • High bias potential

Base Score: 30


How Tiers Affect Argument Scores

From Part 2's scoring formula:

Argument Score = (Evidence Quality × Logical Validity × Linkage Strength) + Expert Weighting

Evidence tier determines base quality score.

Example 1: High-quality evidence

  • Tier 1 study (base 90)
  • Logical validity 95% (no fallacies)
  • Linkage strength 90% (directly relevant)
  • Contribution: 90 × 0.95 × 0.90 = 77

Example 2: Low-quality evidence

  • Tier 4 blog (base 30)
  • Logical validity 90% (actually sound reasoning)
  • Linkage strength 85% (relevant to claim)
  • Contribution: 30 × 0.90 × 0.85 = 23

The tier difference is explicit and measurable.


Why This Matters: The Volume Problem

Current Gish Gallop Success

Gish Gallop: Overwhelming opponent with volume of weak sources.

Example: Anti-vaccine argument

"Look at all this evidence vaccines are dangerous!"

  • 50 blog posts
  • 30 personal testimonials
  • 20 YouTube videos
  • 15 advocacy site articles
  • 10 social media threads

Total: 125 sources cited

Current platform response: "Wow, 125 sources, must be true!"

Reality check with tiers:

125 sources breakdown:
- Tier 1: 0
- Tier 2: 0
- Tier 3: 5 (news articles about controversy)
- Tier 4: 120 (blogs, testimonials, social media)

Average quality: (0×90 + 0×75 + 5×58 + 120×30) / 125 = 31.5

vs.

Vaccine safety argument:
- 50 Tier 1 peer-reviewed studies
- 10 Tier 2 CDC/WHO analyses
- 5 Tier 3 news reports

Average quality: (50×90 + 10×75 + 5×58) / 65 = 82.8

Volume of weak sources: 31.5 Smaller number of strong sources: 82.8

Quality beats quantity when tiers are explicit.


Real Example: Hydroxychloroquine Debate

March 2020: Initial hype

Claim: "Hydroxychloroquine cures COVID-19"

Evidence cited (pro):

  • French study (small sample, methodological issues) [Tier 3: 55]
  • Trump tweets [Tier 4: 25]
  • Fox News segments [Tier 4: 30]
  • Anecdotal doctor testimonials [Tier 4: 25]
  • Social media success stories [Tier 4: 20]

Average supporting evidence: 31


June 2020: Real studies complete

Evidence cited (against):

  • Large RCT showing no benefit [Tier 1: 90]
  • WHO SOLIDARITY trial [Tier 1: 90]
  • NIH analysis [Tier 1: 88]
  • Multiple hospital studies [Tier 1: 87]
  • Lancet meta-analysis [Tier 1: 90]

Average contradicting evidence: 89


With tier-explicit scoring:

"Hydroxychloroquine cures COVID" 
├─ Supporting: Average 31 (mostly Tier 4)
├─ Contradicting: Average 89 (all Tier 1)
└─ Final Score: 15 (strong contradicting evidence dominates)

Clear conclusion: Does not work, evidence overwhelming.

Without tiers: "Experts disagree" (treating blog posts = peer-reviewed studies)

With tiers: "Weak anecdotal support vs. strong systematic evidence" (clear winner)


Objective Criteria: Define "Best" Before Evaluating

The "Best" Problem

Every product debate has this structure:

"X is better than Y"

Immediate responses:

  • "Better how?"
  • "Depends what you prioritize"
  • "Better for whom?"
  • "In what context?"

The fundamental issue: Can't evaluate "best" without first defining what you're measuring.


The Solution: Category-Level Criteria First

Process:

Step 1: Establish what makes ANY item in this category good

  • Independent of specific brands
  • Measurable criteria
  • Agreed upon by category experts
  • Weighted by importance

Step 2: THEN evaluate specific items against those criteria

  • Objective measurements
  • Evidence-based scoring
  • Trade-offs explicit
  • Clear comparisons possible

This isn't revolutionary. It's how every serious evaluation works.


Example 1: Product Debate (Trucks)

Wrong Approach (Current Mess)

Claim: "Ford makes the best trucks"

Debate immediately devolves into:

  • "Best at what?"
  • "Chevy is better for [different thing]"
  • "Depends what you need"
  • "My dad had a Ford and it was great/terrible"
  • No resolution possible

Right Approach (Criteria First)

Step 1: Define objective criteria for truck category

What makes ANY truck good?

  1. Towing Capacity (measurable: lbs)
  • Importance weight: 90% (critical for truck buyers)
  1. Payload Capacity (measurable: lbs)
  • Importance weight: 85%
  1. Reliability (measurable: Consumer Reports, J.D. Power)
  • Importance weight: 95%
  1. Fuel Economy (measurable: MPG)
  • Importance weight: 70%
  1. Total Cost of Ownership (measurable: 5-year TCO)
  • Importance weight: 80%
  1. Safety Scores (measurable: IIHS, NHTSA)
  • Importance weight: 90%

These are category standards, not brand-specific.


Step 2: Evaluate specific trucks against criteria

Ford F-150:

  • Towing: 14,000 lbs [Score: 90]
  • Payload: 3,325 lbs [Score: 85]
  • Reliability: Above average [Score: 78, Consumer Reports Tier 1]
  • Fuel Economy: 22 MPG combined [Score: 75]
  • TCO: $52,000 [Score: 70]
  • Safety: 5-star overall [Score: 95, NHTSA Tier 1]

Weighted Score: 83

Chevy Silverado:

  • Towing: 13,300 lbs [Score: 85]
  • Payload: 2,280 lbs [Score: 70]
  • Reliability: Average [Score: 72]
  • Fuel Economy: 20 MPG combined [Score: 68]
  • TCO: $49,500 [Score: 75]
  • Safety: 5-star overall [Score: 95]

Weighted Score: 77

Toyota Tundra:

  • Towing: 12,000 lbs [Score: 78]
  • Payload: 1,730 lbs [Score: 60]
  • Reliability: Well above average [Score: 92, legendary reliability]
  • Fuel Economy: 18 MPG combined [Score: 60]
  • TCO: $51,000 [Score: 72]
  • Safety: 4-star overall [Score: 85]

Weighted Score: 75


Step 3: Make trade-offs explicit

Analysis:

"Ford F-150 scores highest overall (83) due to strong performance across most categories, particularly towing capacity, payload, and safety."

BUT:

"If you prioritize reliability above all else, Toyota Tundra (92 reliability) may be better choice despite lower overall score."

"If you prioritize value (lowest TCO), Chevy Silverado offers good balance."

"If you need maximum capability, Ford wins on towing and payload."

Different priorities = different "best" choices, but measurements are objective.


Why This Works

Benefits:

  1. Objective measurements: Towing capacity is 14,000 lbs or it isn't
  2. Evidence-based: Reliability from Consumer Reports (Tier 1)
  3. Trade-offs explicit: Can't have maximum towing AND best fuel economy
  4. User profiles: Different needs = different optimal choices
  5. No more endless subjective debates

The key: Criteria established BEFORE evaluating brands.


Example 2: Policy Debate (Healthcare)

Wrong Approach (Current Mess)

Claim: "Universal healthcare is the best system"

Debate immediately becomes:

  • "Best by what measure?"
  • "Depends what you value"
  • "My preferred system is better"
  • "That's just your opinion"
  • Endless circular argument

Right Approach (Criteria First)

Step 1: Define objective criteria for healthcare system category

What makes ANY healthcare system good?

  1. Coverage Rate (measurable: % population covered)
  • Evidence: Census data, OECD statistics
  • Importance: 95%
  1. Health Outcomes (measurable: life expectancy, mortality rates)
  • Evidence: WHO data, medical research
  • Importance: 100%
  1. Cost Per Capita (measurable: annual spending)
  • Evidence: OECD health data
  • Importance: 90%
  1. Administrative Efficiency (measurable: overhead %)
  • Evidence: Healthcare spending analysis
  • Importance: 75%
  1. Wait Times (measurable: days to specialist, procedure)
  • Evidence: Patient surveys, government data
  • Importance: 80%
  1. Patient Satisfaction (measurable: surveys)
  • Evidence: Commonwealth Fund surveys
  • Importance: 70%
  1. Innovation (measurable: new treatments, medical patents)
  • Evidence: Patent data, FDA approvals
  • Importance: 65%

These criteria apply to ALL healthcare systems.


Step 2: Evaluate specific systems against criteria

US System (mixed private/public):

  • Coverage: 92% [Score: 70, evidence: Census Bureau Tier 1]
  • Outcomes: 78.9 years life expectancy [Score: 75, WHO Tier 1]
  • Cost: $12,318/capita [Score: 40, highest in world, OECD Tier 1]
  • Admin: 8% overhead [Score: 60, high overhead]
  • Wait Times: Varies by insurance [Score: 70]
  • Satisfaction: 69% satisfied [Score: 70, Commonwealth Fund Tier 2]
  • Innovation: High [Score: 90, patent data Tier 1]

Overall Weighted Score: 67

UK System (NHS single-payer):

  • Coverage: 100% [Score: 100]
  • Outcomes: 81.3 years [Score: 85]
  • Cost: $4,500/capita [Score: 85]
  • Admin: 2% overhead [Score: 95]
  • Wait Times: Longer for non-urgent [Score: 60]
  • Satisfaction: 72% satisfied [Score: 75]
  • Innovation: Moderate [Score: 70]

Overall Weighted Score: 82

Germany System (multi-payer universal):

  • Coverage: 100% [Score: 100]
  • Outcomes: 81.0 years [Score: 84]
  • Cost: $6,739/capita [Score: 75]
  • Admin: 5% overhead [Score: 80]
  • Wait Times: Short [Score: 85]
  • Satisfaction: 78% satisfied [Score: 85]
  • Innovation: High [Score: 85]

Overall Weighted Score: 85


Step 3: Evidence quality matters

All measurements above cited from:

  • WHO statistics [Tier 1]
  • OECD official data [Tier 1]
  • Census Bureau [Tier 1]
  • Commonwealth Fund surveys [Tier 2]

NOT cited from:

  • Think tank advocacy pieces [Tier 3]
  • Blog posts about healthcare [Tier 4]
  • Personal anecdotes [Tier 4]

Quality difference is explicit and measurable.


Step 4: Value trade-offs become clear

Analysis:

"By objective criteria, Germany (85) and UK (82) score higher than US (67), primarily due to universal coverage and lower costs."

BUT:

"If you weight innovation extremely high (breakthrough treatments), US system's advantage there might offset other weaknesses for some people."

"If you prioritize fast access over universal coverage, different trade-off emerges."

"If you weight cost minimization highest, single-payer systems win clearly."

Different values = different optimal policies, but measurements are objective.


Example 3: Consumer Decision (Buying a Car)

Without Objective Criteria (Normal Experience)

Shopping for car:

  • Dealer: "This is a great car!"
  • Friend: "I heard that brand is unreliable"
  • Online review: "Best car ever!"
  • Different online review: "Worst car ever!"
  • Consumer Reports: [paywalled, ignored]

Decision process:

  • Confusion
  • Reliance on gut feeling or brand loyalty
  • Influenced by whatever source you saw last
  • No systematic comparison

With Objective Criteria (Better Approach)

Step 1: Define your priorities (personal objective criteria)

What matters to me in a car?

  1. Reliability: 95% importance
  2. Fuel economy: 80%
  3. Safety: 90%
  4. Total cost (5 years): 85%
  5. Cargo space: 60%
  6. Tech features: 50%

Step 2: Gather Tier 1-2 evidence for candidates

Honda Accord:

  • Reliability: 4.5/5 [Consumer Reports Tier 1: 90]
  • Fuel economy: 32 MPG combined [EPA Tier 1: 85]
  • Safety: Top Safety Pick+ [IIHS Tier 1: 95]
  • 5-year TCO: $38,000 [Edmunds Tier 2: 80]
  • Cargo: 16.7 cu ft [Manufacturer spec Tier 1: 70]
  • Tech: Good [Reviews Tier 3: 75]

Weighted Score: 86

Toyota Camry:

  • Reliability: 5/5 [Consumer Reports Tier 1: 95]
  • Fuel economy: 31 MPG [EPA Tier 1: 83]
  • Safety: Top Safety Pick [IIHS Tier 1: 90]
  • 5-year TCO: $39,500 [Edmunds Tier 2: 78]
  • Cargo: 15.1 cu ft [Manufacturer Tier 1: 65]
  • Tech: Good [Reviews Tier 3: 75]

Weighted Score: 85

Mazda6:

  • Reliability: 4/5 [Consumer Reports Tier 1: 85]
  • Fuel economy: 29 MPG [EPA Tier 1: 78]
  • Safety: Top Safety Pick [IIHS Tier 1: 90]
  • 5-year TCO: $37,000 [Edmunds Tier 2: 82]
  • Cargo: 14.7 cu ft [Manufacturer Tier 1: 63]
  • Tech: Excellent [Reviews Tier 3: 85]

Weighted Score: 83


Step 3: Decision becomes clear

"For my priorities (reliability and safety most important), Honda Accord (86) scores highest."

"But if I weighted TCO higher, Mazda6's lower cost might make it better choice."

"All three are objectively good cars (scores 83-86), so I can't really go wrong."

Decision made with confidence based on evidence, not marketing or gut feeling.


Why This Beats "Both Sides" Journalism

The False Balance Problem

Current journalism standard: Present both sides

Climate change article:

  • "Scientists say climate change is real [5 paragraphs]"
  • "Skeptics disagree [5 paragraphs]"
  • "Truth is probably somewhere in the middle"

What this implies: 50/50 split, equal evidence on both sides.

Actual reality:

  • Scientists: 10,000+ peer-reviewed studies [Tier 1]
  • Skeptics: 50 blog posts and industry-funded papers [Tier 4]

Evidence ratio: 10,000 Tier 1 vs. 50 Tier 4 = not equal


With Evidence Tiers, Asymmetry Becomes Visible

Same article with tiers:

"Climate change scientific consensus:

  • Supporting: 10,000 peer-reviewed studies [Tier 1, avg score: 90]
  • Contradicting: 50 sources [mostly Tier 4, avg score: 32]
  • Evidence asymmetry: 90 vs. 32 (strong scientific consensus)"

No false balance. Quality difference is explicit and measurable.


Vaccine Safety Example

Without tiers: "Some studies show vaccines are safe. Other sources raise concerns. Parents should research and decide."

Implication: Equal evidence on both sides, it's just opinion.


With tiers:

"Vaccine safety evidence:

  • Supporting: 500+ peer-reviewed studies, decades of data [Tier 1, avg: 92]
  • Contradicting: Wakefield study (retracted), blog posts, testimonials [Tier 4, avg: 18]
  • Evidence asymmetry: 92 vs. 18 (overwhelming safety consensus)"

No ambiguity. Quality difference explicit.


Integration with Other Components

Connection to Evidence Linking (Part 2)

Evidence tier determines base score in linking formula:

Argument Score = (Evidence Quality × Logical Validity × Linkage Strength)
                  ^
                  Evidence tier determines this component

When evidence tier changes:

  • Tier 1 study gets retracted → drops to Tier 0
  • All arguments citing it automatically recalculate
  • Entire chain updates based on new tier

Tiers enable automatic updating.


Connection to Fallacy Detection (Part 3)

Cherry-picking detection uses tiers:

System auto-flags:

  • "This argument cites 5 Tier 4 sources"
  • "But 10 Tier 1 sources exist contradicting"
  • "Possible cherry-picking detected"

Tiers make evidence asymmetry visible.


Connection to Similarity Grouping (Part 3)

Objective criteria enable consolidation:

If you don't define "best truck" criteria:

  • Every person measures differently
  • No way to group similar claims
  • Endless fragmented debates

If you DO define criteria:

  • All "best truck" claims evaluated against same standards
  • Can consolidate to One Page Per Topic
  • Cumulative analysis possible

Criteria enable grouping.


Why This Prevents Gaming

Gaming Attempt 1: Publish in Fake Journal

Strategy: Create "peer-reviewed" study in predatory journal, claim Tier 1 status.

Why it fails:

  • Journal vetting process identifies predatory journals
  • Community flagging: "This journal has no editorial board"
  • Pattern detection: Multiple fake papers from same source
  • Auto-downgrade: Tier 1 → Tier 4
  • Contribution to argument drops from 90 to 30

Gaming Attempt 2: Cite Volume of Weak Sources

Strategy: Cite 100 blogs supporting position.

Why it fails:

100 Tier 4 blogs (avg 30 each) = average 30
vs.
1 Tier 1 study (90) = 90

One quality source beats 100 weak sources.

Volume doesn't overcome tier difference.


Gaming Attempt 3: Misrepresent Source Tier

Strategy: Cite blog as "research shows" implying Tier 1.

Why it fails:

  • Users can check source
  • Community can flag misrepresentation
  • System tracks source URLs
  • Tier automatically assigned based on source type
  • Misrepresentation = credibility loss for submitter

Real-World Application: COVID-19 Treatment Debates

Hydroxychloroquine (Already Covered)

Early hype: Tier 4 sources Real evidence: Tier 1 studies showed no benefit Tier system makes outcome clear


Ivermectin

Claim: "Ivermectin treats COVID-19"

Evidence (March 2021):

Supporting:

  • Small studies from Egypt, Iran [Tier 2: 70]
  • Doctor testimonials [Tier 4: 25]
  • Social media success stories [Tier 4: 20]
  • Advocacy group claims [Tier 4: 28]

Average supporting: 36

Contradicting:

  • Large RCTs showing no benefit [Tier 1: 90]
  • Cochrane review: No evidence [Tier 1: 92]
  • FDA warning against use [Tier 1: 88]
  • NIH guidelines: Not recommended [Tier 1: 90]

Average contradicting: 90

With tiers explicit:

Claim Score: 20 (weak anecdotal support vs. strong systematic evidence)
Conclusion: Does not work for COVID-19

Without tiers: "Experts disagree" (false equivalence between blog posts and RCTs)


Vitamin D

Claim: "Vitamin D reduces COVID-19 severity"

Evidence:

Supporting:

  • Observational studies showing correlation [Tier 2: 72]
  • Plausible mechanism (immune function) [Tier 2: 70]
  • Low risk intervention [Tier 2: 75]

Average supporting: 72

Contradicting:

  • RCTs showing no benefit [Tier 1: 88]
  • Other RCTs showing modest benefit [Tier 1: 85]

Average contradicting: 86

With tiers explicit:

Claim Score: 55 (moderate observational support, mixed RCT evidence)
Conclusion: Unclear, may have modest benefit, low risk to try

Nuanced conclusion possible because tier difference less extreme.


The Bottom Line: Quality Must Be Measurable

Current debate systems:

  • All evidence treated equally
  • Blog post = peer-reviewed study
  • Volume beats quality
  • "Experts disagree" even when they don't
  • No way to measure evidence quality
  • No objective criteria for "best"

With evidence tiers + objective criteria:

  • Quality difference explicit (Tier 1 vs. Tier 4)
  • Strong evidence beats weak evidence regardless of volume
  • Evidence asymmetry visible
  • Consensus vs. outliers clear
  • Objective criteria enable meaningful comparison
  • "Best" becomes measurable, not just opinion

The transformation: From subjective shouting match to objective measurement.


Continue Reading

Comments (0)

You don't have permission to comment on this page.