Home > Page Design > Algorithms > Logical Validity Score
The Logical Validity Score
The Wall-of-Words Problem
Every platform that hosts opinions suffers from the same structural flaw.
Amazon reviews. Letterboxd comments. YouTube reactions. Podcast discussions. Article comment sections. They all allow people to express arguments, but they treat those arguments as monolithic blocks. Upvote or downvote. Helpful or not. Thumbs up or thumbs down.
A thoughtful critic writes five paragraphs explaining why something fails. Buried in that wall are three distinct critiques: a factual error, a logical flaw, and a contested value judgment. Those three claims deserve separate evaluation. Maybe the first is devastating, the second is weak, and the third is genuinely contested. But the platform treats the whole review as one unit.
The arguments never get decomposed. When hundreds of critics independently raise the same objection, those remain hundreds of separate walls of text. Nobody consolidates them into one crisp statement with branching pros and cons. When someone raises a genuinely novel objection, it doesn't stand out from the redundant pile.
The Logical Validity Score (LVS) solves this by decomposing arguments into atomic claims, each with its own branching tree of supporting and opposing sub-arguments. Similar points merge. Redundancy disappears. Every critique links to what it's critiquing. What emerges is a structured argument network: the most comprehensive, deduplicated map of why something's reasoning succeeds or fails.
Note: The LVS measures logical validity, not overall quality. A film can be beautifully shot while making flawed arguments. A book can be a slog to read while being logically airtight. A podcast can be entertaining while spreading misinformation. For craftsmanship, enjoyment, and other dimensions, see the companion Quality Scores for each medium.
Why Branching Sub-Arguments Matter
Here's where most evaluation systems fundamentally fail: they treat claims as self-contained, when actually every claim rests on deeper contested assumptions.
Consider a critic who writes: "This story is depressing. It doesn't offer hope or a satisfying resolution. Two stars."
That sounds like a straightforward opinion. But the critic is making a claim ("lack of hope is a flaw") that depends on winning several deeper arguments they never explicitly make:
Surface claim: "This is bad because it's hopeless"
Sub-argument 1: "Stories should offer hope"
- Pro: Narrative evolved to model how humans overcome adversity. From prehistoric campfires to modern cinema, storytelling exists to show people triumphing over obstacles. Psychologists argue this is the fundamental utility of the medium. Wanting hope isn't naive Pollyanna thinking; it's what stories are for.
- Con: That's a narrow view of narrative art. Tragedy is an ancient and respected genre. Oedipus doesn't offer hope. Neither does 1984, Chinatown, or Blood Meridian. "Art should comfort" is itself a contested aesthetic position, not an objective standard.
Sub-argument 2: "Realism that lacks hope is a flaw, not a feature"
- Pro: Art that offers no path forward breeds despair and passivity. Relentless bleakness isn't "honest"โit's one selective framing of reality, just as distorted as forced optimism.
- Con: Life doesn't always have happy endings. Demanding that stories resolve neatly is demanding they lie. Some audiences specifically seek unflinching portrayals because sugar-coated narratives feel false to their experience.
Sub-argument 3: "My emotional response (depression) indicates a flaw in the work"
- Pro: Audience response matters. If a work leaves people feeling worse without compensating insight, it has failed at a basic level.
- Con: Discomfort isn't failure. The best art often disturbs. Kafka, Dostoevsky, Cormac McCarthy, and Lars von Trier regularly leave audiences unsettled. That's the point.
Inherited Validity
Here's the critical insight: the critic's two-star rating is only as valid as these sub-arguments.
If "stories should offer hope" scores 55% in its own right (strong evolutionary psychology support, but legitimate counter-examples from tragic art), then the surface claim "this is bad because it's hopeless" inherits that contested foundation.
The critic might be completely right that the work is hopeless. But whether "hopeless" equals "bad" is a separate question requiring its own analysis. The Idea Stock Exchange separates these:
- Factual component: "This work lacks hopeful resolution" (verifiable against the content)
- Evaluative component: "Lack of hope is a flaw" (links to the existing debate node on this question)
The factual component gets evaluated on accuracy. The evaluative component inherits its score from the linked debate. Critics can't smuggle contested assumptions as if they were obvious truths.
This cascading structure means:
- Every critique is only as strong as the assumptions it rests on
- Common debates ("should art comfort or challenge?") don't get re-litigated in every review; they get consolidated into canonical nodes
- When underlying debates shift (new evidence about narrative psychology, for instance), all dependent critiques automatically update
- Readers can trace any evaluation back through the reasoning that produced it
The 6 Logic Battlegrounds
Every claim becomes a belief node stress-tested in six arenas. These apply across all media types, though the specific application varies by medium.
1. Fallacy Decomposition
When someone flags a logical flaw, that accusation becomes its own node requiring evidence.
Example: A documentary critic claims "The filmmaker commits post hoc fallacy by linking social media to teen depression."
- Supporting the accusation:
- Correlation timeline doesn't establish causation
- Multiple confounding variables changed simultaneously
- Opposing the accusation:
- The film cites longitudinal studies controlling for confounds
- Dose-response relationship strengthens causal inference
The accusation's score depends on which branch survives scrutiny. No more "gotcha" accusations that go unchallenged. No more legitimate critiques that get buried.
Impact: Proven fallacies reduce the original claim's validity. Failed accusations get flagged so future readers don't waste time on them.
2. Contradiction Mapping
Internal inconsistencies get explicit evaluation: Is this a genuine contradiction or a nuanced position the critic missed?
Example: "The author praises free markets in Chapter 1 but demands regulation in Chapter 8."
- Arguments it's a real contradiction:
- Both chapters use absolute language that precludes nuance
- No explicit framework distinguishing when each applies
- Arguments it's nuanced consistency:
- Chapter 1 addresses commodity markets; Chapter 8 addresses healthcare
- Position matches mainstream economics on market failures
Impact: Confirmed contradictions reduce coherence scores. Alleged contradictions that turn out to be nuanced positions actually boost the scoreโthe work survived a stress test.
3. Evidence Evaluation Trees
We decompose the link between cited evidence and stated conclusions.
Example: A book claims "10,000 hours of practice produces expertise" and cites Ericsson (1993).
- Arguments the evidence supports the claim:
- Study found strong correlation between practice hours and skill
- Replicated across multiple contexts
- Arguments the evidence fails to support the claim:
- Ericsson himself says the author misrepresented his findings
- Meta-analyses show practice explains only 26% of variance
- 10,000 was an average, not a threshold
Each sub-argument links to its source. You can trace critiques back to the original research.
Impact: Claims backed by evidence that survives scrutiny gain validity. Claims where cited evidence doesn't actually support the conclusion lose validity proportionally. A debunked foundational study triggers cascading score drops across everything that depends on it.
4. Metaphor Analysis
Creators use analogies to make complex ideas accessible. These can illuminate or mislead.
Example: A pundit argues "AI development is like building a nuclear bomb."
- Points of illumination:
- Both involve potentially irreversible large-scale consequences
- Both require international coordination to manage
- Points of distortion:
- AI development is iterative and correctable; detonation isn't
- Comparing tool to weapon prejudices policy conversation
- "Nuclear" triggers emotional response that bypasses rational evaluation
Impact: Scores are weighted by how much argumentative work the metaphor does. A metaphor used for illustration is judged lightly. A "load-bearing metaphor" that carries the central argument is judged heavily. If it fails, the argument it supported fails with it.
5. Prediction Tracking
Testable predictions get tracked against real-world outcomes.
Example: A 2015 futurist book predicted "VR will replace traditional screens by 2020."
- Arguments the prediction succeeded:
- VR headset sales grew 30% annually
- Major tech companies invested billions
- Arguments the prediction failed:
- "Replace" implies VR became primary interface; it didn't
- Consumer adoption plateaued below 5% of households
- Traditional screen sales continued growing
Impact: Failed predictions retroactively lower scores. Accurate foresight gains credibility over time. Partial credit for directionally correct predictions that missed on magnitude or timing.
6. Influence vs. Validity Mapping
We track the gap between how widely an idea spreads and how well it's reasoned.
Metrics tracked:
- Academic citations
- Policy influence
- Social shares and viral reach
- Sales and audience size
Impact: Pairing influence with validity reveals which weak arguments spread virally and which strong arguments languish. This exposes where marketing beats reasoning, where tribal loyalty trumps evidence, and where society is collectively failing at critical thinking.
Scoring Dynamics
Recursive Validity
Claims inherit validity from their sub-arguments. If a foundational study gets debunked, every argument that relies on it sees its score drop automatically. Strong foundations strengthen everything built on them; weak links weaken everything downstream.
Consolidation
When hundreds of critics make the same point, those merge into one well-stated critique incorporating the best evidence from all versions. The consolidated node includes the strongest supporting arguments and the strongest counter-arguments. Novel objections become visible instead of buried in redundancy.
Credibility Weighting
Users earn Reasoning Reputation based on:
- Survival rate of their arguments against counter-arguments
- Accuracy in identifying genuine fallacies versus false accusations
- Quality of evidence provided
- How well their consolidations hold up
Top contributors have their arguments weighted more heavily in initial scoring, but all arguments must survive on their merits.
Time-Decay and Updates
Scores evolve as evidence emerges. A claim that looked solid in 2015 might face new counter-evidence by 2025. Predictions get evaluated against outcomes. Works don't coast on past reputation.
Human + AI Synergy
Neither humans nor AI alone solves the wall-of-words problem optimally.
| Role | Strength | Limitation |
| AI |
Pattern detection at scale; flags redundant arguments for consolidation; identifies where branching depth is insufficient |
Misses context, nuance, and value-laden disputes |
| Crowd |
Contextual judgment; lived experience with contested values; determines where debates actually stand |
Inconsistent quality; tribal biases; often stops at surface claims |
| Experts |
Deep domain knowledge; verifies technical claims; spots where arguments depend on resolved debates in other fields |
Limited bandwidth; own blind spots |
Together, they produce better decomposition than any could individually.
Validity Score vs. Quality Score
The Logical Validity Score measures one thing: does the reasoning hold up under decomposition?
But creative works have other virtues. A film can be beautifully shot while making flawed arguments. A book can be a slog to read while being logically airtight. A podcast can be entertaining while spreading misinformation.
Remember the "hopeless story" example: even if "lack of hope is a flaw" scores only 55% in the evaluative debate, an audience member might still weight enjoyment highly. "Yes, I know hopelessness isn't objectively a flaw. I still don't enjoy hopeless stories. That's a preference, not an error."
Each medium has a companion Quality Score addressing non-logical dimensions: craftsmanship, readability/watchability, challenge level, emotional experience, and whether the work delivers what it promises.
A complete evaluation shows both scores. You might choose a 70% validity / 95% quality work for enjoyment, and a 95% validity / 60% quality work for research. Both choices are legitimate once you see the tradeoffs.
Medium-Specific Applications
The LVS applies across all media that make claims, but each medium has unique considerations:
Join the Reasoning Network
The wall-of-words problem isn't going away on its own. The Logical Validity Score is our answer: decompose claims, branch into sub-arguments, consolidate redundancy, link to evidence, update as knowledge evolves.
Ways to participate:
- ๐ Submit content for decomposition
- ๐ณ Add branching sub-arguments where depth is insufficient
- โ๏ธ Evaluate arguments at every level
- ๐ Link claims to evidence and related debates
- ๐ Propose better consolidations of redundant points
Help us refine the system. | View the Algorithm Code
Related Scores and Algorithms
Argument scores from sub-argument scores
Evidence Scores
Importance Score
Linkage Scores
Truth Scores
Objective Criteria Scores
Book Logical Validity Score
Logical Validity and Truth Scores
The Truth Score integrates two complementary aspects:
- Logical Validity: The quality of reasoning.
- Level of Verification: Empirical validation.
A baseline multiplier governs the interplay between these components. This multiplier dynamically shifts weight between observation and argumentation, reflecting their contextual relevance. Arguments evaluating the importance of each type drive the multiplierโs adjustment, ensuring the scoring adapts to the specific nature of the belief being evaluated. This dynamic mechanism maintains fairness and adaptability across diverse belief systems.
Logical Soundness Framework
We will build a robust framework to assign each belief a "logical validity" score, among other metrics. This score will dynamically reflect the logical soundness of the belief based on a structured evaluation of its arguments and reasoning.
Key Components of the Framework
-
Logical Fallacy Identification and Templates:
-
Pro/Con Evaluation of Accusations:
-
Structured Argumentation:
Dynamic Scoring System
-
Logical Validity Score:
-
Weakening Linkage Strength:
-
Real-Time Updates:
Benefits of the Framework
-
Encourages Logical Rigor:
-
Transparency:
-
Arguments are visually organized into premises, conclusions, and linkages, making the reasoning process transparent and easy to follow.
-
User Education:
-
Enhanced Debate Quality:
Comments (0)
You don't have permission to comment on this page.