When Approximate Reasoning from Data Beats Classical Deduction

When a rule is learned from real data, should we reject it as soon as we see one exception?

Contributors: Mark Song, Yiyang Sun, Zhaohua Zheng
CSE 5103: Theory of Artificial Intelligence and Machine Learning \(|\) Spring 2026

Summary

This project suggests that rules with limited exceptions should not be rejected, as the classical deduction. Across five experiments, classical deduction is often too brittle for finite, noisy, real-world data. Approximate reasoning retains rules that are still highly reliable, generalize to held-out data, and reveal when context changes the story.

Core idea

A classical rule test asks whether a rule has zero counterexamples in the observed sample. That standard is useful for formal proof, but it becomes a straitjacket when rules are learned from data.

Approximate reasoning considesrs: among the cases where the rule applies, how often is the rule correct?

This shift matters because real datasets usually contain measurement error, sampling noise, rare exceptions, subgroup effects, and temporal drift. In that setting, one exception should not automatically erase an otherwise useful rule.

Key definitions

Term	Meaning
Rule	A statement of the form “if this condition holds, this outcome is likely.”
Classical deduction	A strict rule-evaluation standard that accepts a rule only if the sample contains zero counterexamples.
Approximate reasoning	A rule-evaluation standard that asks how often the rule is correct when its condition appears.
Validity	The share of matching cases where the rule’s conclusion is correct.
Context shift	A case where a subgroup, season, region, or other context makes a rule stronger, weaker, or even reversed.
Abduction	Reasoning backward from an observed outcome to plausible causes or explanations.

Findings at a glance

Area	Headline finding	Why it matters
Synthetic noise test	With a 5% exception rate and 5,000 samples, classical deduction accepted the rule 0/10 times while approximate validity stayed around 94.9	One exception can kill a rule that is still very reliable.
Context sensitivity	When context effects were made strong, rule reversals happened in 100% of runs	Good average rules can fail inside the wrong subgroup.
Abduction	The elimination-based abduction method recovered the planted cause in 40/40 trials	The project is not only scoring rules; it can also recover explanations.
Mushroom data	172 of 757 candidate rules were rejected classically on train, yet still reached at least 85% validity on test.	Real data contains many useful rules that the strict baseline throws away.
AQI data	`winter -> PM2.5` held at about 0.910 test validity, and `West & summer -> Ozone` held at about 0.689, even though both are classically rejected.	Approximate rules remain useful in a noisy, real-world setting.

Main evidence

1. Exact rejection can miss strong rules

In the controlled synthetic setting, a small amount of noise makes classical acceptance collapse. Approximate validity, however, remains close to the true reliability of the rule.

Synthetic validity gap

Figure 1. Once exceptions are present, exact acceptance drops away quickly, while approximate validity remains stable.

The result shows why exact acceptance is poorly matched to noisy data-derived rules. A rule can be false in a small number of cases and still be highly useful overall.

2. Real data confirms the same pattern

The Mushroom case study gives the clearest real-data version of the same result. Many rules are rejected by the classical baseline because they have at least one training counterexample, but they still perform well on held-out examples.

Mushroom rule gap types

Figure 2. Many Mushroom rules fall into the “approximate reasoning helps” category.

One especially memorable rule had 8 training counterexamples but still achieved 100% validity on 124 test cases Under the classical standard, that rule is discarded. Under approximate reasoning, it is recognized as a strong empirical regularity.

3. Context matters, especially in AQI

The AQI case study shows that seasonal and regional rules can still generalize across years, even though exact acceptance is effectively empty in this domain.

AQI proposal rules

Figure 3. Approximate AQI rules remain useful on future-year test data.

For example, winter -> PM2.5 remained strong on test data, with validity around 0.910. The rule is not exact, but it is clearly informative.

The AQI study also shows that a rule’s reliability can change sharply by context.

AQI context shifts

Figure 4. The same broad rule can strengthen in one season and weaken sharply in another.

This is important because average validity can hide subgroup behavior. A rule may look strong overall while weakening, disappearing, or reversing in a specific region, season, or subgroup.

Interpretation

The experiments support three connected conclusions.

Classical deduction is too brittle for noisy empirical data: It treats one counterexample as enough to reject a rule, even when the rule remains highly reliable.
Aapproximate reasoning better matches the structure of real datasets: It evaluates rules by validity, generalization, and uncertainty rather than by exactness alone.
Context-sensitive analysis is necessary: A rule’s average performance is not enough. We also need to ask where the rule holds, where it weakens, and where it reverses.