Every metric is an estimate; confidence intervals
Every metric is an estimate from a finite sample, and real decisions hinge on whether a difference is signal or noise. The first habit is putting an error bar on a measured rate. For a proportion p from n items, the rough 95% confidence interval is
p +/- 1.96 * sqrt( p * (1 - p) / n )
The headline to internalise is that the margin of error shrinks only as 1/sqrt(n): about +/-10% at n = 100, +/-5% at n = 400, +/-3% at n = 1000. So a metric that "moved from 92% to 88%" on 50 items each is almost certainly noise, not a regression.
The normal-approximation (Wald) interval is a direct calculation: estimate the proportion, its standard error, then go z standard errors either side.
phat = k / n
se = (phat * (1 - phat) / n) ** 0.5
low, high = phat - z * se, phat + z * se
This is fine for moderate p and n; for very small samples or proportions near 0 or 1, use a Wilson or Clopper-Pearson interval instead, which behave better at the extremes.
comparisons”.*
Related cards
Tasks
Card Info
- Topic: Python for Data Science
- Difficulty: Advanced
- Completed: 0 users