Rough notes on sample size requirement calculations for a given confidence interval for a Binomial Population - having a probability p of success & (1 – p) of failure. The first article of relevance is Binomial Confidence Interval which lists out the different approaches to be taken when dealing with:
On the other side, there are derivatives of the Bayes Success Run theorem such as Acceptance Sampling, Zero Defect Sampling, etc. used to work out statistically valid sampling plans. These approaches are based on a successful run of n tests, in which either zero or a an upper bounded k-failures are seen.
These approaches are used in various industries like healthcare, automotive, military, etc. for performing inspections, checks and certifications of components, parts and devices. The sampling could be single sampling (one sample of size n with confidence c), or double sampling (a first smaller sample n1 with confidences c1 & a second larger sample n2 with confidence c2 to be used if test on sample n1 shows more than c1 failures), and other sequential sampling versions of it. A few rule of thumb approximations have also emerged in practice based on the success run techique:
- Large n (> 15), large p (>0.1) => Normal Approximation
- Large n (> 15), small p (<0.1) => Poisson Approximation
- Small n (< 15), small p (<0.1) => Binomial Table
On the other side, there are derivatives of the Bayes Success Run theorem such as Acceptance Sampling, Zero Defect Sampling, etc. used to work out statistically valid sampling plans. These approaches are based on a successful run of n tests, in which either zero or a an upper bounded k-failures are seen.
These approaches are used in various industries like healthcare, automotive, military, etc. for performing inspections, checks and certifications of components, parts and devices. The sampling could be single sampling (one sample of size n with confidence c), or double sampling (a first smaller sample n1 with confidences c1 & a second larger sample n2 with confidence c2 to be used if test on sample n1 shows more than c1 failures), and other sequential sampling versions of it. A few rule of thumb approximations have also emerged in practice based on the success run techique:
- Rule of 3s: That provides a bound for p=3/n, with a 95% confidence for a given success run of length n, with zero defects.
- Success run sample size (n) using Confidence Interval (C) & Reliability (R = 1 -p), when sampling with replacement (sampled item replaced and maybe selected again) taking a Binomial Distribution:
n = ln(1-C)/ln( R), where ln is the natural log, based on a probability of R^n, for a successful run of length n with zero defects. This can be further extended to the case when a maximum of k defects/ failures is acceptable.
For a small population and when sampling without replacement (sampled item not replaced), the Hypergeometric Distribution is used for sampling n items from a population having size N with D defects. Accordingly, Operating Characteristic (OC) curves for the given problem are prepared and used to get values of C & R for a given n.
Footnote on Distributions:
- Poisson distribution is used to model events within time/ space that are rare (small p) but show up large number of times (large n) & occur independent of the time since last event. Inter arrival times is an iid exponential random variables.
- Poisson confidence interval is derived from Gamma Distribution - which is defined using the two-parameters shape & scale. Exponential, Erlang & Chi-Squared are all special cases of Gamma Distrubtion. Gamma distribution is used in areas such as prediction of wait time, insurance claims, wireless communication signal power fading, age distribution of cancer events, inter-spike intervals, genomics. Gamma is also the conjugate prior of Bayesian statistics & exponential distribution.
- Bayesian Success Run can be derived using the Beta Distribution which is the conjugate prior for Binomial. Beta Distribution is defined via two shape parameters. Beta Distribution applications is found in order statistics (selection of k-th smallest from Uniform distribution), subjective logic, wavelet analysis, project management (PERT), etc.