There is a story about the famous French mathematician Poincaré, where he kept track of the weight of the loaves of bread that he bought for a year from one single baker. At the end of the year he complained to the police that the baker has been cheating people selling loaves with a mean weight of 950g instead of 1000g. The baker was issued a warning & let off.
Poincaré continued weighing the loaves that he bought from the baker through the next year at the end which he again complained. This time even though the mean weight of the loaves was 1000g, he pointed out that the baker was still continuing to bake loaves with lower weights, but handing out the heavier ones to Poincaré. The story though made up makes for an interesting exercise for a stats intro class.
The crux of the solution, as shown by others, is to model the baked loaves as a Normal distribution.
At the end of year 1
The expected mean m=1000g, since the baker had been cheating the mean shows up as 950g. The standard deviation (sd) is assumed to remain unaffected. To know if the drop in mean from the expected value m1=1000 to the actual m2=950 for identical sd, a hypothesis test can be performed to ascertain if the sampled breads are drawn from the same Normal distribution having mean value m1=1000, or from a separate one having mean m2=950. The null hypothesis is H0(m1 = m2) vs. the alternate hypothesis H1(m1 != m2).
|z| = |(m1-m2)/(sd/sqrt(n))| where n is the no of samples.
With an assumption of Poincaré's frequency of buying at 1 loaf/ day (n=365), & sd=50,
z=19.1, much larger than 1% (2.576) level of significance, so the null hypothesis can be rejected. Poincaré was right in calling out the baker's cheating!
From the original story we are only certain about the values of m1=1000 & m2=950. Playing around with the other values, for instance changing n=52 (1 loaf/ week), or sd to 100g (novice baker, more variance), could yield other z values & different decisions about the null hypothesis.
At the end of year 2
What Poincaré observes is a different sort of curve with a mean=1000g, but that is far narrower than Normal. A few quick checks make him realize that he's looking at an Extreme Value Distribution (EVD) or the Gumbell distribution. Using data from his set of bread samples, he could easily work out the parameters of the EVD (mean, median, shape β, location μ, etc.). Evident from the EVD curve was the baker's modus-operandi of handing out specially earmarked heavier loaves (heaviest ones would yield narrowest curve) to Poincaré.
Poincaré continued weighing the loaves that he bought from the baker through the next year at the end which he again complained. This time even though the mean weight of the loaves was 1000g, he pointed out that the baker was still continuing to bake loaves with lower weights, but handing out the heavier ones to Poincaré. The story though made up makes for an interesting exercise for a stats intro class.
The crux of the solution, as shown by others, is to model the baked loaves as a Normal distribution.
At the end of year 1
The expected mean m=1000g, since the baker had been cheating the mean shows up as 950g. The standard deviation (sd) is assumed to remain unaffected. To know if the drop in mean from the expected value m1=1000 to the actual m2=950 for identical sd, a hypothesis test can be performed to ascertain if the sampled breads are drawn from the same Normal distribution having mean value m1=1000, or from a separate one having mean m2=950. The null hypothesis is H0(m1 = m2) vs. the alternate hypothesis H1(m1 != m2).
|z| = |(m1-m2)/(sd/sqrt(n))| where n is the no of samples.
With an assumption of Poincaré's frequency of buying at 1 loaf/ day (n=365), & sd=50,
z=19.1, much larger than 1% (2.576) level of significance, so the null hypothesis can be rejected. Poincaré was right in calling out the baker's cheating!
From the original story we are only certain about the values of m1=1000 & m2=950. Playing around with the other values, for instance changing n=52 (1 loaf/ week), or sd to 100g (novice baker, more variance), could yield other z values & different decisions about the null hypothesis.
At the end of year 2
What Poincaré observes is a different sort of curve with a mean=1000g, but that is far narrower than Normal. A few quick checks make him realize that he's looking at an Extreme Value Distribution (EVD) or the Gumbell distribution. Using data from his set of bread samples, he could easily work out the parameters of the EVD (mean, median, shape β, location μ, etc.). Evident from the EVD curve was the baker's modus-operandi of handing out specially earmarked heavier loaves (heaviest ones would yield narrowest curve) to Poincaré.
No comments:
Post a Comment