Free Trial

Safari Books Online is a digital library providing on-demand subscription access to thousands of learning resources.

Share this Page URL

Significance Testing and p-Values > Significance Testing and p-Values - Pg. 284

284 Appendix: A Crash Course in Fundamental Statistical Concepts (1 minus the level of confidence (1 - 0.95 = 0.05)) and the degrees of freedom (sample size minus 1 for a one-sample t), for which t = 2.14. Therefore, a more accurate confidence interval would be 2.14 standard errors, which generates the slightly wider margin of error of 13.3 (6.2 × 2.14). This would provide us with a 95% confidence interval around the sample mean of 80 ranging from 66.7 to 93.3. Confidence intervals based on t-scores will always be larger than those based on z-scores (reflecting the slightly higher variability associated with small sample estimates), but will be more likely to contain the population mean at the specified level of confidence. Chapter 3 provides more detail on computing confidence intervals for a variety of data. SIGNIFICANCE TESTING AND p-VALUES The concept of the number of standard errors that sample means differ from population means applies to both confidence intervals and significance tests. If we want to know if a new design actu- ally improves task-completion times but can't measure everyone, we need to estimate the difference from sample data. Sampling error then plays a role in our decision. For example, Figure A.12 shows the times from 14 users who attempted to add a contact in a CRM application. The average sample completion time is 33 seconds with a standard deviation of 22 seconds. A new version of the data entry screen was developed and a different set of 13 users attempted the same task (see Figure A.13). This time the mean completion time was 18 seconds with a stan- dard deviation of 10 seconds. So, our best estimate is that the new version is 15 seconds faster than the older version. A nat- ural question to ask is whether the difference is statistically significant. That is, it could be that there is really no difference in task-completion times between versions. It could be that our sam- pling error from our relatively modest sample sizes is just leading us to believe there is a difference. We could just be taking two random samples from the same population with a mean of 26 seconds. How can we be sure and convince others that at this sample size we can be confident the difference isn't due to chance alone? 10 20 30 40 50 60 70 80 FIGURE A.12 Task-completion times from 14 users. 10 20 30 40 50 60 70 80 FIGURE A.13 Task-completion times from 13 other users.