Free Trial

Safari Books Online is a digital library providing on-demand subscription access to thousands of learning resources.

Share this Page URL

Reconciling the "Magic Number 5" with "E... > Some History: The 1980s - Pg. 160

160 CHAPTER 7 What Sample Sizes Do We Need? Part 2 iteration (n = 7), the total sample size is 14, and the expectation is the target discovery of 90% of problems with p = 0.15 (and even discovery of 77% of problems with p = 0.1). RECONCILING THE "MAGIC NUMBER 5" WITH "EIGHT IS NOT ENOUGH" Some usability practitioners use the "Magic Number 5" as a rule of thumb for sample sizes for for- mative usability tests (Barnum et al., 2003; Nielsen, 2000), believing that this sample size will usually reveal about 85% of the problems available for discovery. Others (Perfetti and Landesman, 2001; Spool and Schroeder, 2001) have argued that "Eight Is Not Enough"; in fact, their experience showed that it could take over 50 participants to achieve this goal. Is there any way to reconcile these apparently opposing points of view? Some History: The 1980s Although strongly associated with Jakob Nielsen (see, for example, Nielsen, 2000), the idea of run- ning formative user studies with small sample iterations goes back much further--to one of the fathers of modern human factors engineering, Alphonse Chapanis. In an award-winning paper for the IEEE Transactions on Professional Communication about developing tutorials for first-time computer users, Al-Awar et al. (1981, p. 34) wrote: Having collected data from a few test subjects--and initially a few are all you need--you are ready for a revision of the text. Revisions may involve nothing more than changing a word or a punctuation mark. On the other hand, they may require the insertion of new examples and the rewriting, or reformatting, of an entire frame. This cycle of test, evaluate, rewrite is repeated as often as is necessary. Any iterative method must include a stopping rule to prevent infinite iterations. In the real world, resource constraints and deadlines often dictate the stopping rule. In the study by Al-Awar et al. (1981), their stopping rule was an iteration in which 95% of participants completed the tutorial without any serious problems. Al-Awar et al. (1981) did not specify their sample sizes, but did refer to collecting data from "a few test subjects." The usual definition of "few" is a number that is greater than one, but indefi- nitely small. When there are two objects of interest, the typical expression is "a couple." When there are six, it's common to refer to "a half dozen." From this, it's reasonable to infer that the per-iteration sample sizes of Al-Awar et al. (1981) were in the range of three to five--at least, not dramatically larger than that. The publication and promotion of this method by Chapanis and his students had an almost immediate influence on product development practices at IBM (Kennedy, 1982; Lewis, 1982) and other companies, notably Xerox (Smith et al., 1982) and Apple (Williams, 1983). Shortly thereafter, John Gould and his associates at the IBM T. J. Watson Research Center began publishing influen- tial papers on usability testing and iterative design (Gould, 1988; Gould and Boies, 1983; Gould et al., 1987; Gould and Lewis, 1984), as did Whiteside et al. (1988) at DEC (Baecker, 2008; Dumas, 2007; Lewis, 2012).