Free Trial

Safari Books Online is a digital library providing on-demand subscription access to thousands of learning resources.

Share this Page URL

9 Six Enduring Controversies in Measurem... > What If You Need to Run More Than On... - Pg. 256

256 CHAPTER 9 Six Enduring Controversies in Measurement and Statistics As true as this might be for psychological research, it is even truer for usability research intended to affect the design of products or systems. If you run a test with a composite measure and find a significant difference between products, then what do you really know? You will have to follow up that test with separate tests of the component metrics, so one could reasonably argue against running the test with the composite metric, instead starting with the tests of the component metrics. Our Recommendation Both of us, at various times in our careers, have worked on methods for combining different usabil- ity metrics into single scores (Lewis, 1991; Sauro and Kindlund, 2005)--clearly, we are on the side of combining usability metrics when it is appropriate, but using a method that produces an inter- pretable composite such as SUM rather than MANOVA. There are situations in the real world in which practitioners must choose only one product from a summative competitive usability test of multiple products and, in so doing, must either rely on a single measurement (a very limiting approach), must try to rationally justify some priority of the dependent measures, or must use a composite score. Composite usability scores can also be useful on executive management dash- boards. Even without an increase in reliability it can still be advantageous to combine the scores for these situations, but the factor analysis of Sauro and Lewis (2009) lends statistical support to the practice of combining component usability metrics into a single score. Any summary score (median, mean, index, or other composite) must lose important information ( just as an abstract does not contain all of the information in a full paper)--it is the price paid for summarizing data. It is certainly not appropriate to rely exclusively on summary data, but it is important to keep in mind that the data that contribute to a summary score remain available as com- ponent scores for any analyses and decisions that require more detailed information (such as provid- ing guidance about how a product or system should change in a subsequent design iteration). You don't lose anything permanently when you combine scores--you just gain an additional view. WHAT IF YOU NEED TO RUN MORE THAN ONE TEST? "In 1972 Maurice Kendall commented on how regrettable it was that during the 1940s mathe- matics had begun to `spoil' statistics. Nowhere is this shift in emphasis from practice, with its room for intuition and pragmatism, to theory and abstraction, more evident than in the area of multiple comparison procedures. The rules for making such comparisons have been discussed ad nauseam and they continue to be discussed" (Cowles, 1989, p. 171). On One Hand When the null hypothesis of no difference is true, you can think of a single test with = 0.05 as the flip of a single coin that has a 95% chance of heads (correctly failing to reject the null hypothesis) and a 5% chance of tails (falsely concluding there is a difference when there really isn't one--a false alarm, a Type I error). These are the probabilities for a single toss of the coin (a single test), but what if you run more than one test? Statisticians sometimes make a distinction between the error rate per comparison (EC) and the error rate per family (EF, or family-wise error rate) (Myers, 1979).