Free Trial

Safari Books Online is a digital library providing on-demand subscription access to thousands of learning resources.


Share this Page URL
Help

Do You Need to Test at Least 30 Users? > On One Hand - Pg. 246

246 CHAPTER 9 Six Enduring Controversies in Measurement and Statistics MEANS WORK BETTER THAN MEDIANS WHEN ANALYZING ORDINAL MULTIPOINT DATA How Acting in Accordance with Stevens' Levels of Measurement Nearly Tripped Me Up From the files of Jim Lewis In the late 1980s I was involved in a high-profile project at IBM in which we were comparing performance and satisfaction across a set of common tasks for three competitive office application suites (Lewis et al., 1990). Based on what I had learned in my college statistics classes about Stevens' levels of measurement, I pronounced that the multipoint rating-scale data we were dealing with did not meet the assumptions required to take the mean of the data for the rating scales because they were ordinal rather than interval or ratio, so we should present their central tendencies using medians rather than means. I also advised against the use of t-tests for individual comparisons of the rating-scale results, promoting instead its nonparametric analog, the Mann-Whitney U-test. The folks who started running the statistics and putting the presentation together (which would have been given to a group that included high-level IBM executives) called me in a panic after they started following my advice. In the analyses, there were cases where the medians were identical, but the U-test detected a statistically significant difference. It turns out that the U-test is sensitive not only to central tendency, but also to the shape of the distribution, and in these cases the distributions had opposite skew but overlapping medians. As a follow- up, I systematically investigated the relationship among mean and median differences for multipoint scales and the observed significance levels of t- and U-tests conducted on the same data, all taken from our fairly large- scale usability test. It turned out that the mean difference correlated more than the median difference with the observed significance levels (both parametric and nonparametric) for discrete multipoint scale data. Consequently, I no longer promote the concepts of Steven's levels of measurement with regard to permissible statistical analysis, although I believe this distinction is critical when interpreting and applying results. It appears