Free Trial

Safari Books Online is a digital library providing on-demand subscription access to thousands of learning resources.

Share this Page URL

Chapter 31. Descriptive Statistics in SQ... > 31.4 Variance and Standard Deviation - Pg. 572

572 Chapter 31 dESCrIPTIvE STATISTICS IN SQl are separate function calls. The duplicate rows can be in any order. There is an assumption that we would get: X 1 2 2 3 3 3 a 6 5 4 3 2 1 b 1 2 3 4 5 6 (a ­ b) 5 3 1 > candidate row x=2 ­1 > candidate row x=3 ­3 ­3 The assumption comes from the idea that the ascending and descending ROW_NUMBER() values could be computed at the same time using the formula: ROW_NUMBER() OVER(ORDER BY x ASC) = (COUNT(*) - ROW_NUMBER() OVER(ORDER BY x DESC) + 1) This is a nice optimizer trick, but it is not required. This is why we need to add a key to the ORDER BY list to assure that the ascending list is the physical reverse order of the descending list. 31 .4 Variance and Standard Deviation The standard deviation is a measure of how far away from the average the values in a normally distributed population are. It is hard to calculate in SQL, because it involves a square root and standard SQL has only the basic four arithmetic operators. Many vendors will allow you to use other math functions, but in all fairness, most SQL databases are in commercial appli- cations and have little or no need for engineering or statistical calculations. The usual trick is to load the raw data into an appro- priate host language, such as FORTRAN, and do the work there. The variance is defined as the standard deviation squared, so we can avoid taking a square root and keep the calculations in pure SQL. The queries look like this: CREATE TABLE Samples (x REAL NOT NULL); INSERT INTO Samples (x) VALUES (64.0), (48.0), (55.0), (68.0), (72.0), (59.0), (57.0), (61.0), (63.0), (60.0), (60.0), (43.0), (67.0), (70.0), (65.0), (55.0), (56.0), (64.0), (61.0), (60.0);