Free Trial

Safari Books Online is a digital library providing on-demand subscription access to thousands of learning resources.


  • Create BookmarkCreate Bookmark
  • Create Note or TagCreate Note or Tag
  • DownloadDownload
  • PrintPrint

Application

Now that you have completed all of the activities in this chapter, use the techniques you've learned to respond to these questions.

  1. Scenario: We'll continue our analysis of the variation in life expectancy at birth. We'll begin with the subset of the Life Expectancy file, focusing initially on the 2010 data.

    1. When we first constructed the Life Exp histogram, we described it as single-peaked and left-skewed. Use the hand tool to increase and reduce the number of bars. Adjust the number of bars so that a second peak appears. Describe what you did, and where the peaks are located.

    2. Rescale the axes of the same histogram and see if you can emphasize the two peaks even more (in other words, have them separated distinctly). Describe what you did to make these peaks more distinct and noticeable.

    3. Based on what you've seen in these exercises, why is it a good idea to think critically about an analyst's choice of scale in a reported graph?

    4. Using the lasso tool, highlight the outliers in the box plot for LifeExp. Which continent or continents are home to the seven countries with the shortest life expectancies in the world? What might account for this?

  2. Scenario: Now let's look at the distribution of life expectancy 25 years before 2010. Return to the original Life Expectancy data table, and choose the subset of observations from 1985.

    1. Use the Distribution platform to summarize Region and LifeExp for this subset. In a few sentences, describe the shape, center, and spread of LifeExp in 1985.

    2. Compare the five-number summaries for life expectancy in 1985 and in 2010. Comment on what you find.

    3. Compare the standard deviations for life expectancy in 1985 and 2010. Comment on what you find.

    4. You'll recall that in 2010, the mean life expectancy was shorter than the median, consistent with the left-skewed shape. How do the mean and median compare in the 1985 data?

  3. Scenario: The data file called Sleeping Animals contains data about the size, sleep habits, lifespan, and other attributes of different mammalian species.

    1. Construct box plots for Lifespan and Sleep. For each plot, explain what the landmarks on each plot tell you about the distribution of each variable. Comment on noteworthy features of the plot.

    2. Which distribution is more symmetric? Explain specifically how the graphs and descriptive statistics helped you come to a conclusion.

    3. According to the data table, "Man" has a maximum life span of 100 years. Approximately what percent of mammals in the data set live less than 100 years?

    4. Sleep hours are divided into "dreaming" and "non-dreaming" sleep. How do the distributions of these types of sleep compare?

    5. Select the species that tend to get the most total sleep. Comment on how those species compare to the other species in terms of their predation, exposure, and overall danger indexes.

    6. Now use the Distribution platform to analyze the body weights of these mammals. What's different about this distribution in comparison to the other continuous variables that you've analyzed thus far?

    7. Select those mammals that sleep in the most exposed locations. How do their body weights tend to compare to the other mammals? What might explain this comparison?

  4. Scenario: When financial analysts want a benchmark for the performance of individual equities (stocks), they often rely on a "broad market index" such as the S&P 500 in the U.S. There are many such indexes in stock markets around the world. One major index on the London Stock Exchange is the FTSE 100, and this set of questions refers to data about the monthly values of the FTSE 100 from January 1, 2003 through December 1, 2007. In other words, our data table called FTSE100 reflects monthly market activity for a five-year period.

    1. The variable called Volume is the total number of shares traded per month (in millions of shares). Describe the distribution of this variable.

    2. The variable called Change% is the monthly change, expressed as a percentage, in the closing value of the index. When Change% is positive, the index increased that month; when the variable is negative, the index decreased that month. Describe the distribution of this variable.

    3. Use the Quantiles to determine approximately how often the FTSE declines. (Hint: What percentile is 0?)

    4. Use the Chart command to make a Line Graph (you'll need to find your own way to make a line graph rather than a bar chart) that shows closing prices over time. Then use the Distribution platform to create a histogram of closing prices. Each graph summarizes the Close variable, but each graph presents a different view of the data. Comment on the comparison of the two graphs.

    5. Now make a line graph of the monthly percentage changes over time. How would you describe the pattern in this graph?

  5. Scenario: Anyone traveling by air understands that there is always some chance of a flight delay. In the United States, the Department of Transportation monitors the arrival and departure time of every flight. The data table Airline Delays contains a sample of nearly 15,000 flights for two airlines destined for four busy airports.

    1. The variable called Dest is the airport code for the flight destination. Describe the distribution of this variable.

    2. The variable called Delay is the actual arrival delay, measured in minutes. A positive value indicates that the flight was late and a negative value indicates that the flight arrived early. Describe the distribution of this variable.

    3. Notice that the distribution of Delay is skewed. Based on your experience as a traveler, why should we have anticipated that this variable would have a skewed distribution?

    4. Use the Quantiles to determine approximately how often flights in this sample were delayed. (Hint: What percentile is 0?)

  6. Scenario: For many years, it has been understood that tobacco use leads to health problems related to the heart and lungs. The Tobacco Use data table contains recent data about the prevalence of tobacco use and of certain diseases around the world.

    1. Use an appropriate technique from this chapter to summarize and describe the variation in tobacco usage (TobaccoUse) around the world.

    2. Use an appropriate technique from this chapter to summarize and describe the variation in cancer mortality (CancerMort) around the world.

    3. Use an appropriate technique from this chapter to summarize and describe the variation in cardiovascular mortality (CVMort) around the world.

    4. You've now examined three distributions. Comment on the similarities and differences in the shapes of these three distributions.

    5. Summarize the distribution of the region variable and comment on what you find.

    6. We have two columns containing the percentage of males and females around the world who use tobacco. Create a summary for each of these variables and explain how tobacco use among men compares to that among women.


  

You are currently reading a PREVIEW of this book.

                                                                                        

Get instant access to over
$1 million worth of books and videos.

  

Start a Free Trial