Safari Books Online is a digital library providing on-demand subscription access to thousands of learning resources.
Scatter plots are used to display the relationship between two continuous variables. In a scatter plot, each observation in a data set is represented by a point. Often, a scatter plot will also have a line showing the predicted values based on some statistical model. This is easy to do with R and ggplot2, and can help to make sense of data when the trends aren’t immediately obvious just by looking at it.
With large data sets, it can be problematic to plot every single observation because the points will be overplotted, obscuring one another. When this happens, you’ll probably want to summarize the data before displaying it. We’ll also see how to do that in this chapter.