Safari Books Online is a digital library providing on-demand subscription access to thousands of learning resources.
The user behavior example and the fruit tree were both classification problems, since the outcomes were categories rather than numbers. The remaining examples in this chapter, home prices and hotness, are both problems with numerical outcomes.
While it’s possible to run buildtree on a dataset with numbers as
outcomes, the result probably won’t be very good. If all the numbers are
treated as different categories, the algorithm won’t take into account
the fact that some numbers are close together and others are far apart;
they will all be treated as completely separate. To deal with this, when
you have a tree with numerical outcomes, you can use variance as a scoring function instead of
entropy or Gini impurity. Add the variance function to treepredict.py: