| Overview
Want to tap the power behind search rankings, product
recommendations, social bookmarking, and online matchmaking? This
fascinating book demonstrates how you can build Web 2.0
applications to mine the enormous amount of data created by people
on the Internet. With the sophisticated algorithms in this book,
you can write smart programs to access interesting datasets from
other web sites, collect data from users of your own applications,
and analyze and understand the data once you've found it.
Programming Collective Intelligence takes you into the
world of machine learning and statistics, and explains how to draw
conclusions about user experience, marketing, personal tastes, and
human behavior in general -- all from information that you and
others collect every day. Each algorithm is described clearly and
concisely with code that can immediately be used on your web site,
blog, Wiki, or specialized application. This book explains:
Collaborative filtering techniques that enable online retailers
to recommend products or media Methods of clustering to detect groups of similar items in a
large dataset Search engine features -- crawlers, indexers, query engines,
and the PageRank algorithm Optimization algorithms that search millions of possible
solutions to a problem and choose the best one Bayesian filtering, used in spam filters for classifying
documents based on word types and other features Using decision trees not only to make predictions, but to model
the way decisions are made Predicting numerical values rather than classifications to
build price models Support vector machines to match people in online dating
sites Non-negative matrix factorization to find the independent
features in a dataset Evolving intelligence for problem solving -- how a computer
develops its skill by improving its own code the more it plays a
game
Each chapter includes exercises for extending the algorithms to
make them more powerful. Go beyond simple database-backed
applications and put the wealth of Internet data to work for
you.
"Bravo! I cannot think of a better way for a developer to first
learn these algorithms and methods, nor can I think of a better way
for me (an old AI dog) to reinvigorate my knowledge of the
details."
-- Dan Russell, Google
"Toby's book does a great job of breaking down the complex subject
matter of machine-learning algorithms into practical,
easy-to-understand examples that can be directly applied to
analysis of social interaction across the Web today. If I had this
book two years ago, it would have saved precious time going down
some fruitless paths."
-- Tim Wolters, CTO, Collective Intellect
Editorial ReviewsProduct DescriptionWant to tap the power behind search rankings, product recommendations, social bookmarking, and online matchmaking? This fascinating book demonstrates how you can build Web 2.0 applications to mine the enormous amount of data created by people on the Internet. With the sophisticated algorithms in this book, you can write smart programs to access interesting datasets from other web sites, collect data from users of your own applications, and analyze and understand the data once you've found it. Programming Collective Intelligence takes you into the world of machine learning and statistics, and explains how to draw conclusions about user experience, marketing, personal tastes, and human behavior in general--all from information that you and others collect every day. Each algorithm is described clearly and concisely with code that can immediately be used on your web site, blog, Wiki, or specialized application. This book explains: - Collaborative filtering techniques that enable online retailers to recommend products or media
- Methods of clustering to detect groups of similar items in a large dataset
- Search engine features--crawlers, indexers, query engines, and the PageRank algorithm
- Optimization algorithms that search millions of possible solutions to a problem and choose the best one
- Bayesian filtering, used in spam filters for classifying documents based on word types and other features
- Using decision trees not only to make predictions, but to model the way decisions are made
- Predicting numerical values rather than classifications to build price models
- Support vector machines to match people in online dating sites
- Non-negative matrix factorization to find the independent features in adataset
- Evolving intelligence for problem solving--how a computer develops its skill by improving its own code the more it plays a game
Each chapter includes exercises for extending the algorithms to make them more powerful. Go beyond simple database-backed applications and put the wealth of Internet data to work for you. "Bravo! I cannot think of a better way for a developer to first learn these algorithms and methods, nor can I think of a better way for me (an old AI dog) to reinvigorate my knowledge of the details." -- Dan Russell, Google "Toby's book does a great job of breaking down the complex subject matter of machine-learning algorithms into practical, easy-to-understand examples that can be directly applied to analysis of social interaction across the Web today. If I had this book two years ago, it would have saved precious time going down some fruitless paths." -- Tim Wolters, CTO, Collective Intellect |
Other Readers Also Read | Top Sellers in This Category | | | | | |
Reader Reviews From Amazon (Ranked by 'Helpfulness') Average Customer Rating: based on 47 reviews. A very interesting book, 2009-05-29 Reviewer rating: I picked this book up at a local Barnes and Noble. While I am certainly not trained in some of the areas this book covered, I found them completely accessible. While it should be obvious from the title that someone new to programming would find this book an incredibly tough read, I'll state it for the record. If you are learning how to program, this book is worth purchasing and holding on to until your ready.
The whole idea of "Collective Intelligence" is an interesting one. Given the way things are changing every day, technology is growing, and the web is expanding, it only makes sense that ideas in this book, and elsewhere should be explored.
The author chose to use Python as the language to realize code for the different topics of the book. This certainly is not to say that they can only be coded in Python, but I would tend to agree with his choice. Python is a clear language that can be coded procedurally or objectively. If you don't "speak" Python, in many cases you can understand what is going on in the code.
For me though, this book wasn't about the code so much as the ideas. Data, data, everywhere.. now, how can we explore, extrapolate, quantify, and qualify that data? That is what I took as the essence of the book. It covers many different techniques to do this, and I found it all fascinating.
In my opinion, if you are into this kind of thing, this book is well worth it. | Excellent to refresh my knowledge, 2009-05-15 Reviewer rating: Back in school, few years ago (to many to remember). I had to study most of this concepts, and at the time they where to abstract to me, and the examples and exercises they where so simple that they weren't making sense in real life. After that I started to work in other kind of system's and projects that never had the chance to play around this concepts and see how to apply them in real life. Until now that I had the chance to read this book, and see how I can apply this ideas and concepts in real life and take advantage of this knowledge. | Great breadth; poor references; crippled by terse Python, 2009-04-27 Reviewer rating: This book provides very good breadth on a number of subjects related to machine learning. The author covers unsupervised classification and prediction systems (e.g. Bayesian classification, neural networks, and support vector machines), supervised clustering (e.g. K-Means), and stochastic optimisation (e.g. simulated annealing, genetic algorithms, and genetic programming).
Although I already had some knowledge of genetic algorithms, I know next to little about machine learning in general (my dissertation topic wasn't anywhere close to this topic), and my previous attempts at reading the machine learning tomes by Bishop and Alpaydin were futile. This book was nearly perfect for me.
The book is well written and well organised. A typical chapter comprises a high-level description of the topic, some discussion of a Python implemention with some small examples and data set, and finally a 3-5 page hands-on example at the end where the implementation is run against data accessed from a commercial website. I personally found the introductory matter in each chapter to be the most interesting, and thankfully the author provides nice illustrations for all the topics.
The author saves the best for last: Chapter 12 provides a summary of all the topics he covered with relative strengths and weaknesses of each algorithm. The author gives an excellent recurring example of email spam filtering that he carries through this chapter's discussion on Bayesian classification, decision tree classification, and neural networks, thereby allowing the reader to see how each of these techniques handles the problem differently. The illustrated example of the neural network in itself was worth its weight in gold. When I finished the book, I realised this high-level overview in Chapter 12 was invaluable and well-positioned at the end, as it neatly covers all the topics and places them in context (assuming the reader had indeed read through the previous chapters). If the author is bored enough to read this review, I would recommend that he place a similar high-level overview in Chapter 1 to guide the reader.
Now, here are things I did not like about this book:
1. The author does not provide many references to related work, particularly with how the problems presented in the book could be solved in the absence of the techniques he presents. For example, in Chapter 8 on building price models, he states that small problems "can more easily be solved with traditional statistical techniques," but he does not say what those are or give any references. Furthermore, from my own work in genetic algorithms, I know that stochastic optimisation is not the end-all to optimisation problems; sometimes the same problem can be solved quickly and efficiently with, say, linear programming, dynamic programming, or an approximation algorithm. The author does not discuss such alternatives in his chapter on this topic.
2. There is little depth in this book, but I will not hold that against him since this book was intended for a general audience. I wanted to know HOW does a neural network do what it does, and WHY a support vector machine produces its result. Again, the author provides no references. I guess Wikipedia will have to be my next step, but assuming the author himself has read through relevant material, it would have been nice if could let us know what are important papers or books to read.
3. The terse Python examples were confusing, which was due to a combination of that language's horrid type system and the author's lack of comments. Each example is difficult to follow. What do I need to pass into the function, and what do I get back? Is it a scalar, an array, a map/multimap, a set/multiset, or what? The author should have provided better comments at the very least. Occasionally, I wondered if the reader would have been better served with examples presented in pseudocode (a la the algorithms shown in CLRS), but in the end I decided that having working code in Python outweighed issues in clarity. My recommendation for advanced readers: Read the Python as simply pseudocode and implement each algorithm in your favourite respectable programming language (which better be C++, Java, or C#). I learned much more that way since I had to carefully understand what each line was doing.
In summary: this is an excellent book on machine learning with much more interesting and advanced topics than other O'Reilly works. I hope O'Reilly will continue to produce similar books.
| Practical and accessible., 2009-03-22 Reviewer rating: The book is interesting and easy to read. Shows how to apply AI concepts to the kind of applications that the majority of programmers produce, and for those who like me studied AI years ago but haven't used it a lot since then, it's a good reminder.
But, the quality of the Python code leaves a lot be desired. I'm sure it works, and for strict personal use it could be OK, but lacks of ellegance for a textbook; abuses of list comprehensions and long expressions(to make the code compact, I guess), which makes hard to follow the examples to the detail.
I don't regret having bought it, though. | Great Introduction, 2009-03-19 Reviewer rating: One of the best books I have bought in a while. It strikes a perfect balance of introduction of the algorithms and practical application. The book is organized around the different problem areas such as "search", "optimization", "categorizing", etc. and algorithms to achieve them. It starts each section with a naive implementation to a problem, and gradually works through to more intelligent solutions. I really enjoyed the evolution of the search implementation. It starts with a trivial implementation, and continues to augment adding such features as a simplified PageRank and other optimizations. |
Some information above was provided using data from Amazon.com. View at Amazon > |
| |
© 2009 Safari Books Online. All rights reserved.
|