Safari Books Online is a digital library providing on-demand subscription access to thousands of learning resources.
Preparing your data (also known as preprocessing or data cleansing ) is the most important part of data mining. It's also the least glamorous and one of the least-discussed parts. Preprocessing can be as simple as making sure your data is in the right format for the program that reads it, or it can involve extensive calculations.
As a bioinformatics researcher, you must be especially careful of your data. Your results and reputation are based on data that have been provided by other researchers. Consequently, you must be scrupulous in collecting and using that data. The following is a list of some general questions about data integrity to answer when you work on a project (this list isn't exhaustive; you will probably come up with other things to check that are specific to the project at hand):