Chapter 14. Analysis and Integration of ... > KEY TERMS AND DEFINITIONS


Hierarchical Clustering:

Unsupervised data mining algorithm. It clusters the data forming a tree diagram, or dendrogram, which shows the relationships between samples according to a proximity matrix. The root node of the dendrogram represents the whole data set, and each leaf node is regarded as a data point. The clusters are obtained by cutting the dendrogram at different levels


Data mining algorithms applied over a biological database containing different types of data (for example: metabolic profiles and transcriptional data from microarrays) from an original genome that that has been modified by introgression lines of wild species alleles (cisgenic plants) or transgenic plants overexpressing a gene of interest. An introgression line (IL) is defined as a genotype that carries genetic material derived from a similar species, for example a "wild" relative. The IL-mining objective is finding hidden relations among the IL-data to infer new knowledge about the biological processes that involve them


It is one of the best-known and most popular clustering algorithms. It begins by selecting a desired number of k clusters and assigning their centroids to data points randomly chosen from the data set. At each iteration, data points are classified by assigning them to the cluster whose centroid is closest and then new cluster centroids are computed as the average of all the points belonging to each cluster.


The metabolome forms a large network of metabolic reactions, where outputs from one enzymatic chemical reaction are inputs to other chemical reactions. Metabolites are the intermediates and products of metabolism. A primary metabolite is directly involved in normal growth, development, and reproduction. A secondary metabolite is not directly involved in those processes, but usually has an important ecological function.

Objective Clustering Measurement:

An objective measure is based on only the raw data. No additional knowledge about the data is required. An objective measure usually represents the correlation or distribution of the data. In the case of clustering, objective measurements evaluate the quality the clusters found by a data mining algorithm.


Series of chemical reactions catalyzed by enzymes and connected by their intermediates, i.e. the reactants of one reaction are the products of the previous one, and so on. Reconstructing a metabolic pathway consists in inferring the relations between genes, proteins (enzymes), and reactions in a given metabolic system.

Self-organizing Maps:

Neural networks that use competitive learning. They can represent complex high-dimensional input patterns into a simpler low dimensional discrete map, with prototype vectors that can be visualized in a two-dimensional lattice structure, while preserving the proximity relationships of the original data as much as possible.

Subjective Clusterin g Measure:

A subjective measure takes into account both the data and the user domain or background knowledge about the data. In the case of clustering, subjective measurements evaluate the validity of the clusters found by a technique from the application point of view (for example, for a biological database, the biological validity of the groupings found).


An RNA molecule, a type of compound produced directly from genes. Transcription levels are obtained from hybridation chips containing all genes of the material of interest ordered into spots that are incubated with the transcriptes obtained from the material. They are marked with fluorescence and the results are observed as quantifiable intensity peaks.


