
However, most of the existing tools that accept keywords also rely solely on text-mining of the literature and are therefore less suitable for novel discoveries ( 11–16 ). One common strategy to circumvent that problem is to rely on keywords to define the genetic condition under study.

However, most of these tools are using a guilt-by-association concept (candidate genes that are similar to the already confirmed disease genes are considered promising) and are therefore not applicable when little is known about the phenotype or when no confirmed disease genes are available beforehand. PhenoPred is based on a human protein–protein interaction network and uses a supervised algorithm for detecting gene-disease associations, known gene-disease associations, protein sequence and protein functional information. GeneWanderer uses a global network distance measure to define similarity in protein–protein interaction networks. Candidate genes from one locus that are connected to candidate genes in another locus are considered promising candidate genes. Prioritizer integrates several networks obtained from different databases (including expression data) and uses this huge network to investigate diseases for which several loci are known. Other tools, such as GeneWanderer ( 8 ), Prioritizer ( 9 ) and PhenoPred ( 10 ) make use exclusively of genome-wide protein–protein interaction networks. Among these tools, ToppGene ( 4 ), SNPs3D ( 5 ), GeneDistiller ( 6 ) and Posmed ( 7 ) additionally include model organism data (mainly mouse data). Most of the available gene prioritization tools combine different data and information sources, among which the most commonly used data sources are literature, functional annotations, interactions, expression data and sequence information ( 2, 3 ). For a detailed review of web based gene prioritization tools and their information sources, the reader is referred to our recent review ( 1 ) and its associated web site ( ).

In the past couple of years, several gene prioritization methods have been proposed by the bioinformatics community to address this problem.

Because of the huge amount of genomic data that is publicly available, computational approaches have been developed to avoid performing candidate gene prioritization manually. Candidate gene prioritization is key in genetics because it is generally too expensive and time-consuming to experimentally validate all candidate genes. Identifying, among such a list, the most promising candidate genes for a disease of interest has been defined as the gene prioritization problem.

Genetic studies, such as association studies and linkage analyses, identify chromosomal regions involved in a disease or phenotype of interest, but often result in large lists of candidate genes of which only one or a few are really associated to the disease or phenotype under study. A major challenge in human genetics is to identify novel disease genes to understand the mechanisms underlying genetic conditions and, in the long term, elaborate novel treatments for these disorders.
