My expertise lies at the frontier of information theory, machine learning and biology. Currently, my main research interest is (transcriptional) regulatory network inference and reverse engineering of cells and genomes.
Regulatory networks:
A gene can produce a protein that can activate or repress the production of another protein. Hence, there are circuits coded in the DNA of a cell. A convenient representation of the cell circuitry is a graph, where the nodes represent the genes and the arcs represent the interactions between them. Those networks aims at giving a global picture of the cell. Regulatory networks also delineate the interactions between the components of a biological systems, and tell us how these interactions give rise to the functions and behaviors of that global system.
Results: (200+ citations)
MRNET and MINET:
We proposed an original regulatory network inference method, called MRNET (Minimum Redundancy NETwork), inspired by a fast variable selection technique. While most of the standard approaches are not suited to deal with high number of genes, MRNET has been able to infer very large networks (up to several thousands nodes) in a reasonable amount of time. Furthermore, numerous experimental results show the competitive accuracy of this approach (EURASIP Systems Biology 2007). Research on MRNET has led to an open source R/Bioconductor package called MINET (Mutual Information NETworks inference) (BMC Bioinformatics 2008).
Results: (200+ citations)
Datasets, generally analyzed in isolation, often have a limited amount of information that can prevent the construction of accurate models. One of the main problems lies in the limited amount of samples within each of those datasets, making it difficult to detect functional relationships. To circumvent this difficulty, several datasets can be collected and analyzed together hoping to improve estimation of functional relationships. However, it is extremely difficult to integrate datasets that are heterogeneous with noise, range and distribution of each variable that differ importantly.
The efforts of the Drosophila Model Organism Encyclopedia Of DNA Elements project (modENCODE) consortium has led to the development of many different types of functional datasets that interrogate different aspects of the fly genome including the genomic, epi-genomic and transcriptomic levels. These datasets have given a unique opportunity for constructing a complete functional regulatory map for the fly. We developed a framework for reconstructing the regulatory network for Drosophila melanogaster by integrating physical (ChIP and motifs) and functional (expression and chromatin data) generated by the modENCODE consortium. We also developed a benchmarking framework that allows us to rigorously assess the quality of the reconstruction (Science, 2010, Genome Research 2012).