From genes to organisms: Bioinformatics System Models and Software
Gene regulatory networks play a central role in the development and health of any organism. Unraveling and modeling these networks is of primary importance for improving our understanding of developmental processes and genetically related diseases. The goal of my PhD thesis was to develop a comprehensive framework for reverse engineering gene regulatory networks.
The building blocks of this framework have been designed to be extensible in order to enable one day the reverse engineering of whole organisms:
Biological interaction networks are often organized into groups (also called clusters, modules, or communities) of related genes and proteins carrying out specific biological functions. We have developed a consensus-based community detection method for reliable module detection in networks. The software can also be used to identify modules in metabolic, neural, social (e.g. LinkedIn, Facebook, etc.) and technological networks.
In addition to the above research contribution, the methods developed during this thesis have been implemented and released as open-source, extensible, and user-friendly software applications. As an example, GeneNetWeaver (GNW) has become a standard tool for benchmark generation and performance profiling of gene network inference methods. Up to the present time, the community has used GNW to evaluate the accuracy of 5,000 gene network predictions. Moreover, GNW has been used to organize three editions of the DREAM challenge, an annual community-wide network inference challenge.
Keywords: gene network inference, community detection, optimization algorithms, data integration, consensus methods, reverse engineering, unsupervised algorithms, image segmentation, computational tools
Towards unsupervised and systematic segmentation of biological organisms
Significant efforts have been put into reverse engineering gene regulatory networks from protein concentration levels measured in what can be seen as a single cell of the organism (DNA microarray data). However, reconstructing gene networks is usually an underdetermined problem, that is, the real gene network can not be identified due to a lack of exploitable information.
High-throughput microscopy imaging applications represent an important research field that will one day provide tools for automatic quantification of living organisms at a multiscale systems level (molecular, cellular, and tissue level). In this project, we have developed a method for unsupervised and systematic segmentation of the Drosophila wing. This method is released as an open-source, user-friendly and extensible software called WingJ.
We have developed a method for generating quantitative descriptions of multicellular systems (such as body or organ systems) from 3D microscope images. The quantitative description accounts for morphological, gene expression, and cell nuclei information.
A robust and reliable quantitative description requires the quantification of numerous individual systems. In this project, we have developed an unsupervised image segmentation algorithm for fast and systematic quantification of the morphological structure of the Drosophila wing. The segmentation of the Drosophila embryo is also supported.
We have applied our method to segment hundreds of Drosophila wings imaged at different time points during their development and for different genetic backgrounds. The quantitative description generated provides a powerful tool for better understanding the biology of the wing and quantitatively assessing the effects of drugs or mutations.
Biological interaction networks are often organized into groups (also called clusters, modules, or communities) of related genes and proteins carrying out specific biological functions. Community detection has numerous applications for systems that can be described as graphs, for example metabolic, neural, social (e.g. Facebook), and technological networks.
Jmod is an open-source Java library for community structure detection in networks that can be easily integrated in third-party applications. Jmod implements several algorithms including a novel consensus community detection method that we have developed for identifying functional modules in transcriptional networks. Jmod is also available as a standalone application with both graphical and command-line user interfaces.
The second goal of this project is to provide an intuitive and complete environment for developing novel community detection methods. Jmod implements several benchmarks and metrics for evaluating the performance of these methods. A variety of additional tools allow researchers to focus on the development of novel methods and spend less time on common aspects of community detection (e.g., reading network structures, implementing standard metrics, etc.).
In silico benchmark generation and performance profiling of network inference methods
Numerous methods have been developed for reverse engineering gene regulatory networks from expression data. Unraveling and modeling these networks is of primary importance for improving our understanding of biological organisms. However, both their absolute and comparative performance remain poorly understood. The aim of this project is to provide benchmarks and tools for rigorous testing of methods for gene network inference.
Our framework is available as an open-source and user-friendly software application called GeneNetWeaver (GNW). GNW is the first tool that provides methods for both in silico benchmark generation and performance profiling of network reconstruction algorithms.
Detailed models of gene regulatory networks are generated in just a few clicks. Their structure is extracted from known transcriptional networks (E. coli, S. cerevisiae, etc.) before being endowed with detailed dynamical models of gene regulation accounting for both transcription and translation, independent and synergistic interactions, as well as molecular and measurement noise.
We have used GNW to organize three editions of the DREAM challenge, an annual community-wide network inference challenge. In this context, GNW was used to identify systematic errors of network inference algorithms, thus providing useful insights into how to improve their performance.
Observation and interaction in experimental environments
Distinguishing subpopulations in group behavioral experiments can reveal the impact of differences in genetic, pharmacological and life-histories on social interactions and decision-making. We have developed Fluorescence Behavioral Imaging (FBI), a toolkit that uses transgenic fluorescence to discriminate subpopulations, imaging hardware that simultaneously records behavior and fluorescence expression, and open-source software for automated, high-accuracy determination of genetic identity.
Scalable reverse engineering of nonlinear gene networks
The effective reverse engineering of gene regulatory networks is one of the great challenges of systems biology and is expected to have substantial impact on the pharmaceutical and biotech industries in the next decades. A gene network is formed by regulatory genes, which code for proteins that enhance or inhibit the expression of other regulatory and/or non-regulatory genes, thereby forming a complex web of interactions. The goal of reverse engineering is to automatically identify such a network from experimental data. In this project,
I have developed a reverse engineering algorithm capable of generating predictive models of gene networks. These models are biologically plausible dynamical models which can be used to predict the response of the gene networks to new perturbations such as the application of a drug;
I have taken the initiative and developed a method for generating in silico benchmark networks and assess the performance of reverse engineering algorithms. This work later inspired the development of the GeneNetWeaver project;
I received the Best-poster award at the 2008 exhibition of microengineer's master projects (more than 90 participants).