Bioinformatics & Computational Biology

 

We are developing bioinformatics techniques and tools for uncovering the molecular-level pathways involved in complex diseases such as cancer, aiming at determining disease markers and therapeutic targets.

 

·      knowledge discovery and machine learning (especially for extracting functional information from microarray gene expression data, reconstructing/refining pathways from large-scale microarray data)

-        Biclustering and metaclustering gene expression data

-        Gene network inference

·      Real-life microarray datasets analysis

-        Pancreatic cancer

-        Lung cancer dataset of Bhattacharjee et al.

-        Lung cancer dataset of Garber et al.

-        Type 2 diabetes dataset of Mootha et al.

·      Intelligent query answering, ontologies (mediator architecture providing reasoning-aware query answering)

·      Determining potential drug targets from a combination of pathways and gene expression data. 

 

Bioinformatic analysis of a large pancreatic cancer dataset (GENOPACT project).

Pancreatic ductal adenocarcinoma is the deadliest form of cancer, for which the best known therapeutic options are currently extremely ineffective. In the framework of the GENOPACT project (Research of Excellence Program CEEX 56/2005), we analyzed a set of 78 pancreatic cancer-normal sample pairs from the tissue bank of the Fundeni Clinical Institute (ICF), measured with Affymetrix U133 Plus 2.0 microarrays. This is one of the largest available pancreatic ductal adenocarcinoma datasets, thereby allowing a statistically reliable indentification of the genes involved in this disease.

 

We have developed a complex bioinformatic framework for the analysis of this dataset including:

-        various preprocessing algorithms (RMA, dChip, MAS5)

-        a number of different clustering algorithms, including widely used ones such as hierarchical clustering, but also original biclustering algorithms allowing for overlapping clusters

-        promoter analysis for detection of transcription factor binding sites, useful for determining the regulatory programs of the differentially expressed genes

-        a large database of gene interactions and pathways compiled from several sources, including Pubmed literature.

 

We have performed an in-depth integrated analysis of the resulting set of differentially expressed genes, producing a plausible “model” of the molecular-level mechanisms of PDAC and its progression.

We plan to further refine our current understanding of the molecular-level processes responsible in this disease in the framework of a future project and with the help of a specialized molecular-biology lab by using various high-throughput technologies (not just microarrays) to dissect the pathways involved in PDAC and to test these on cell lines and possibly animal models.

Bioinformatic analysis of the lung cancer dataset of Bhattacharjee et al.

-        microarray data analysis (differentially expressed genes, biclustering, metaclustering, gene network inference)

-        literature analysis tools (e.g. extracting co-citations)

 

The microarray data analysis tools reveal only the level of transcription regulation and are strongly affected by noise and normal biological variability. We are therefore using them in conjunction with literature analysis tools for

-        validating certain transcriptional influences, as well as

-        emphasizing the various (signaling) pathways in which these genes operate.

 

The partial results are very encouraging. For example, for the squamous cell lung carcinoma we have found essentially two groups of differentially expressed genes:

-        a set of upregulated genes involved e.g. in the cell cycle (e.g. E2F and/or p130/retinoblastoma like 2 targets) and/or in  the structure and organization of the cytoskeleton (e.g. keratin 5, desmoplakin – specific to the squamous cancer subtype)

-        a larger set of down-regulated genes, normally involved in certain developmental stages of the lung.

 

Apparently, this cancer subtype seems to be due to a defective re-enactment of normal developmental processes (at a wrong time and place).

 


 

References

·        Liviu Badea, Doina Tilivea. Stable Biclustering of Gene Expression Data with Nonnegative Matrix Factorizations. Proceedings of the International Joint Conference on Artificial Intelligence IJCAI-07, Hyderabad, India, pp. 2651-2656.

·        Liviu Badea. Semantic Web Reasoning for Analyzing Gene Expression Profiles. Proceedings Principles and Practice of Semantic Web Reasoning, PPSWR 2006, LNCS 4187, pp. 78-89, Springer Verlag.

·        Liviu Badea, Doina Tilivea. Meta-clustering Gene Expression Data with Positive Tensor Factorizations. Proceedings European Conference on Artificial Intelligence ECAI-06, p. 787, IOS Press 2006.

·        Liviu Badea. Clustering and Metaclustering with Nonnegative Matrix Decompositions. Proc. of the European Conference on Machine Learning ECML-05. Lecture Notes in Artificial Intelligence, Vol. 3720, pp. 10-20, Springer Verlag, 2005. (C) Springer Verlag.

·        Liviu Badea, Doina Tilivea. Sparse Factorizations of Gene Expression Data guided by Binding Data. Proc. of the Pacific Symposium on Biocomputing PSB-2005.

·        Liviu Badea, Doina Tilivea, Anca Hotaran. Semantic Web Reasoning for Ontology-Based Integration of Resources. Principles and Practice of Semantic Web Reasoning, PPSWR 2004: 61-75, Lecture Notes in Computer Science 3208 Springer 2004.

·        Rolf Backofen, Mike Badea, Pedro Barahona, Liviu Badea, François Bry, Gihan Dawelbait, Andreas Doms, François Fages, Carole Goble, Andreas Henschel, Anca Hotaran, Bingding Huang, Ludwig Krippahl, Patrick Lambrix, Werner Nutt, Michael Schroeder, Sylvain Soliman, Sebastian Will. Towards a semantic web for bioinformatics. (Poster) In: Proceedings of "Bioinformatics 2004", Linköping, Sweden (3rd - 6th June 2004), SocBIN - Society for Bioinformatics in the Nordic countries.

·        Liviu Badea - Determining the Direction of Causal Influence in Large Probabilistic Networks: A Constraint-Based Approach. Proceedings of the European Conference on Artificial Intelligence ECAI 2004: 263-267.

·        Liviu Badea - Extracting networks of influences from microarray data, ISMB-2004 poster.

·        Liviu Badea, Doina Tilivea - Integrating biological process modelling with gene expression data and ontologies for functional genomics (position paper), Proc. of the International Workshop on Computational Methods in Systems Biology University of Trento, 24-26 February 2003 -- Rovereto, Italy. (C) Springer Verlag.

·        Liviu Badea - Functional discrimination of gene expression patterns in terms of the Gene Ontology, Proc. of the Pacific Symposium on Biocomputing PSB-2003, 565-576.