We are developing bioinformatics techniques and tools for
uncovering the molecular-level pathways involved in complex diseases such as cancer,
aiming at determining disease markers and therapeutic targets.
· knowledge
discovery and machine learning (especially for extracting
functional information from microarray gene expression
data, reconstructing/refining pathways from large-scale microarray
data)
-
Biclustering and metaclustering gene expression data
-
Gene network inference
· Real-life
microarray datasets analysis
-
Lung cancer dataset of Bhattacharjee et al.
-
Lung cancer dataset of Garber et al.
-
Type 2 diabetes dataset of Mootha et al.
· Intelligent
query answering, ontologies (mediator
architecture providing reasoning-aware query answering)
· Determining
potential drug targets from a combination of pathways and gene expression
data.
Pancreatic ductal adenocarcinoma is the deadliest form of cancer, for which
the best known therapeutic options are currently extremely ineffective. In the framework of the GENOPACT project (Research of Excellence
Program CEEX 56/2005), we analyzed a set of 78 pancreatic cancer-normal sample
pairs from the tissue bank of the Fundeni Clinical Institute (ICF), measured with Affymetrix U133 Plus 2.0 microarrays.
This is one of the largest available pancreatic ductal
adenocarcinoma datasets, thereby allowing a
statistically reliable indentification of the genes
involved in this disease.
We have developed a complex bioinformatic
framework for the analysis of this dataset including:
-
various
preprocessing algorithms (RMA, dChip, MAS5)
-
a
number of different clustering algorithms, including widely used ones such as
hierarchical clustering, but also original biclustering
algorithms allowing for overlapping clusters
-
promoter
analysis for detection of transcription factor binding sites, useful for
determining the regulatory programs of the differentially expressed genes
-
a large database of gene interactions and pathways compiled from several
sources, including Pubmed literature.
We have performed an in-depth integrated analysis of the
resulting set of differentially expressed genes, producing a plausible “model”
of the molecular-level mechanisms of PDAC and its progression.

We plan to further refine our current understanding of the
molecular-level processes responsible in this disease in the framework of a
future project and with the help of a specialized molecular-biology lab by using
various high-throughput technologies (not just microarrays)
to dissect the pathways involved in PDAC and to test these on cell lines and
possibly animal models.
-
microarray data analysis (differentially expressed genes, biclustering, metaclustering,
gene network inference)
-
literature
analysis tools (e.g. extracting co-citations)
The microarray data analysis tools
reveal only the level of transcription regulation and are strongly affected by
noise and normal biological variability. We are therefore using them in
conjunction with literature analysis tools for
-
validating
certain transcriptional influences, as well as
-
emphasizing the various (signaling) pathways in which these genes
operate.
The partial results are very encouraging. For example, for
the squamous cell lung carcinoma we have found
essentially two groups of differentially expressed genes:
-
a
set of upregulated genes involved e.g. in the cell
cycle (e.g. E2F and/or p130/retinoblastoma like 2 targets) and/or in the structure and organization of the cytoskeleton
(e.g. keratin 5, desmoplakin – specific to the squamous cancer subtype)
-
a larger set of down-regulated genes, normally involved in certain
developmental stages of the lung.
Apparently, this cancer subtype seems to be due to a
defective re-enactment of normal developmental processes (at a wrong time and
place).

·
Liviu Badea, Doina Tilivea. Stable
Biclustering of Gene Expression Data with Nonnegative
Matrix Factorizations. Proceedings of the International Joint Conference on
Artificial Intelligence IJCAI-07,
·
Liviu Badea. Semantic Web Reasoning for Analyzing Gene Expression
Profiles. Proceedings Principles and Practice of Semantic Web Reasoning, PPSWR
2006, LNCS 4187, pp. 78-89, Springer Verlag.
·
Liviu Badea, Doina Tilivea.
Meta-clustering Gene Expression Data with Positive Tensor Factorizations.
Proceedings European Conference on Artificial Intelligence ECAI-06, p. 787, IOS
Press 2006.
·
Liviu Badea. Clustering and Metaclustering
with Nonnegative Matrix Decompositions. Proc. of the European Conference on
Machine Learning ECML-05. Lecture Notes in Artificial Intelligence, Vol. 3720, pp. 10-20, Springer Verlag,
2005. (C) Springer Verlag.
·
Rolf Backofen, Mike Badea, Pedro Barahona, Liviu Badea, François Bry, Gihan Dawelbait,
Andreas Doms, François Fages,
Carole Goble, Andreas Henschel, Anca
Hotaran, Bingding Huang,
Ludwig Krippahl, Patrick Lambrix,
Werner Nutt, Michael Schroeder, Sylvain Soliman,
Sebastian Will. Towards a semantic web for bioinformatics. (Poster) In:
Proceedings of "Bioinformatics 2004",
·
Liviu Badea - Extracting networks of influences from microarray data, ISMB-2004 poster.
· Liviu Badea, Doina Tilivea - Integrating biological process modelling with gene expression data and ontologies for functional genomics (position paper), Proc. of the International Workshop on Computational Methods in Systems Biology University of Trento, 24-26 February 2003 -- Rovereto, Italy. (C) Springer Verlag.