Being an interface between the modern biology and the information technology, the discipline of Bioinformatics & Computational Biology involves discovery, development and implementation of computational algorithms and software tools that facilitate an understanding of the biological processes with the goal to serve primarily agriculture and healthcare sectors with several spin-offs. Our lab is applying statistical pattern recognition, artificial intelligence and machine learning technologies to develop novel computational tools and algorithms, primarily in the following areas of research interest:
Criminal investigations of bioterrorism attacks, tracking of disease outbreaks, and medical intelligence operations require bioinformatics applications that are mission oriented and task specific. The KBL is involved in developing novel computational tools & algorithms for pathogen detection and discrimination, identification of species-specific signatures, and using artificial intelligence to predict biosecurity threats. For example, discriminating pathogen genotypes in a fundamentally different way from distance-based and BLAST algorithms and instead, using the Neural Networks
, Support Vector Machine
or Decision Tree
classifiers to build patterns from genome regions (e.g. DNA barcodes) that are under selective pressure; and ultimately incorporating them into a database(s) / visualization tool(s) for community use.
In addition, we are also engaged in modeling of high-consequence pathogens based on epidemiological factors, e.g. develop tools for disease prediction / forecasting based on significant weather parameters and/or other correlated features.
Organizing metagenomic sequence data (Huge!):-
- Clustering: a large dataset clustered into distinct subsets based on some specific measures (e.g. clustering of proteins/genes to get families, binning into taxon).
- Binning: a clustering method that uses DNA composition (e.g. GC content), codon usage, assembly, phylogenetic analysis, database search etc. to cluster reads into specific genomes or groups of genomes.
- Gene prediction: genes coding for proteins, RNA and other regulatory elements.
- Gene annotation: in metagenomics, a substantial percentage of sequences cannot be easily classified; new algorithms development is needed.
The interplay between plants and their microbial co-habitants is regulated by extensive chemical signaling. Most of what we know about these complex community interactions has been derived through study of organisms in pure culture, but it is well known that the vast majority of microbes have not been cultivated. While high-throughput DNA sequence analysis will be an important tool for these studies, the immense richness and diversity of such communities present a strong mandate for the use of functional metagenomics strategies that involve a broad variety of screening methodologies to discover and study the currently unknown key biological processes. Thus, developing computational methods that identify which interactions enable successful host-microbe associations have great implications, especially in metagenomics.
Annotation of sequenced genomes is an important process of attaching biological information to the sequences. Apart from using the conventional similarity-based approaches for the annotation of genomes, we are using Artificial Intelligence techniques (Neural Networks, Support Vector Machine, Decision Tree etc
.) to develop various machine learned algorithms for faster and accurate structural / functional annotations. The hybrid approach (combining homology and machine learning techniques) has shown relatively better accuracy over the individual approaches and we are actively developing such hybrid methodologies for the genome annotation process. One of such usages has been successfully demonstrated inSubcellular Localization prediction
where a range of prediction tools
have been developed.