Bioinfo Labs  |   CEPCEB  |   IIGB  |   UC Riverside

Rakesh Kaundal

Director, Bioinformatics Facility
Institute for Integrative Genome Biology (IIGB)
Department of Botany and Plant Sciences
1207G Genomics Building
University of California, Riverside CA 92521.
E-mail: rkaundal@ucr.edu
Phone: (951) 888-9835
Fax: (951) 827-5155

Related links: http://genomics.ucr.edu/members/  



Qualifications

Post Doctorate (Bioinformatics), The Samuel Roberts Noble Foundation, Ardmore (Oklahoma), USA
Ph.D. (Plant Breeding & Genetics), Dr. B.R. Ambedkar University, India
M.Sc. (Plant Breeding & Genetics), CSK Himachal Pradesh Agricultural University, India
Post-Graduate Diploma (Bioinformatics), Sikkim Manipal University, India


Area(s) of Expertise

       Next-generation sequencing data analysis; Computational modeling using supervised (Machine Learning) and unsupervised (Bayesian-based) learning approaches; Data mining, develop and maintain bioinformatics software / tools; Develop algorithms to study Protein-Protein Interactions, intra- and inter-species (host-pathogen interactions); Protein function prediction.

Research Interests

        In addition to my role as the director of UCR's high-performance computing | bioinformatics facility, I am actively engaged in research aimed at computationally mining the diverse and large multi-dimensional -omics datasets by integrating cutting-edge informatics technologies, e.g. applying statistical pattern recognition, artificial intelligence, and supervised / unsupervised learning approaches to develop novel computational tools and algorithms, and apply the gained knowledge towards organismal improvement. Some major scientific areas of interest are:
        Systems-based understanding of complex genetic traits (e.g. modeling of gene regulatory networks, visualization); Predicting intra- and inter-species protein interaction networks (host-pathogen interactions); Protein function prediction (subcellular localization, predicting pathways related to lignin degradation/synthesis, classification of other protein functions); Metagenomics (e.g. rhizosphere microbiome interacting with host); and Next-generation sequencing data analysis (develop packages for assembly, alignment, annotation, etc.).

Projects

Some Current / Past Projects:

        1. Systems Approaches to Understanding Plant-Microbe Interactions

Not only humans, every year pathogenic organisms cause billions of dollars worth of damage to crops and livestock. Studies regarding the role of effector proteins in plants are still in its infancy. One way to study the role of effectors in disease development is to identify the plant target proteins that effectors interact with. To date, there is no automated system to predict genome-wide plant-pathogen interactions, and mechanisms to visualize these networks in a user-friendly way.

Under an ongoing project on predicting Protein Interaction Networks (PINs) in the model plant host-pathogen system, Arabidopsis-Pseudomonas syringae, we are using unsupervised learning (Bayesian-based) and supervised learning (machine learning) techniques to develop novel algorithms and predict genome-scale PINs, including visualization of these networks in a Cytoscape environment. By integrating diverse data types or properties such as Plant-Associated Microbe Gene Ontology (PAMGO) annotations, protein domain interactions such as intra-species protein-domain profiles, topology or proximity in intra-species Protein-Protein Interaction (PPI) networks, protein sequence similarity (interologs and orthologs), correlated gene expression, phylogenetic relationship, and the available experimental evidence into a computational framework, our results to predict host-pathogen interactions show more than 95% prediction accuracy. The models have been further implemented as a web-based resource, AP-iNET, freely accessible at http://apinet.bioinfo.ucr.edu/. The users could analyse their 'query' host/pathogen sequence data to predict the interactions using this tool. The positive interaction pairs could be visualized in AP-iNET implemented as a Cytoscape plug-in.

Further, we are experimentally validating some novel interaction pairs in vivo using Yeast two-hybrid and BiFC (Bimolecular Fluorescence Complementation) techniques, in collaboration with the Noble Foundation (http://www.noble.org/). Our interest is to apply this knowledge to develop such computational models in agriculturally relevant crop systems, and help the plant science community in guiding cost effective experimental strategies to detect host-pathogen PPIs and drive research on how pathogens infect host cells.

2. Novel Algorithms and Tools for Subcellular Localization Prediction

Determining subcellular localization is important for understanding the protein function and is a critical step in any genome annotation. In the past, the trend has been to develop 'general' prediction tools (e.g., TargetP, LOCtree, PA-SUB, MultiLoc, WoLF PSORT, Plant-Ploc etc.) applicable to all organisms. In my earlier studies on individual proteomes (Arabidopsis, Rice), I found that there are unique genome-specific signals for subcellular localization, and thus, organism-specific prediction tools are better than the general ones. My innovative techniques have integrated empirical biological knowledge with machine learning methods for intelligent automated decision making on localizations. Two online tools were developed; one for Arabidopsis (called AtSubPhttp://bioinfo3.noble.org/AtSubP/), and the other for Rice (RSLpred,http://www.imtech.res.in/raghava/rslpred/).

These subcellular predictors are being actively used by various researchers, e.g. one of the tools, AtSubP has been integrated into the TAIR database (http://www.arabidopsis.org/), a comprehensive resource for Arabidopsis thaliana, to provide genome-scale subcellular annotations. My finding that species-specific predictors are better than the generalist predictors has been confirmed by various other groups through experiments [New Phytologist 2013, 200(4): 1022-33]. Further, these predictions have been tested experimentally in lab through Green Fluorescent Protein (GFP) fusions. About 25 'previously unknown' proteins have been randomly picked from the AtSubP predictions and their localizations confirmed in planta (unpublished).

My interest is to develop such novel algorithms and tools in other important species of interest, including develop classifiers for dual- and multi-targeted proteins within a cell.

3. Bioinformatics for Bioenergy

This is another area where we are employing integrated and innovative computational approaches to generate new and recombined metabolism in organisms that may lead to useful products such as biofuels. To aid in the discovery of novel biomass degrading enzymes, we have recently developed a comprehensive prediction system for the identification and classification of organism-wide biomass degrading enzymes. A state-of-the-art Artificial Intelligence (AI) technique, Support Vector Machines (SVM) was used to train a known set of ligninase classes (~27 enzyme categories) to develop computational models and predict novel lignin degrading enzymes in (meta) genomes. Our results indicate a high degree of prediction performance; an overall accuracy of 98% with a Matthews Correlation Coefficient (MCC) of 0.84. A web-based prediction tool has also been developed; available at (http://pred.bioinfo.ucr.edu/ligpred/).

In our similar study on one of the most important lignin-related enzymes, laccases, we recently have developed a two-phase classification system to characterize various laccase subtypes using unsupervised and supervised learning approaches. Laccases (E.C. 1.10.3.2) are multi-copper oxidases that have gained importance in many industries such as biofuels, pulp production, textile dye bleaching, bioremediation, and food production. Our online tool, LacSubPred (http://lacsubpred.bioinfo.ucr.edu/) has been specifically designed to characterize novel laccase subtypes from their physicochemical properties.

We are interested in applying other computational approaches, e.g. Self-Organized Maps (SOM) and k-means clustering algorithms to refine the models so that they are applicable to full length as well as the metagenomics sequences; classification of enzymes based on the pathways they are involved in; and seek collaborations for validation of these models in lab and field conditions, etc.

4. Metagenomics for Crop Improvement

The rapidly emerging field of metagenomics seeks to examine the genomic content of communities of organisms to understand their roles and interactions in an ecosystem. One such area of interest is relating microbial communities to plant productivity, and/or disease associations. However, characterization of microbiome contributing to the plant phenotypic traits is challenging due to, difficulty in working in the soil environment, and the numerical and functional complexity of the microbial community. Then most microorganisms cannot be isolated easily.

In one of our collaborative projects, we are developing bioinformatics approaches to understand wheat productivity associations with its microbiome. Using statistical approaches, our collaborators identified ~800 Operational Taxonomic Units (OTUs) that are positively and negatively associated with the wheat productivity. From bioinformatics perspective, our interest is to develop computational algorithms using machine learning techniques to identify patterns of +vely and -vely associated OTUs, and then develop a web-based database and prediction software to further identify/classify OTUs from an 'unknown' data. The overall goal is to computationally optimize the productivity potential of a given agricultural soil-system based on the microbial community structure and soil characteristics.

Tools Developed

  1. AP-iNET (http://apinet.bioinfo.ucr.edu/): a bioinformatics system for predicting and visualizing genome-wide Protein Interaction Networks (PINs) in the Arabidopsis-Pseudomonas syringae model interaction system.
  2. AtSubP (http://bioinfo3.noble.org/AtSubP/): a highly accurate Arabidopsis Subcellular Localization predictor.
  3. DoBlast (http://bioinfo.okstate.edu:8080/doblast/): a parallelized BLAST server for genome-scale annotations; large-scale sequence data analysis could be finished in minutes using automated parallel computing.
  4. LacSubPred (http://lacsubpred.bioinfo.ucr.edu/): a two-phase classification system to characterize various laccase subtypes using unsupervised and supervised learning approaches, a useful resource to the biofuel community.
  5. LigPred (http://pred.bioinfo.ucr.edu/ligpred/): a comprehensive prediction system for the identification and classification of enzymes related to the synthesis and degradation of lignin.
  6. PLpred (http://pred.bioinfo.ucr.edu/PLpred/): this online tool first identifies a query protein to be a plastid or non-plastid one and then, classifies the identified plastid proteins further into four categories viz. Chloroplast, Chromoplast, Amyloplast or Etioplast proteins.
  7. RSLpred (www.imtech.res.in/raghava/rslpred/): a highly accurate Rice Subcellular Localization predictor.
  8. RB-Pred (www.imtech.res.in/raghava/rbpred/): a first of its kind worldwide, this server forecasts rice leaf blast severity based on the weather parameters for general use to plant pathologists and farming community.
  9. Project (http://www.imtech.res.in/raghava/rslpred/project.html): Given a protein sequence / accession number, this tool searches for high hydrophobicity window in the query sequence when a suitable pattern is made to search by the user (e.g. AL???LW pattern). The high hydrophobicity window is defined with the Kyte-Doolittle score schema based on the user-customizable search pattern, user-customizable window size and score threshold value.

Publications    Google scholar ]     [ PubMed ]    ResearchGate ]

    1. Weirick, T., Sahu, S.S., Mahalingam, R. and Kaundal, R*. 2014. LacSubPred: predicting subtypes of Laccases, an important lignin metabolism-related enzyme class, using in silico approaches. BMC Bioinformatics 15(S11): S15.

    2. Sahu, S.S., Weirick, T. and Kaundal, R*. 2014. Predicting genome-scale Arabidopsis-Pseudomonas syringae interactome using domain and interolog-based approaches. BMC Bioinformatics 15(S11): S13.

    3. Kaundal, R.*, Sahu, S.S., Verma, R. and Weirick, T. 2013. Identification and characterization of plastid-type proteins from sequence-attributed features using machine learning. BMC Bioinformatics 14(S14): S7.

    4. Ahmed, F., Kaundal, R. and Raghava, G.P.S. 2013. PHDcleav: A SVM based method for predicting human Dicer cleavage sites using sequence and secondary structure of miRNA precursors. BMC Bioinformatics 14(S14): S9.

    5. Kaundal, R., Saini, R. and Zhao, P.X. 2010. Combining machine learning and homology-based approaches to accurately predict subcellular localization in Arabidopsis. Plant Physiology 154(1): 36-54.

    6. Benedito, V.A., Li, H., Dai, X., Wandrey, M., He, J., Kaundal, R., Torres-Jerez, I., Gomez, S.K., Harrison, M.J., Tang, Y., Zhao, P.X. and Udvardi, M.K. 2010. Genomic inventory and transcriptional analysis of Medicago truncatula transporters. Plant Physiology 152(3): 1716-1730.

    7. Kaundal, R. and Raghava, G.P.S. 2009. RSLpred: predicting subcellular localization of rice proteins combining compositional and evolutionary information. Proteomics 9(9): 2324-2342.

    8. Kaundal, R., Kapoor, A.S. and Raghava, G.P.S. 2006. Machine learning techniques in disease forecasting: a case study on rice blast prediction. BMC Bioinformatics 7(1): 485.

    9. Kaundal, R.* and Sharma, B.K. 2006. Genotype x environment interaction and stability analysis for yield and other quantitative traits in maize (Zea mays L.) under rainfed and high rainfall valley areas of the sub-montane. Research on Crops 7(1): 171-180.

    10. Kapoor, A.S. and Kaundal, R. 2007. Development of weather based forewarning systems for rice blast. Himachal Journal of Agricultural Research 33(2): 211-217.

    11. Kaundal, R. and Kapoor, A.S. 2005. Virulence pattern of Pyricularia grisea in district Kangra of Himachal Pradesh. Himachal Journal of Agricultural Research 31(2): 170-172.

    12. Kaundal, R.* and Sharma, B.K. 2005. Genetic variability and association studies for different yield components over the environments in elite cultivars of Zea mays L. Himachal Journal of Agricultural Research 31(1): 31-38.

      * Corresponding author    

    Editorials:

    13. Wren, J.D., Dozmorov, M.G., Burian, D., Perkins, A., Zhang, C., Hoyt, P. and Kaundal, R. 2014. Proceedings of the 2014 MidSouth Computational Biology and Bioinformatics Society (MCBIOS) Conference. BMC Bioinformatics 15(S11): S1.

    14. Wren, J.D., Dozmorov, M.G., Burian, D., Kaundal, R., Perkins, A., Perkins, E., Kupfer, D.M. and Springer, G.K. 2013. Proceedings of the 2013 MidSouth Computational Biology and Bioinformatics Society (MCBIOS) Conference. BMC Bioinformatics 14(S14): S1.

    15. Wren, J.D., Dozmorov, M.G., Burian, D., Kaundal, R., Bridges, S. and Kupfer, D.M. 2012. Proceedings of the 2012 MidSouth Computational Biology and Bioinformatics Society (MCBIOS) Conference. BMC Bioinformatics 13(S15): S1.

    Book Chapters / Bulletins:

    16. Azad, R.K., Mishra, N., Ahmed, F., Kaundal, R*. 2013. Bioinformatics approaches to deciphering alien gene transfer: a comprehensive analysis. In: Pratap, A. and Kumar, J. (Eds.) Alien GeneTransfer in Crop Plants: Innovations, Methods and Risk Assessment, Vol I. Springer Business and Science Media, USA (invited chapter).

    17. Reddy, C.S., Susheela, K., Kapoor, A.S., Kaundal, R., Krishnaiah, N.V., Mishra, B., Ramakrishna, Y.S., Prasad, Y.G., Reddy, D.Y. and Prabhakar, M. 2004. Forewarning Rice Blast in India, Technical Bulletin No. 9, 2004-2005, Directorate of Rice Research, Rajendranagar, Hyderabad (AP), India, 46 pp.

    18. Contributed full chapter on rice blast forewarning in the book entitled “Weather based forewarning for crop pests and diseases”; Kalyani Publishers, New Delhi, India.

---------- x ---------- x ----------