Susmita Datta

Susmita Datta,

Professor

Department: PHHP-COM BIOSTATISTICS
Business Phone: (352) 294-5923
Business Email: susmita.datta@ufl.edu

About Susmita Datta

Professional Biography Susmita Datta has received her PhD degree in Statistics from the University of Georgia, Athens, Georgia, USA followed by a postdoctoral training in Biostatistics from the Emory University. She has joined the Department of Biostatistics at the University of Florida in 2015 with a Preeminent hire as a tenured Full Professor. Prior to that, she was a Distinguished Scholar and a Tenured Full Professor at the University of Louisville and at the Georgia State University as a tenured Associate Professor. She is a fellow of the American Statistical Association (ASA), an elected member of the International Statistical Institute (ISI), and fellow of the American Association for the Advancement of Science (AAAS). She is one of the three elected members of the International Indian Statistical Association (IISA), elected RECOMB member of ENAR of Biometric Society and was the elected President of Cacus for Women in Statistics in 2013. Her research area includes Biostatistics and Bioinformatics/Computational Biology. Her research contributions spans all ‘omics’ related high dimensional data such as RNA-sequencing, Single Cell RNA sequencing and mass spectrometry data for proteiomics, lipidomics, metabolomics and good old microarray data. In addition to that, her computing laboratory is involved in methodological and software development in clustering and classification techniques, statistical issues in population biology, systems biology, survival analysis, multi-state models and big data analytics. She has published a book on “Statistical Analysis of Proteomics, Metabolomics, and Lipidomics Data Using Mass Spectrometry” by Springer. Dr. Datta is widely (>100) published in peer reviewed journals. The National Science Foundation and the National Institutes of Health have continuously funded her work. Her constant involvement with Big and fat data made her interested in Data science. She has guided more than 47 students through their theses and dissertations. She promotes women in STEM fields.

Accomplishments

Elected RECOMB Member
2021-2023 · ENAR Biometric Society
Elected member of the Board of Trustees
2020-2023 · International Indian Statistical Association (IISA)
Fellow
2014 · American Association for the Advancement of Science (AAAS)
Elected President
2013 · Caucus for Women in Statistics
Fellow
2012 · Fellow of American Statistical Association (ASA)
Fellow
2010 · International Statistical Institute (ISI)

Research Profile

Methodological: Bioinformatics, Clustering and Classification, Genomics, Proteomics, Lipidomics, Single cell RNA sequencing data analysis, Network analysis,Infectious Disease Modeling, Non-linear Regression modeling for Systems Biology, Statistical Issues in Population Biology, Statistical Genetics, Systems Biology, Survival Analysis and Multi state models. Disease: Cancer, Autism, Alzheimer’s, Perkinsons, Infectious disease such as AIDS, COVID-2 and Zica related diseases. Expertise: • Biostatistics • Bioinformatics/Computational Biology • Genomics • Proteomics • Metabolomics/Lipidomics • Clustering and Classification • Population Biology • Survival Analysis • Nonparametrics • Personalized Medicine • Complex disease modeling and Biomarker identification in Cancer, Alzheimer, Pain and infectious diseases.

Open Researcher and Contributor ID (ORCID)

0000-0002-7408-699X

Publications

2021
A hypolipoprotein sepsis phenotype indicates reduced lipoprotein antioxidant capacity, increased endothelial dysfunction and organ failure, and worse clinical outcomes.
Critical care (London, England). 25(1) [DOI] 10.1186/s13054-021-03757-5. [PMID] 34535154.
2021
Body fatness and breast cancer risk in relation to phosphorylated mTOR expression in a sample of predominately Black women.
Breast cancer research : BCR. 23(1) [DOI] 10.1186/s13058-021-01458-z. [PMID] 34330319.
2021
Magnesium dietary intake and physical activity in Type 2 diabetes by gender in White, African‐American and Mexican American: NHANES 2011‐2014
Endocrinology, Diabetes & Metabolism. 4(1) [DOI] 10.1002/edm2.203. [PMID] 33532626.
2021
SAREV : A review on statistical analytics of single‐cell RNA sequencing data
WIREs Computational Statistics. [DOI] 10.1002/wics.1558.
2021
Unraveling City-Specific Microbial Signatures and Identifying Sample Origins for the Data From CAMDA 2020 Metagenomic Geolocation Challenge.
Frontiers in genetics. 12 [DOI] 10.3389/fgene.2021.659650. [PMID] 34421984.
2021
Unraveling city-specific signature and identifying sample origin locations for the data from CAMDA MetaSUB challenge
Biology Direct. 16(1) [DOI] 10.1186/s13062-020-00284-1. [PMID] 33397406.
2020
A sparse Bayesian factor model for the construction of gene co-expression networks from single-cell RNA sequencing count data.
BMC bioinformatics. 21(1) [DOI] 10.1186/s12859-020-03707-y. [PMID] 32811424.
2020
Body fatness and mTOR pathway activation of breast cancer in the Women’s Circle of Health Study.
NPJ breast cancer. 6 [DOI] 10.1038/s41523-020-00187-4. [PMID] 33024820.
2020
COVID-19: Reduced Lung Function and Increased Psycho-emotional Stress.
Bioinformation. 16(4):293-296 [DOI] 10.6026/97320630016293. [PMID] 32773987.
2020
Does Community- or University-Based Residency Sponsorship Affect Graduate Perceived Preparation or Performance?
Journal of graduate medical education. 12(5):583-590 [DOI] 10.4300/JGME-D-19-00907.1. [PMID] 33149828.
2020
Early administration of steroids in the ambulance setting: Protocol for a type I hybrid effectiveness-implementation trial with a stepped wedge design.
Contemporary clinical trials. 97 [DOI] 10.1016/j.cct.2020.106141. [PMID] 32931918.
2020
Meta-analysis of cardiomyopathy-associated variants in troponin genes identifies loci and intragenic hot spots that are associated with worse clinical outcomes.
Journal of molecular and cellular cardiology. 142:118-125 [DOI] 10.1016/j.yjmcc.2020.04.005. [PMID] 32278834.
2020
Topical doxycycline monohydrate hydrogel 1% targeting proteases/PAR2 pathway is a novel therapeutic for atopic dermatitis.
Experimental dermatology. 29(12):1171-1175 [DOI] 10.1111/exd.14201. [PMID] 32997843.
2019
Bayesian Hierarchical Model for Protein Identifications.
Journal of applied statistics. 46(1):30-46 [DOI] 10.1080/02664763.2018.1454893. [PMID] 31105371.
2019
Detection of differentially expressed genes in discrete single-cell RNA sequencing data using a hurdle model with correlated random effects.
Biometrics. 75(4):1051-1062 [DOI] 10.1111/biom.13074. [PMID] 31009065.
2019
Identification of city specific important bacterial signature for the MetaSUB CAMDA challenge microbiome data.
Biology direct. 14(1) [DOI] 10.1186/s13062-019-0243-z. [PMID] 31340852.
2019
Medical schools, primary care and family medicine: clerkship directors’ perceptions of the current environment.
Family practice. 36(6):680-684 [DOI] 10.1093/fampra/cmz015. [PMID] 31329866.
2019
Membrane proteomic analysis reveals overlapping and independent functions of Streptococcus mutans Ffh, YidC1, and YidC2.
Molecular oral microbiology. 34(4):131-152 [DOI] 10.1111/omi.12261. [PMID] 31034136.
2019
What Are the Characteristics of Fourth-Year Medical Students With Higher Levels of Resilience?
PRiMER (Leawood, Kan.). 3 [DOI] 10.22454/PRiMER.2019.150381. [PMID] 32537593.
2018
A Combined PLS and Negative Binomial Regression Model for Inferring Association Networks from Next-Generation Sequencing Count Data.
IEEE/ACM transactions on computational biology and bioinformatics. 15(3):760-773 [DOI] 10.1109/TCBB.2017.2665495. [PMID] 28186904.
2018
Pilot Study of Metabolomics and Psychoneurological Symptoms in Women With Early Stage Breast Cancer.
Biological research for nursing. 20(2):227-236 [DOI] 10.1177/1099800417747411. [PMID] 29258398.
2018
Predicting survival times for neuroblastoma patients using RNA-seq expression profiles.
Biology direct. 13(1) [DOI] 10.1186/s13062-018-0213-x. [PMID] 29848365.
2018
Profiling the effects of short time-course cold ischemia on tumor protein phosphorylation using a Bayesian approach.
Biometrics. 74(1):331-341 [DOI] 10.1111/biom.12742. [PMID] 28742267.
2018
Unraveling bacterial fingerprints of city subways from microbiome 16S gene profiles.
Biology direct. 13(1) [DOI] 10.1186/s13062-018-0215-8. [PMID] 29789016.
2017
A novel statistical approach for identification of the master regulator transcription factor.
BMC bioinformatics. 18(1) [DOI] 10.1186/s12859-017-1499-x. [PMID] 28148240.
2017
EAMA: Empirically adjusted meta-analysis for large-scale simultaneous hypothesis testing in genomic experiments
PLOS ONE. 12(10) [DOI] 10.1371/journal.pone.0187287. [PMID] 29088275.
2017
Monotonic single-index models to assess drug interactions.
Statistics in medicine. 36(4):655-670 [DOI] 10.1002/sim.7158. [PMID] 27804146.
2017
optCluster: An R Package for Determining the Optimal Clustering Algorithm.
Bioinformation. 13(3):101-103 [DOI] 10.6026/97320630013101. [PMID] 28584451.
2017
Temporal Prediction of Future State Occupation in a Multistate Model from High-Dimensional Baseline Covariates via Pseudo-Value Regression.
Journal of statistical computation and simulation. 87(7):1363-1378 [DOI] 10.1080/00949655.2016.1263992. [PMID] 29217870.
2016
Exploring the importance of cancer pathways by meta-analysis of differential protein expression networks in three different cancers.
Biology direct. 11(1) [PMID] 27993151.
View on: PubMed
2016
Improving protein identification from tandem mass spectrometry data by one-step methods and integrating data from other platforms.
Briefings in bioinformatics. 17(2):262-9 [DOI] 10.1093/bib/bbv043. [PMID] 26141827.
2016
Inter-platform concordance of gene expression data for the prediction of chemical mode of action.
Biology direct. 11(1) [PMID] 27993158.
View on: PubMed
2014
Differential network analysis in human cancer research.
Current pharmaceutical design. 20(1):4-10 [PMID] 23530503.
View on: PubMed
2014
dna: An R package for differential network analysis.
Bioinformation. 10(4):233-4 [DOI] 10.6026/97320630010233. [PMID] 24966526.
2013
Feature selection and machine learning with mass spectrometry data.
Methods in molecular biology (Clifton, N.J.). 1007:237-62 [DOI] 10.1007/978-1-62703-392-3_10. [PMID] 23666729.
2013
Genetic relationship between five psychiatric disorders estimated from genome-wide SNPs.
Nature genetics. 45(9):984-94 [DOI] 10.1038/ng.2711. [PMID] 23933821.
2013
svapls: an R package to correct for hidden factors of variability in gene expression studies.
BMC bioinformatics. 14 [DOI] 10.1186/1471-2105-14-236. [PMID] 23883280.
2012
Identification and characterization of nucleolin as a COUP-TFII coactivator of retinoic acid receptor β transcription in breast cancer cells.
PloS one. 7(5) [DOI] 10.1371/journal.pone.0038278. [PMID] 22693611.
2012
Surrogate variable analysis using partial least squares (SVA-PLS) in gene expression studies.
Bioinformatics (Oxford, England). 28(6):799-806 [DOI] 10.1093/bioinformatics/bts022. [PMID] 22238271.
2011
Meta analysis of Chronic Fatigue Syndrome through integration of clinical, gene expression, SNP and proteomic data.
Bioinformation. 6(3):120-4 [PMID] 21584188.
View on: PubMed
2011
Modeling microRNA-mRNA interactions using PLS regression in human colon cancer.
BMC medical genomics. 4 [DOI] 10.1186/1755-8794-4-44. [PMID] 21595958.
2011
pkDACLASS: Open source software for analyzing MALDI-TOF data.
Bioinformation. 6(1):45-7 [PMID] 21464846.
View on: PubMed
2011
Statistical inference methods for sparse biological time series data.
BMC systems biology. 5 [DOI] 10.1186/1752-0509-5-57. [PMID] 21518445.
2010
A statistical framework for differential network analysis from microarray data.
BMC bioinformatics. 11 [DOI] 10.1186/1471-2105-11-95. [PMID] 20170493.
2010
An adaptive optimal ensemble classifier via bagging and rank aggregation with applications to high dimensional data.
BMC bioinformatics. 11 [DOI] 10.1186/1471-2105-11-427. [PMID] 20716381.
2010
Feature selection and machine learning with mass spectrometry data.
Methods in molecular biology (Clifton, N.J.). 593:205-29 [DOI] 10.1007/978-1-60327-194-3_11. [PMID] 19957152.
2010
Statistical Analyses of Next Generation Sequence Data: A Partial Overview.
Journal of proteomics & bioinformatics. 3(6):183-190 [PMID] 21113236.
View on: PubMed
2009
Computational biology touches all bases. A report of the 6th Annual Rocky Mountain Bioinformatics Conference, Aspen, USA, 4-7 December 2008.
Genome biology. 10(2) [DOI] 10.1186/gb-2009-10-2-303. [PMID] 19232078.
2009
RankAggreg, an R package for weighted rank aggregation.
BMC bioinformatics. 10 [DOI] 10.1186/1471-2105-10-62. [PMID] 19228411.
2008
Fetal alcohol syndrome (FAS) in C57BL/6 mice detected through proteomics screening of the amniotic fluid.
Birth defects research. Part A, Clinical and molecular teratology. 82(4):177-86 [DOI] 10.1002/bdra.20440. [PMID] 18240165.
2008
Finding common genes in multiple cancer types through meta-analysis of microarray experiments: a rank aggregation approach.
Genomics. 92(6):400-3 [DOI] 10.1016/j.ygeno.2008.05.003. [PMID] 18565726.
2008
Reconstruction of genetic association networks from microarray data: a partial least squares approach.
Bioinformatics (Oxford, England). 24(4):561-8 [DOI] 10.1093/bioinformatics/btm640. [PMID] 18204062.
2007
Incorporation of biological knowledge into distance for clustering genes.
Bioinformation. 1(10):396-405 [PMID] 17597929.
View on: PubMed
2007
Predicting patient survival from microarray data by accelerated failure time modeling using partial least squares and LASSO.
Biometrics. 63(1):259-71 [PMID] 17447952.
View on: PubMed
2007
Weighted rank aggregation of cluster validation measures: a Monte Carlo cross-entropy approach.
Bioinformatics (Oxford, England). 23(13):1607-15 [PMID] 17483500.
View on: PubMed
2006
Biologically supervised hierarchical clustering algorithms for gene expression data.
Conference proceedings : … Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Annual Conference. 2006:5515-8 [PMID] 17947147.
View on: PubMed
2006
Evaluation of clustering algorithms for gene expression data.
BMC bioinformatics. 7 Suppl 4 [PMID] 17217509.
View on: PubMed
2006
Methods for evaluating clustering algorithms for gene expression data using a reference set of functional classes.
BMC bioinformatics. 7 [PMID] 16945146.
View on: PubMed
Unraveling city-specific signature and identifying sample origin locations for the data from CAMDA MetaSUB challenge
. [DOI] 10.21203/rs.2.20675/v1.

Grants

Sep 2021 ACTIVE
Informing the Emergency Care of Septic Shock Patients: A Novel Application of Data-Driven Analytics
Role: Other
Funding: NATL INST OF HLTH NIGMS
Jul 2021 ACTIVE
A novel mechanism of virulence control in Porphyromonas gingivalis
Role: Co-Investigator
Funding: NATL INST OF HLTH NIDCR
Feb 2021 ACTIVE
Energy Balance, mTOR pathway signaling, and breast cancer prognosis
Role: Co-Investigator
Funding: NATL INST OF HLTH NCI
Oct 2020 ACTIVE
Health Services Utilization of Autistic Youth: Are Therapeutic Services Associated with Reduced Acute Psychiatric Care?
Role: Principal Investigator
Funding: UNIV OF SOUTHERN CALIFORNIA via AMERICAN OCCUPATIONAL THERAPY FOUNDATION
Jul 2020 ACTIVE
Circulating microbiome and premature mortality in hemodialysis patients
Role: Principal Investigator
Funding: UNIV OF TENNESSEE KNOXVILLE via NATL INST OF HLTH NIDDK
Apr 2020 ACTIVE
The Role and Mechanisms of Lipid and Lipoprotein Dysregulation in Sepsis
Role: Co-Investigator
Funding: NATL INST OF HLTH NIGMS
Mar 2018 ACTIVE
OA Pathogenesis beyond Cartilage: A preclinical study of the sources of OA pain
Role: Co-Investigator
Funding: NATL INST OF HLTH NIAMS
Aug 2017 ACTIVE
Mechanisms of airway protection dysfunction in Parkinson's disease
Role: Project Manager
Funding: NATL INST OF HLTH NICHD
Dec 2015 ACTIVE
MEMBRANES OF THE DENTAL PATHOGEN STREPTOCOCCUS MUTANS
Role: Project Manager
Funding: NATL INST OF HLTH NIDCR
Sep 2015 – Feb 2017
Identification of Proteins from Mass Spectrometry Data: A Statistical Aproach
Role: Principal Investigator
Funding: UNIV OF LOUISVILLE via NATL INST OF HLTH
Sep 2015 – Aug 2016
Novel biomarker validation and dosing algorithms for anemia management in ESRD
Role: Principal Investigator
Funding: UNIVERSITY OF LOUISVILLE RES FOU via NATL INST OF HLTH NIDDK
Jul 2015 – Jun 2021
Finding Good TEMporal PostOperative pain Signatures (TEMPOS)
Role: Project Manager
Funding: NATL INST OF HLTH NIGMS

Education

PhD
1995 · University of Georgia
Postdoctoral Associate
1995 · Emory University

Teaching Profile

Courses Taught
2015-2021
PHC7979 Advanced Research
2016-2021
PHC7091 Advanced Biostatistical Methods II
2016-2017,2020
PHC7980 Research for Doctoral Dissertation
2018
CHM7980 Research for Doctoral Dissertation
2018
PHC6905 Independent Study
2016
PHC6917 Supervised Research Project

Contact Details

Phones:
Business:
(352) 294-5923
Emails: