Susmita Datta

Susmita Datta,

Professor

Department: PHHP-COM BIOSTATISTICS
Business Phone: (352) 294-5923
Business Email: susmita.datta@ufl.edu

About Susmita Datta

Professional Biography Susmita Datta has received her PhD degree in Statistics from the University of Georgia, Athens, Georgia, USA followed by a postdoctoral training in Biostatistics from the Emory University. She has joined the Department of Biostatistics at the University of Florida in 2015 with a Preeminent hire as a tenured Full Professor. Prior to that, she was a Distinguished Scholar and a Tenured Full Professor at the University of Louisville and at the Georgia State University as a tenured Associate Professor. She is a fellow of the American Statistical Association (ASA), an elected member of the International Statistical Institute (ISI), and fellow of the American Association for the Advancement of Science (AAAS). She is one of the three elected members of the International Indian Statistical Association (IISA), elected RECOMB member of ENAR of Biometric Society and was the elected President of Cacus for Women in Statistics in 2013. Her research area includes Biostatistics and Bioinformatics/Computational Biology. Her research contributions spans all ‘omics’ related high dimensional data such as RNA-sequencing, Single Cell RNA sequencing and mass spectrometry data for proteiomics, lipidomics, metabolomics and good old microarray data. In addition to that, her computing laboratory is involved in methodological and software development in clustering and classification techniques, statistical issues in population biology, systems biology, survival analysis, multi-state models and big data analytics. She has published a book on “Statistical Analysis of Proteomics, Metabolomics, and Lipidomics Data Using Mass Spectrometry” by Springer. Dr. Datta is widely (>100) published in peer reviewed journals. The National Science Foundation and the National Institutes of Health have continuously funded her work. Her constant involvement with Big and fat data made her interested in Data science. She has guided more than 47 students through their theses and dissertations. She promotes women in STEM fields.

Accomplishments

Affiliated Faculty Appointment
2023 · Department of Statistics at the University of Florida
Elected RECOMB Member
2021-2023 · ENAR Biometric Society
Elected member of the Board of Trustees
2020-2023 · International Indian Statistical Association (IISA)
Fellow
2014 · American Association for the Advancement of Science (AAAS)
Elected President
2013 · Caucus for Women in Statistics
Fellow
2012 · Fellow of American Statistical Association (ASA)
Fellow
2010 · International Statistical Institute (ISI)

Teaching Profile

Courses Taught
2015-2024
PHC7979 Advanced Research
2016-2024
PHC7091 Advanced Biostatistical Methods II
2016-2017,2020-2023,2022-2024
PHC7980 Research for Doctoral Dissertation
2018
CHM7980 Research for Doctoral Dissertation
2018,2022-2023
PHC6905 Independent Study
2016
PHC6917 Supervised Research Project

Research Profile

Methodological: Bioinformatics, Clustering and Classification, Genomics, Proteomics, Lipidomics, Single cell RNA sequencing data analysis, Network analysis,Infectious Disease Modeling, Non-linear Regression modeling for Systems Biology, Statistical Issues in Population Biology, Statistical Genetics, Systems Biology, Survival Analysis and Multi state models. Disease: Cancer, Autism, Alzheimer’s, Perkinsons, Infectious disease such as AIDS, COVID-2 and Zica related diseases. Expertise: • Biostatistics • Bioinformatics/Computational Biology • Genomics • Proteomics • Metabolomics/Lipidomics • Clustering and Classification • Population Biology • Survival Analysis • Nonparametrics • Personalized Medicine • Complex disease modeling and Biomarker identification in Cancer, Alzheimer, Pain and infectious diseases.

Open Researcher and Contributor ID (ORCID)

0000-0002-7408-699X

Publications

2023
Adaptive Sparse Multi-Block PLS Discriminant Analysis: An Integrative Method for Identifying Key Biomarkers from Multi-Omics Data.
Genes. 14(5) [DOI] 10.3390/genes14050961. [PMID] 37239321.
2023
asmbPLS: Adaptive Sparse Multi-block Partial Least Square for Survival Prediction using Multi-Omics Data.
bioRxiv : the preprint server for biology. [DOI] 10.1101/2023.04.03.535442. [PMID] 37066143.
2023
Association Between Recreational Physical Activity and mTOR Signaling Pathway Protein Expression in Breast Tumor Tissue
Cancer Research Communications. 3(3):395-403 [DOI] 10.1158/2767-9764.crc-22-0405.
2023
Association of air pollution with postmenopausal breast cancer risk in UK Biobank.
Breast cancer research : BCR. 25(1) [DOI] 10.1186/s13058-023-01681-w. [PMID] 37443054.
2023
Clustering single-cell multimodal omics data with jrSiCKLSNMF.
Frontiers in genetics. 14 [DOI] 10.3389/fgene.2023.1179439. [PMID] 37359367.
2023
DHCR7 Expression Predicts Poor Outcomes and Mortality From Sepsis.
Critical care explorations. 5(6) [DOI] 10.1097/CCE.0000000000000929. [PMID] 37332366.
2023
DHCR7 Expression Predicts Poor Outcomes and Mortality from Sepsis.
Research square. [DOI] 10.21203/rs.3.rs-2500497/v1. [PMID] 36778468.
2023
Inferring Cell–Cell Communications from Spatially Resolved Transcriptomics Data Using a Bayesian Tweedie Model
Genes. 14(7) [DOI] 10.3390/genes14071368. [PMID] 37510272.
2023
mTOR pathway candidate genes and energy intake interaction on breast cancer risk in Black women from the Women’s Circle of Health Study.
European journal of nutrition. 62(6):2593-2604 [DOI] 10.1007/s00394-023-03176-y. [PMID] 37209192.
2023
mTOR pathway candidate genes and obesity interaction on breast cancer risk in black women from the Women’s Circle of Health Study.
Cancer causes & control : CCC. 34(5):431-447 [DOI] 10.1007/s10552-022-01657-9. [PMID] 36790512.
2023
mTOR pathway candidate genes and physical activity interaction on breast cancer risk in black women from the women’s circle of health study.
Breast cancer research and treatment. 199(1):137-146 [DOI] 10.1007/s10549-023-06902-6. [PMID] 36882608.
2022
Factors affecting live birth rates in donor oocytes from commercial egg banks vs. program egg donors: an analysis of 40,485 cycles from the Society for Assisted Reproductive Technology registry in 2016-2018.
Fertility and sterility. 117(2):339-348 [DOI] 10.1016/j.fertnstert.2021.10.006. [PMID] 34802685.
2022
Impact of Skin Biopsy and Clinical-Pathologic Correlation in Dermatology Inpatient Consults.
Cureus. 14(8) [DOI] 10.7759/cureus.28534. [PMID] 36185900.
2022
Intraindividual Reliability of Opportunistic Computed Tomography-Assessed Adiposity and Skeletal Muscle Among Breast Cancer Patients.
JNCI cancer spectrum. 6(6) [DOI] 10.1093/jncics/pkac068. [PMID] 36222575.
2022
MarZIC: A Marginal Mediation Model for Zero-Inflated Compositional Mediators with Applications to Microbiome Data.
Genes. 13(6) [DOI] 10.3390/genes13061049. [PMID] 35741811.
2022
mTOR pathway gene expression in association with race and clinicopathological characteristics in Black and White breast cancer patients.
Discover. Oncology. 13(1) [DOI] 10.1007/s12672-022-00497-y. [PMID] 35608730.
2022
SAREV: A review on statistical analytics of single‐cell RNA sequencing data
WIREs Computational Statistics. 14(4) [DOI] 10.1002/wics.1558. [PMID] 36034329.
2022
Unraveling T Cell Responses for Long Term Protection of SARS-CoV-2 Infection.
Frontiers in genetics. 13 [DOI] 10.3389/fgene.2022.871164. [PMID] 35601483.
2021
A hypolipoprotein sepsis phenotype indicates reduced lipoprotein antioxidant capacity, increased endothelial dysfunction and organ failure, and worse clinical outcomes.
Critical care (London, England). 25(1) [DOI] 10.1186/s13054-021-03757-5. [PMID] 34535154.
2021
Body fatness and breast cancer risk in relation to phosphorylated mTOR expression in a sample of predominately Black women.
Breast cancer research : BCR. 23(1) [DOI] 10.1186/s13058-021-01458-z. [PMID] 34330319.
2021
Flexible Nasal Endoscopic Procedures in Family Medicine: Indications and Effectiveness.
Family medicine. 53(10):886-889 [DOI] 10.22454/FamMed.2021.332061. [PMID] 34780657.
2021
Magnesium dietary intake and physical activity in Type 2 diabetes by gender in White, African‐American and Mexican American: NHANES 2011‐2014
Endocrinology, Diabetes & Metabolism. 4(1) [DOI] 10.1002/edm2.203. [PMID] 33532626.
2021
Single-Cell Differential Network Analysis with Sparse Bayesian Factor Models.
Frontiers in genetics. 12 [DOI] 10.3389/fgene.2021.810816. [PMID] 35186014.
2021
Unraveling City-Specific Microbial Signatures and Identifying Sample Origins for the Data From CAMDA 2020 Metagenomic Geolocation Challenge.
Frontiers in genetics. 12 [DOI] 10.3389/fgene.2021.659650. [PMID] 34421984.
2021
Unraveling city-specific signature and identifying sample origin locations for the data from CAMDA MetaSUB challenge
Biology Direct. 16(1) [DOI] 10.1186/s13062-020-00284-1. [PMID] 33397406.
2020
A sparse Bayesian factor model for the construction of gene co-expression networks from single-cell RNA sequencing count data.
BMC bioinformatics. 21(1) [DOI] 10.1186/s12859-020-03707-y. [PMID] 32811424.
2020
Body fatness and mTOR pathway activation of breast cancer in the Women’s Circle of Health Study.
NPJ breast cancer. 6 [DOI] 10.1038/s41523-020-00187-4. [PMID] 33024820.
2020
COVID-19: Reduced Lung Function and Increased Psycho-emotional Stress.
Bioinformation. 16(4):293-296 [DOI] 10.6026/97320630016293. [PMID] 32773987.
2020
Does Community- or University-Based Residency Sponsorship Affect Graduate Perceived Preparation or Performance?
Journal of graduate medical education. 12(5):583-590 [DOI] 10.4300/JGME-D-19-00907.1. [PMID] 33149828.
2020
Early administration of steroids in the ambulance setting: Protocol for a type I hybrid effectiveness-implementation trial with a stepped wedge design.
Contemporary clinical trials. 97 [DOI] 10.1016/j.cct.2020.106141. [PMID] 32931918.
2020
Meta-analysis of cardiomyopathy-associated variants in troponin genes identifies loci and intragenic hot spots that are associated with worse clinical outcomes.
Journal of molecular and cellular cardiology. 142:118-125 [DOI] 10.1016/j.yjmcc.2020.04.005. [PMID] 32278834.
2020
Topical doxycycline monohydrate hydrogel 1% targeting proteases/PAR2 pathway is a novel therapeutic for atopic dermatitis.
Experimental dermatology. 29(12):1171-1175 [DOI] 10.1111/exd.14201. [PMID] 32997843.
2019
Bayesian Hierarchical Model for Protein Identifications.
Journal of applied statistics. 46(1):30-46 [DOI] 10.1080/02664763.2018.1454893. [PMID] 31105371.
2019
Detection of differentially expressed genes in discrete single-cell RNA sequencing data using a hurdle model with correlated random effects.
Biometrics. 75(4):1051-1062 [DOI] 10.1111/biom.13074. [PMID] 31009065.
2019
Identification of city specific important bacterial signature for the MetaSUB CAMDA challenge microbiome data.
Biology direct. 14(1) [DOI] 10.1186/s13062-019-0243-z. [PMID] 31340852.
2019
Medical schools, primary care and family medicine: clerkship directors’ perceptions of the current environment.
Family practice. 36(6):680-684 [DOI] 10.1093/fampra/cmz015. [PMID] 31329866.
2019
Membrane proteomic analysis reveals overlapping and independent functions of Streptococcus mutans Ffh, YidC1, and YidC2.
Molecular oral microbiology. 34(4):131-152 [DOI] 10.1111/omi.12261. [PMID] 31034136.
2019
What Are the Characteristics of Fourth-Year Medical Students With Higher Levels of Resilience?
PRiMER (Leawood, Kan.). 3 [DOI] 10.22454/PRiMER.2019.150381. [PMID] 32537593.
2018
A Combined PLS and Negative Binomial Regression Model for Inferring Association Networks from Next-Generation Sequencing Count Data.
IEEE/ACM transactions on computational biology and bioinformatics. 15(3):760-773 [DOI] 10.1109/TCBB.2017.2665495. [PMID] 28186904.
2018
Pilot Study of Metabolomics and Psychoneurological Symptoms in Women With Early Stage Breast Cancer.
Biological research for nursing. 20(2):227-236 [DOI] 10.1177/1099800417747411. [PMID] 29258398.
2018
Predicting survival times for neuroblastoma patients using RNA-seq expression profiles.
Biology direct. 13(1) [DOI] 10.1186/s13062-018-0213-x. [PMID] 29848365.
2018
Profiling the effects of short time-course cold ischemia on tumor protein phosphorylation using a Bayesian approach.
Biometrics. 74(1):331-341 [DOI] 10.1111/biom.12742. [PMID] 28742267.
2018
Unraveling bacterial fingerprints of city subways from microbiome 16S gene profiles.
Biology direct. 13(1) [DOI] 10.1186/s13062-018-0215-8. [PMID] 29789016.
2017
A novel statistical approach for identification of the master regulator transcription factor.
BMC bioinformatics. 18(1) [DOI] 10.1186/s12859-017-1499-x. [PMID] 28148240.
2017
EAMA: Empirically adjusted meta-analysis for large-scale simultaneous hypothesis testing in genomic experiments
PLOS ONE. 12(10) [DOI] 10.1371/journal.pone.0187287. [PMID] 29088275.
2017
Monotonic single-index models to assess drug interactions.
Statistics in medicine. 36(4):655-670 [DOI] 10.1002/sim.7158. [PMID] 27804146.
2017
optCluster: An R Package for Determining the Optimal Clustering Algorithm.
Bioinformation. 13(3):101-103 [DOI] 10.6026/97320630013101. [PMID] 28584451.
2017
Temporal Prediction of Future State Occupation in a Multistate Model from High-Dimensional Baseline Covariates via Pseudo-Value Regression.
Journal of statistical computation and simulation. 87(7):1363-1378 [DOI] 10.1080/00949655.2016.1263992. [PMID] 29217870.
2016
Exploring the importance of cancer pathways by meta-analysis of differential protein expression networks in three different cancers.
Biology direct. 11(1) [PMID] 27993151.
2016
Improving protein identification from tandem mass spectrometry data by one-step methods and integrating data from other platforms.
Briefings in bioinformatics. 17(2):262-9 [DOI] 10.1093/bib/bbv043. [PMID] 26141827.
2016
Inter-platform concordance of gene expression data for the prediction of chemical mode of action.
Biology direct. 11(1) [PMID] 27993158.
2014
Differential network analysis in human cancer research.
Current pharmaceutical design. 20(1):4-10 [PMID] 23530503.
2014
dna: An R package for differential network analysis.
Bioinformation. 10(4):233-4 [DOI] 10.6026/97320630010233. [PMID] 24966526.
2013
Feature selection and machine learning with mass spectrometry data.
Methods in molecular biology (Clifton, N.J.). 1007:237-62 [DOI] 10.1007/978-1-62703-392-3_10. [PMID] 23666729.
2013
svapls: an R package to correct for hidden factors of variability in gene expression studies.
BMC bioinformatics. 14 [DOI] 10.1186/1471-2105-14-236. [PMID] 23883280.
2012
Surrogate variable analysis using partial least squares (SVA-PLS) in gene expression studies.
Bioinformatics (Oxford, England). 28(6):799-806 [DOI] 10.1093/bioinformatics/bts022. [PMID] 22238271.
2011
Meta analysis of Chronic Fatigue Syndrome through integration of clinical, gene expression, SNP and proteomic data.
Bioinformation. 6(3):120-4 [PMID] 21584188.
2011
Modeling microRNA-mRNA interactions using PLS regression in human colon cancer.
BMC medical genomics. 4 [DOI] 10.1186/1755-8794-4-44. [PMID] 21595958.
2011
pkDACLASS: Open source software for analyzing MALDI-TOF data.
Bioinformation. 6(1):45-7 [PMID] 21464846.
2011
Statistical inference methods for sparse biological time series data.
BMC systems biology. 5 [DOI] 10.1186/1752-0509-5-57. [PMID] 21518445.
2010
A statistical framework for differential network analysis from microarray data.
BMC bioinformatics. 11 [DOI] 10.1186/1471-2105-11-95. [PMID] 20170493.
2010
An adaptive optimal ensemble classifier via bagging and rank aggregation with applications to high dimensional data.
BMC bioinformatics. 11 [DOI] 10.1186/1471-2105-11-427. [PMID] 20716381.
2010
Feature selection and machine learning with mass spectrometry data.
Methods in molecular biology (Clifton, N.J.). 593:205-29 [DOI] 10.1007/978-1-60327-194-3_11. [PMID] 19957152.
2010
Statistical Analyses of Next Generation Sequence Data: A Partial Overview.
Journal of proteomics & bioinformatics. 3(6):183-190 [PMID] 21113236.
2009
Computational biology touches all bases. A report of the 6th Annual Rocky Mountain Bioinformatics Conference, Aspen, USA, 4-7 December 2008.
Genome biology. 10(2) [DOI] 10.1186/gb-2009-10-2-303. [PMID] 19232078.
2009
RankAggreg, an R package for weighted rank aggregation.
BMC bioinformatics. 10 [DOI] 10.1186/1471-2105-10-62. [PMID] 19228411.
2008
Fetal alcohol syndrome (FAS) in C57BL/6 mice detected through proteomics screening of the amniotic fluid.
Birth defects research. Part A, Clinical and molecular teratology. 82(4):177-86 [DOI] 10.1002/bdra.20440. [PMID] 18240165.
2008
Finding common genes in multiple cancer types through meta-analysis of microarray experiments: a rank aggregation approach.
Genomics. 92(6):400-3 [DOI] 10.1016/j.ygeno.2008.05.003. [PMID] 18565726.
2008
Reconstruction of genetic association networks from microarray data: a partial least squares approach.
Bioinformatics (Oxford, England). 24(4):561-8 [DOI] 10.1093/bioinformatics/btm640. [PMID] 18204062.
2007
Incorporation of biological knowledge into distance for clustering genes.
Bioinformation. 1(10):396-405 [PMID] 17597929.
2007
Predicting patient survival from microarray data by accelerated failure time modeling using partial least squares and LASSO.
Biometrics. 63(1):259-71 [PMID] 17447952.
2007
Weighted rank aggregation of cluster validation measures: a Monte Carlo cross-entropy approach.
Bioinformatics (Oxford, England). 23(13):1607-15 [PMID] 17483500.
2006
Biologically supervised hierarchical clustering algorithms for gene expression data.
Conference proceedings : … Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Annual Conference. 2006:5515-8 [PMID] 17947147.
2006
Evaluation of clustering algorithms for gene expression data.
BMC bioinformatics. 7 Suppl 4(Suppl 4) [PMID] 17217509.
2006
Methods for evaluating clustering algorithms for gene expression data using a reference set of functional classes.
BMC bioinformatics. 7 [PMID] 16945146.
Unraveling city-specific signature and identifying sample origin locations for the data from CAMDA MetaSUB challenge
. [DOI] 10.21203/rs.2.20675/v1.

Grants

Aug 2023 ACTIVE
Clinical, Biochemical, and Microbiological Effects of Constipation Treatment in Patients with Chronic Kidney Disease: A Pilot Feasibility Trial
Role: Principal Investigator
Funding: UNIV OF TENNESSEE HEALTH SCIENCE CTR via NATL INST OF HLTH NIDDK
Aug 2023 ACTIVE
Effects of deep brain stimulation (DBS) on laryngeal function and associated behaviors in Parkinson Disease
Role: Co-Investigator
Funding: NATL INST OF HLTH NIDCD
May 2022 ACTIVE
Energy balance, mTOR pathway signaling, and breast cancer prognosis
Role: Co-Investigator
Funding: OHIO STATE UNIV via NATL INST OF HLTH NCI
Sep 2021 ACTIVE
Informing the Emergency Care of Septic Shock Patients: A Novel Application of Data-Driven Analytics
Role: Other
Funding: NATL INST OF HLTH NIGMS
Jul 2021 ACTIVE
A novel mechanism of virulence control in Porphyromonas gingivalis
Role: Co-Investigator
Funding: NATL INST OF HLTH NIDCR
Feb 2021 – Mar 2022
Energy Balance, mTOR pathway signaling, and breast cancer prognosis
Role: Co-Investigator
Funding: NATL INST OF HLTH NCI
Oct 2020 – Dec 2022
Health Services Utilization of Autistic Youth: Are Therapeutic Services Associated with Reduced Acute Psychiatric Care?
Role: Principal Investigator
Funding: UNIV OF SOUTHERN CALIFORNIA via AMERICAN OCCUPATIONAL THERAPY FOUNDATION
Jul 2020 – Feb 2024
Circulating microbiome and premature mortality in hemodialysis patients
Role: Principal Investigator
Funding: UNIV OF TENNESSEE via NATL INST OF HLTH NIDDK
Apr 2020 ACTIVE
The Role and Mechanisms of Lipid and Lipoprotein Dysregulation in Sepsis
Role: Co-Investigator
Funding: NATL INST OF HLTH NIGMS
Mar 2018 ACTIVE
OA Pathogenesis beyond Cartilage: A preclinical study of the sources of OA pain
Role: Co-Investigator
Funding: NATL INST OF HLTH NIAMS
Aug 2017 – May 2023
Mechanisms of airway protection dysfunction in Parkinson's disease
Role: Project Manager
Funding: NATL INST OF HLTH NICHD
Dec 2015 – Jun 2022
MEMBRANES OF THE DENTAL PATHOGEN STREPTOCOCCUS MUTANS
Role: Project Manager
Funding: NATL INST OF HLTH NIDCR
Sep 2015 – Feb 2017
Identification of Proteins from Mass Spectrometry Data: A Statistical Aproach
Role: Principal Investigator
Funding: UNIV OF LOUISVILLE via NATL INST OF HLTH
Sep 2015 – Aug 2016
Novel biomarker validation and dosing algorithms for anemia management in ESRD
Role: Principal Investigator
Funding: UNIVERSITY OF LOUISVILLE RES FOU via NATL INST OF HLTH NIDDK
Jul 2015 – Jun 2021
Finding Good TEMporal PostOperative pain Signatures (TEMPOS)
Role: Project Manager
Funding: NATL INST OF HLTH NIGMS

Education

PhD
1995 · University of Georgia
Postdoctoral Associate
1995 · Emory University

Contact Details

Phones:
Business:
(352) 294-5923
Emails:
Addresses:
Business Mailing:
PO BOX 117450
GAINESVILLE FL 326110001
Business Street:
CTRB
2004 MOWRY RD., 5TH FLR.
DEPARTMENT OF BIOSTATISTICS
GAINESVILLE FL 32610