Somnath Datta

Somnath Datta

Professor

Department: PHHP-COM BIOSTATISTICS
Business Phone: (352) 294-5920
Business Email: somnath.datta@ufl.edu

About Somnath Datta

Somnath Datta received his undergraduate and master’s degrees in statistics from Indian Statistical Institute followed by a doctoral degree in Statistics and Probability from Michigan State University. He joined the Department of Biostatistics at University of Florida as a tenured full professor in July of 2015 under the preeminence initiative. Prior to that, he was Professor in the Statistics department at the University of Georgia and in the Bioinformatics & Biostatistics department at the University of Louisville. Over the years, he has published over one hundred and sixty research papers in various peer reviewed Statistics & Biostatistics journals. He develops novel statistical methods for analyzing public health, dental and biomedical data. His collaborative research interests include bioinformatics, spinal cord injury research, plant pathology and informatics based materials science research. His research projects have been supported by grants from the US National Institutes of Health, the National Science Foundation, and the National Security Agency. He is Elected Member of International Statistical Institute, Elected Fellow of American Statistical Association, and Elected Fellow of Institute of Mathematical Statistics. He has served as a director of over 20 doctoral dissertation committees.

Accomplishments

CDC ATSDR 2019 Statistical Science Award: Best Theoretical Paper, “Multisample adjusted U-Statistics that account for confounding covariates.” Satten, G. A., Kong, Maiying, and Datta, Somnath. Statistics in Medicine, 37, 3357– 3372
2019 · Centers for Disease Control and Prevention
President, International Indian Statistical Association (IISA)
2018 · IISA
Dean’s Citation Paper Award, College of Public Health and Health Professions, University of Florida.
2017 · College of Public Health and Health Professions, University of Florida.
Preeminence Hire in Genomic Medicine
2015 · University of Florida
University Scholar
2014 · University of Louisville
President's Distinguished Faculty Award in Research for Career Achievement
2013 · University of Louisville
CDC ATSDR 2011 Statistical Science Award: Best Theoretical Paper, “Inverse Probability of Censoring Weighted U-statistics for Right-Censored Data with an Application to Testing Hypotheses”, Datta, Somnath, Bandyopadhyay, Dipankar and Satten, Glen A., Scandinavian Journal of Statistics, 37, 680-700 (2010).
2011 · CDC
2010-2011 Faculty Favorite, “An Outstanding Professor Nominated by Students”
2011 · Delphi Center for Teaching and Learning, University of Louisville.
Elected Fellow, Institute of Mathematical Statistics. Citation: “For contributions to compound decision theory, bootstrap inference for Markov chains and time series, survival analysis and counting processes, and biostatistics and bioinformatics; and for editorial services to the profession.”
2010 · Institute of Mathematical Statistics.
Elected member, International Statistical Institute
2009 · International Statistical Institute
Elected Fellow, American Statistical Association. Citation: “For outstanding research in theoretical and applied statistics including decision theory, bootstrap theory, survival analysis and analysis of microarray data.”
2006 · American Statistical Association
CDC ATSDR 2005 Statistical Science Award: Best Application Paper, “Standardization and denoising algorithms for mass spectra to classify whole-organism bacterial specimens” by Satten, G. A., Datta, S., Moura, H., Woolfitt, A., Carvalho, G., De, B. K, Pavlopoulos, A., Carlone, G. M., and Barr, J., Bioinformatics, 20, 3128-3136 (2004).
2005 · CDC
CDC ATSDR 2004 Statistical Science Award: Best Theoretical Paper, “Marginal analyses of clustered data when cluster size is informative” by Williamson, J. M., Datta, S. and Satten, G. A, Biometrics, 59, 36-42 (2003).
2004 · CDC
CDC ATSDR 2001 Statistical Science Award: Best Theoretical Paper, “A simulate-update algorithm for missing data problems” by Satten, G. A. and Datta, S., Computational Statistics, 15, 243-277 (2000).
2001 · CDC
CDC ATSDR 1999 Statistical Science Award: Best Theoretical Paper, “A semiparametric approach to the proportional hazards model for interval censored data”, by Satten, G. A. and Datta, S. and Williamson, J. M., Journal of the American Statistical Association, 93, 318-327 (1998).
1999 · CDC

Research Profile

I have received broad training in theoretical, as well as, applied and computational statistics and have published extensively in several areas of statistics and bioinformatics. In 2014, I co-edited a book on statistical analysis of next generation sequencing data which has been favorably reviewed (Biometrics 72, p. 1008). Over the years, I have collaborated with scientists and statisticians from the CDC in Atlanta, and with scientists and engineers and from UAB, CSIRO (Australia), U. of Iowa, Iowa State, UNC Chapel Hill, U. of Kentucky, U. of Louisville, and locally at U. of Florida. Besides serving as a PI on multiple NIH/NIDCR grants, I have had numerous federal PI level grants from CDC, NSA and NFS.

Open Researcher and Contributor ID (ORCID)

0000-0003-4381-1842

Areas of Interest
  • Biostatistics

Publications

2021
Analyzing longitudinal clustered count data with zero inflation: Marginal modeling using the Conway-Maxwell-Poisson distribution.
Biometrical journal. Biometrische Zeitschrift. [DOI] 10.1002/bimj.202000061. [PMID] 33393147.
2020
Variance estimation in tests of clustered categorical data with informative cluster size.
Statistical methods in medical research. 29(11):3396-3408 [DOI] 10.1177/0962280220928572. [PMID] 32513073.
2020
Selection of the optimal personalized treatment from multiple treatments with multivariate outcome measures.
Journal of biopharmaceutical statistics. 30(3):462-480 [DOI] 10.1080/10543406.2019.1684304. [PMID] 31691633.
2020
A longitudinal Bayesian mixed effects model with hurdle Conway-Maxwell-Poisson distribution.
Statistics in medicine. [DOI] 10.1002/sim.8844. [PMID] 33368533.
2020
Innovative Long-Dose Neurorehabilitation for Balance and Mobility in Chronic Stroke: A Preliminary Case Series
Brain Sciences. 10(8) [DOI] 10.3390/brainsci10080555. [PMID] 32824012.
2019
A joint overdispersed marginalized random-effects model for analyzing two or more longitudinal ordinal responses.
Statistical methods in medical research. 28(1):50-69 [DOI] 10.1177/0962280217714616. [PMID] 28657455.
2019
A probability based method for selecting the optimal personalized treatment from multiple treatments.
Statistical methods in medical research. 28(3):749-760 [DOI] 10.1177/0962280217735701. [PMID] 29145777.
2019
Adjustments of multi-Sample U-statistics to right censored data and confounding covariates
. 135:1-14
2019
Estimation of average treatment effects among multiple treatment groups by using an ensemble approach.
Statistics in medicine. 38(15):2828-2846 [DOI] 10.1002/sim.8146. [PMID] 30941812.
2019
Integrating gene regulatory pathways into differential network analysis of gene expression data
Scientific Reports. 9(1) [DOI] 10.1038/s41598-019-41918-3. [PMID] 30940863.
2019
Personalized treatment selection using data from crossover designs with carry-over effects.
Statistics in medicine. 38(28):5391-5412 [DOI] 10.1002/sim.8372. [PMID] 31637762.
2018
Rank-based inference for covariate and group effects in clustered data in presence of informative intra-cluster group size.
Statistics in medicine. 37(30):4807-4822 [DOI] 10.1002/sim.7979. [PMID] 30232808.
2018
Multisample adjusted U-statistics that account for confounding covariates.
Statistics in medicine. 37(23):3357-3372 [DOI] 10.1002/sim.7825. [PMID] 29923344.
2018
Inferring marginal association with paired and unpaired clustered data.
Statistical methods in medical research. 27(6):1806-1817 [DOI] 10.1177/0962280216669184. [PMID] 27655806.
2018
Flexible semi-parametric regression of state occupational probabilities in a multistate model with right-censored data.
Lifetime data analysis. 24(3):464-491 [DOI] 10.1007/s10985-017-9403-6. [PMID] 28819787.
2018
Blood Pressure Signature Genes and Blood Pressure Response to Thiazide Diuretics: Results From the Pear and Pear-2 Studies
BMC Medical Genomics. 11(1) [DOI] 10.1186/s12920-018-0370-x. [PMID] 29925376.
2018
Unraveling Bacterial Fingerprints of City Subways From Microbiome 16S Gene Profiles
Biology direct. 13(1) [DOI] 10.1186/s13062-018-0215-8. [PMID] 29789016.
2018
A marginalized overdispersed location scale model for clustered ordinal data
. 80:S103-S134
2018
Analyzing clustered count data with a cluster specific random effect zero-inflated Conway-Maxwell-Poisson distribution.
Journal of applied statistics. 45(5):799-814 [DOI] 10.1080/02664763.2017.1312299. [PMID] 31080303.
2018
A Bayesian approach for analyzing zero-inflated clustered count data with dispersion.
Statistics in medicine. 37(5):801-812 [DOI] 10.1002/sim.7541. [PMID] 29108124.
2018
A Combined Pls and Negative Binomial Regression Model for Inferring Association Networks From Next-Generation Sequencing Count Data
IEEE/ACM transactions on computational biology and bioinformatics. 15(3):760-773 [DOI] 10.1109/TCBB.2017.2665495. [PMID] 28186904.
2018
Predicting Survival Times for Neuroblastoma Patients Using Rna-Seq Expression Profiles
Biology Direct. 13(1) [DOI] 10.1186/s13062-018-0213-x. [PMID] 29848365.
2018
A log rank test for clustered data with informative within-cluster group size.
Statistics in medicine. 37(27):4071-4082 [DOI] 10.1002/sim.7899. [PMID] 30003565.
2017
EAMA: Empirically adjusted meta-analysis for large-scale simultaneous hypothesis testing in genomic experiments.
PloS one. 12(10) [DOI] 10.1371/journal.pone.0187287. [PMID] 29088275.
2017
Whole transcriptome sequencing analyses reveal molecular markers of blood pressure response to Thiazide Diuretics
Scientific reports. 7(1) [DOI] 10.1038/s41598-017-16343-z. [PMID] 29167564.
2017
Temporal Prediction of Future State Occupation in a Multistate Model from High-Dimensional Baseline Covariates via Pseudo-Value Regression.
Journal of statistical computation and simulation. 87(7):1363-1378 [DOI] 10.1080/00949655.2016.1263992. [PMID] 29217870.
2017
Tests for informative cluster size using a novel balanced bootstrap scheme.
Statistics in medicine. 36(16):2630-2640 [DOI] 10.1002/sim.7288. [PMID] 28324913.
2017
optCluster: An R Package for Determining the Optimal Clustering Algorithm.
Bioinformation. 13(3):101-103 [DOI] 10.6026/97320630013101. [PMID] 28584451.
2017
Non-parametric regression in clustered multistate current status data with informative cluster size.
Statistica Neerlandica. 71(1):31-57 [DOI] 10.1111/stan.12099. [PMID] 28798498.
2017
Propensity scores based methods for estimating average treatment effect and average treatment effect among treated: A comparative study.
Biometrical journal. Biometrische Zeitschrift. 59(5):967-985 [DOI] 10.1002/bimj.201600094. [PMID] 28436047.
2016
A rank-sum test for clustered data when the number of subjects in a group within a cluster is informative.
Biometrics. 72(2):432-40 [DOI] 10.1111/biom.12447. [PMID] 26575695.
2016
Cluster adjusted regression for displaced subject data (CARDS): Marginal inference under potentially informative temporal cluster size profiles.
Biometrics. 72(2):441-51 [DOI] 10.1111/biom.12456. [PMID] 26682911.
2016
Inter-platform concordance of gene expression data for the prediction of chemical mode of action.
Biology direct. 11(1) [PMID] 27993158.
View on: PubMed
2016
Marginal regression models for clustered count data based on zero-inflated Conway-Maxwell-Poisson distribution with applications.
Biometrics. 72(2):606-18 [DOI] 10.1111/biom.12436. [PMID] 26575079.
2016
Applications of feature selection and regression techniques in materials design: A Tutorial
. 224-251
2015
GEE type inference for clustered zero-inflated negative binomial regression with application to dental caries.
Computational statistics & data analysis. 85:54-66 [PMID] 25620827.
View on: PubMed
2014
Inference on the marginal distribution of clustered data with informative cluster size.
Statistical papers (Berlin, Germany). 55(1):71-92 [PMID] 25878396.
View on: PubMed
2014
Differential network analysis in human cancer research
Current Pharmaceutical Design. 20(1):4-10 [DOI] 10.2174/138161282001140113122316. [PMID] 23530503.
2014
dna: an R package for differential network analysis
Bioinformation. 10(4):233-234 [DOI] 10.6026/97320630010233. [PMID] 24966526.
2014
Informatics guided discovery of surface structure-chemistry relationships in catalytic nanoparticles
Journal of Chemical Physics. 140(9) [DOI] 10.1063/1.4867010. [PMID] 24606374.
2014
Robust estimation of marginal regression parameters in clustered data.
Statistical modelling. 14(6):489-501 [PMID] 25848345.
View on: PubMed
2014
Informatics-aided band gap engineering for solar materials
Computational Materials Science. 83:185-195
2013
Cluster analysis: Finding groups in data
. 53-70
2013
Nonparametric regression of state occupation, entry, exit, and waiting times with multistate right-censored data.
Statistics in medicine. 32(17):3006-19 [DOI] 10.1002/sim.5703. [PMID] 23225570.
2013
svapls: an R package to correct for hidden factors of variability in gene expression studies.
BMC bioinformatics. 14 [DOI] 10.1186/1471-2105-14-236. [PMID] 23883280.
2012
A General Class of Signed Rank Tests for Clustered Data when the Cluster Size is Potentially Informative.
Journal of nonparametric statistics. 24(3):797-808 [PMID] 23074359.
View on: PubMed
2012
Dynamic longitudinal evaluation of the utility of the Berg Balance Scale in individuals with motor incomplete spinal cord injury.
Archives of physical medicine and rehabilitation. 93(9):1565-73 [DOI] 10.1016/j.apmr.2012.01.026. [PMID] 22920453.
2012
Longitudinal patterns of functional recovery in patients with incomplete spinal cord injury receiving activity-based rehabilitation.
Archives of physical medicine and rehabilitation. 93(9):1541-52 [DOI] 10.1016/j.apmr.2012.01.027. [PMID] 22920451.
2012
Surrogate variable analysis using partial least squares (SVA-PLS) in gene expression studies.
Bioinformatics (Oxford, England). 28(6):799-806 [DOI] 10.1093/bioinformatics/bts022. [PMID] 22238271.
2011
Inference for marginal linear models for clustered longitudinal data with potentially informative cluster sizes.
Statistical methods in medical research. 20(4):347-67 [DOI] 10.1177/0962280209347043. [PMID] 20223781.
2011
Marginal association measures for clustered data.
Statistics in medicine. 30(27):3181-91 [DOI] 10.1002/sim.4368. [PMID] 21953204.
2010
Statistical Analyses of Next Generation Sequence Data: A Partial Overview.
Journal of proteomics & bioinformatics. 3(6):183-190 [PMID] 21113236.
View on: PubMed
2010
Non-parametric estimation of state occupation, entry and exit times with multistate current status data.
Statistical methods in medical research. 19(2):147-65 [DOI] 10.1177/0962280208094278. [PMID] 18765503.
2010
A statistical framework for differential network analysis from microarray data using partial least squares
BMC bioinformatics. 11 [DOI] 10.1186/1471-2105-11-95. [PMID] 20170493.
2010
An adaptive optimal ensemble classifier via bagging and rank aggregation with applications to high dimensional data.
BMC bioinformatics. 11 [DOI] 10.1186/1471-2105-11-427. [PMID] 20716381.
2010
Comparison of state occupation, entry, exit and waiting times in two or more groups based on current status data in a multistate model.
Statistics in medicine. 29(7-8):906-14 [DOI] 10.1002/sim.3803. [PMID] 20213707.
2009
A multivariate examination of temporal changes in Berg variables for patients with AIS C and D spinal cord injuries
Archives of Physical Medicine and Rehabilitation. 90(7):1208-1217 [DOI] 10.1016/j.apmr.2008.09.577. [PMID] 19577035.
2009
Computational biology touches all bases. A report of the 6th Annual Rocky Mountain Bioinformatics Conference, Aspen, USA, 4-7 December 2008.
Genome biology. 10(2) [DOI] 10.1186/gb-2009-10-2-303. [PMID] 19232078.
2009
RankAggreg, an R package for weighted rank aggregation.
BMC bioinformatics. 10 [DOI] 10.1186/1471-2105-10-62. [PMID] 19228411.
2008
Testing Equality of Survival Distributions when the Population Marks are Missing.
Journal of statistical planning and inference. 138(6):1722-1732 [PMID] 19844606.
View on: PubMed
2008
Reconstruction of genetic association networks from microarray data: a partial least squares approach.
Bioinformatics (Oxford, England). 24(4):561-8 [DOI] 10.1093/bioinformatics/btm640. [PMID] 18204062.
2008
clValid, an R package for cluster validation
Journal of Statistical Software. 25(4)
2008
A signed-rank test for clustered data.
Biometrics. 64(2):501-7 [PMID] 17970820.
View on: PubMed
2008
Finding common genes in multiple cancer types through meta-analysis of microarray experiments: a rank aggregation approach.
Genomics. 92(6):400-3 [DOI] 10.1016/j.ygeno.2008.05.003. [PMID] 18565726.
2007
Predicting patient survival from microarray data by accelerated failure time modeling using partial least squares and LASSO.
Biometrics. 63(1):259-71 [PMID] 17447952.
View on: PubMed
2007
Weighted rank aggregation of cluster validation measures: a Monte Carlo cross-entropy approach.
Bioinformatics (Oxford, England). 23(13):1607-15 [PMID] 17483500.
View on: PubMed
2006
Biologically supervised hierarchical clustering algorithms for gene expression data.
Conference proceedings : … Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Annual Conference. 2006:5515-8 [PMID] 17947147.
View on: PubMed
2006
Evaluation of clustering algorithms for gene expression data.
BMC bioinformatics. 7 Suppl 4 [PMID] 17217509.
View on: PubMed
2006
Methods for evaluating clustering algorithms for gene expression data using a reference set of functional classes.
BMC bioinformatics. 7 [PMID] 16945146.
View on: PubMed
2006
Nonparametric estimation of stage occupation probabilities in a multistage model with current status data.
Biometrics. 62(3):829-37 [PMID] 16984326.
View on: PubMed
2005
Rank-sum tests for clustered data
Journal of the American Statistical Association. 100:908-915
2004
Standardization and denoising algorithms for mass spectra to classify whole-organism bacterial specimens.
Bioinformatics (Oxford, England). 20(17):3128-36 [PMID] 15217815.
View on: PubMed
2004
An empirical bayes adjustment to increase the sensitivity of detecting differentially expressed genes in microarray experiments.
Bioinformatics (Oxford, England). 20(2):235-42 [PMID] 14734315.
View on: PubMed
2003
Marginal analyses of clustered data when cluster size is informative.
Biometrics. 59(1):36-42 [PMID] 12762439.
View on: PubMed
2003
Comparisons and validation of statistical clustering techniques for microarray gene expression data
Bioinformatics (Oxford, England). 19(4):459-466 [DOI] 10.1093/bioinformatics/btg025. [PMID] 12611800.
2002
Estimation of integrated transition hazards and stage occupation probabilities for non-Markov systems under dependent censoring.
Biometrics. 58(4):792-802 [PMID] 12495133.
View on: PubMed
2002
Marginal estimation for multi-stage models: waiting time distributions and competing risks analyses.
Statistics in medicine. 21(1):3-19 [PMID] 11782047.
View on: PubMed
2001
The Kaplan-Meier Estimator as an Inverse-Probability-of-Censoring Weighted Average.
The American statistician. 55(3):207-210 [DOI] 10.1198/000313001317098185. [PMID] 28845048.
1991
On the consistency of posterior mixtures and its application
Annals of Statistics. 19:338-353

Grants

Oct 2020 ACTIVE
VA IPA_FY21_ Somnath Datta
Role: Principal Investigator
Funding: US DEPT OF VET AFF GAINESVILLE MED CTR
Sep 2020 ACTIVE
FY21 VA IPA Anyaso-Samuel
Role: Other
Funding: US DEPT OF VET AFF GAINESVILLE MED CTR
Sep 2019 – Jun 2020
VA IPA_FY20_ Somnath Datta
Role: Principal Investigator
Funding: US DEPT OF VET AFF GAINESVILLE MED CTR
Sep 2019 – Aug 2020
VA IPA – Tyler Grimes
Role: Other
Funding: US DEPT OF VET AFF GAINESVILLE MED CTR
Sep 2018 – Aug 2019
VA IPA_FY19_ Somnath Datta
Role: Principal Investigator
Funding: US DEPT OF VET AFF GAINESVILLE MED CTR
Sep 2018 – Aug 2019
VA IPA- Tyler Grimes
Role: Other
Funding: US DEPT OF VET AFF GAINESVILLE MED CTR
Aug 2018 ACTIVE
A Novel Analysis Plan for the Caries and Fluorosis Data from the Iowa Fluoride Study
Role: Principal Investigator
Funding: NATL INST OF HLTH NIDCR
Apr 2018 – Mar 2020
IISA 2018: From Data to Knowledge, Working for a Better World
Role: Principal Investigator
Funding: NATL SCIENCE FOU
Sep 2017 – Aug 2018
VA IPA- Datta Yr. 2
Role: Principal Investigator
Funding: US DEPT OF VET AFF GAINESVILLE MED CTR
Aug 2017 ACTIVE
Optimizing AAV Vectors for Central Nervous System transduction
Role: Co-Investigator
Funding: NATL INST OF HLTH NINDS
Apr 2017 – Mar 2018
STATISTICAL INFERENCE FOR BIOMEDICAL BIG DATA: THEORY, METHODS AND TOOLS
Role: Co-Investigator
Funding: NATL SCIENCE FOU
Mar 2017 – Feb 2019
Interactions between microglia and dopaminergic neurons regulates dopamine neurotransmission
Role: Co-Investigator
Funding: NATL INST OF HLTH NIDA
Sep 2016 – Aug 2017
VA IPA- Datta
Role: Principal Investigator
Funding: US DEPT OF VET AFF GAINESVILLE MED CTR
Aug 2016 – Jul 2020
Exploratory Statistical Analysis of Differential Network Behaviors Based on Gene Expression Atlas of Palate Development
Role: Principal Investigator
Funding: NATL INST OF HLTH NIDCR

Education

PhD
1988 · Michigan State University, East Lansing

Teaching Profile

Courses Taught
2018-2021
PHC7980 Research for Doctoral Dissertation
2016-2018,2020-2021
PHC7979 Advanced Research
2016-2021
PHC7066 Large Sample Theory

Contact Details

Phones:
Business:
(352) 294-5920
Emails: