Statistical Models and Diagnostics Tools for Spatially Correlated Skewed and Heterogeneous Data

Proyecto: Proyecto de Investigación

Detalles del proyecto

Description

Skewed and heterogeneous data are often observed in applied research, e.g., abundance of a species related to habitat suitability in ecological studies, or number of hospitalizations in health services research. Modelling such data can be further complicated when data are geographically clustered due to unmeasured regional characteristics or repeatedly collected over time. Logarithmic or square-root transformations are often used to achieve normality and then a linear regression can be applied to the transformed data. However, in some contexts, the data contain both an abundance of zeros and high extreme values, so normality may not be easily achieved by any forms of transformation. Moreover, the transformed response variable is operated on a different scale that can mask the important information of the original response variable. Survival data is also often highly skewed, which can have cure fraction (i.e., a substantial portion of subjects never fail) as well as complex event types (i.e. multi-state or competing risk events). In recent years, there has been growing interest to capture spatial patterns in survival times for determining the possible factors that contribute towards such variability. Traditional parametric survival models cannot account for cure fraction, multiple event types as well as spatial correlation at the same time. To fill the gap in analytical methods for handling this sort of disparate data, one key focus of my research program is to develop models for spatially correlated skewed outcomes that do not require transformation of the data, but rather can be applied to the data on their original scale. The proposed modeling methods will improve the model prediction and accuracy of parameter estimates. Model diagnostics is an essential step to ensure the validity of the model, but it has been very challenging to diagnose models for skewed and heterogeneous data with discreteness or incomplete information due to censoring, partly because the traditional residuals have complicated reference distributions that are dependent on the parameters in the model. To fill this gap, we recently extended randomized quantile residuals for diagnosing zero-inflated mixed-effects models and parametric survival models. This method will be further developed to diagnose spatial and spatial-temporal models and spatial survival models. The extended model diagnosis methods will guide researchers developing better models and drawing more reliable conclusion from their data. The proposed theoretical work is driven by real-life projects, with applicability to various aspects of applied research. The research outcomes are anticipated to contribute to statistical theory and practice by providing feasible, efficient, and robust approaches, and the associated training will produce highly qualified statisticians. Packages in R will be made available to assist in the dissemination and implementation of the proposed models to a wide audience.

EstadoActivo
Fecha de inicio/Fecha fin1/1/23 → …

Financiación

  • Natural Sciences and Engineering Research Council of Canada: US$ 23.714,00

ASJC Scopus Subject Areas

  • Statistics and Probability