Missing data imputation using Bayesian data modelling

15th October 2024

4/2024 scientific articles

Roman Pavelka, Statistical Office of the Slovak Republic, Slovak Republic

Type of article: scientific article
Pages: 21 – 41

Abstract

Why are missing data a problem? Because common statistical methods and software assume that all values for all variables in the data matrix are observed for all units participating in the statistical survey. The default method of dealing with nonresponse in all statistical software is to simply delete cases with missing data for the indicators of interest. The most obvious disadvantage of list (unit) removal is that it often deletes a large portion of the sample of collected statistical data. Removal of the collected data not suitable for further statistical processing can lead to a serious loss of statistical power of the analyses. Researchers are understandably reluctant to discard data that they have spent a lot of time, money and effort collecting, so various methods of ‘rescuing’ cases with missing data have become popular. The Bayesian inference has become a modern method for completing incomplete data over the last few decades. The Bayesian probability and statistics is far more than the well-known Bayesian formula and its occasional use in demonstrative or illustrative examples in explaining operations with probabilities of random phenomena. The Bayesian’ probability formula (also called the law of inverse probability) is primarily used in the context of making judgments about an unknown model based on known data. This provides opportunities for using the Bayesian’ formula in imputing unobserved data (unknown model) based on observed data.

Issue for download
PDF (2.6 MB, 323 downloads)

Number of views: 102

author Roman PAVELKA, Bayesian inference, data imputation, inverse probability, mechanism of missingness

Missing data imputation using Bayesian data modelling

Abstract

STATISTICAL OFFICE OF THE SR

INFORMATION SERVICE