Consumer price index from web-scraped data: analysis of specific product category

Peter Knížat, Helena Glaser-Opitzová, Statistical Office of the Slovak Republic, University of Economics in Bratislava, Slovak Republic

Type of article: scientific article
Strany: 37 – 49

Abstract

As a consequence of changes in the consumer behaviour, a consumer prefers to shop online. Statistical institutions responsible for the collection of prices of goods and services for the area of price statistics are obligated to reconsider the ttraditional collection of prices and in some cases potentially replace it with automated collection of prices through internet, also called web-scraping. The implementation of this type of data sources entails various challenges, from questions in the methodological field to a significant changes of data processing. This involves the processing of big data including the evaluation of their quality, the selection of representatives and the determination of prices of individual goods, which are usually scraped on a daily basis. Another challenge is the selection of methodology for estimating the consumer price index (CPI) that can be fundamentally different from CPI estimation methodology used in the traditional data collection. The aim of this study is to present a theoretical framework for the implementation of web-scraped data in the production for price statistics. In the case study, we used data for the product category of refrigerators, scraped from the comparison website heureka.sk.

Issue for download
PDF (2.6 MB, 132 downloads)