Silvia Komara, Michal Páleš, Faculty of Economic Informatics, University of Economics in Bratislava, Slovak Republic
Type of article: informative article
Pages: 55 – 68
Abstract
The paper focuses on presenting the basic attributes of web scraping in the context of currently used terms such as new sources of statistics, big data, machine learning, artificial intelligence, Business Intelligence, etc. It describes the Python language’s options for downloading data from the Internet and modules in which this process can be executed. It is also specifically dedicated to connecting the field of machine learning with web scraping. In a practical demonstration, we present the functionality of the Python language for scraping data from the PDF documents.