The use of the Python language in web scraping

Silvia Komara, Michal Páleš, Faculty of Economic Informatics, University of Economics in Bratislava, Slovak Republic


Type of article: informative article

Pages: 55 – 68

Abstract

The paper focuses on presenting the basic attributes of web scraping in the context of currently used terms such as new sources of statistics, big data, machine learning, artificial intelligence, Business Intelligence, etc. It describes the Python language’s options for downloading data from the Internet and modules in which this process can be executed. It is also specifically dedicated to connecting the field of machine learning with web scraping. In a practical demonstration, we present the functionality of the Python language for scraping data from the PDF documents.

Issue for download
PDF (3.1 MB, 81 downloads)