Http requests are conveniently divided into few types that perform distinct function: Let's take a look at exactly that! Request Method When it comes to web-scraping we don't exactly need to know every little detail about http requests and responses, however it's good to have a general overview and to know which parts of this protocol are especially useful in web-scraping. Let's take a quick look at each of these components, what they mean and how are they relevant in web scraping. Most of the web is served over http protocol which is rather simple: we (the client) send a request for a specific document to the website (the server), once the server processes our request it replies with the requested document - a very straight forward exchange!Īs you can see in this illustration: we send a request object which consists of method (aka type), location and headers, in turn we receive a response object which consists of status code, headers and document content itself. To collect data from a public resource, we need to establish connection with it first.
NodeJS has many HTTP clients, however by far the most popular one is axios so in this section we'll be sticking with it as it provides most of the necessary functions for web scraping: cookie tracking and easy form/json requests. Vital part of web-scraping is establishing connection with our web targets and for that we'll need an HTTP client.
#Web scraping with nodejs install#
For connection, we'll be using axios HTTP client and for parsing we'll focus on cheerio HTML tree parser, let's install them using these command line instructions: $ mkdir scrapfly-etsy-scraper
In this article we'll focus on few tools in particular.