6.3.1.2. Request Section

This section contains settings related to the HTTP requests that are made to the target site to retrieve information.

6.3.1.2.1. Cookies

You can provide cookies [1] that will be attached to every request that is sent to the target site. For example, you can provide a session cookie to crawl a site by a logged-in user. Each cookie should have two values:

Cookie name
Name of the cookie.
Cookie content
Value of the cookie.

Note

The cookies will be added to every request sent to the target site, including the requests sent by clicking to button and by Visual Inspector.

The cookies stored by a website in your computer can be seen by using your Internet browser. For example, in Chrome, you can click the lock button that is located on the left of your address bar and then click Cookies (See: Fig. 6.5) to see all of the cookies stored by that web site in your computer.

../../../_images/chrome-cookies.png

Fig. 6.5 Opening Cookies window in Chrome browser.

Next, you can see all of the cookies and their details by just selecting them.

../../../_images/chrome-cookie-details.png

Fig. 6.6 Showing cookie details in Chrome browser.

Fig. 6.6 shows the details of a cookie whose name starts with wordpress_. When that cookie is selected, its details are displayed at the bottom of the window. Among the details, you only need Name and Content. You can enter these values into Cookie name and Cookie content inputs of this setting, respectively.

Tip

If you use a browser other than Chrome, you can easily find how to display cookies in your browser just by searching how to see cookies in X by using your favorite search engine, where X should be replaced with your browser’s name, e.g. how to see cookies in Firefox.

How to know which cookies to use

Because cookies are created by the site you visit, their name, content, and usage purposes are defined by that site. Therefore, it is not possible to know the answer of this question. If you are not sure about which cookies to use, then you can try to use them all.

6.3.1.2.1.1. Importing all cookies

You can import all cookies used by a site. Simply, follow these steps:

  1. Open the Developer Tools [2] of your browser when viewing the site whose cookies you want, as shown in Fig. 6.7
  2. Go to the Network tab of the Developer Tools
  3. Refresh the current browser tab
  4. In the Network tab of the Developer Tools, you will see a list of requests as shown in Fig. 6.7. Click to the one at the top. A window will be opened next to it. Go to Headers tab as shown in Fig. 6.7.
  5. In the Headers tab, find the header named as Cookie, right click to it, and copy its value.
  6. Go to the Cookies setting of the plugin, click its button, and paste the text you copied into the input shown under the button as shown in Fig. 6.7
  7. Click to the button again. The cookies will be imported as shown in Fig. 6.8. If the setting has other cookies, they will not be removed.
  8. Optionally, remove the cookies that you do not want from the Cookies setting.
../../../_images/cookies-copy-network-tab.png

Fig. 6.7 Copying all cookies from the Network tab of the developer tools of Chrome browser and pasting them into the Import area. Open the image in a new tab to see it bigger.

../../../_images/cookies-import.png

Fig. 6.8 Cookies imported after button is clicked

6.3.1.2.2. Headers

Enter the HTTP request headers [3] that will be attached to every request that is sent to the target site.

Header name
Enter the name of the HTTP header
Header value
Enter the value of the HTTP header

Note

If you define a Cookie header in this setting, its value will be overridden by the cookies defined in the Cookies setting.

Request headers are used for a variety of purposes. For example, a website might require a specific request header to be defined so that the site sends a valid response, such as an authentication header. Another example is that a site might be using a request header to determine the language of the site. A third example is that a site might be preventing certain requests that do not contain a specific request header, with the purpose of preventing crawlers. These are just a few use-cases. In such cases where you need to send a request header to the target site, you can define the header in this setting.

6.3.1.2.2.1. Importing all request headers

You can import all request headers sent to a site by your browser into the Headers setting. Simply, follow these steps:

  1. Open the Developer Tools [2] of your browser when viewing the site whose request headers you want, as shown in Fig. 6.9
  2. Go to the Network tab of the Developer Tools
  3. Refresh the current browser tab
  4. In the Network tab of the Developer Tools, you will see a list of requests as shown in Fig. 6.9. Click to the one at the top. A window will be opened next to it. Go to Headers tab as shown in Fig. 6.9.
  5. In the Headers tab, select and copy the values that are shown under the Request headers section, as shown in Fig. 6.9.
  6. Go to the Headers setting of the plugin, click its button, and paste the text you copied into the input shown under the button, as shown in Fig. 6.9
  7. Click to the button again. The request headers will be imported as shown in Fig. 6.10. If the setting has other headers, they will not be removed.
  8. If there is a Cookie header, remove it from the Headers setting. The cookies defined in Cookies setting will override the Cookie header defined in this setting. You can use the Cookies setting to define the cookies.
  9. Optionally, remove the headers that you do not want from the Headers setting.
../../../_images/headers-copy-network-tab.png

Fig. 6.9 Copying all request headers from the Network tab of the developer tools of Chrome browser and pasting them into the Import area. Open the image in a new tab to see it bigger.

../../../_images/headers-import.png

Fig. 6.10 Request headers imported after button is clicked

Footnotes

[1]https://developer.mozilla.org/en-US/docs/Web/HTTP/Cookies
[2](1, 2) https://developers.google.com/web/tools/chrome-devtools#open
[3]https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers