6.3.1.2. Request Section
This section contains settings related to the HTTP requests that are made to the target site to retrieve information.
6.3.1.2.1. Cookies
You can provide cookies [1] that will be attached to every request that is sent to the target site. For example, you can provide a session cookie to crawl a site by a logged-in user. Each cookie should have two values:
- Cookie name
- Name of the cookie.
- Cookie content
- Value of the cookie.
Note
The cookies will be added to every request sent to the target site, including the requests sent by clicking to button and by Visual Inspector.
The cookies stored by a website in your computer can be seen by using your Internet browser. For
example, in Chrome, you can click the lock button that is located on the left of your address bar
and then click Cookies
(See: Fig. 6.5) to see all of the cookies stored by
that web site in your computer.
Next, you can see all of the cookies and their details by just selecting them.
Fig. 6.6 shows the details of a cookie whose name starts with
wordpress_
. When that cookie is selected, its details are displayed at the bottom of the
window. Among the details, you only need Name
and Content
. You can enter these values
into Cookie name
and Cookie content
inputs of this setting, respectively.
Tip
If you use a browser other than Chrome, you can easily find how to display cookies in your
browser just by searching how to see cookies in X
by using your favorite search engine,
where X
should be replaced with your browser’s name, e.g. how to see cookies in Firefox
.
6.3.1.2.1.1. Importing all cookies
You can import all cookies used by a site. Simply, follow these steps:
- Open the Developer Tools [2] of your browser when viewing the site whose cookies you want, as shown in Fig. 6.7
- Go to the Network tab of the Developer Tools
- Refresh the current browser tab
- In the Network tab of the Developer Tools, you will see a list of requests as shown in Fig. 6.7. Click to the one at the top. A window will be opened next to it. Go to Headers tab as shown in Fig. 6.7.
- In the Headers tab, find the header named as Cookie, right click to it, and copy its value.
- Go to the Cookies setting of the plugin, click its button, and paste the text you copied into the input shown under the button as shown in Fig. 6.7
- Click to the button again. The cookies will be imported as shown in Fig. 6.8. If the setting has other cookies, they will not be removed.
- Optionally, remove the cookies that you do not want from the Cookies setting.
6.3.1.2.2. Headers
Enter the HTTP request headers [3] that will be attached to every request that is sent to the target site.
- Header name
- Enter the name of the HTTP header
- Header value
- Enter the value of the HTTP header
Note
If you define a Cookie
header in this setting, its value will be overridden by the
cookies defined in the Cookies setting.
Request headers are used for a variety of purposes. For example, a website might require a specific request header to be defined so that the site sends a valid response, such as an authentication header. Another example is that a site might be using a request header to determine the language of the site. A third example is that a site might be preventing certain requests that do not contain a specific request header, with the purpose of preventing crawlers. These are just a few use-cases. In such cases where you need to send a request header to the target site, you can define the header in this setting.
6.3.1.2.2.1. Importing all request headers
You can import all request headers sent to a site by your browser into the Headers setting. Simply, follow these steps:
- Open the Developer Tools [2] of your browser when viewing the site whose request headers you want, as shown in Fig. 6.9
- Go to the Network tab of the Developer Tools
- Refresh the current browser tab
- In the Network tab of the Developer Tools, you will see a list of requests as shown in Fig. 6.9. Click to the one at the top. A window will be opened next to it. Go to Headers tab as shown in Fig. 6.9.
- In the Headers tab, select and copy the values that are shown under the Request headers section, as shown in Fig. 6.9.
- Go to the Headers setting of the plugin, click its button, and paste the text you copied into the input shown under the button, as shown in Fig. 6.9
- Click to the button again. The request headers will be imported as shown in Fig. 6.10. If the setting has other headers, they will not be removed.
- If there is a
Cookie
header, remove it from the Headers setting. The cookies defined in Cookies setting will override theCookie
header defined in this setting. You can use the Cookies setting to define the cookies. - Optionally, remove the headers that you do not want from the Headers setting.
Footnotes
[1] | https://developer.mozilla.org/en-US/docs/Web/HTTP/Cookies |
[2] | (1, 2) https://developers.google.com/web/tools/chrome-devtools#open |
[3] | https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers |