7.8.1. Advanced
This tab of the general settings contain advanced settings about the plugin.
7.8.1.1. Always use UTF-8 encoding?
If you want to crawl all pages in UTF-8 [1] encoding, check this. You should not get any character encoding problems in your language when the encoding is UTF-8.
Although using UTF-8 [1] is definitely suitable for almost any site, they sometimes prefer not to use it. This setting ensures that the encoding for HTML is UTF-8 so that you do not get character encoding problems. If you get character encoding problems, such as characters shown as squares or question marks, you can try to convert the encoding to UTF-8 by using Convert encoding to UTF-8 when it is not UTF-8 setting.
7.8.1.2. Convert encoding to UTF-8 when it is not UTF-8
This setting is visible only if Always use UTF-8 encoding? is checked. When you check this, the plugin will try to convert the character encoding of the source code of the target web site to UTF-8 [1] when the encoding is not UTF-8.
Note
This setting does not guarantee that the encoding problems will be fixed. That is because character encodings of certain web sites cannot be fixed by just converting it to UTF-8. This setting just tries to fix the problems. When you think of it, a web site whose character encoding problems cannot be fixed even by converting it to UTF-8 is just badly designed.
7.8.1.3. HTTP User Agent
You can set the value of HTTP User-Agent [2] header that will be sent with the request that is made to the target web site to retrieve its source code.
Tip
Web sites can use this information when showing you their pages. They can also use this to not show you certain pages, such as limiting your access to their web site.
If you are looking for User-Agent strings, you can take a look at this page.
7.8.1.4. HTTP Accept
You can set HTTP Accept [3] header’s value in this setting. The default value of this
setting is text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
.
7.8.1.6. Disable SSL verification?
If you do not want to verify the SSL certificate of the target sites, check this.
Warning
If you enable this setting, you can no longer trust that the responses are retrieved from the target site. Before you enable this setting, it is recommended that you research on what SSL certificates are used for, to better understand the implications of disabling the SSL verification.
7.8.1.7. Connection timeout (in seconds)
Maximum number of seconds in which target server should response. You can enter 0
to not limit.
With this setting, you can limit the response time so that your server is not kept busy waiting a response from the target web page for a long time. When a web site does not send a response within the defined number of seconds, the plugin will simply stop waiting for a response. For example, if the request is for a post page, then that post will not be saved because its source code could not be retrieved.
Footnotes
[1] | (1, 2, 3) UTF-8 is the standard encoding of HTML. For more information, see the Wikipedia page. To learn how many languages Unicode supports, see its FAQ page. |
[2] | HTTP User-Agent is a text that sends certain information, such as application type, operating system, software vendor or software version, with the request made to the target site. See User-Agent at Mozilla for more information. |
[3] | HTTP Accept is a text that sends which content types the client is able to understand. See Accept at Mozilla for more information. |
[4] | An HTTP cookie is a piece of information that is saved into your computer by a web site to, for example, remember your user login or track you. See HTTP Cookie at Mozilla for more information. |