3.50. Dealing with character encoding problems

In case that the characters inside the contents retrieved from the target site are not shown correctly, you can try the following:

  1. First, please make sure your server’s language settings are configured properly. If the problem is still not fixed, keep reading.
  2. Go to General Settings Page and activate Advanced Tab. If you want to do these for a specific site, use custom general settings for the site (See: Using custom general settings for a site) and do the following steps in Settings Tab of Site Settings Page.
  3. If not checked, check Always use UTF-8 encoding? setting’s checkbox. If it is already checked, uncheck it. Save the settings, and test your site settings (See: Testing site settings) to see if the problem is fixed. If not, proceed with the following steps.
  4. Again, under Advanced Tab, check Always use UTF-8 encoding? setting’s and then Convert encoding to UTF-8 when it is not UTF-8 setting’s checkbox. Save the settings and test the site settings again to see if the problem is fixed.

Note

You can also try to change the character set defined in the HTML code of the page. Although checking Always use UTF-8 encoding? setting’s checkbox replaces the charset defined in the HTML code of the target page, you can try to do it manually in case the plugin might not succeed in doing so.

The site you are trying to crawl may have a meta tag that defines a charset other than UTF-8. For instance, there may be a meta tag like this one in the page’s HTML code:

<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-9" />

As you can see, charset value is iso-8859-9, although we need it to be UTF-8. If this is the case, you can use Find and replace in HTML at first load setting under Post Tab to replace the charset with UTF-8.

In some cases, removing the charset definition completely might solve the issue. You can try to remove charset=iso-8859-9 part completely, as well.

If the problem is not fixed after these steps are done, then the plugin cannot fix the character encoding problem, unfortunately.