5.1. Creating a New Site

TL;DR

  • In the sidebar, click Content Crawler > Add New
  • Fill the two mandatory settings: Main > Site URL and Category > Category URLs
  • Fill other settings depending on what and how you want to crawl
  • Click Publish button

To create a new site, in the sidebar of your WordPress admin panel, click Content Crawler > Add New. Another way to create a new site is to click Content Crawler > All Sites and then, in the opened page, click Add New button that is available next to the page’s title, at the top. Click one of these buttons and let’s get started.

Now, you see the site settings page with empty fields, lots of them, and you wonder if you have to enter something into each and every field. You do not have to configure every setting. In fact, there are just 2 settings that must be configured. These settings are as follows:

Main > Site URL
You should enter the main site URL in this setting. For example, if you want to crawl https://wordpress.org/plugins/ page, you should enter https://wordpress.org/ into this setting. The URL must start with http. This setting will be used to create full URLs from relative URLs. For example, if there is a URL defined as plugins/akismet, this URL will be prepended the value entered in this setting, which will make this relative URL https://wordpress.org/plugins/akismet. If you entered https://wordpress.org/plugins/, the URL would become https://wordpress.org/plugins/plugins/akismet, which is not a valid URL. Therefore, make sure the parts in the URL coming after the domain of the target site are removed.
Category > Category URLs
You should enter at least one category URL in this setting. This is required because, for automatic crawling to work, the plugin should be able to find post URLs in the target web site. The URLs you define in this setting will be searched for post URLs using the settings you will define under Category tab.

Settings other than these two are not mandatory to configure. Whether to define them or not is completely up to you. You can configure them if you need them. For example, if you need to crawl post title, you should configure Post > Post Title Selectors setting. If you want to save post tags, you should configure Post > Post Tag Selectors setting. If you want to find and replace something in the post title, you can configure Templates > Find and replace in post's title setting. If you want the plugin to send cookies with the requests, you can configure Main > Cookies setting.

The plugin’s settings have their own documentation in the settings page as well. These documentations can be shown by clicking i icon that is available next to the name of all settings. When you click to the icon, an explanation about the setting is shown. These explanations are created considering the most important things you might want to know to properly use the settings. They are quick references. Instead of trying to use a setting by following your instincts, it is recommended to take a look at the quick references in order not to spend too much time figuring it out by yourself.

After configuring the settings, click Publish button to save and publish them (See: Saving The Settings). The sites that are not published, i.e. that are in draft status, are not crawled even if they are activated for scheduling. Therefore, make sure the site is published.