5.1. Creating a New Site
TL;DR
- In the sidebar, click
Content Crawler > Add New
- Fill the two mandatory settings:
Main > Site URL
andCategory > Category URLs
- Fill other settings depending on what and how you want to crawl
- Click
Publish
button
To create a new site, in the sidebar of your WordPress admin panel, click Content
Crawler > Add New
. Another way to create a new site is to click Content Crawler > All
Sites
and then, in the opened page, click Add New
button that is available next to
the page’s title, at the top. Click one of these buttons and let’s get started.
Now, you see the site settings page with empty fields, lots of them, and you wonder if you have to enter something into each and every field. You do not have to configure every setting. In fact, there are just 2 settings that must be configured. These settings are as follows:
- Main > Site URL
- You should enter the main site URL in this setting. For example, if
you want to crawl
https://wordpress.org/plugins/
page, you should enterhttps://wordpress.org/
into this setting. The URL must start withhttp
. This setting will be used to create full URLs from relative URLs. For example, if there is a URL defined asplugins/akismet
, this URL will be prepended the value entered in this setting, which will make this relative URLhttps://wordpress.org/plugins/akismet
. If you enteredhttps://wordpress.org/plugins/
, the URL would becomehttps://wordpress.org/plugins/plugins/akismet
, which is not a valid URL. Therefore, make sure the parts in the URL coming after the domain of the target site are removed. - Category > Category URLs
- You should enter at least one category URL in this setting.
This is required because, for automatic crawling to work, the plugin should be able to find
post URLs in the target web site. The URLs you define in this setting will be searched for post
URLs using the settings you will define under
Category
tab.
Settings other than these two are not mandatory to configure. Whether to define them or not is
completely up to you. You can configure them if you need them. For example, if you need to crawl
post title, you should configure Post > Post Title Selectors
setting. If you want to
save post tags, you should configure Post > Post Tag Selectors
setting. If you want to
find and replace something in the post title, you can configure Templates > Find and
replace in post's title
setting. If you want the plugin to send cookies with the requests, you
can configure Main > Cookies
setting.
The plugin’s settings have their own documentation in the settings page as well. These
documentations can be shown by clicking i
icon that is available next to the name of
all settings. When you click to the icon, an explanation about the setting is shown. These
explanations are created considering the most important things you might want to know to properly
use the settings. They are quick references. Instead of trying to use a setting by following your
instincts, it is recommended to take a look at the quick references in order not to spend too
much time figuring it out by yourself.
After configuring the settings, click Publish
button to save and publish them (See:
Saving The Settings). The sites that are not published, i.e. that are in draft status, are not
crawled even if they are activated for scheduling. Therefore, make sure the site is published.