7.1.1. Scheduling
Scheduling contains automatically collecting post URLs from target site’s category pages and then saving those posts into your WordPress site.
- URL Collection
- This means crawling category pages of the sites, which are defined in
Category > Category URLs
setting, extracting post URLs, and saving those post URLs into the database. - Post Crawling
- This means crawling one of the post URLs saved into your database via URL collection event and creating a post in your WordPress site from the source code of the post URL.
7.1.1.1. Scheduling is active?
Enable this to activate automatic crawling. When you enable this, it means the plugin will collect
post URLs from the categories entered into Category > Category URLs
setting and it will
automatically crawl collected post URLs.
Note
For the posts to be crawled, both this setting and Active for scheduling? setting of the site must be enabled.
7.1.1.2. Post URL Collection Interval
The time interval set in this setting will define how much time apart the post collection job should run.
Note
This setting is visible only if Scheduling is active? is enabled.
Note
The collected URLs will be crawled (saved as posts) in the order they are collected.
For example, let’s say you selected Every 5 minutes
. In this case, every 5 minutes, the
plugin will collect URLs from one of the categories of one of the active sites. Let’s assume you
have 3 active sites and they are configured as follows:
Site Name | Categories |
---|---|
Site 1 | Category 1.1 (2 pages)
Category 1.2 (1 page)
|
Site 2 | Category 2.1 (1 page)
Category 2.2 (2 pages)
Category 2.3 (2 pages)
|
Site 3 | Category 3.1 (1 page)
|
Let’s also assume that you do not limit the number of category pages that can be crawled.
Finally, let’s assume that you set the value of Maximum number of pages to check to find new post URLs to 1
.
Here is how the plugin will collect the URLs with this configuration:
Time (minute) | Site - Category - Page |
---|---|
0 | Site 1 - Category 1.1 - Page 1 |
5 | Site 2 - Category 2.1 - Page 1 |
10 | Site 3 - Category 3.1 - Page 1 |
15 | Site 1 - Category 1.1 - Page 2 |
20 | Site 2 - Category 2.2 - Page 1 |
25 | Site 3 - Category 3.1 - Page 1 |
30 | Site 1 - Category 1.2 - Page 1 |
35 | Site 2 - Category 2.2 - Page 2 |
40 | Site 3 - Category 3.1 - Page 1 |
45 | Site 1 - Category 1.1 - Page 1 |
50 | Site 2 - Category 2.3 - Page 1 |
55 | Site 3 - Category 3.1 - Page 1 |
60 | Site 1 - Category 1.2 - Page 1 |
65 | Site 2 - Category 2.3 - Page 2 |
70 | Site 3 - Category 3.1 - Page 1 |
75 | Site 1 - Category 1.1 - Page 1 |
80 | Site 2 - Category 2.1 - Page 1 |
85 | Site 3 - Category 3.1 - Page 1 |
90 | Site 1 - Category 1.2 - Page 1 |
95 | Site 2 - Category 2.2 - Page 1 |
100 | Site 3 - Category 3.1 - Page 1 |
105 | Site 1 - Category 1.1 - Page 1 |
110 | Site 2 - Category 2.3 - Page 1 |
115 | Site 3 - Category 3.1 - Page 1 |
120 | Site 1 - Category 1.2 - Page 1 |
125 | Site 2 - Category 2.1 - Page 1 |
130 | Site 3 - Category 3.1 - Page 1 |
135 | Site 1 - Category 1.1 - Page 1 |
140 | Site 2 - Category 2.2 - Page 1 |
145 | Site 3 - Category 3.1 - Page 1 |
150 | Site 1 - Category 1.2 - Page 1 |
155 | Site 2 - Category 2.3 - Page 1 |
160 | Site 3 - Category 3.1 - Page 1 |
165 | Site 1 - Category 1.1 - Page 1 |
170 | Site 2 - Category 2.1 - Page 1 |
175 | Site 3 - Category 3.1 - Page 1 |
7.1.1.3. Post Crawl Interval
The time interval set in this setting will define how much time apart the post crawling job should run.
Note
This setting is visible only if Scheduling is active? is enabled.
Let’s say you set this to Every minute
. Then, every minute, the plugin will crawl a URL that
is already collected and available in your database. The URLs that are not saved and waiting to
be saved are said that they are in queue
. So, the plugin will get the first URL in the queue
for a site, ask the target server to send the source code, and apply your settings to extract
data from the source code and save it as a post in your site.
Let’s also assume that you have 4 active sites that have different number of posts in their queue as shown in the following table.
Note
Every time, the oldest URL in the queue will be saved for a site. You can think this as a bank queue. The one who came the earliest is served first.
Site Name | Post URLs in the Queue (the oldest at the top) |
---|---|
Site 1 | URL 1.1
URL 1.2
URL 1.3
URL 1.4
URL 1.5
|
Site 2 | URL 2.1
URL 2.2
URL 2.3
URL 2.4
URL 2.5
URL 2.6
URL 2.7
URL 2.8
URL 2.9
URL 2.10
|
Site 3 | URL 3.1
URL 3.2
|
Let’s finally assume that you have set Run count for post crawling event to 2
. The plugin will
crawl the URLs in the queue and save them as posts as shown in the following table.
Time (minute) | Site | Crawled URLs | Remaining URLs in the queue |
---|---|---|---|
0 | Site 1 | URL 1.1
URL 1.2
|
3 |
1 | Site 2 | URL 2.1
URL 2.2
|
8 |
2 | Site 3 | URL 3.1
URL 3.2
|
0 |
3 | Site 1 | URL 1.3
URL 1.4
|
1 |
4 | Site 2 | URL 2.3
URL 2.4
|
6 |
5 | Site 1 | URL 1.5
|
0 |
6 | Site 2 | URL 2.5
URL 2.6
|
4 |
7 | Site 2 | URL 2.7
URL 2.8
|
2 |
8 | Site 2 | URL 2.9
URL 2.10
|
0 |
Note
In case of multi-page posts, each page of a post will be handled
in a different run. You can assume that the plugin will make only 1
request to the target
site at each run of the event. Since each page of a post has a different URL, the plugin will
crawl/recrawl them at different runs. The plugin will not crawl/recrawl another post until all
pages of the post is crawled/recrawled.
7.1.1.4. Maximum number of pages to crawl per category
This setting defines URLs from how many pages of a category can be collected. When you set this
setting’s value to 0
, all pages of the category will be crawled to collect post URLs.
For example, let’s assume that you entered a category URL having 100 pages into Post > Category
URLs
setting. If you set this setting’s value to 50
, the plugin will collect URLs from the
first 50 pages at first. After all of the 50 pages are crawled, then the plugin will start crawling
from the first page of the category to find new post URLs. But, this time the plugin will not go
through the first 50 pages. Instead, it will use the value defined in
Maximum number of pages to check to find new post URLs setting.
7.1.1.5. Maximum number of pages to check to find new post URLs
This setting defines how many pages of a category, starting from its first page, should be checked for new post URLs. The new post URLs are the URLs that were not added to the database, i.e queue, before.
This setting is applied after the number of pages defined in Maximum number of pages to crawl per category is crawled. For an example, see Maximum number of pages to crawl per category.
7.1.1.6. Run count for URL collection event
This setting defines how many times URL collection event should run.
Let’s assume the following case. You have set Post URL Collection Interval to Every 5
minutes
and set the value of this setting to 2
. You have 2 active sites as shown in the
following table:
Site Name | Categories |
---|---|
Site 1 | Category 1.1 (2 pages)
Category 1.2 (1 page)
|
Site 2 | Category 2.1 (1 page)
Category 2.2 (2 pages)
Category 2.3 (2 pages)
|
Let’s finally assume that Maximum number of pages to check to find new post URLs is set to 1
. Here is how the
plugin will collect the URLs with this configuration:
Time (minute) | Site - Category - Page |
---|---|
0 | Site 1 - Category 1.1 - Page 1
Site 1 - Category 1.1 - Page 2
|
5 | Site 2 - Category 2.1 - Page 1
Site 2 - Category 2.2 - Page 1
|
10 | Site 1 - Category 1.2 - Page 1
Site 1 - Category 1.1 - Page 1
|
15 | Site 2 - Category 2.2 - Page 2
Site 2 - Category 2.3 - Page 1
|
20 | Site 1 - Category 1.2 - Page 1
Site 1 - Category 1.1 - Page 1
|
25 | Site 2 - Category 2.3 - Page 2
Site 2 - Category 2.1 - Page 1
|
30 | Site 1 - Category 1.2 - Page 1
Site 1 - Category 1.1 - Page 1
|
35 | Site 2 - Category 2.2 - Page 1
Site 2 - Category 2.3 - Page 1
|
40 | Site 1 - Category 1.2 - Page 1
Site 1 - Category 1.1 - Page 1
|
45 | Site 2 - Category 2.1 - Page 1
Site 2 - Category 2.2 - Page 1
|
50 | Site 1 - Category 1.2 - Page 1
Site 1 - Category 1.1 - Page 1
|
60 | Site 2 - Category 2.3 - Page 1
Site 2 - Category 2.1 - Page 1
|
Tip
You can increase this number to quickly collect all URLs from all pages of the categories at first. For example, if a category has 30 pages, you can set this number to 3 until URLs of all 30 pages are collected. After that, you can decrease this value because the plugin will collect only new URLs from the category pages from that point on.
Note
All the runs will be executed as soon as the event is run. For
example, if an event’s run count is set to 10
and its interval is set to Every 5
minutes
, all the runs will be executed as soon as the previous run is finished.
5
-minute time interval will not be divided into 10
equal intervals.
7.1.1.7. Run count for post crawling event
This setting defines how many times post crawling event should run every interval of time defined in Post Crawl Interval. For an example case, see Post Crawl Interval.
Note
All the runs will be executed as soon as the event is run. For
example, if an event’s run count is set to 10
and its interval is set to Every 5
minutes
, all the runs will be executed as soon as the previous run is finished.
5
-minute time interval will not be divided into 10
equal intervals.