10.1. Manual Crawling Tool

This tool let’s you crawl one or more post URLs. If you want, you can also save the post URLs into the database so that the plugin will crawl them using your scheduling settings (See: Scheduling Tab).

The inputs of this tool are explained as follows.

Important

You should provide at least one post URL.

Site

Select a site whose settings should be saved to save posts.

Note

Only the published sites are shown.

Category

Select a category into which the posts will be saved.

Important

You should select a category that belongs to the post type defined in Post Type setting. For example, if Post Type setting is set to product, then you should select a WooCommerce category.

Post URLs (optional)

Enter the URL of the post that you want to crawl. You can enter many URLs by writing each of them into a new line.

Note

Make sure you entered full URLs. In other words, they should start with http.

Post and Featured Image URLs (optional)

This setting lets you define post URLs with their featured image URLs. If you configured the settings under Featured Image Section of Post Tab, then you do not need to use this setting, because the featured images will be retrieved from the post. In that case you can just use Post URLs setting. This setting is just for the case where the featured images are not available inside the post page. For example, if you configured the settings under Featured Images Section of Category Tab and you did not configure the settings under Featured Image Section of Post Tab then you should use this setting, in case you want the featured images to be saved as well.

This setting has two input fields as explained below:

Post URL
Enter the URL of the post that should be saved
Featured image URL
Enter the URL of the featured image that belongs to the post URL. The plugin will save the post and assign this image to the post as its featured image.

Note

Make sure you entered full URLs. In other words, they should start with http.

Retrieve post URLs from these category URLs (optional)

Enter the URLs of the categories from which the post URLs should be collected. This setting automatically finds post URLs that are available in the given category pages and crawls them so that you do not enter them manually. You can use this setting if you want to save all of the posts that are available inside a category page. The post URLs will be found by using the settings you defined under Category Tab.

Note

Make sure you entered full URLs. In other words, they should start with http.

Pause after crawling this number of posts (optional)
Enter a number. The tool will pause crawling after this number of posts are crawled. This option is valid only when crawling now. Entering 0 or leaving this empty means unlimited posts.
Maximum parallel crawling count (optional)

Enter a number. This number of posts will be crawled at the same time. You can use this setting to increase the speed of crawling. The default number is 1.

Let’s say a single post is saved in 3 seconds and you entered 10 post URLs. When this setting is set to 1, the posts will be finished crawling after about 30 seconds. If you set this to 10, crawling will be finished in about 3 seconds, depending on the response time of the target site.

Clear entered URLs after I click submit button (optional)
When you check this, the URLs you have entered into the inputs will be cleared after you click one of the submit buttons.

After you configured the inputs, simply click one of the submit buttons. The submit buttons are explained as follows.

Crawl now

The URLs you entered will be crawled one by one, as soon as you click this.

Important

Your browser needs to stay open until all URLs are finished being crawled.

Add to the database
The URLs you entered will be added to the database. They will be crawled using your scheduling settings (See: Scheduling Tab).

The following video shows how this tool is used.

10.1.1. URLs for Manual Crawling Section

After you click to Crawl now button, you can follow the status of the URLs in this section. All of the URLs will be shown here. After the URLs are crawled, the link of the posts from your site will be displayed. You can see how many posts are in the queue, how many of them are being saved, and pause/continue crawling.

You can click to button to pause the crawling of the URLs in the queue. Note that the URLs that are in progress will not be stopped. The button will pause crawling for the URLs that are in queue, not being crawled. After you click button, it will be turned into button. You can click to button to continue the crawling of the URLs that are in the queue.

The URLs are shown as rows. A URL row displays the following information about the URL:

Status
The status of the URL as already crawled, being crawled or not crawled.
Site
The name of the site whose settings are used to crawl the URL
Category
The category into which the post will be saved into in your site
Image
Featured image of the post. If a featured image is not provided when adding the URL, then no image will be shown here.
Post URL
URL of the post. You can click to this URL to go to the target page.

Each URL row shown in this section has the following buttons:

Clicking to this button will remove the URL row from this section. It does not delete the post. It just removes the URL row.
Clicking this button will recrawl the URL.

After a URL is crawled, the results of crawling are shown just below the URL row. If there were errors, you can see them in this part, as well. You can also see the saved post’s link so that you can click to it and check if it was saved as you want.

Tip

You can click to the row that shows the URL to toggle the visibility of the results of that URL.

Tip

The URLs that are currently being saved are shown with a green background.

At the top of the URL rows, there are the following links:

Show all results
Clicking this link displays the results of all URL rows.
Hide all results
Clicking this link hides the results of all URL rows.
Remove all
Clicking this link removes all of the URLs from the list. The posts will not be deleted. Only the rows showing the URLs will be deleted.

At the top of this section, a list of URLs that are currently in progress of being crawled is shown. Also, you can see how many of the total URLs are already crawled.