3.59. Deleting a post when it is not found when recrawling

You may want to remove the posts that do not exist in the target website anymore. You can achieve this by creating a Filter.

Important

This works only if the target website returns a 404 response for the not-found pages. HTTP status code returned by a website completely depends on the website. So, the website might decide to not return a 404 response. For example, the website might redirect the not-existing page to another page. In that case, you might add a condition command that checks if an Element’s Value Exists or Does not exist to decide whether the post is found or not. You might need to improvise a condition by using the available Condition Commands.

  1. Go to Site Settings Page and activate Main Tab

  2. Check Active for recrawling? setting’s checkbox to enable recrawling

  3. Activate Post Tab and go to Filters Section

  4. Add a new Filter to Post request filters setting. Make sure:

  5. Click to Add condition command button of the filter and configure the new condition command like below. This checks if the post is being recrawled.

    Subject:Crawling
    Property:Value
    Command:Is recrawling
  6. Click to Add condition command button of the filter and configure the new condition command like below. This checks if the target site returned a 404, i.e. not found, response.

    Subject:Request
    Property:HTTP status code
    Command:Equal to
    Value:404
  7. Click to Add action command button of the filter and configure the new action command like below. This command deletes the post.

    Subject:Crawling
    Property:Value
    Command:Stop and delete the post
    Reason:Optional. You can enter your reason here. For example, you can enter this: Target post is not found when recrawling
    Delete URL?:Optional. See Stop and delete the post.
  8. Optionally, you can enter a Filter title/description such as “Delete the post if it is not found when recrawling”.

  9. Save the site settings (See: Saving The Settings)

The following figure shows the filter that is described above.

../_images/deleting-a-post-when-not-found-when-recrawling.png

Fig. 3.1 The filter that is described in this guide.