3.61. Deleting a post when it is not found when recrawling
You may want to remove the posts that do not exist in the target website anymore. You can achieve this by creating a Filter.
Important
This works only if the target website returns a 404 response for the not-found pages. HTTP status code returned by a website completely depends on the website. So, the website might decide to not return a 404 response. For example, the website might redirect the not-existing page to another page. In that case, you might add a condition command that checks if an Element’s Value Exists or Does not exist to decide whether the post is found or not. You might need to improvise a condition by using the available Condition Commands.
Go to Site Settings Page and activate Main Tab
Check Active for recrawling? setting’s checkbox to enable recrawling
Activate Post Tab and go to Filters Section
Add a new Filter to Post request filters setting. Make sure:
- Its Condition operator is set as
and
- Its Event of the if part and Event of the then part are set as
After post request is made
- It is enabled (See: Enable/disable filter)
- Its Condition operator is set as
Click to Add condition command button of the filter and configure the new condition command like below. This checks if the post is being recrawled.
Subject: Crawling
Property: Value Command: Is recrawling Click to Add condition command button of the filter and configure the new condition command like below. This checks if the target site returned a 404, i.e. not found, response.
Subject: Request
Property: HTTP status code Command: Equal to Value: 404
Click to Add action command button of the filter and configure the new action command like below. This command deletes the post.
Subject: Crawling
Property: Value Command: Stop and delete the post Reason: Optional. You can enter your reason here. For example, you can enter this: Target post is not found when recrawling Delete URL?: Optional. See Stop and delete the post. Optionally, you can enter a Filter title/description such as “Delete the post if it is not found when recrawling”.
Save the site settings (See: Saving The Settings)
The following figure shows the filter that is described above.