3.59. Deleting a post when it is not found when recrawling¶
You may want to remove the posts that do not exist in the target website anymore. You can achieve this by creating a Filter.
This works only if the target website returns a 404 response for the not-found pages. HTTP status code returned by a website completely depends on the website. So, the website might decide to not return a 404 response. For example, the website might redirect the not-existing page to another page. In that case, you might add a condition command that checks if an Element’s Value Exists or Does not exist to decide whether the post is found or not. You might need to improvise a condition by using the available Condition Commands.
Check Active for recrawling? setting’s checkbox to enable recrawling
Click to Add condition command button of the filter and configure the new condition command like below. This checks if the post is being recrawled.
Click to Add condition command button of the filter and configure the new condition command like below. This checks if the target site returned a 404, i.e. not found, response.
Click to Add action command button of the filter and configure the new action command like below. This command deletes the post.
Optionally, you can enter a Filter title/description such as “Delete the post if it is not found when recrawling”.
Save the site settings (See: Saving The Settings)
The following figure shows the filter that is described above.