To prevent duplicate documents from getting into the search collection, enable Remove Duplicates under the Collections > Settings tab.
Duplicate documents are defined by SearchBlox as having the 100% same content within them. Here the content for checking duplicates includes title, keywords, description, other meta fields as well as the content of the page.
When this feature is enabled, it will allow only one document to be indexed if there are 2 or more documents with the same content.
To learn more about Remove Duplicates HTTP Collection settings read: Remove Duplicates in HTTP Collection