You can ensure that the crawler stays within the same domain or website by specifying the pattern within the Allow Path box found here: Collections > Paths > Allow Paths.
For example, if you want to index all URLs on within edition.cnn.com, and want the spider to only crawl within the same website, just enter edition.cnn.com in the Allow Paths box.
Related Topics:
What is the syntax used by SearchBlox for Allow and Disallow paths in HTTP Collection?
Comments
0 comments
Please sign in to leave a comment.