Can I customize the settings for the SearchBlox HTTP crawler?

Yes, the following preferences can be customized for the crawler. Log in to Searchblox Admin. Parameters can be found under the Collections > Settings Admin tab when you use the HTTP-based collection.

The following parameters can be set:

* User Agent Name
* Spider Delay (in milli secs)
* Referrer URL
* Spider Depth
* Follow Robots
* Follow Sitemaps
* Redirects
* HTTP Basic Authentication information (user/password)
* Form Authentication (Form URL, Name/Value, Form Action)
* HTTP Proxy Server Settings
* Boosting
* Removal of Duplicates
* Stemming
* Spelling Suggestions
* Enable Logging

Related Discussions:

Can I restrict the crawler from indexing certain folder/url paths?

Can the SearchBlox crawler access documents over HTTPS?

How can I see detailed spider/crawler activity?

Can I set a time delay on the crawler/spider between making requests to a website?

To learn more about SearchBlox please visit our developer document site

Have more questions? Submit a request