Yes, the following preferences can be customized for the crawler. Log in to Searchblox Admin. Parameters can be found under the Collections > Settings Admin tab when you use the HTTP-based collection.
The following parameters can be set:
* User Agent Name
* Spider Delay (in milli secs)
* Referrer URL
* Spider Depth
* Robots
* Redirects
* HTTP Basic Authentication information (user/password)
* Form Authentication (Form URL, Name/Value, Form Action)
* HTTP Proxy Server Settings
* Boosting
* Removal of Duplicates
* Stemming
* Spelling Suggestions
* Enable Logging
Related Discussions:
Can I restrict the crawler from indexing certain folder/url paths?
Can the SearchBlox crawler access documents over HTTPS?
How can I see detailed spider/crawler activity?
Can I set a time delay on the crawler/spider between making requests to a website?
Comments