The crawler works as provided in the steps below for Web Collection:
- The crawler will first check the robots.txt. If robots.txt is available for the root URL, based on the limitations specified in the robots.txt the URLs will be crawled starting from the root URL.
- Then other conditions set in collection settings such as remove duplicates, spider depth, redirects, etc will be checked and the URL that matches the criteria would get indexed.
- Additionally, robot meta tags, if available on the HTML page would be considered for crawling.
- Once the crawling is done on the first page i.e., root URL, the process will recursively be continued with other URLs in each web page till the spider depth mentioned in collection settings.
- While indexing the content of the page, excepting the elements between stopindex and startindex tags would be indexed under content.
- Meta fields such as title, description, keyword, and URL would be indexed as SearchBlox fields which can be searched directly as they would be included in the context field. Content and SearchBlox fields mentioned would be considered to generate a context for the search.
- Other custom meta tags would also be indexed; those fields can be viewed in your JSON search response along with other Searchblox fields and can be searched using fielded search and filters. They can be added as facet filters or added to context search.
- Read: Fielded Search in SearchBlox
- Read: Custom Fields in Search
Additional Pointers:
- Kindly note that in JSON response, you would be able to view the SearchBlox fields as well as meta fields. Content will not appear in the response.
- Regarding the content that gets picked by the indexer, it should be static content and should not get generated dynamically via javascript.
- The search results can be tuned based on relevancy as in the help link below: https://developer.searchblox.com/docs/relevancy-tuning-in-search
- Please check all the topics on that page to learn about boosting certain search results and relevancy tuning.
To learn more about Web collection, read: Web Collection in SearchBlox
Comments
0 comments
Please sign in to leave a comment.