The "robots.txt" file is a special file located in the root domain (http://www.mydomain.com/robots.txt). The file tells the robot (crawler) which files it may index/download. This system is called The Robots Exclusion Standard. It is a convention to prevent cooperating web crawlers and other web robots from accessing all or part of a website that is otherwise publicly viewable. Search engines will look in the root domain for this file and, if present, will use it. In the absence of this file, the search engines will attempt to index everything on the website.
To learn more on Robots.txt file impact in HTTP Collection read: Using Robots.txt in SearchBlox