Robot Exclusion Policies allow you to set the exclusion policies that affect spider/bots crawling your site. These policies define which Search Engine spiders/bots will be able to crawl your site.
The default policy/operation for the robots.txt file that is generated is to block everything. This is the default policy that is appended to the end of every robots.txt file. Thus, if no entries are created in Robot Exclusions, all spider/bots will be blocked.
- User-agent: *
- Disallow: /
Note: You must explicitly allow bots.
Note: As part of a site build project, TownNews.com creates two initial robot exclusion policies which allow bots from main search engines to crawl your site.
The default can be overwritten, however, by creating an entry for "User-agent: *" and adding a URL that does not exist on your site (the Robots Exclusion will not allow a blank URLs). Example: User-agent: *
- Disallow: /non_existent_directory
Note: If you specify a URL (e.g. /calendar), all URL's connected to that URL will also be blocked.
To access Robot Exclusions, from the dropdown in the the upper left, choose Settings / Robot Exclusions (Not sure how to access BLOX Applications? Click here)
This will open the Robot Exclusion Policies window. There are two ways to edit existing policies: Edit Policies As Text and Edit. There are four tasks you can perform.
To create a new robot exclusion policy, select the Add button in the upper left corner of the Robot exclusion policies window. This will open the Edit Robot Exclusion Policy window.
Edit Policies As Text
This opens a Text Editing window so you can enter the policy manually. This makes it easy to import (or export) an existing robots.txt file into the admin via the copy and paste method, and it also serves as an interface for those individuals who understand the standard robots.txt format.
Selecting a policy and choosing Edit opens the Edit Robot Exclusion Policy window. This provides a Graphical Interface to edit existing policies.
1 Robots: This window allows you to manage Robots and Site access. Robots are considered the user-agent strings of robots in which this policy should apply. Site access refers to the level of access granted to robots matching this policy.
2 Add: To add all common search engines as robots for a policy, select the Common search engines button. To specify user-agents, which are considered the actual name of the robots, select Specific User-Agents. Then select Add to create new user-agent for this policy. Select Clear All to remove all of the user-agents from the list, and select Remove to clear the user-agent that is currently selected.
3 Site Access: In the Site Access area, select Disallow Access To Site to not allow the user-agents to access any section of the site; select Allow Full Access to site to grant full access to the site; and select Allow full Access Except For Specific URLs to grant access to any section to the site apart from the sections that are selected in the menu below.
4 Add: To add URLs that should not be accessed by the user-agents, select Add and type in the section. Make sure to begin with a "/"; for example, a URL might look like /content/tncms/ads/. Select Clear All to remove all of the URLs in the list and Remove to clear only the selected URLs from the list.
5 Crawl Delay: This is the time in seconds between page requests from the user agents included in this policy. Crawl Delay is used to slow down crawlers and reduce server load. A value higher than 10 is not recommended. Only positive, whole numbers between 1 and 10 should be used for this value. Some crawlers ignore this value.
6 OK: When you are done adding/editing the Robot exclusions policy, select OK.
To delete a Robot Exclusion Policy, highlight the policy you want to remove and click Remove. Select multiple policies by holding down the Ctrl button