![]() |
Robot, Web Mining Spider - Online Reference Manual | |||
| info@semantic-knowledge.com | ||||
| Home | News | Reference | Support | Download | Buy | About | ||||
To access these options use the [Advanced] button of Scan settings of the [Settings] tabsheet on the right frame :

Use the Advanced scan settings parameters to set:
Use the Maximum concurrent downloads parameters to set the number of HTML pages downloads running simultaneously (between 4 and 20, according to your ISP speed and your processor is a good range). Please note that a high concurrent downloads value may cause server bottleneck problems and/or computer resources shortage.
Some computers need a Proxy server to access the Internet. Please contact your network administrator to set theses options.

If necessary, use the [Proxy] tabsheet to set the proxy options:
If the Use proxy parameters is checked, you must set these options:
By using the [Policy] tabsheet parameters, you can decide to respect or not the "Robot exclusion standard":

Important:
1 - The "Robot exclusion standard" is set by webmasters to limit the directories and files the search engine is allowed to "harvest" (copy and index). See http://www.robotstxt.org, for more information.
2 - You choose to crawl every Web pages under your own responsability.
Copyright Acetic and Semantic Knowledge, all rights reserved
www.semantic-knowledge.com