![]() |
Zoom, Semantic Search Engine - Online Reference Manual | |||
| info@semantic-knowledge.com | ||||
| Home | News | Reference | Support | Download | Buy | About | ||||
We recommend that you structure all the folders of your main folder into sub-folders for the following reasons:
- Current Windows versions cannot deal correctly with a large number of files within a single folder (considerable fall-off in performance, clogging of file management tools, etc.)
- You can use the names of the sub-folders of a main folder to carry additional information (for instance, by grouping together the texts by source, by geographical area, by type of population, etc.) that can be reused when collecting statistics on your files (see Indexing, corresponding chapter).
Note that the Robot supplied with Zoom automatically builds up a tree structure of sub-folders (in which are stored Web pages) so as to offer you the best possible performance for indexing the folders and consulting results. Structuring data in folders and sub-folders entails only advantages and no disadvantages at all if you use Tropes Zoom for desktop search.
To group documents together inside a single folder or, on the contrary, to disperse your documents into several sub-folders, while retaining the coherence of the search index and taking advantage of the incremental performances of Zoom search engine, you must:
Since the semantic information collected in the course of the indexing is stored inside a small file ([.IDT]) associated with each analyzed document, you do not have to make a “global” build of the index to merge or disperse folders, provided that you have chosen to enable incremental build (see Indexing parameters).
These remarks are of interest only if you are processing huge folders (of more than 10,000 documents), and if you are using incremental build, which enables you very quickly to rebuild a search index without re-analyzing each file.
Six files, specific to the software, are generated during the indexing of a folder:
|
Name of the file |
Function |
Description |
|
BASEDOC.MFT |
Global index |
Information Retrieval index |
|
BASEDOC.MIT |
Global index |
Information Retrieval index |
|
BASEDOC.MWL |
Global index |
Information Retrieval index |
|
BASEDOC.MVI |
Version |
Information on the search index version |
|
BASEDOC.SCN |
Scenario |
Scenario used when indexing |
|
“Filename.IDT” |
File index |
List of the equivalent classes of a text, files that remain on the hard drive only if you activate Incremental Build |
Do not modify these files: they are managed automatically by the software. However, you can delete them from your folders if you wish to erase all trace of the documentary index.
The use of numerous files is essential for performance reasons, particularly because current Windows versions cannot deal correctly with files of more than 2 Gb, while Zoom needs to store search indexes on folders that can be composed of more than one million documents.
When you are indexing, the software follows these two stages:
In the course of “incremental” indexing, stage 1 is carried out only when necessary.
Stage 2 is always carried out. It makes it possible to merge the file indexes with the incremental option, and thus to reduce the length of the indexing.
Reminder: to carry out incremental build, you must check the [Enable incremental build] box when indexing the base.
Copyright Acetic and Semantic Knowledge, all rights reserved
www.semantic-knowledge.com