Textractor Enterprise 2e
Our expertise is accumulated in the Textractor Information Extraction toolkit, an environment for easy and rapid development of text classification and information extraction systems. Textractor learns superficial text interpretation from examples. A Textractor implementation is a custom-tailored blend of text classification, string extraction, and section labelling. A Textractor implementation is a custom-tailored blend of text classification, string extraction, and section labelling.
Text classification
Text classification means labeling text with a predefined set of categories. These codes provide a meta-representation of the meaning of the text, mapping the large variety of language onto a consistent controled vocabulary, that allows easier access to and management of the text later on. Texts are classified on the basis of a large automatically derived set of features. This, in contrast to many other keyword-based approaches, prevents the system from erroneously triggering on out-of-context occurences of meaningful words. Moreover, the set of relevant features is automatically derived from the training examples, and thus the amount of manual knowledge acquisition is minimized.
What has text classification meant to our clients?
Tangram
“Text classification means labeling text with a predefined set of categories. These codes provide a meta-representation of the meaning of the text, mapping the large variety of language onto.”Buzzcapture
“Consistent controled vocabulary, that allows easier access to and management of the text later on. Texts are classified on the basis of a large automatically derived set of features.”Textkernel Textractor Toolkit Components
- Textractor Server processes texts from a flat unstructured format to a XML structured format. According to an easily manageable configuration setting it commands a set of specialized classification and extraction components.
- Classifier The component responsible for classification of coded representations of (parts of) texts.
- The Extractor is the component responsible for string extraction.
- The Batch Processor manages batch-processing and batch-planning.
- Annotation Manager A suite of tools for semi-automated labeling of training data.
- Training Toolkit A suite of tools for training and tuning the system from example data.
- System Management Tools to manage the servers and to maintain and tune the knowledge structures involved.

