Class CompleteLabelFilter
java.lang.Object
org.carrot2.attrs.AttrComposite
org.carrot2.text.preprocessing.filter.ContextLabelFilter
org.carrot2.text.preprocessing.filter.CompleteLabelFilter
- All Implemented Interfaces:
AcceptingVisitor
A filter that removes "incomplete" labels.
For example, in a collection of documents related to Data Mining, the phrase Conference on Data is incomplete in a sense that most likely it should be Conference on Data Mining or even Conference on Data Mining in Large Databases. When truncated phrase removal is enabled, the algorithm would try to remove the "incomplete" phrases like the former one and leave only the more informative variants.
See this document, page 31 for a definition of a complete phrase.
-
Field Summary
FieldsModifier and TypeFieldDescriptionDetermines the strength of the truncated label filter.Fields inherited from class org.carrot2.text.preprocessing.filter.ContextLabelFilter
enabled
Fields inherited from class org.carrot2.attrs.AttrComposite
attributes
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionvoid
filter
(PreprocessingContext context, boolean[] acceptedStems, boolean[] acceptedPhrases) Marks incomplete labels.Methods inherited from class org.carrot2.text.preprocessing.filter.ContextLabelFilter
isEnabled
Methods inherited from class org.carrot2.attrs.AttrComposite
accept
-
Field Details
-
labelOverrideThreshold
Determines the strength of the truncated label filter. The lowest value means strongest truncated labels elimination, which may lead to overlong cluster labels and many unclustered documents. The highest value effectively disables the filter, which may result in short or truncated labels.
-
-
Constructor Details
-
CompleteLabelFilter
public CompleteLabelFilter()
-
-
Method Details
-
filter
public void filter(PreprocessingContext context, boolean[] acceptedStems, boolean[] acceptedPhrases) Marks incomplete labels.- Specified by:
filter
in classContextLabelFilter
- Parameters:
context
- contains words and phrases to be filteredacceptedStems
- the filter should set tofalse
those elements that correspond to the stems to be filtered outacceptedPhrases
- the filter should set tofalse
those elements that correspond to the phrases to be filtered out
-