Package org.carrot2.language
Class GlobDictionary
java.lang.Object
org.carrot2.language.GlobDictionary
- All Implemented Interfaces:
Predicate<CharSequence>
This dictionary implementation is a middle ground between the complexity of regular expressions
and sheer speed of plain text matching. It offers case sensitive and case insensitive matching,
as well as globs (wildcards matching any token sequence).
The following wildcards are available:
*
- matches zero or more tokens (possessive match),*?
- matches zero or more tokens (reluctant match),+
- matches one or more tokens (possessive match),+?
- matches zero or more tokens (reluctant match),?
- matches exactly one token (possessive).
In addition, a token type matching is provide in the form of:
{name}
- matches a token with flags namedname
.
Token flags are an int bitfield.
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionstatic enum
static class
static final class
static final class
-
Constructor Summary
ConstructorsConstructorDescriptionGlobDictionary
(Stream<GlobDictionary.WordPattern> patterns) GlobDictionary
(Stream<GlobDictionary.WordPattern> patterns, Function<String, String> tokenNormalization, Function<CharSequence, String[]> termSplitter) -
Method Summary
Modifier and TypeMethodDescriptionstatic GlobDictionary
compilePatterns
(Stream<String> entries) static Function
<CharSequence, String[]> boolean
find
(String[] inputTerms, String[] normalizedTerms, int[] types, Predicate<GlobDictionary.WordPattern> earlyAbort) Find all matching patterns, optionally aborting prematurely.String[]
String[]
split
(CharSequence input) boolean
test
(CharSequence input) toString()
-
Constructor Details
-
GlobDictionary
public GlobDictionary(Stream<GlobDictionary.WordPattern> patterns, Function<String, String> tokenNormalization, Function<CharSequence, String[]> termSplitter) -
GlobDictionary
-
-
Method Details
-
defaultTermSplitter
-
test
- Specified by:
test
in interfacePredicate<CharSequence>
-
find
public boolean find(String[] inputTerms, String[] normalizedTerms, int[] types, Predicate<GlobDictionary.WordPattern> earlyAbort) Find all matching patterns, optionally aborting prematurely.- Parameters:
inputTerms
- Input terms (verbatim).normalizedTerms
- Normalized terms (must use the same normalizer as the dictionary).types
- Token types (bitfield) used inGlobDictionary.MatchType.ANY_OF_TYPE
.earlyAbort
- A predicate that indicates early abort condition.- Returns:
- Returns
true
if at least one match was found,false
otherwise.
-
split
-
normalize
-
toString
-
defaultTokenNormalization
-
compilePatterns
-