Class PhraseExtractor

java.lang.Object
org.carrot2.text.preprocessing.PhraseExtractor

public class PhraseExtractor extends Object
Extracts frequent phrases from the provided document. A frequent phrase is a sequence of words that appears in the documents more than once. This phrase extractor aggregates different inflection variants of phrase words into one phrase, returning the most frequent variant. For example, if phrase computing science appears 2 times and computer sciences appears 4 times, the latter will be returned with aggregated frequency of 6.

This class saves the following results to the PreprocessingContext:

This class requires that Tokenizer, CaseNormalizer and LanguageModelStemmer be invoked first.

  • Method Details

    • extractPhrases

      public void extractPhrases(PreprocessingContext context)
      Performs phrase extraction and saves the results to the provided context.