Class BisectingKMeansClusteringAlgorithm

java.lang.Object
org.carrot2.attrs.AttrComposite
org.carrot2.clustering.kmeans.BisectingKMeansClusteringAlgorithm
All Implemented Interfaces:
AcceptingVisitor, ClusteringAlgorithm

public class BisectingKMeansClusteringAlgorithm extends AttrComposite implements ClusteringAlgorithm
A very simple implementation of bisecting k-means clustering. Unlike other algorithms in Carrot2, this one creates hard clustering (one document belongs only to one cluster). On the other hand, the clusters are labeled only with individual words that may not always fully correspond to all documents in the cluster.
  • Field Details

    • NAME

      public static final String NAME
      See Also:
    • clusterCount

      public final AttrInteger clusterCount
      Number of clusters to create. The algorithm will create at most the specified number of clusters.
    • maxIterations

      public final AttrInteger maxIterations
      Maximum number of k-means iterations to perform.
    • partitionCount

      public final AttrInteger partitionCount
      Number of partitions to create at each k-means clustering iteration.
    • labelCount

      public final AttrInteger labelCount
      Minimum number of labels to return for each cluster.
    • queryHint

      public final AttrString queryHint
      Query terms used to retrieve documents. The query is used as a hint to avoid trivial clusters.
    • useDimensionalityReduction

      public final AttrBoolean useDimensionalityReduction
      If enabled, k-means will be applied on the dimensionality-reduced term-document matrix. The number of dimensions will be equal to twice the number of requested clusters. If the number of dimensions is lower than the number of input documents, reduction will not be performed. If disabled, the k-means will be performed directly on the original term-document matrix.
    • matrixBuilder

      public TermDocumentMatrixBuilder matrixBuilder
      Configuration of the size and contents of the term-document matrix.
    • matrixReducer

      public TermDocumentMatrixReducer matrixReducer
      Configuration of the matrix decomposition method to use for clustering.
    • preprocessing

      public BasicPreprocessingPipeline preprocessing
      Configuration of the text preprocessing stage.
    • dictionaries

      public EphemeralDictionaries dictionaries
      Per-request overrides of language components (dictionaries).
      Since:
      4.1.0
  • Constructor Details

    • BisectingKMeansClusteringAlgorithm

      public BisectingKMeansClusteringAlgorithm()
  • Method Details