Interface ClusteringAlgorithm

All Superinterfaces:
AcceptingVisitor
All Known Implementing Classes:
BisectingKMeansClusteringAlgorithm, LingoClusteringAlgorithm, STCClusteringAlgorithm

public interface ClusteringAlgorithm extends AcceptingVisitor
  • Method Details

    • requiredLanguageComponents

      Set<Class<?>> requiredLanguageComponents()
      Returns:
      A set of classes required to be present in the LanguageComponents instance provided for clustering.
    • optionalLanguageComponents

      default Set<Class<?>> optionalLanguageComponents()
      Returns:
      A set of classes used by the algorithm, if present, but optional in LanguageComponents instance provided for clustering.
    • supports

      default boolean supports(LanguageComponents languageComponents)
      Verify whether a given LanguageComponents instance contains all the required components for the algorithm to run.
      Parameters:
      languageComponents - LanguageComponents to check against.
      Returns:
      true if the provided LanguageComponents instance is sufficient for clustering.
    • cluster

      <T extends Document> List<Cluster<T>> cluster(Stream<? extends T> documents, LanguageComponents languageComponents)
      Cluster a set of documents.
      Type Parameters:
      T - Any subclass of Document. Clusters of objects of the same type are returned.
      Parameters:
      documents - A stream of documents for clustering.
      languageComponents - LanguageComponents with a set of suppliers for the required language-specific components.
      Returns:
      A list of top-level clusters (clusters can form a hierarchy via Cluster.getClusters().