An alternative to Lingo3G are open source clustering algorithms available in the Carrot2 framework. Here is how they compare.
|Hierarchical clustering||Not available||Not available||Available|
|Cluster label quality||high||fair||very high|
|Customizable stop word list||Available||Available||Available|
|Label filtering (suppressing specific words or phrases in the output cluster labels)||Available||Available||Available|
|Label boosting (promoting specific words or phrases in the output cluster labels)||Not available||Not available||Available|
|Synonyms (defining groups of words or phrases to be treated as synonymous)||Not available||Not available||Available|
|Structured data (guiding text clustering by numeric or nominal data)||Not available||Not available||Available|
|Document-to-cluster misassignment (ratio of documents in a cluster that are irrelevant to the cluster label)||medium||medium||low|
|Further development||only critical bug fixes||only critical bug fixes||new features planned|
|Time* of clustering 100 snippets [s]||0.054||0.003||0.005**|
|Time of clustering 500 snippets [s]||0.490||0.015||0.018|
|Time of clustering 1000 snippets [s]||0.847||0.026||0.029|
|Time of clustering 10000 snippets [s]||121.5***||0.218||0.154|
|Time of clustering 100000 snippets [s]||—***||3.802||1.561|
(*) Clustering speed measurements were done on Open Directory Project site descriptions coming from the Top/Computers category. Benchmark environment: Intel Core i7-2600K 3.4GHz, 12GB RAM, Windows 7. Java Virtual Machine: Sun JDK 1.7.0_04 64bit, JVM switches: -server -Xmx1024m -Xms1024m. Time presented in the table is an average of 100 runs, for each algorithm time measurement was preceded by 100 untimed warm-up runs.
(**) Lingo3G times refer to hierarchical clustering, while for Carrot2 algorithms the times reported refer to flat clustering because Carrot2 algorithms do not support hierarchical clustering.
(***) Open Source edition is not scalable enough to reliably cluster very large numbers of documents.
Stephan Schmid, CEO at Comcepta, Switzerland
Dr James Thomas, Associate Director, EPPI-Centre, Social Science Research Unit, Institute of Education, London
René de Vries, Managing Director at HowardsHome