An alternative to Lingo3G are open source clustering algorithms available in the Carrot2 framework. Here is how they compare.
| Feature | Lingo open source | STC open source | Lingo3G commercial |
|---|---|---|---|
| Hierarchical clustering | Not available | Not available | Available |
| Cluster label quality | high | fair | very high |
| Customizable stop word list | Available | Available | Available |
| Label filtering (suppressing specific words or phrases in the output cluster labels) | Available | Available | Available |
| Label boosting (promoting specific words or phrases in the output cluster labels) | Not available | Not available | Available |
| Synonyms (defining groups of words or phrases to be treated as synonymous) | Not available | Not available | Available |
| Document-to-cluster misassignment (ratio of documents in a cluster that are irrelevant to the cluster label) | medium | medium | low |
| Number of unclustered documents | medium | medium | low |
| Results tuning | basic | basic | advanced |
| Further development | only critical bug fixes | only critical bug fixes | new features planned |
| Time* of clustering 100 documents [s] | 0.051 | 0.012 | 0.010** |
| Time of clustering 200 documents [s] | 0.163 | 0.021 | 0.018 |
| Time of clustering 500 documents [s] | 0.329 | 0.051 | 0.039 |
| Time of clustering 1000 documents [s] | 51.550*** | 1.861 | 0.520 |
| Time of clustering 100000 documents [s] | —*** | 80.771 | 11.709 |
(*) Clustering speed measurements were done on Open Directory Project data coming from the Top/Computers category. Benchmark environment: Intel Core2 Duo E8400 3GHz, 3GB MB RAM, Windows XP. Java Virtual Machine: Sun JDK 1.6.0, JVM switches: -server -Xmx1024m -Xms1024m. Time presented in the table is an average of 100 runs, for each algorithm time measurement was preceded by 100 untimed warm-up runs.
(**) Lingo3G times refer to hierarchical clustering, while for Carrot2 algorithms the times reported refer to flat clustering because Carrot2 algorithms do not support hierarchical clustering.
(***) Open Source edition is not scalable enough to reliably cluster very large numbers of documents.
Easy to integrate, many tuning options, very fast and lightweight.
Stephan Schmid, CEO at Comcepta, Switzerland
Our evaluation found overwhelming support for using Lingo3G.
Dr James Thomas, Associate Director, EPPI-Centre, Social Science Research Unit, Institute of Education, London
René de Vries, Managing Director at HowardsHome