Lingo3G or Carrot2?
Use Lingo3G if you need high performance, extended tuning or hierarchical clusters.
If you require source code or prefer free of charge solutions, try Carrot2.
Lingo3G | STC | Lingo | ||
---|---|---|---|---|
License | Commercial | BSD, available in Carrot2 | ||
Hierarchical clustering | Supported. Lingo3G outputs mutli-level clusters. | Not supported. STC outputs flat cluster lists. | Not supported. Lingo outputs flat cluster lists. | |
Cluster label quality | Very high. Compact and meaningful labels, noise limited to a minimum. | Fair. Meaningful, but often one-word labels. | High. Meaningful labels, some noise words visible. | |
Customizable stop words | Supported | Supported | Supported | |
Label filtering Suppressing specific words or phrases in the output cluster labels. | Supported. Very fast hash-based filtering, flexible regexp-based filtering. | Supported. Regexp-based filtering. | Supported. Regexp-based filtering. | |
Label boosting Promoting specific words or phrases in the output cluster labels. | Supported. Very fast hash-based boosting, flexible regexp-based boosting. | Not supported | Not supported | |
Synonyms Defining groups of words or phrases to be treated as synonymous. | Supported. Very fast hash-based synonym marking. | Not supported | Not supported | |
Structured data Guiding text clustering by numeric or nominal data. | Supported | Not supported | Not supported | |
Document-to-cluster misassignment Ratio of documents in a cluster that are irrelevant to the cluster label. | Low, especially with precise document assignment option enabled. | Medium, some irrelevant document may slip in. | Medium, some irrelevant document may slip in. | |
Results tuning | Advanced, many parameters for tuning different aspects of clusters and labels. | Basic set of parameters. | Basic set of parameters. | |
Further development | New features and performance improvements. | Bug fixes only. | Bug fixes only. | |
Performance Time taken to cluster search results snippets*. | ||||
Number of snippets | Lingo3G | STC | Lingo | |
100 | 0.005 s** | 0.003 s | 0.054 s | |
500 | 0.018 s | 0.015 s | 0.490 s | |
1,000 | 0.029 s | 0.026 s | 0.847 s | |
10,000 | 0.154 s | 0.218 s | 121.5 s*** | |
100000 | 1.561 s | 3.802 s | —*** |
*) Clustering speed measurements were done on Open Directory Project site descriptions coming from the Top/Computers category. Benchmark environment: Intel Core i7-2600K 3.4GHz, 12GB RAM, Windows 7. Java Virtual Machine: Sun JDK 1.7.0_04 64bit, JVM switches: -server -Xmx1024m -Xms1024m. Time presented in the table is an average of 100 runs, for each algorithm time measurement was preceded by 100 untimed warm-up runs.
**) Lingo3G times refer to hierarchical clustering, while for Carrot2 algorithms the times reported refer to flat clustering because Carrot2 algorithms do not support hierarchical clustering.
***) Open Source edition is not scalable enough to reliably cluster very large numbers of documents.
Questions and answers
I integrated my code with Carrot2. How difficult will it be to switch to Lingo3G?
Lingo3G plugs into the API framework defined by Carrot2, so switching to Lingo3G will be a matter of adding a few JARs to your project and changing the source code to use the Lingo3G algorithm identifier.
When it comes to tuning of clustering, each algorithm has its specific set of parameters, so after switching to Lingo3G you might need to tweak parameter values and lexical resources.
Still in doubt? Get Lingo3G trial.
Use Lingo3G for 2 months free of charge and compare it with Carrot2 on your own data.
Get a trial