Carrot Search: Lingo3G: Lingo3G vs Carrot2

An alternative to Lingo3G are open source clustering algorithms available in the Carrot2 framework. Here is how they compare.

Feature Lingo
open source
STC
open source
Lingo3G
commercial
Hierarchical clustering Not available Not available Available
Cluster label quality high fair very high
Customizable stop word list Available Available Available
Label filtering (suppressing specific words or phrases in the output cluster labels) Available Available Available
Label boosting (promoting specific words or phrases in the output cluster labels) Not available Not available Available
Synonyms (defining groups of words or phrases to be treated as synonymous) Not available Not available Available
Structured data (guiding text clustering by numeric or nominal data) Not available Not available Available
Document-to-cluster misassignment (ratio of documents in a cluster that are irrelevant to the cluster label) medium medium low
Results tuning basic basic advanced
Further development only critical bug fixes only critical bug fixes new features planned
Time* of clustering 100 snippets [s] 0.054 0.003 0.005**
Time of clustering 500 snippets [s] 0.490 0.015 0.018
Time of clustering 1000 snippets [s] 0.847 0.026 0.029
Time of clustering 10000 snippets [s] 121.5*** 0.218 0.154
Time of clustering 100000 snippets [s] *** 3.802 1.561

(*) Clustering speed measurements were done on Open Directory Project site descriptions coming from the Top/Computers category. Benchmark environment: Intel Core i7-2600K 3.4GHz, 12GB RAM, Windows 7. Java Virtual Machine: Sun JDK 1.7.0_04 64bit, JVM switches: -server -Xmx1024m -Xms1024m. Time presented in the table is an average of 100 runs, for each algorithm time measurement was preceded by 100 untimed warm-up runs.

(**) Lingo3G times refer to hierarchical clustering, while for Carrot2 algorithms the times reported refer to flat clustering because Carrot2 algorithms do not support hierarchical clustering.

(***) Open Source edition is not scalable enough to reliably cluster very large numbers of documents.

FAQ

more questions

Easy to integrate, many tuning options, very fast and lightweight.

Stephan Schmid, CEO at Comcepta, Switzerland

Our evaluation found overwhelming support for using Lingo3G.

Dr James Thomas, Associate Director, EPPI-Centre, Social Science Research Unit, Institute of Education, London

I’ve shown two board members of our client company what our FoamTree-powered app does. Amazing what a good visualization can accomplish :-)

René de Vries, Managing Director at HowardsHome