Lingo3G or Carrot2?

Use Lingo3G if you need high performance, extended tuning or hierarchical clusters.

If you require source code or prefer free of charge solutions, try Carrot2.

An alternative to Lingo3G are open source clustering algorithms available in the Carrot2 framework. Here is how they compare.

(There is also Lingo4G or Lingo3G comparison.)
Lingo3GSTCLingo
License

Commercial

BSD, available in Carrot2

Hierarchical clustering

Check, supported

Supported. Lingo3G outputs mutli-level clusters.

Cross, not supported

Not supported. STC outputs flat cluster lists.

Cross, not supported

Not supported. Lingo outputs flat cluster lists.

Cluster label quality

Very high. Compact and meaningful labels, noise limited to a minimum.

Fair. Meaningful, but often one-word labels.

High. Meaningful labels, some noise words visible.

Customizable stop words

Check, supported

Supported

Check, supported

Supported

Check, supported

Supported

Label filtering

Suppressing specific words or phrases in the output cluster labels.

Check, supported

Supported. Very fast hash-based filtering, flexible regexp-based filtering.

Check, supported

Supported. Regexp-based filtering.

Check, supported

Supported. Regexp-based filtering.

Label boosting

Promoting specific words or phrases in the output cluster labels.

Check, supported

Supported. Very fast hash-based boosting, flexible regexp-based boosting.

Cross, not supported

Not supported

Cross, not supported

Not supported

Synonyms

Defining groups of words or phrases to be treated as synonymous.

Check, supported

Supported. Very fast hash-based synonym marking.

Cross, not supported

Not supported

Cross, not supported

Not supported

Structured data

Guiding text clustering by numeric or nominal data.

Check, supported

Supported

Cross, not supported

Not supported

Cross, not supported

Not supported

Document-to-cluster misassignment

Ratio of documents in a cluster that are irrelevant to the cluster label.

Low, especially with precise document assignment option enabled.

Medium, some irrelevant document may slip in.

Medium, some irrelevant document may slip in.

Results tuning

Advanced, many parameters for tuning different aspects of clusters and labels.

Basic set of parameters.

Basic set of parameters.

Further development

New features and performance improvements.

Bug fixes only.

Bug fixes only.

Performance

Time taken to cluster search results snippets*.

Number of snippetsLingo3GSTCLingo
1000.005 s**0.003 s0.054 s
5000.018 s0.015 s0.490 s
1,0000.029 s0.026 s0.847 s
10,0000.154 s0.218 s121.5 s***
1000001.561 s3.802 s***

*) Clustering speed measurements were done on Open Directory Project site descriptions coming from the Top/Computers category. Benchmark environment: Intel Core i7-2600K 3.4GHz, 12GB RAM, Windows 7. Java Virtual Machine: Sun JDK 1.7.0_04 64bit, JVM switches: -server -Xmx1024m -Xms1024m. Time presented in the table is an average of 100 runs, for each algorithm time measurement was preceded by 100 untimed warm-up runs.

**) Lingo3G times refer to hierarchical clustering, while for Carrot2 algorithms the times reported refer to flat clustering because Carrot2 algorithms do not support hierarchical clustering.

***) Open Source edition is not scalable enough to reliably cluster very large numbers of documents.

Questions and answers

I integrated my code with Carrot2. How difficult will it be to switch to Lingo3G?

Lingo3G plugs into the API framework defined by Carrot2, so switching to Lingo3G will be a matter of adding a few JARs to your project and changing the source code to use the Lingo3G algorithm identifier.

When it comes to tuning of clustering, each algorithm has its specific set of parameters, so after switching to Lingo3G you might need to tweak parameter values and lexical resources.

Can I use Carrot2 algorithms in a commercial project?

Absolutely! Carrot2 is distributed under the Apache 2.0 license, which is very commercially-friendly.

Still in doubt? Get Lingo3G trial.

Use Lingo3G for 2 months free of charge and compare it with Carrot2 on your own data.

Get a trial