Real-time clustering
of thousands of docs
and megabytes of text

Free trial Live demo
Lingo3G Clustering Workbench showing web search results clustering.

Lingo3G Workbench is a desktop application you can use to quickly try, tune and experiment with Lingo3G clustering. Out of the box, Workbench will let you apply Lingo3G clustering to web search results, PubMed and Solr search results, contents of a Lucene index and custom data in XML format.

Workbench can visualize the clusters using Carrot Search FoamTree and Circles visualizations along with the legacy Aduna cluster map view. You can use Workbench to experiment with Lingo3G parameters and observe the impact on the results in real time. There is also an option to run a simple performance benchmark with the current parameter values.

This screen shot shows search results for the query data mining clustered into a 2-level hierarchy. The clusters are also visualized in Carrot Search Circles and Aduna clustermap views.

Instant analysis of small-to-medium quantities of text

Lingo3G organizes collections of text documents into clearly-labeled hierarchical folders. In real-time, fully automatically, without external knowledge bases.

Instant overview

Get a concise summary of the subjects discussed in a set of documents.

More efficient browsing

Navigate straight to the documents you need using clearly-labeled folders.

Query refinement

Refine the initial query and "drill down" on a specific subject based on cluster labels.

Java logo Apache Solr logo Elasticsearch logo

Painless integration into any environment using:
Java API, C# API, REST API, CLI tools, Solr or Elasticsearch plugins.

Accurate, blazing-fast, stateless

Useful hierarchical clusters

Lingo3G aims to produce clusters with concise, varied, relevant and human-readable labels.


No external taxonomies or knowledge bases needed, Lingo3G categorizes documents based only on their text.

High performance

On a desktop machine, Lingo3G clusters 100 search results in about 5 ms. Clustering 10.000 abstracts takes ~1 s.


You can add synonym definitions, such as photos = pictures = pics = photographs, to increase the quality of clustering

Label filtering

You can boost or suppress specific cluster labels to highlight product names or remove abusive language.

Tuning application

Tuning of clustering characteristics and performance in a dedicated GUI application called Lingo3G Workbench.

Non-textual attributes

Numeric or enumerated document fields, such as price or tags, can be optionally used to guide clustering.

19 languages supported

Including English, German, French, Chinese Simplified, Thai and Arabic. All with automatic language detection.

100% stateless

Lingo3G is a stateless system: data-in, clusters-out. This makes horizontal scaling a breeze.

Pure Java library

Lingo3G works on any system supporting Java 1.8 or higher, no platform-specific dependencies.

Native C# API

Java runtime is not needed to integrate and call Lingo3G through its C# API.

Open source foundation

Lingo3G is based on the Carrot2 framework. If you've used Carrot2, switching to Lingo3G will be a breeze.

Dozens of happy customers around the globe

Questions & Answers

What are the applications of Lingo3G?

Lingo3G is stateless and processes all data in-memory. This makes it particularly suitable for clustering data coming from highly-dynamic collections, such as search results or social conversations.

Having said that, Lingo3G will be appropriate for processing any collection of texts where the total size does not exceed a few tens of megabytes.

What is the largest collection Lingo3G can handle?

Lingo3G was designed to perform real-time in-memory clustering of small and medium collections of documents, which roughly corresponds to about 5,000 documents, a few kilobytes each.

The upper limit very much depends on the characteristics of your documents. Some of our customers report that they successfully use Lingo3G with as many as 100,000 documents. Please contact us for an evaluation license and performance tuning advice.

For collections spanning millions of documents and gigabytes of text, consider Lingo4G.

Does Lingo3G come with end-user applications?

No. Lingo3G is a software component intended for use and embedding in other applications. Some programming experience is required to apply Lingo3G to custom data.

Lingo3G does come with a clustering tuning GUI called Clustering Workbench, which may serve end-user needs to a limited extent. However, our development efforts concentrate primarily on the core clustering algorithms, developing and supporting user-facing applications has lower priority.

How is Lingo3G licensed?

We require one Lingo3G license per one physical or virtual server that runs Lingo3G binaries, regardless of the number of cores on the server, the number of users or requests handled by the server.

For large-scale or non-typical deployment scenarios, such as OEM distribution, please get in touch.

What is the cost of a Lingo3G license?

The cost of a license depends on the edition, please contact us for a quote.

Is there a free version?

Yes, you can use the algorithms available in the Carrot2 open source framework. Please see the comparison for more details.

Can I get a trial license?

Absolutely! Please get in touch for a free evaluation package.

Where can I learn more about Lingo3G?

The best place to start would be the Lingo3G Manual. For an in-depth introduction to search results clustering algorithms and engines, please see:

A survey of Web clustering engines. ACM Computing Surveys (CSUR), Volume 41, Issue 3 (July 2009), Article No. 17, ISSN: 0360-0300 (PDF).

The paper reports on the evaluation of a number of search results clustering engines, including Lingo3G.

Next steps