Clustering engine for millions of documents and gigabytes of text

Free trial
Lingo4G Explorer topics treemap

Lingo4G Explorer is an application you can use to get started and experiment with Lingo4G large-scale text clustering.

This screen shot shows topics discovered for StackExchange Super User questions matching the office query. The central panel presents all topics as a treemap. The panel on the right shows the top questions matching the Excel Macro topic. The left-hand side panel allows changing topic discovery parameters and re-running the processing.

Meaningful insights from large quantities of text

Lingo4G identifies clearly-labelled topics in millions of documents and gigabytes of text. In near-real-time, fully automatically, without external knowledge bases.

Lingo4G Explorer topics treemap

Lingo4G Explorer is an application you can use to get started and experiment with Lingo4G large-scale text clustering.

This screen shot shows StackExchange Super User questions matching the office query divided into non-overlapping clusters. A cluster related to printing is highlighted, top questions from that cluster are displayed in the right-hand side panel.

Explore topics and clusters

Get a birds-eye view of the topics discussed in your documents. Use the topics and clusters to plan, execute and refine your research.

Build custom apps

Use the REST API to build more complex apps, such as finding content-wise similar documents or nearest-neighbor classification.

Fast, automatic, easy to integrate

Near-real-time processing

Once Lingo4G indexes your collection, it can extract topics, themes and document clusters within seconds.


Topic discovery takes seconds regardless of whether you're processing a hundred, a hundred of thousands of documents or the whole indexed collection.

No external taxonomies

Lingo4G processes documents based only on their textual content, no external dictionaries, taxonomies or databases required.

Stop word discovery

Lingo4G will automatically identify the meaningless words and phrases specific to your data, such as present invention for patent data.

Easy integration

Lingo4G exposes a JSON REST API you can call from any programming language.


The Lingo4G Explorer application will let you get started and tune clustering quickly.

Questions & Answers

What are the applications of Lingo4G?

The natural use case is exploration of large volumes of fairly static human-readable text, such as scientific papers, business or legal documents.

Out of the box, Lingo4G can give an instant overview of the topics discussed in the whole collection or in the requested subset of it and thus help the analysts to plan, execute and report on their research.

You can also use Lingo4G REST API to build more complex applications, such as recommendation of content-wise similar documents or nearest-neighbor classification.

What is the largest collection Lingo4G can handle?

The early adopters of Lingo4G have been successfully using it with collections of millions of documents spanning over 100 GB of text. If your collection is larger than that, please do get in touch for an evaluation license to see if Lingo4G can handle your data.

One important factor to consider is that currently Lingo4G does not offer distributed processing. This means that the maximum reasonable size of the project will be limited by the amount of RAM, disk space and processing power available on a single virtual or physical server.

Which languages does Lingo4G support?

Currently, Lingo4G can only process English text. If you'd like to apply Lingo4G to content written in a different language, please contact us.

What are the system requirements for Lingo4G?

Lingo4G can run on any platform supporting Java 1.8 or later. While processing cannot currently be distributed to multiple machines, a high-end workstation with fast SSD storage should be capable of handling collections of several tens of gigabytes. For most data sets not exceeding gigabytes, any computer with 4GB of memory and some disk space will be sufficient. We very much recommend using SSD drives to store Lingo4G indices. Please see the Requirements section of Lingo4G manual for more details.

How is Lingo4G licensed?

We require one Lingo4G license per one physical or virtual server that runs Lingo4G binaries, regardless of the number of cores on the server, the number of users and number of collections handled by the server.

For large-scale or non-typical deployment scenarios, such as OEM distribution, please get in touch.

How many collections can I process on one server?

There are no restrictions on the number of Lingo4G instances running on one physical or virtual server. The only limit may be the capacity of the server, including RAM size, disk space and the number of CPUs.

What is the cost of a Lingo4G license?

The cost of a license depends on the edition, please contact us for a quote.

Can I get a trial license?

Absolutely! Please get in touch for a free evaluation package.

I have a Lingo3G license, will I receive Lingo4G as an upgrade?

No. Lingo3G and Lingo4G are two separate products we intend to offer and maintain independently. Lingo3G will remain an engine for real-time clustering of small and medium collections, while Lingo4G will address clustering of large data sets. Therefore, Lingo4G is not an upgrade to Lingo3G, but a complementary offering.

Having said that, if you would like to switch from Lingo3G to Lingo4G, we offer a license trade-in option and count the initial Lingo3G license purchase fee towards the Lingo4G license fee.

Next steps