Clustering engine for millions of documents and gigabytes of textFree trial
Meaningful insights from large quantities of text
Lingo4G enables interactive exploration of millions of documents and gigabytes of text. In near-real-time, fully automatically, without external knowledge bases.
Bird's eye view
Get an overview of the topics covered in thousands of documents, within seconds.
Quickly identify documents of interest and visualize relationships between them.
Combine topics, clusters and 2d document maps into powerful visualizations.
Near-real-time topic discovery
Lingo4G can extract the topics discussed in hundreds of thousands of documents, along with lexical relationships between them, within seconds.
Topical phrases in context
Lingo4G can highlight selected topical phrases in the document text to put them in context and bring up the relevant parts.
Lingo4G can arrange hundreds of thousands of documents into non-overlapping clusters and 2d maps to help plan, execute and refine research.
Data slicing and filtering
Choose the document subset to analyze by typing a query, picking an area from the document map or selecting a topic or cluster to drill down on.
Lingo4G exposes fine-grained parameters for adjusting the number of topics and clusters, editing stop lists to exclude unwanted topic labels and more.
Combine Lingo4G JSON-based REST API with visualization components, such as FoamTree or Circles, to build interactive text exploration tools.
Fast, automatic, easy to integrate
Once Lingo4G indexes your collection, it can extract topics, themes and document clusters within seconds.
Topic discovery takes seconds regardless of whether you're processing a hundred or a hundred of thousands of documents.
No external taxonomies
Lingo4G processes documents based only on their textual content, no external dictionaries or taxonomies required.
Stop word discovery
Lingo4G will automatically identify the meaningless phrases specific to your data, such as present invention for patent data.
Full text search
Should you need the good old full text search over your collection, Lingo4G can do that too.
The Lingo4G Explorer application will let you get started quickly and tune every aspect of topic extraction and clustering.
On modern hardware Lingo4G can index 200–500 MB of text per minute. Adding or updating docs does not require reindexing.
Lingo4G exposes a JSON-based REST API you can call from any programming language to get analysis results.
Use REST API to build more complex apps, such as finding textually similar documents or nearest-neighbor classification.
Questions & Answers
The natural use case is exploration of large volumes of human-readable text, such as scientific papers, business or legal documents.
Out of the box, Lingo4G can give an instant overview of the topics discussed in the whole collection or in the requested subset of it and thus help the analysts to plan, execute and report on their research.
You can use Lingo4G REST API to build more complex applications, such as recommendation of content-wise similar documents or nearest-neighbor classification.
You can also combine Lingo4G REST API with visualization components, such as FoamTree, to build interactive text exploration applications.
Some of our clients have been successfully using Lingo4G with collections of millions of documents spanning over 300 GB of text. If your collection is larger than that, please do get in touch for an evaluation license to see if Lingo4G can handle your data.
One important factor to consider is that currently Lingo4G does not offer distributed processing. This means that the maximum reasonable size of the project will be limited by the amount of RAM, disk space and processing power available on a single virtual or physical server.
Currently, Lingo4G can only process English text. If you'd like to apply Lingo4G to content written in a different language, please contact us.
Lingo4G can run on any platform supporting Java 1.8 or later. While processing cannot currently be distributed to multiple machines, a high-end workstation with fast SSD storage should be capable of handling collections of several tens of gigabytes. For most data sets not exceeding gigabytes, any computer with 4GB of memory and some disk space will be sufficient. We very much recommend using SSD drives to store Lingo4G indices. Please see the Requirements section of Lingo4G manual for more details.
We require one Lingo4G license per one physical or virtual server that runs Lingo4G binaries, regardless of the number of cores on the server, the number of users and number of collections handled by the server.
For large-scale or non-typical deployment scenarios, such as OEM distribution, please get in touch.
There are no restrictions on the number of Lingo4G instances running on one physical or virtual server. The only limit may be the capacity of the server, including RAM size, disk space and the number of CPUs.
Absolutely! Please get in touch for a free evaluation package.
No. Lingo3G and Lingo4G are two separate products we intend to offer and maintain independently. Lingo3G will remain an engine for real-time clustering of small and medium collections, while Lingo4G will address clustering of large data sets. Therefore, Lingo4G is not an upgrade to Lingo3G, but a complementary offering.
Having said that, if you would like to switch from Lingo3G to Lingo4G, we offer a license trade-in option and count the initial Lingo3G license purchase fee towards the Lingo4G license fee.
The dotAtlas map visualization component shipping with Lingo4G Explorer is currently pre-release software. It's been battle-tested for months by early adopters, but lacks finalized API and documentation.
If you'd like to try integrating dotAtlas into your software, please let us know. We'll be happy to share the pre-release version along with code examples and initial guidance.
We will not charge any extra fees for the pre-release versions of dotAtlas. Once it enters the official product suite, the use of dotAtlas will require a license fee similar to the one that applies for Carrot Search FoamTree.