Lingo3G: text document clustering engine

Lingo3G is a software component that organizes collections of text documents into clearly-labeled hierarchical thematic folders called clusters. In real-time, fully automatically and without external knowledge bases.

View live demo Lingo3G clustering engine live demo


Lingo3G can help users of your applications to overcome information overload by:

  • Instant overview of the document set Lingo3G will concisely summarise the subjects dealt with in the documents.
  • Fast access to relevant documents With clearly labelled folders, users will navigate straight to the documents they need and easily skip the irrelevant ones.
  • More efficient browsing With the folder view, users will browse an order of magnitude more documents than with the traditional ranked list search.
  • Query refinement Folder labels generated by Lingo3G can help your users to refine their initial query and e.g. "drill down" on a specific subject.

Lingo3G powers innovative applications built by our customers in the areas of search results clustering, text mining and e-discovery, visual search, search engine optimization and other.

Next steps


  • Meaningful hierarchical clusters with useful, concise and varied labels.
  • No external taxonomies or knowledge bases needed, Lingo3G categorizes documents based only on their text.
  • High performance, real-time clustering On a desktop machine, Lingo3G clusters 100 search results in about 5 ms.
  • Scalability Lingo3G can cluster thousands of abstracts or full-text documents. Clustering 10.000 abstracts takes 1 s.
  • Easy integration Lingo3G can be called through its native Java API and C# APIs. Any other technology can call Lingo3G as a REST service. Lingo3G integrates with Apache Solr and Elastic Search.
  • Tuning of clustering characteristics and performance in a dedicated GUI application called Lingo3G Workbench.
  • 100% pure Java library with no platform-specific dependencies. Lingo3G works on any platform supporting Java 1.7 or higher, including Windows, Linux and Mac.
  • .NET Framework 3.5 or later required for Lingo3G C# API; Java runtime is not needed for the C# API.
  • Synonyms such as photos = pictures = pics = photographs can be defined to further increase the quality of clustering.
  • Label filtering can boost or suppress the specified content to e.g. highlight product names or remove abusive language.
  • Non-textual document attributes such as numbers can be used to guide clustering.
  • 19 languages supported including English, German, French, Italian, Spanish, Russian, Chinese Simplified, Thai and Arabic. Lingo3G can automatically detect the language of the documents.
  • Based on open source architecture of the Carrot2 framework, and can use its components for fetching data from such sources as Bing, Google, Lucene, Solr or Google Desktop index. If you integrated your code with Carrot2, switching to Lingo3G will be a breeze.


more questions

APIs & Tools

see all

Easy to integrate, many tuning options, very fast and lightweight.

Stephan Schmid, CEO at Comcepta, Switzerland

Our evaluation found overwhelming support for using Lingo3G.

Dr James Thomas, Associate Director, EPPI-Centre, Social Science Research Unit, Institute of Education, London

I’ve shown two board members of our client company what our FoamTree-powered app does. Amazing what a good visualization can accomplish :-)

René de Vries, Managing Director at HowardsHome