Lingo3G: text document clustering engine

Lingo3G is a software component that organizes collections of text documents into clearly-labeled hierarchical thematic folders called clusters. In real-time, fully automatically and without external knowledge bases.

View live demoLingo3G clustering engine live demo

Benefits

Lingo3G can help users of your applications to overcome information overload by:

  • Instant overview of the document set Lingo3G will concisely summarise the subjects dealt with in the documents.
  • Fast access to relevant documents With clearly labelled folders, users will navigate straight to the documents they need and easily skip the irrelevant ones.
  • More efficient browsing With the folder view, users will browse an order of magnitude more documents than with the traditional ranked list search.
  • Query refinement Folder labels generated by Lingo3G can help your users to refine their initial query and e.g. "drill down" on a specific subject.

Lingo3G powers innovative applications built by our customers in the areas of search results clustering, text mining and e-discovery, visual search, search engine optimization and other.

Next steps

Features

  • Meaningful hierarchical clusters with useful, concise and varied labels.
  • No external taxonomies or knowledge bases needed, Lingo3G categorizes documents based only on their text.
  • High performance, real-time clustering On a desktop machine, Lingo3G clusters 100 search results in less than 10 ms.
  • Scalability Lingo3G can cluster thousands of abstracts or full-text documents. Processing of 10.000 abstracts takes 530ms.
  • Easy integration Lingo3G can be called through its native Java API and C# APIs. Any other technology can call Lingo3G as a REST service. Lingo3G integrates with Apache Solr and Nutch.
  • Tuning of clustering characteristics and performance in a dedicated GUI application called Lingo3G Workbench.
  • 100% pure Java library with no platform-specific dependencies. Lingo3G works on any platform supporting Java 1.6 or higher, including Windows, Linux and Mac.
  • .NET Framework 3.5 or later required for Lingo3G C# API; Java runtime is not needed for the C# API.
  • Synonyms such as photos = pictures = pics = photographs can be defined to further increase the quality of clustering.
  • Label filtering can boost or suppress the specified content to e.g. highlight product names or remove abusive language.
  • 19 languages supported including English, German, French, Italian, Polish, Russian, Spanish, Chinese Simplified, Korean and Arabic. Lingo3G can automatically detect the language of the documents.
  • Based on open source architecture of the Carrot2 framework, and can use its components for fetching data from such sources as Yahoo!, MSN Live, Google, Lucene, Solr or Google Desktop index. If you integrated your code with Carrot2, switching to Lingo3G will be a breeze.

FAQ

more questions

APIs & Tools

see all
  • Lingo3G Document Clustering Workbench
  • Lingo3G Document Clustering Server
  • Lingo3G Command Line Interface

When we analyzed the market for text clustering software, we chose Lingo3G because it exceeded our expectations in several ways: easy to integrate, many configuration options and extremely fast and lightweight.

Stephan Schmid, CEO at Comcepta, Switzerland

Our evaluation found overwhelming support for using Lingo3G, enabling users to make connections that they had not been able to predict in advance, "broadening understanding", and so leading them to important new places.

Dr James Thomas, Associate Director, EPPI-Centre, Social Science Research Unit, Institute of Education, London

Carrot Search team is supporting us in an excellent way and, as a leading consultancy company, we consider Carrot Search a perfect partner to provide cutting-edge solutions to our clients.

Giorgio Pezzuto, Unit Manager at D'Appolonia, Italy