Carrot Search: Lingo3G: FAQ

What is the cost of a Lingo3G license?

Lingo3G comes in three editions, please contact us for pricing.

Can I get an evaluation license for Lingo3G?

Absolutely! You can get a free trial license.

Can Lingo3G be integrated with C#?

Yes, you can use Lingo3G C# API to call clustering from your C / .NET code. Java runtime is not required for the C# API.

What are the Lingo3G licensing terms?

Please see the Lingo3G editions summary for information on licensing, updates and support terms.

What are the system requirements for Lingo3G?

Lingo3G can run on any platform that suports Java 1.7 or later. Lingo3G Workbench is available for Windows, Linux and Mac OS, all in 32-bit and 64-bit versions. Memory and CPU requirements of Lingo3G are application-specific; a typical desktop machine should be more than enough for medium-traffic on-line search results clustering. Please contact us for an evaluation license and specific performance advice.

What is the maximum number of documents Lingo3G can cluster?

Lingo3G was designed to perform real-time in-memory clustering of small-to-medium collections of medium-length documents, which roughly corresponds to about 5,000 documents, a few kilobytes each. The upper limit very much depends on the characteristics of your documents, some of our customers report that they successfully use Lingo3G with as many as 100,000 documents. Please contact us for an evaluation license and performance tuning advice.

Which languages does Lingo3G support?

Lingo3G supports clustering in 19 languages: English, Chinese Simplified (experimental), Arabic (experimental), Danish, Dutch, Finnish, French, German, Hungarian, Italian, Norwegian, Polish, Portuguese, Romanian, Russian, Spanish, Swedish, Turkish and Thai.

Can Lingo3G be customized for my application?

Yes, various aspects of Lingo3G can be customized:

  • Tuning of lexical resources: You can tune Lingo3G's stop word, boosted label lists and synonyms to match e.g. the specific domain of documents being clustered.
  • Tuning of clustering: Lingo3G offers a large numbers of attributes that influence the characteristics of the clusters (e.g. depth of cluster hierarchy, desired size and number of clusters, length of cluster labels). We will be happy to help you tune them for your specific application.
  • Language support: If Lingo3G does not support your language and you have the required resources available (word segmentation algorithm, stop words, stemming algorithm, expertise in evaluating the results), we can try to extend Lingo3G to support the language as part of Carrot Search consulting services.
  • Document sources: If there is a need to fetch documents from some external sources, open source components from Carrot2 can be reused and customized as needed as part of Carrot Search consulting services.
  • Custom features: If there is a feature that you miss in Lingo3G, after a feasibility analysis we can arrange to get it implemented as part of Carrot Search consulting services.

Can Lingo3G crawl and index my documents?

No, Lingo3G is a software component for clustering text documents. Fetching of the documents is outside the scope of Lingo3G and should be done by your integration code. Having said that, Lingo3G works within the Carrot2 framework, which provides components for fetching search results from public search engines, querying Apache Lucene and Apache Solr and loading documents from XML files

What is Carrot2 and how does it relate to Lingo3G?

Carrot2 is an open source search results clustering engine created and maintained by the founders of Carrot Search. Apart from two specialized clustering algorithms, it offers:

Carrot2-Lingo3G relationship diagram
  • a common framework and API for document clustering algorithms,
  • components for fetching search results from various sources, such as public search engines, Apache Solr or Open Search,
  • Document Clustering Workbench application for tuning of clustering,
  • Document Clustering Server application for accessing clustering as a REST service,
  • Search results clustering web application,
  • Command Line Interface applications,
  • Apache Solr plugin.

Lingo3G seamlessly plugs into Carrot2 and extends it with a very fast and tunable hierarchical clustering algorithm. While Lingo3G remains a proprietary piece of software, all Carrot2 components and applications it plugs into are open source and can be re-used free of charge.

How does Lingo3G compare to the open source algorithms from Carrot2?

Compared to the algorithms distributed with Carrot2 , Lingo3G offers hierarchical clustering, better cluster labels, fewer unclustered documents and much higher processing speed.

What is the difference between Lingo3G and Lingo?

While their names are very similar, Lingo3G and Lingo are two completely different clustering algorithms. Lingo is available as part of the open source Carrot2 framework and offers decent clustering quality for small collections of documents. Lingo3G was built from the ground up to combine high quality of clustering with high processing performance. See also: Comparison.

What data sources does Lingo3G support?

Lingo3G uses Carrot2 components for fetching documents from external sources. Currently, the following sources are supported:

  • Custom XML stream or local file
  • Public search engines: Google, Bing, PubMed
  • Open Search
  • Apache Solr
  • Apache Lucene

Where can I learn more about Lingo3G?

The best place to start would be the Lingo3G Manual. For an in-depth introduction to search results clustering algorithms and engines, please see:

A survey of Web clustering engines. ACM Computing Surveys (CSUR), Volume 41, Issue 3 (July 2009), Article No. 17, ISSN: 0360-0300 (PDF).

The paper reports on the evaluation of a number of search results clustering engines, including Lingo3G.

Easy to integrate, many tuning options, very fast and lightweight.

Stephan Schmid, CEO at Comcepta, Switzerland

Our evaluation found overwhelming support for using Lingo3G.

Dr James Thomas, Associate Director, EPPI-Centre, Social Science Research Unit, Institute of Education, London

I’ve shown two board members of our client company what our FoamTree-powered app does. Amazing what a good visualization can accomplish :-)

René de Vries, Managing Director at HowardsHome