What are the system requirements for Lingo3G?
Lingo3G can run on any platform that suports Java 1.7 or later.
is available for Windows, Linux and Mac OS, all in 32-bit and 64-bit versions.
Memory and CPU requirements of Lingo3G are application-specific; a typical
desktop machine should be more than enough for medium-traffic
on-line search results clustering. Please contact us
for an evaluation license and specific performance advice.
What is the maximum number of documents Lingo3G can cluster?
Lingo3G was designed to perform real-time in-memory clustering of
small-to-medium collections of medium-length documents, which roughly
corresponds to about 5,000 documents, a few kilobytes each. The upper
limit very much depends on the characteristics of your documents,
some of our customers report that they successfully use Lingo3G with
as many as 100,000 documents. Please contact us
for an evaluation license and performance
Which languages does Lingo3G support?
Lingo3G supports clustering in 19 languages: English, Chinese
Simplified (experimental), Arabic (experimental), Danish, Dutch,
Finnish, French, German, Hungarian, Italian,
Norwegian, Polish, Portuguese, Romanian, Russian, Spanish, Swedish,
Turkish and Thai.
Can Lingo3G be customized for my application?
Yes, various aspects of Lingo3G can be customized:
Tuning of lexical resources: You can tune Lingo3G's stop word, boosted label lists and
synonyms to match e.g. the specific domain of documents being clustered.
Tuning of clustering: Lingo3G offers a large numbers of attributes that influence the
characteristics of the clusters (e.g. depth of cluster hierarchy, desired size and number of clusters,
length of cluster labels). We will be happy to help you tune
them for your specific application.
Language support: If Lingo3G does not support your language and you have the required
resources available (word segmentation algorithm, stop words, stemming algorithm, expertise in evaluating
the results), we can try to extend Lingo3G to support the language as part of Carrot Search consulting
Document sources: If there is a need to fetch documents from some external sources,
open source components from Carrot2
can be reused and customized as needed as part of Carrot Search consulting
Custom features: If there is a feature that you miss in Lingo3G, after a feasibility
analysis we can arrange to get it implemented as part of Carrot Search consulting services.
Can Lingo3G crawl and index my documents?
No, Lingo3G is a software component for clustering text documents.
Fetching of the documents is outside the scope of Lingo3G and should
be done by your integration code. Having said that, Lingo3G works
within the Carrot2 framework
, which provides components for fetching
search results from public search engines, querying Apache Lucene and
Apache Solr and loading documents from XML files
What is Carrot2 and how does it relate to Lingo3G?
is an open source search results clustering engine created
and maintained by the founders of Carrot Search. Apart from two
specialized clustering algorithms, it offers:
- a common framework and API for document clustering algorithms,
- components for fetching search results from various sources, such as public search engines, Apache Solr or Open Search,
- Document Clustering Workbench application for tuning of clustering,
- Document Clustering Server application for accessing clustering as a REST service,
- Search results clustering web application,
- Command Line Interface applications,
- Apache Solr plugin.
Lingo3G seamlessly plugs into Carrot2 and extends it with a very
fast and tunable hierarchical clustering algorithm. While Lingo3G
remains a proprietary piece of software, all Carrot2 components and
applications it plugs into are open source and can be re-used free
How does Lingo3G compare to the open source algorithms from Carrot2?
What is the difference between Lingo3G and Lingo?
While their names are very similar, Lingo3G and Lingo are two
completely different clustering algorithms. Lingo is available as
part of the open source Carrot2 framework
and offers decent
clustering quality for small collections of documents. Lingo3G was
built from the ground up to combine high quality of clustering with
high processing performance. See also: Comparison
What data sources does Lingo3G support?
Lingo3G uses Carrot2
components for fetching documents from external sources. Currently, the following
sources are supported:
- Custom XML stream or local file
- Public search engines: Google, Bing, PubMed
- Open Search
- Apache Solr
- Apache Lucene
Where can I learn more about Lingo3G?
The best place to start would be the Lingo3G Manual.
For an in-depth introduction to search results clustering algorithms and engines, please see:
A survey of Web clustering engines. ACM Computing Surveys (CSUR), Volume 41, Issue 3
(July 2009), Article No. 17, ISSN: 0360-0300 (PDF).
The paper reports on the evaluation of a number of search results clustering engines,