What are the system requirements for Lingo3G?
Lingo3G can run on any platform that suports Java 1.6 or later. Lingo3G Workbench
is available for Windows, Linux and Mac OS, all in 32-bit and 64-bit versions. Memory and CPU requirements of Lingo3G are application-specific; a typical desktop machine should be more than enough for medium-traffic on-line search resuts clustering. Please contact us
for an evaluation license and specific performance advice.
What is the maximum number of documents Lingo3G can cluster?
Lingo3G was designed to perform real-time in-memory clustering of small-to-medium collections of medium-length documents, which roughly corresponds to about 5,000 documents, a few kilobytes each. The upper limit very much depends on the characteristics of your documents, some of our customers report that they successfully use Lingo3G with as many as 100,000 documents. Please contact us
for an evaluation license and performance tuning advice.
Which languages does Lingo3G support?
Lingo3G supports clustering in 19 languages: English, Chinese Simplified (experimental), Arabic (experimental), Danish, Dutch, Finnish, French, German, Hungarian, Italian, Korean (experimental), Norwegian, Polish, Portuguese, Romanian, Russian, Spanish, Swedish, Turkish.
Can Lingo3G be customized for my application?
Yes, various aspects of Lingo3G can be customized:
- Tuning of lexical resources: You can tune Lingo3G's stop word, boosted label lists and synonyms to match e.g. the specific domain of documents being clustered.
- Tuning of clustering: Lingo3G offers a large numbers of attributes that influence the characteristics of the clusters (e.g. depth of cluster hierarchy, desired size and number of clusters, length of cluster labels). We will be happy to help you tune them for your specific application.
- Language support: If Lingo3G does not support your language and you have the required resources available (word segmentation algorithm, stop words, stemming algorithm, expertise in evaluating the results), we can try to extend Lingo3G to support the language as part of Carrot Search consulting services.
- Document sources: If there is a need to fetch documents from some external sources, open source components from Carrot2 can be reused and customized as needed as part of Carrot Search consulting services.
- Custom features: If there is a feature that you miss in Lingo3G, after a feasibility analysis we can arrange to get it implemented as part of Carrot Search consulting services.
Can Lingo3G crawl and index my documents?
No, Lingo3G is a software component for clustering text documents. Fetching of the documents is outside the scope of Lingo3G and should be done by your integration code. Having said that, Lingo3G works within the Carrot2 framework
, which provides components for fetching search results from public search engines, querying Apache Lucene and Apache Solr and loading documents from XML files
What is Carrot2 and how does it relate to Lingo3G?
Carrot2 is an open source search results clustering engine created and maintained by the founders of Carrot Search. Apart from two specialized clustering algorithms, it offers:
- a common framework and API for document clustering algorithms,
- components for fetching search results from various sources, such as public search engines, Apache Solr or Open Search,
- Document Clustering Workbench application for tuning of clustering,
- Document Clustering Server application for accessing clustering as a REST service,
- Search results clustering web application,
- Command Line Interface applicatios,
- Apache Solr plugin.
Lingo3G seamlessly plugs into Carrot2 and extends it with a very fast and tunable hierarchical clustering algorithm. While Lingo3G remains a proprietary piece of software, all Carrot2 components and applications it plugs into are open source and can be re-used free of charge.
How does Lingo3G compare to the open source algorithms from Carrot2?
What is the difference between Lingo3G and Lingo?
While their names are very similar, Lingo3G and Lingo are two completely different clustering algorithms. Lingo is available as part of the open source Carrot2 framework
and offers decent clustering quality for small collections of documents. Lingo3G was built from the ground up to combine high quality of clustering with high processing performance. See also: Comparison
What data sources does Lingo3G support?
Lingo3G uses Carrot2 components for fetching documents from external sources. Currently, the following sources are supported:
- Public search engines: Google, Bing, PubMed
- Open Search
- Custom XML stream or local file
- Apache Solr
- Apache Lucene
- Google Desktop
Where can I learn more about Lingo3G?
The best place to start would be the Lingo3G Manual. For an in-depth introduction to search results clustering algorithms and engines, please see:
A survey of Web clustering engines. ACM Computing Surveys (CSUR), Volume 41, Issue 3 (July 2009), Article No. 17, ISSN: 0360-0300 (PDF).
The paper reports on the evaluation of a number of search results clustering engines, including Lingo3G.