Search   |   Back Issues   |   Author Index   |   Title Index   |   Contents

Conference Report

spacer

D-Lib Magazine
November/December 2008

Volume 14 Number 11/12

ISSN 1082-9873

Cross-Language Evaluation Forum - CLEF 2008

 

CLEF logo

Carol Peters
Istituto di Scienza e Tecnologie dell'Informazione
Consiglio Nazionale delle Ricerche, Pisa
<carol.peters@isti.cnr.it>

Red Line

spacer
The Cross-Language Evaluation Forum (CLEF) has been running for nine years now. The results of the CLEF 2008 campaign were presented at a two-and-a-half day workshop held in Aarhus, Denmark, 17-19 September, immediately following the twelfth European Conference on Digital Libraries (ECDL 2008).

The objective of the Cross Language Evaluation Forum is to promote research in the field of multilingual system development. This is done through the organisation of annual evaluation campaigns offering tasks designed to test different aspects of mono- and cross-language information retrieval (IR) systems. The intention is to encourage experimentation with all kinds of multilingual information access – from the development of systems for monolingual retrieval operating on many languages to the implementation of complete multilingual multimedia search services. The aim is to encourage the development of next generation multilingual IR systems.

This year 100 groups, mainly but not only from academia, participated in the campaign. Most of the groups were from Europe, but there was also a good contingent from North America and Asia plus a few participants from South America and Africa.

CLEF 2008 Tracks

CLEF 2008 offered seven tracks designed to evaluate the performance of systems for:

  • multilingual textual document retrieval (Ad Hoc)
  • mono- and cross-language information on structured scientific data (Domain-Specific)
  • interactive cross-language retrieval (iCLEF)
  • multiple language question answering (QA@CLEF)
  • cross-language retrieval in image collections (ImageCLEF)
  • multilingual retrieval of Web documents (WebCLEF)
  • cross-language geographical information retrieval (GeoCLEF)

Two new tracks were offered as pilot tasks:

  • cross-language video retrieval (VideoCLEF)
  • multilingual information filtering (INFILE@CLEF)

In addition, MorphoChallenge 2008, an activity of the EU Network of Excellence Pascal, was organized in collaboration with CLEF.

Test Collections

Most of the tracks adopt a corpus-based automatic scoring method for the assessment of system performance. The test collections consist of sets of statements representing information needs known as topics (queries) and collections of documents (corpora). System performance is evaluated by judging the documents retrieved in response to a topic with respect to their relevance (relevance assessments) and computing recall and precision measures.

A number of document collections were used to build the test collections for CLEF 2008:

  • CLEF multilingual corpus of more than 3 million news documents in 14 European languages
  • Hamshahri Persian newspaper corpus
  • Library catalog records belonging to The European Library and derived from the archives of the British Library, the Austrian National Library and the Bibliothèque Nationale de France
  • English/German and Russian social science data
  • Collections used by the ImageCLEF track for both general photographic and medical image retrieval:
    • IAPR TC-12 photo database; INEX Wikipedia image collection
    • ARRS Goldminer database of radiographs; IRMA collection for medical image annotation
  • Dutch and English documentary television programs provided by Sound & Vision, The Netherlands
  • Agence France Press (AFP) comparable newswire stories in Arabic, French and English

Diverse sets of topics or queries were prepared in many languages according to the needs of the various tracks. At the end of the campaign, the result is a number of valuable and reusable test collections.

Workshop

The CLEF Workshops play an important role by providing the opportunity for all the groups that have participated in the evaluation campaign to get together to compare approaches and exchange ideas. The Workshop was held in Aarhus, Denmark, this year and was attended by 150 researchers and system developers. The schedule was divided between plenary track overviews, plus parallel, poster and breakout sessions. There were several invited talks. Noriko Kando, National Institute of Informatics Tokyo, reported on the activities of NTCIR-7 (NTCIR is an evaluation initiative focussed on testing IR systems for Asian languages), while John Tait of the Information Retrieval Facility (IRF), Vienna, presented a proposal for an Intellectual Property track that would focus on cross-language retrieval of legal patents in CLEF 2009.

Photo taken at the final session of CLEF 2008

Photograph of the final session of CLEF 2008

The presentations given at the CLEF Workshops and detailed reports on the experiments of CLEF 2008 and previous years can be found on the CLEF website. The preliminary agenda for CLEF 2009 will be available from mid-November.

CLEF and Treble-CLEF

CLEF 2008 was organized under the auspices of TrebleCLEF, a Coordination Action of the Seventh Framework Programme Over the years, CLEF has done much to promote the development of multilingual IR systems. However, the focus has been on building and testing research prototypes rather than developing fully operational systems. TrebleCLEF is building on and extending the results achieved by CLEF. The objective is to support the development and consolidation of expertise in the multidisciplinary research area of multilingual information access and to promote a dissemination action in the relevant application communities.

TrebleCLEF thus has three main goals:

  • To promote high standards of evaluation in Multilingual Information Access (MLIA) systems using three approaches: test collections; user evaluation; and log file analysis
  • To sustain an evaluation community by providing high quality access to past evaluation results
  • To disseminate knowhow, tools, resources and best practice guidelines, enabling DL creators to make content and knowledge accessible, usable and exploitable over time, over media and over language boundaries.

The aim will be to provide applications that need multilingual search solutions with the capability of identifying the most appropriate technology. For this purpose, a series of best practice workshops have been organised:

  • Workshop on Best Practices for the Development of Multilingual Information Access Systems, Segovia, Spain, June 2008
  • Workshop on Best Practices for System Developers: Bringing Multilingual Information Access to Operational Systems, Winterthur, Switzerland, October 2008
  • Workshop on Best Practices in Query Log Analysis, Spring 2009

A Summer School on Multilingual Information Access is also being organised for June 2009 in Pisa. The focus of the Summer School will be on "How to build effective MLIA systems and how to evaluate them".

More information on the activities of TrebleCLEF can be found on the website.

Links Referenced in this report

CLEF: <http://www.clef-campaign.org>.

NTCIR: <http://research.nii.ac.jp/ntcir/>.

TrebleCLEF: <http://www.trebleclef.eu>.

IRF: <http://www.ir-facility.org/>.

Copyright © 2008 Carol Peters
spacer
spacer

Top | Contents
Search | Author Index | Title Index | Back Issues
Previous Conference Report | Next Conference Report
Home | E-mail the Editor

spacer
spacer

D-Lib Magazine Access Terms and Conditions

doi:10.1045/november2008-peters