The 13th European Conference on digital libraries (ECDL 2009) was hosted this year on the beautiful Greek island of Corfù, from the 27th of September to the 2nd of October. Lively and crammed with information and events as always, the conference was attended by a relevant number of librarians and computer scientists, faculty and master students, IT specialists and data curators, repositories administrators and web designers (nearly 500 participants in total). The main conference was preceded by four tutorials: "Aggregation and reuse of digital objects' metadata from distributed digital libraries"; "Knowledge organization system (KOS) in digital libraries"; "Digital preservation: logical bit-stream preservation using Plato, Eprints and the Cloud"; and "Designing user interfaces for interactive information retrieval systems & digital libraries".
The technical program of the main conference was scheduled from Monday through Wednesday and included: two keynote addresses, two panel sessions, two special sessions (the first one on Services, the second one on Infrastructures), eight parallel sessions on the topics of Interaction, Knowledge organization, Interfaces, Resource Discovery, Architectures, Information Retrieval, Preservation, and Evaluation, and one poster and demo session, which was introduced by the Minute Madness and held on Monday evening. This year, 181 submissions were received (128 full papers; 15 short papers; and 38 demos or posters). Of these, 28 were accepted as full papers, 17 as short papers and 22 as demos or posters an overall acceptance rate of 22%.
A new conference feature was introduced this year namely "special tracks" with 4 separate calls for papers (on the topics: infrastructures, content, services, and foundations). This year, also, the posters and demos will be put on Second Life (see http://www.ionio.gr/conferences/ecdl2009/secondlife.php), together with virtual lectures and education events on digital libraries and preservation.
An impressive number of workshops followed the main conference programme, and reports from these are available in this issue of D-Lib Magazine.
On Monday 28 September, the opening keynote address (entitled "Digital libraries as a Phenotypes for Digital Societies") was given by Gary Marchionini, Cary C. Boshamer Professor in the School of Information and Library Science at the University of North Carolina at Chapel Hill (USA). Marchionini advocated for a new notion of digital libraries as social and cultural hubs within the Web 2.0 context that reveal and preserve the phenotypes of societies as they evolve. The next challenge for digital library administrators therefore will be to preserve the culturally enhanced content that is growing up around digital collections. Marchionini used as an example the VidArch Project (http://ils.unc.edu/vidarch/), a project supported by the National Science Foundation and the Library of Congress (as one of the NDIIPP research projects). The main goal of the VidArch Project is to develop a preservation framework for YouTube video objects.
The first parallel session of the conference was devoted to the topic of "Interaction". Interactivity was actually a thread running through the whole conference: interactive digital documents, human computer interaction (HCI), author reader interaction, and document user interaction. Three papers were presented in this session.
In the first paper, Parisa Eslambochilar, Fernando Loizides (Swansea University, UK), and George Buchanan (City University, UK) described the development of a sound-enhanced tool navigator for PDA devices by refining Speed-Dependent Automatic Zooming (SDAZ), software that automatically varies the zooming level in document browsing. The authors studied 24 students to evaluate the impact of sonic SDAZ. Test participants were split into two groups: one using SDAZ without sonic cues, and the other using SDAZ with sonic cues. The students were asked to perform eight tasks on document headings. Study results indicated that audio cues improve readers' performance in locating specific headings and a greater number of headings in the text.
George Buchanan was also one of the three authors of the second paper presented in the session on interaction. Together with Jennifer Pearson and Harold Thimbleby (FIT Lab, Swansea University, UK), he introduced the Visual Index System (VIS) software, which improves linear indexing by helping users create their own digital indexes in documents. In addition to traditional indexing functions, VIS provides three different types of visual indexes: a colour tag (different colours show different number of occurrences of the words), a tag cloud, and a graph.
Thomas Gottron (Institut for Informatik, University of Mainz, Germany) gave a striking presentation on tag clouds. On the Net, users' "information resources are abundant, user attention is scarce".1 Therefore, web designers strive to facilitate and enhance the experience for readers as they scan e-documents. To support quick visual reading, the Institut for Informatik of the University of Mainz developed a desktop http proxy server that analyzes web documents on the fly and converts words in the text into tag clouds. By using a very simple TF-IDF formula it is possible to describe a scheme for terms in a vector space information retrieval model to weight frequency and relevancy of words in the text. A basic user test showed clearly that word tag clouds are effective in helping users to identify relevant terms in a document more quickly (about 0.32 seconds).
The first special session, which was on "Services" opened with a talk on e-document text classification, entitled "Leveraging the legacy of conventional libraries for organizing digital libraries," and was given by Arash Joorabchi e Abdulhussain E. Mahdi (University of Limerick, Ireland). The authors have developed a new method for automatic text categorization (ATC) based on references cited in a document. Assuming that materials cited in a given document belong to the same classification category as that of the citing document, the Bibliography Based ATC uses a three-step process:
Joorabchi and Mahdi implemented the Bibliography Based ATC to classify one hundred electronic syllabus documents archived in the Irish National Syllabus Repository according to the Dewey Decimal Classification scheme.
Next, Nicola Ferro of the University of Padua (Italy) presented the "FAST way"2 to enhance digital document retrieval. The Flexible Annotations Service Tool (FAST) is software that allows users to search for and retrieve both annotations and correlated documents according to structured and unstructured queries. FAST has been implemented into the DELOS Digital Libraries management system, the prototype of the new generation digital library developed by DELOS, the EC-funded Network of Excellence on Digital Libraries (http://www.delos.info/). An improved version of FAST has recently been released.
The last presentation of the special session on services was entitled "wikiSearch: from access to use" and was given by Elaine G. Toms, Lori McCay Peet, and Tayze Mackenzie of the Centre for Management Informatics, Dalhousie University, Canada. Their paper was about enriching wiki user workflow by developing an interface, wikiSearch, to support wiki user multiple task activity. In particular, the wikiSearch allows a user to integrate search activity into the wiki work process. Thanks to this implementation, a wiki interface can be divided into three core components: the Task section, which contains the contents of the experimental task that participants are assigned; the Search section; and the Information object section, which is the scrollable wiki page loaded from the Search results. WikiSearch was tested with the help of an evaluation study that gave evidence of its high usability level.
The first conference panel was entitled "Digital libraries, personalisation, and network effects unpicking the paradoxes", and participants discussed the question: "Can libraries get data from users' behaviour?" The panel discussion was vivid and lively.
The final parallel session on Monday was the one devoted to interface developments. Fernando Loizides (Swansea University, UK) and George Buchanan (City University, UK) gave an interesting talk on the document triage, i.e., the moment in the information seeking process when a user first decides the relevance of a document to his or her information need. The authors performed an empirical assessment of readers' navigation patterns by carrying out a laboratory-based observational study of 20 users, ages 21 to 38. The scope of the study was to evaluate the impact of visual document features (PDF files) on user behaviour. Preliminary findings show that content that doesn't appear on the first page is very unlikely to gain user attention. Even the conclusion sections of the documents are seldom scrutinized by readers.
Of special interest to those involved in the area of digital humanities was the presentation held by Federico Boschetti, Matteo Romanello, Alison Babeu, David Bamman, and Gregory Crane (Perseus Digital Library, Tufts University, USA). Their paper was entitled "Improving OCR accuracy for classical critical editions". The authors are drawn from the Perseus Digital Library (http://www.perseus.tufts.edu/hopper/), a huge digital collection of classics, mostly Greek and Latin. In their presentation, Boschetti et al. focused their attention on the scalability of the workflow necessary to build a digital library of classical critical editions. In relation to Greek editions (text and critical apparatus), OCR scanning is particularly challenging. While advanced OCR engines can deal with polytonic Greek fonts used in 19th and 20th century, further improvements can be achieved with post-processing procedures.
The first conference day ended with the Minute Madness, a hectic session where every poster or demo presenter gets one minute to promote his or her work and to convince potential viewers to find out more about it by visiting their space at the evening's Poster and Demo session.
The second day of the main conference (Tuesday, 29 September) began with a special session on "Infrastructures". The first paper in that session was given by Christoph Becker, Hannes Kulovits, Michael Kraxner, Riccardo Gottardi, Andreas Rauber (Vienna University of Technology, Austria), and Randolph Welte (University of Freiburg, Germany) , who discussed how to guarantee quality Web-services for the migration of digital objects. They have developed a framework that is based on combining monitored migration web services with remote emulation. A series of experiments was run to evaluate the tool's performance and quality. The results of experiments on migration and emulation services show that the implemented tool can significantly reduce the effort that is necessary to preserve digital objects.
The second paper of the session was by Fabio Simeoni (Department of Computer and Information Sciences, University of Strathclyde, UK), Leonardo Candela (Istituto di Scienza e Tecnologie dell'Informazione "Alessandro Faedo", CNR, Pisa, Italy), David Lievens (Department of Computer Science, Trinity College, Dublin 2, Ireland), Pasquale Pagano, and Manuele Simi (Istituto di Scienza e Tecnologie dell'Informazione "Alessandro Faedo") who discussed how to apply the concept of adaptivity and more specifically of functional adaptivity to digital libraries services. Adaptivity implies developing as many different service interfaces as there are applications, in order to cope with the differences in users' needs and consequent system requirements. The authors presented two such implementations, i.e., fully fledged services and the local components of a single service. They are discussed in gCube, a service-based system that integrates Grid and Digital Library technologies to deploy, operate, and monitor Virtual Research Environments defined over infrastructural resources. In gCube the notion of functional adaptivity is rendered via the diverse implementation of the central DL functionality.
The last paper of the session, by Marco Dussin and Nicola Ferro (University of Padua, Italy), presented how digital libraries might be adopted in evaluation campaigns, where the DLs may be used as virtual research environments to archive, access, cite, disseminate, and share experimental results. The DIKW (Data, Information, Knowledge, Wisdom) paradigm is applied to such evaluation campaigns and to DIRECT, a DL architecture for scientific data.
The first parallel session on Tuesday was on "Resource Discovery" and started with a paper on the use of semantic technology in DLs by Sascha Tönnies and Wolf-Tilo Balke (L3S Research Center, IFIS TU Braunschweig, Germany). The authors argued that, although semantic techniques do decrease the amount of time necessary to index documents and maintain metadata, they are also responsible for decreasing the quality of a collection because of their statistical nature. In order to cope with this bottleneck, the authors conducted a user study comparing how experts in the field of chemistry assess the relations between keywords and documents automatically generated by semantic techniques. As a result of their study, they set up a roadmap for the further development of metrics.
The second paper of the "Resource Discovery" session, by Hamed Alhoori, Omar Álvarez, Richard Furuta, Miguel Muñiz and Eduardo Urbina (Texas A&M University, USA), dealt with online reputation-based social collaboration. In this case, too, a user study was performed to validate the hypothesis that good quality scholarly bibliographies can be generated via online social and reputation-based collaboration. This hypothesis was tested on the Cervantes project, by generating both closed and open bibliographies. The results of the study show that socially built bibliographies can produce better and more precise search outcomes than closed bibliographies can.
Finally, a very interesting paper (nominated for the best paper award) was the one given by Elaine Toms and Lori McCay-Peet (Centre for Management Informatics, Dalhousie University, Canada) on "Chance Encounters in the Digital Library". The central questions this paper was trying to answer are: "how to induce serendipity in science?" and "how to test it?" The authors have developed an interface that suggests to users other related pages they might want to look at, depending on their search history. Despite the fact that only limited use has been made of the interface so far, most users who have tried the interface found it useful and potentially valuable.
The parallel session on "Architectures" held on Tuesday included two papers on open source digital library systems. The first one, "Stress-testing General Purpose Digital Library Software" by David Bainbridge, Ian Witten (Department of Computer Science, University of Waikato, New Zealand), Stefan Boddie, and John Thompson (DL Consulting, Innovation Park, Hamilton, New Zealand) reported on scalability tests performed on DSpace, Fedora and Greenstone. The authors presented as a case study the production of a large collection in Greenstone, i.e., the Papers Past collection of the New Zealand National Library. The main goal of the study was to discover how the Greenstone software performs when used to produce fully searchable newspaper collections containing in excess of 20 GB of raw text (2 billion words, with 60 million unique terms), 50 GB of metadata, and 570 GB of images.
The second paper of the "Architectures" session was entitled "eSciDoc Infrastructure: a Fedora-based e-Research Framework". The paper by Matthias Razum, Frank Schwichtenberg, Steffen Wagner, and Michael Hoppe (FIZ Karlsruhe, Germany) discussed the challenges of e-Research infrastructure in the context of the Fedora-based eSciDoc Infrastructure, a framework jointly created by the German Max Planck Society and FIZ Karlsruhe. A set of requirements must be taken into account in order to build powerful e-Research infrastructures: i.e., providing reliable citations; maintaining both data and publications; building different solutions for different researchers' needs; supporting the veracity and the fidelity of research and the re-use of data; supporting collaborative work; and endorsing the long term preservation of complex compound objects in e-Research.
The last two papers of the session were given by Nicola Ferro and Gianmaria Silvello (Department of Information Engineering, University of Padua, Italy) on how to build (and to handle) hierarchical data structures, and by Pauline Ngimwa, Anne Adams (Institute of Educational Technology (IET, Open University, UK), and Josh Underwood (London Knowledge Lab, Institute of Education, University of London, UK ) on the usage patterns of educational DLs.
The parallel session on "Information Retrieval" was devoted to discussions on information retrieval (IR) in several topic areas, including information retrieval in digital music libraries (David Damm, Department of Computer Science III, University of Bonn , Germany, Frank Kurth, Research Establishment for Applied Science, Wachtberg, Germany, Christian Fremerey and Michael Clausen, Dep. of Computer Science III, University of Bonn), with a focus on appropriate query formulation and appropriately ranked lists of results; on the identification of fish species (Uma Murthy, Edward Fox, Yinlin Chen, Department of Computer Science, Virginia Tech, Blacksburg, USA, Eric Hallerman, Department of Fisheries and Wildlife Sciences, Virginia Tech, Ricardo Torres, Evandro Ramos, and Tiago Falcao, Institute of Computing, University of Campinas, Brazil), as an alternative to dichotomous keys for the identification of species; on an architecture for indexing (Ndapandula Nakashole and Hussein Suleman, Max-Planck Institute, Germany and Department of Computer Science, University of Cape Town, South Africa); and on a "Compressed Self-Indexed Representation of XML Documents" (Nieves R. Brisaboa, Ana Cerdeira-Pena, Database Lab, University of Coruña, Spain, and Gonzalo Navarro, Department of Computer Science, University of Chile).
On Wednesday, 30 September, the session on "Preservation" began with a talk by Angela Dappert and Adam Farquhar of the British Library. They argued that in the preservation context trying to retain every aspect of the original object can be costly, infeasible and sometimes undesirable. Therefore, in preservation actions it is necessary to establish the most significant characteristics of the content to be preserved. Dappert and Farquhar stated that significance is not inherent in or determined by the file formats of digital objects but by the needs and requirements of stakeholders. Representation information may or may not be significant depending on the needs of different communities. They concluded that "a well-designed archival format profile will support properties that are of interest to a substantial community of stakeholders and appear in a substantial subset of content in the full file format."
The second presentation of the "Preservation" session, by Louis Martinez-Uribe and Stuart MacDonald (University of Oxford, University of Edinburgh, UK), dealt with the problem of research data curation. Martinez and Macdonald's theory is that a key for developing efficient, powerful and above all useful data repositories is the engagement of the different researchers' communities in the process of data curation. According to the authors, curation activities need to start very early in the research lifecycle and must involve researchers. In the UK the JISC-funded DISC-UK DataShare project (http://www.disc-uk.org/datashare.html) has explored a number of technical, legal and cultural issues surrounding research data in repository environments.
Last two presentations of the "Preservation" session dealt respectively with the topic of file format robustness (Volker Heydegger, Historisch-Kulturwissenschaftliche Informationsverarbeitung (HKI), Univeristy of Köln,, Germany), and with the topic of digital rights (Claudio Prandoni, Marlis Valentini, and Martin Doerr, Metaware, Pisa, Italy and Institute of Computer Science, FORTH, Crete, Greece). The latter presented an innovative domain ontology of Intellectual Property Rights.
As mentioned at the beginning of this report, ECDL was crammed with events, and consequently, we could not attend every one of them; therefore, there were paper presentations that are not covered here. For information about those program items not covered here, please visit the conference web site (http://www.ecdl2009.org), and also see the Workshop reports in this issue of D-Lib Magazine.
ECDL 2009 was rich in content and very successful. All participants are already looking forward to ECDL 2010, which will be held in September in Glasgow (http://www.ecdl2010.org/).
2. Nicola Ferro's presentation was entitled "Annotation search: the FAST way".
Copyright © 2009 Maria Cassella and Licia Calvi