Leonid A. Kalinichenko
Digital Library (DL) research is that field of research and development aiming to promote the theory and practice of processing, dissemination, storage, search and analysis of various digital data. Digital Libraries acting as knowledge depositories can be considered as complex information systems, development and use of which require solution of numerous scientific, technological, methodological, economic, legal and other issues. Digital Library technologies are rapidly developing. Challenges in semantics, integration of information, and perceptions of presentation of various kinds of data call for significant innovations. Development of Digital Library technologies is becoming more and more essential for raising the standards of health care, education, science and economy, as well as the quality of life on the whole. Projects devoted to generation of the digital form of information concerning the Earth, Universe, Literature, Art, Environment and Humans, accumulated by humanity, are examples of the intensive development of global information repositories.
RCDL'2002 - the Fourth All-Russian Scientific Conference "Digital Libraries: Advanced Methods and Technologies", took place at the Joint Institute for Nuclear Research, Dubna, on 15 - 17 October 2002. RCDL'2002 is the fourth conference on this subject (earlier conferences included: 1999 - St. Petersburg, 2000 - Protvino, and 2001 - Petrozavodsk). The principal objective of the RCDL conference series is to promote the establishment of a community of Russian experts involved in research and development related to Digital Libraries. The Conference offers such a community the opportunity to discuss ideas and outcomes and to make contacts for closer cooperation. Much attention is focused on advanced applications and technologies. In addition, pilot applications and digital collections developed within the framework of the RFBR grants on digital libraries and other programs are also discussed at the RCDL conferences. The Conference also promotes the study of international experience and development of international cooperation on Digital Libraries. The Conference is conducted in both the Russian and English languages. In 2002, Yannis Ioannidis (University of Athens, Greece) helped prepare the Conference and acted as its European Coordinator. Christine Borgman (UCLA, USA) provided a liaison with the ACM SIGIR. The Program Committee of RCDL'2002 included members from many nations.
Ninety-seven extended abstracts were submitted for this year's conference. The Program Committee reviewed all of them and selected 59 submissions for regular sessions and 13 additional ones for poster presentations. The Conference Proceedings, which includes all the full papers, was published prior to the conference. RCDL'2002 was supported by grants from the Russian Foundation for Basic Research and the Ministry for Science of the Russian Federation.
One hundred and four specialists from 16 Russian cities and 15 foreign attendees from Germany, Hungary, Latvia, Moldova, Ukraine and the USA took part in the conference.
2. The Program of the Conference
2.1 Conference Structure
2.2 General Features of the Program
Digital libraries are more and more actively coming into use within scientific organizations and universities. At present, one can hardly imagine a western university having no well-developed digital library. Digital libraries created on the basis of licensing agreements with publishing houses frequently displace subscriptions to print journals. At the same time, most university digital libraries functionally resemble traditional libraries or are combined with them to become "hybrid libraries". In perspective, digital libraries should become the depositories of knowledge.
The program of RCDL'2002 reflected the wide range of various interpretations of the "digital library" concept. The problems of hybrid library development were mixed in the program with the presentations of digital libraries and information systems for the organization of science. The considered examples of specialized scientific collections showed that, from the informational, structural and functional viewpoints, they go far beyond the traditional library capabilities. The problems of developing a virtual astronomical observatory were considered in detail in the context of international cooperation. Significant attention in the RCDL'2002 program was given to the creation of integrated depositories of scientific informationglobal ones in particular areas of knowledgeand formed by means of advanced technologies (mediators) and traditional approaches. Problems regarding the creation of digital archives were also discussed. Special research papers were collected in the sessions on semantics of information resources, methods of representation, retrieval and indexing of documents. The conference emphasized digital libraries in education as elements of the virtual educational environment. Particular expectations are imposed on them in connection with the global transformation of education under influence of the information technologies. Besides the special session devoted to these issues, the International Expert Meeting of the UNESCO Institute for Information Technologies in Education was organized at the conference in form of a round-table discussion considering the state of digital libraries in education.
2.3 Characterization of Presentations
Session 1 featured talks given by representatives of several large Russian libraries who discussed the status, applied technologies and perspectives regarding the development of a digital component in their hybrid libraries.
Session 2 was about Digital libraries for education. Mary Marlino (UCAR, USA) reviewed professionally oriented community aspects of creating narrowly profiled digital libraries for education (using for her example, DLESE - the Digital Library for Earth System Education). The role of the professional community during the development of such a subject-oriented library and mutual influence of the community and the library were explained. A new approach to organization of educational courses on the basis of the materials, accumulated in digital libraries, was presented in the talk given by Alex Ushakov on behalf of a team from the University of California at Santa Barbara working on the well-known Alexandria Digital Library Project. In this approach, the conceptual medium of a subject domain was emphasized. Applying this hypothesis and using a physical geography course as an example, the build-up of a teaching environment emphasizing the leading role of concepts and digital libraries in training was exhibited. The engineering aspects of creating a digital library for airspace education were surveyed in the talk given by E.B. Kudashev.
Session 3 focused on semantic aspects of information resources. A talk given by L.A. Kalinichenko and N.A. Skvortsov focused on using the DAML + OIL ontological model draft standard, developed by W3C, for the subject mediation applying a reversible mapping of this ontology into a canonical model of the mediator. The ideas of a "visual thesaurus", "visual" metadata and indexing of visual data looked debatable, though not motivated enough in the paper presented by I.M. Zatsmann.
Session 4. A brief overview of the problems of developing digital libraries for organization of science was presented by JINR, by a group of physical institutes of the Russian Academy of Science (RAS) and Kazan State University. E.N. Filinov and A.V. Boychenko once again attempted to consider the standards of representation of digital library resources simultaneously for science, culture, and education. The presented material was not synchronized well enough with the actual state of resource representations used in modern digital libraries reflecting a gap with the world community in this area of the intensively developing technologies.
In Session 5, Guenther Eichhorn presented his paper regarding the large digital library of publications in the field of astronomyThe Astrophysics Data System (ADS). It is an impressive collection. O.B. Dluzhnevskaya and O. Yu. Malkov shared plans for integration of the Russian scientific astronomy community in the international movement in a direction of the Virtual Astronomical Observatory (VAO). For this purpose, the project of the Russian Virtual Observatory as a component for integration into the International VAO is being worked out. The talk by V.V. Vitkovsky and his colleagues presented information on the contribution of the special astrophysical observatory of RAS into VAO. (Talks on astronomical collections were also presented at Sessions 7 and 9.)
Session 6 played an important role in the structure of the conference and was devoted to Data Grid and perspectives of using this architecture in digital libraries. The invited talk by Ilya Zaslavsky from the San Diego Supercomputer Center contained a brief survey of the technologies developed at this centerthe Storage Resource Broker (SRB), a representative of Data Grid, and MIX, the mediator implementing the "Global as View" approach to integration of heterogeneous data sources. These architectures are still considered separately, though their integration is expected in the future. The talk given by V.V. Korenkov explained the structure of the large project of the European Union on Data Grid and involvement of Russia in this project. These two talks allowed the conference attendees to compare various Data Grid architectures being developed by the global community.
At Session 7 the talks from Russia and Ukraine were devoted to the creation of databases of archives of photographic plates accumulated in Pulkovskaya and Crimean observatories.
Session 9 dealt with the application issues of various technologies to the creation of astronomical collections (object models for pulsar data in object-oriented environment (A.E. Avramenko's paper) and XML for various observational data (V.V. Vitkovsky's paper). The first talk characterized using the object interoperability on the basis of CORBA/DCOM, the second talk was devoted to usage of Web-services and their interoperation on the basis of SOAP, WSDL, and UDDI technologies.
Sessions 10 and 13 were devoted to the methods of representation and retrieval of documents. Benjamin M. Gross (UIUC, USA) has analyzed procedures of work with e-mail (modes of choice of addresses, sorting of letters according to categories, etc.) and proposed his variant of the prototype of a system using a memory for messages (mails) as a relational database at a lower layer and a set of services on an upper layer (for example, a text and metadata indexing service) for improving organization of storage of messages, their sampling, addressing and navigating. Many of the offered solutions can also be applied for the organization of digital collections of other types. Of note were the talks presented by specialists from the St. Petersburg State University on the activities supported by the RFBR grants and devoted to research on the possibility of automatic detecting of HTML-documents having similar structure (i.e., receiving information facilitating creation of wrappers) and the possibility of using information about the content of documents in a neighborhood of identified Web-pages for retrieval quality improvement. The paper presented by B.V. Dobrov and N.V. Lukashevich is dedicated to the development of multilingual information systems, including facilities for automatic processing, indexing and retrieval of documents in "multilingual" collections. Principles of development and filling by scientific information (in various areas of science) of the Integrated Distributed Information System (IDIS) of the Siberian Branch of RAS applying an extended document object model (DOM) was presented in the talk given by Yu. I. Shokin, A.M. Fedotov and Yu. V. Leonov. The paper by M.V. Gubin presented the results of researches intended for choosing a method of the indexed files compression (the basic index structure used for text retrieval).
Session 11 was on integrated scientific repositories. Professor Bernd Wegner (Institute of Mathematics of the Technical University in Berlin) noted that in case of development of knowledge bases having the form of a digital library, the creation of global repositories is required. This, in its turn, is related to three aspects:
The paper was devoted to some details of this activity, in particular EMANI - Electronic Mathematics Archives Network Initiative (international project) and ERAM - Electronic Research Archive in Mathematics (German project)the projects applying distributed network architecture. Besides that, a plan for development of the global Electronic Library on Mathematics (DML and RusDML) was characterized.
Two other papers presented at Session 11 (with involvement of specialists from the Institute of Mathematics of RAS, the Institute of Informatics Problems of RAS, the Institute of Cytology and Genetics of Siberian Branch of RAS and the Institute of Computational Mathematics and Mathematical Geophysics of Siberian Branch of RAS) were devoted to various issues of implementation of distributed digital libraries in the areas of molecular biology, biotechnology and medicine and, in particular, to implementation of the Gene Discovery / GeneExpress system involving the TRRD, SWISSPROT databases (structure and functions of proteins, their classification, etc.), EMBL/GenBank (sequences of DNA and RNA), and Medline. Regretfully, the form of presentation of the material heavily relied on familiarity with the terminology and concepts of the related research area.
Session 12 focused on integration of heterogeneous collections. Yu.S. Zatuliveter underlined a forthcoming problem of transforming the Internet into a programmable metacomputer by activation of functionalities of the network computers for global system (suppression of information noise, structuring and integration of information resources, automatic control over computing resources) and user tasks. It was noted that Grid-technology is the first serious step in this direction.
Two other papers in Session 12 (presented by V.A. Kapustin and O.L. Zhizhimov with their co-authors) were devoted to possibilities and tools of applying Z39.50 protocol for creation of profiled distributed information systems (standardization of metadata, schemes of data). Last but not least, the session demonstrated the Library Subsystem of the Integrated system of information resources of RAS as a medium of the library registries providing access to the materials of the libraries of RAS Institutes (this is a joint work of Computing Centre of RAS and the Centre of scientific telecommunications and information technologies of RAS).
Within the frame of Session 14 (Archives) the talk given by Paul Braslavsky (Ural Branch of RAS) and Tomas Krichel (USA) should be noted. It was devoted to a technology of organizing repositories accessible through Web, to formats and usage of the Dublin Core metadata standard in accordance with the OAI (Open Archive Initiative) protocol for the academic organizations, their documents and collections. The papers by the team of authors from the RAS Institute that deals with problems of information transfer and the Institute of Informatics Systems of the Siberian Branch of RAS characterized the technologies of creation and usage of a text-graphics database on the history of the Russian fundamental science applying the archive of RAS and personal archives.
Session 15 covered document indexing. Within the frame of this session, two papers from St. Petersburg were presented. A. Koryavko and I. Nekrestyanov considered the problem of building retrieval systems on the Web with alternative approaches to the rating of the "usefulness" of Web-pages for a particular user, applying not only the content of a document, but also meta-information on both the document and the user (including his previous inquiries, what documents were retrieved and how much time it took for him to read them after his query, etc. This approach provides for more effective ranking of documents). Capabilities of one of the representatives of the page ranking methods based on information on relationships between Web-pages (the Kleinberg algorithm) were analyzed and extended. Facilities for searching in the environment of semi-structured data were discussed in the talk given by B.S. Khvostichenko and B.A. Novikov.
3. Expert Meeting "Digital Libraries in Education" (UNESCO IITE)
On October 15, the Institute for Information Technologies in Education of UNESCO, in cooperation with RCDL'2002, JINR and IPI RAS organized an International Expert Meeting "Digital libraries in Education". IITE UNESCO develops projects on applying digital libraries in education. The purpose of the Expert Meeting was to present the Analytical Survey "Digital libraries in Education" prepared by an international group of experts. The content of the Survey was presented at the Meeting by Professor L.A. Kalinichenko.
In the presentation of survey results, some technological aspects of developing digital libraries were considered and selected projects in the USA and Europe were summarized. For instance, in the USA, a national digital library is being developed in the field of science, technology, engineering and mathematics (NSDL) that is oriented on the usage in education and science. NSDL (a first version of the system became operational in December, 2002) is developed as an integrated distributed information environment. NSDL provides the possibility for access to a large volume of heterogeneous digital objects including multimedia, geo-referenced objects, the objects representing measurement data, samples under study and even expensive instruments for remote access (like an electronic microscope). In view of such a variety of information objects, NSDL supports a multiple set of various metadata standards. Interfaces of such systems are evolved from traditional ones, based on keywords, in the direction of more semantic interfaces (for example, usage for queries of benchmarks of the Atlas of Literacy developed recently in the USA). Quite an intensive development of NSDL is planned, including possible positioning of this library as a sub-structure of the federal government.
CITIDEL, an example of an NSDL component, is an interactive digital library in the field of computer science and information technologies. Another example is a networked digital library of theses and dissertations (NDLTD). These are distributed infrastructures with multilingual access, support of multiple methods of information retrieval, and metadata acquisition. NDLTD is supported at a state level in a number of countries including Australia, Brazil, Germany, India, Korea, and the USA, as well as by some national libraries (including the British Library). An interesting example of the high-quality educational library in the area of a specific subject domain is DLESE (the Digital Library for Earth System Education).
It is important to note that along with supporting traditional digital objects, infrastructures are developed in which information objects become the data streams measured in real time (for example, results of atmospheric measurements on the Earth surface, in higher layers of the atmosphere, radar measurements, satellite observations, etc.). In the USA, data networks have been developed that deliver such real time measurements to hundreds of universities. There are projects intended to make such data streams a part of the NSDL information. New "cyber-infrastructures" are progressing for science and education, providing new approaches to the creation of digital libraries. In the data grids the term xGrid, where x means a data domain, designates a structure (for example, BioGrid), joining the experts, information and tools in this subject area. One of the purposes of such grids is the open publication of scientific information.
The Analytical Survey considers the evolution of these projects, at least in a five-year perspective. Besides that, the advanced approaches of development of such libraries for educational purposes are analyzed. The Survey is completed by guidelines for the next stage of the UNESCO projectthe development of the educational modules oriented on various groups of learners in developing countries.
The following conference attendees took part in the discussion of the Analytical Survey: Dr. Mary Marlino (UCAR, Boulder, CO, USA), Alex Ushakov (UC in Santa Barbara, CA, USA), Prof. Bernd Wegner (TU Berlin, Germany), Dr. Stephan Koernig (TU Darmstadt, Germany), Prof. V.P. Shirikov (JINR, Dubna, Russia), Dr. S.A. Khristochevsky (IITE UNESCO), Prof. A.G. Marchuk (ISI SB RAS, Novosibirsk, Russia), Dr. V.N. Zakharov (IPI RAS), and others. The Meeting recommended publishing and widely disseminating the Analytical Survey and proceeding to the next phase of the UNESCO project.
4. The Conference Recommendations
At the closing session of the conference, participants approved the following recommendations:
Copyright © Leonid A. Kalinichenko, Vladimir V. Korenkov, Vladislav P. Shirikov, Alexey N. Sissakian, and Oleg V. Sunturenko