Report for the ERPANET/CODATA Seminar on the Selection, Appraisal and Retention of Digital Scientific Data Released
Rapid advances in technology are impacting the way scientists work, allowing greater amounts of digital data to be produced in the majority of scientific disciplines. These technological advances are also changing the way scientists interact, creating opportunities for collaborations across disciplines, institutions, and countries. The ever-increasing data that are generated through these advances require active curation to ensure their longevity. The international EPRANET/CODATA seminar examined the current state of practice of the selection, appraisal and retention among diverse scientific communities and discussed how archival concepts can best be applied to the management and long-term preservation of digital data.
The seminar, held from 15-17 December 2003 at the Biblioteca Nacional in Lisbon, brought together more than sixty-five researchers, data managers, information specialists, archivists, and librarians from thirteen countries to discuss the issues involved in making critical decisions regarding the long-term preservation of the scientific record. One of the major aims for this seminar was to provide an international forum to exchange information about data archiving policies and practices across different scientific, institutional, and national contexts. The seminar proved to be extremely successful in enabling discussions between scientific and archival communities. The seminar also highlighted some conceptual hurdles to overcome before effective collaboration between the diverse communities can take place.
This seminar was an important first step in the journey towards openness and collaboration between scientific disciplines, archivists, and other information specialists in the area of data curation and preservation. The seminar illustrated areas where each can learn from the others in establishing common frameworks and guidelines that will enable the effective selection, appraisal and long-term retention of digital scientific data.
The Electronic Resource Preservation and Access Network (ERPANET) and the Committee on Data for Science and Technology (CODATA) are pleased to announce the release of the final report for this seminar and invite all stakeholders involved in the creation and curation of digital scientific data to review the results at http://www.erpanet.org/www/products/lisbon/LisbonReportFinal.pdf.
On May, 11, 2004, the German part of the Sun Center of Excellence for Trusted Digital Repositories (http://www.coe.hu-berlin.de) was launched at Humboldt University Berlin, Germany within a colloquium of the Computer and Media Services. The Austrian part will be initiated on June, 24, in Graz.
With the Sun Center of Excellence programme a cooperation among leading research and education institutions, including external partners and Sun Microsystems, shall be established in order to carry out investigations in current information technology on the basis of Sun technology and products. The results of the investigations will be made available to other institutions.
The Sun Center of Excellence for Trusted Digital Repositories is a joint project of the Austrian Literature Online Consortium (ALO) (Austria), Humboldt University Berlin (Germany), XiCrypt GmbH (Austria) and Sun Microsystems (USA). The goal of the Center of Excellence is the development and implementation of a "Trusted Digital Repository" based on the ideas of the "Reference Model for an Open Archival Information System (OAIS)" and the RLG report on "Trusted Repositories".
The partners of the Center of Excellence for Trusted Digital Repositories are convinced that Trusted Digital Repositories are capable of solving the key issues of long-term preservation of digital information. The Center of Excellence is based on the previous work and implementations of the partners, such as the Humboldt University document and publication server, the Xades implementation for long-term preservation of digital signatures developed at XiCrypt, and the digital repository software implemented by ALO.
Within the Center of Excellence, a framework is being developed to handle preservation mechanisms within a trusted digital archive, a user interface for accessing the trusted digital archives and methods to apply digital signatures and time stamps to the digital documents.
For more information, contact:
Search and Retrieve Web Service (SRW) and Search and Retrieve URL Service (SRU) are Web Services-based protocols for querying databases and returning search results. SRW and SRU requests and results are very similar. The difference between them lies in the ways the queries and results are encapsulated and transmitted between client and server applications. The canonical URL for SRW and SRU is: <http://www.loc.gov/z3950/agency/zing/srw/>.
Both protocols define three and only three basic "operations": explain, scan, searchRetrieve:
Differences in operation
The differences between SRW and SRU lie in the way operations are encapsulated and transmitted between client and server as well as how results are returned. SRW is essentially as SOAP-ful Web service. Operations are encapsulated by clients as SOAP requests and sent to the server. Likewise, responses by servers are encapsulated using SOAP and returned to clients. Since SOAP is used in SRW, HTTP is not a necessary transport protocol.
On the other hand, SRU is essentially a REST-ful Web Service. Operations are encoded as name/value pairs in the query string of a URL. As such operations sent by SRU clients can only be transmitted via HTTP GET requests. The result of SRU requests are XML streams, the same streams returns via SRW requests sans the SOAP envelope.
SRW and SRU are "brother and sister" standardized protocols for accomplishing the task of querying databases and returning search results. If index providers were to expose their services via SRW and/or SRU, then access to these services would become more ubiquitous.
Comments are requested on the following study commissioned by CLIR into the feasibility of an image retrieval benchmarking service, and its possible role in speeding the development and deployment of image retrieval technology for the digital library.
Image Retrieval Benchmark Database Service: A Needs Assessment and Preliminary Development Plan
A Report Prepared for the Council on Library and Information Resources and the Coalition for Networked Information
The rapid increase in the quantity of visual materials in digital libraries-supported by significant advances in digital imaging technologies-has not been supported by a corresponding advance in image retrieval technologies and techniques. Digital librarians sense that much could be done to improve access to visual collections and hope, perhaps vainly, that users' needs to identify relevant digital visual resources might be met more satisfactorily through search strategies based on visual characteristics rather than on textual metadata associated with the image, which are expensive to produce. However, digital librarians currently have no tools for evaluating either content-based or metadata-based image retrieval systems. Consequently, they have difficulty assessing existing systems of image access, evaluating proposed changes in these systems, or comparing metadata-based and content-based image retrieval.
Some have proposed benchmarking as a solution to this problem. An image retrieval benchmark database could provide a controlled context within which various approaches could be tested. Equally important, it might provide a focus for image retrieval research and help bridge the significant divide between researchers exploring these two search paradigms: metadata-based vs. content-based image retrieval. If so, such a database could spur advances in research, as comparative results make it possible to evaluate the effectiveness of particular strategies and thereby add value to studies supported by many funding agencies.
Creating an image retrieval benchmarking service would be a significant undertaking. A benchmarking database is more than a collection of images. Benchmarking requires a set of queries to be put to that test collection. Each image in the test collection must be assessed to determine whether it is relevant to that query. Assessing the performance of systems requires a set of evaluation metrics that make it possible to compare one system with another and to rank results. Developing a test collection requires an investment in data collection, documentation, enhancement, and distribution. Most significantly, maintaining an image reference benchmarking service requires that a community of researchers make a long-term commitment to its use. Without a community vested in the development of the database-and publishing research based on it-the collection remains a chimerical solution to advancing the state of research and improving the retrieval of visual materials in the digital library.
Please forward your comments to Jennifer Trant <firstname.lastname@example.org> or to the Council on Library and Information Resources (CLIR) c/o <email@example.com>.
You are asked to share this request for comments widely. The issues cut across many communities, and breadth of interest and commitment is critical if the concept is to be successfully developed.
Excerpts from Recent Press Releases and Announcements
International Internet Preservation Consortium
Announced by Julian Masanes, IIPC Programme Manager.
May 12, 2004 - "In acknowledgement of the importance of international collaboration for preserving internet content for future generations, the International Internet Preservation Consortium was formed in 2003. "
"Led by the Bibliotheque nationale de France, the Consortium also comprises National libraries of Australia, Canada, Denmark, Finland, Iceland, Italy, Norway, Sweden, United Kingdom ,The Library of Congress (USA) and the Internet Archive. "
"The Consortium has identified a number of key objectives which inform and shape its work. These include:
"To achieve these objectives, the Consortium will:
"The detailed work of the Consortium will be carried out through working groups to define Policy, Requirements, Methods, Standards and Tools for Internet archiving. By this means projects will be developed and defined and will ultimately lead to the creation and provision of the necessary tools to fulfil the vision of universal coverage of internet archive collections. "
For more information, please see <http://netpreserve.org>.
The Berkeley Electronic Press and Engineering Conferences International Launch Major Research Site
May 5, 2004: "The Berkeley Electronic Press and Engineering Conferences International (ECI) are pleased to announce the launch of an innovative new publication site - The Engineering Conferences International Symposium Series (see http://services.bepress.com/eci). The Series electronically publishes presented papers, peer-reviewed articles, and other materials such as presentations, data sets, and video files associated with ECI conferences. "
"Four conference proceedings, including one refereed volume, have been published to date. Each series contains a wealth of materials presented in association with recently held engineering conferences. Materials are approved by the meeting organizers, and the entire Series is overseen by coordinating editor Franco Berruti, Dean of Engineering at the University of Western Ontario. According to Dr. Berruti, 'ECI conferences produce tremendously important research in both established and emerging engineering fields. Until now, papers from these meetings were difficult to find or slow to appear in the literature. The Engineering Conferences International Symposium Series provides a highly visible platform for this cutting edge research.'"
"The Engineering Conferences International Symposium Series is a joint project of Engineering Conferences International and The Berkeley Electronic Press. It provides conference organizers with a highly visible, rapidly disseminated publication outlet for conference materials. Each conference has its own, branded site (with links back to past conference or to sponsoring bodies, for example) and can customize its policies (e.g., will materials be peer reviewed). ECI manages two dozen engineering conferences annually and is the premier organizer of meetings in this field. The Berkeley Electronic Press is a leader in the implementation of innovative academic e-publications, including born digital journals and institutional repositories. For more information concerning ECI, visit http://www.engconfintl.org. For more information on The Berkeley Electronic Press, visit http://www.bepress.com. "
NISO, IMLS Announce Update of Framework for Good Digital Collections
April 30, 2004 - Washington, DC: "A new version of the Framework of Guidance for Building Good Digital Collections is now freely available for download from the National Information Standards Organization (NISO). The Institute of Museum and Library Services (IMLS) transferred maintenance of the Framework to NISO in September 2003; the update is the first product of NISO's advisory group formed to contribute to the document's further development."
"The Framework provides a set of high-level principles for identifying, organizing, and applying existing knowledge and resources to collections of digital resources. For each category of Collections, Objects, Metadata, and Projects, the Framework defines general principles relating to quality and provides a list of supporting resources such as standards, guidelines, best practices, explanations, discussions, clearinghouses, and case studies. Originally prepared in 2001 under the auspices of the IMLS, the Framework has earned wide recognition in the library and museum communities and the endorsement of the Chief Officers of State Library Associations as well as the Digital Library Federation."
"NISO's advisory group that developed the update is composed of experts from the digital resources community: Priscilla Caplan, chair (Florida Center for Library Automation), Grace Agnew (Rutgers University), Rebecca Guenther (Library of Congress), Ingrid Hsieh-Yee (Catholic University), and Leonard Steinbach (Cleveland Museum of Art). The Advisory Group will continue to aggressively reexamine the Framework. Readers are invited to send their comments and suggestions to <firstname.lastname@example.org> on how to improve and expand the Framework."
New UK Centre to Help Make Sense of Text
April 29, 2004 - "Imagine a future in which databases are populated with accurate, valid, exhaustive, rapidly updated data where users find what they want all the time; where drug discovery costs and development time are slashed and animal experimentation is reduced through early identification of unpromising paths; where new insights are gained through integration and exploitation of experimental results, databases, and scientific knowledge; where product development archives and patents yield new directions for R&D; and where searching yields facts rather than documents to read. This is the potential of text mining."
"The JISC, BBSRC AND EPSRC announced today funding of some £1m to establish a National Centre for Text Mining. The remit of the Centre, the first publicly funded centre in the world, is to contribute to the associated national and international research agenda, to establish a service for the wider academic community, and to make connections with industry."
"Text mining attempts to discover new, previously unknown information by applying techniques from natural language processing, data mining, and information retrieval:
"Text mining finds applications in many diverse areas of wide interest such as drug discovery and predictive toxicology, protein interaction, competitive intelligence, protection of the citizen, identification of new product possibilities, detection of links between lifestyle and states of health, and many more."
"Led by UMIST, the National Centre for Text Mining will be run by an internationally leading consortium. The consortium has four UK partner institutions: UMIST, the Victoria University of Manchester , the University of Liverpool, and the University of Salford. These core partners are extended by international partners: the University of California Berkeley, the University of Geneva, the San Diego Supercomputing Centre, and the University of Tokyo, with the European Bioinformatics Institute having presence on the Technical Directorate. It is anticipated that the Centre will engage as part of the related emerging networks of excellence."
For more information, please see the full press release at <http://www.jisc.ac.uk/>.
IMLS Updates National Leadership Grant Program: New structure helps libraries and museums better serve their communities
April 26, 2004, Washington, DC - "The Institute of Museum and Library Services (IMLS) has updated its National Leadership Grant program. Program categories have been renamed and clarified to improve cross-agency consistency. The new structure helps build the capacity of libraries and museums to extend learning throughout the lifetime."
"Under the National Leadership Grant, the three categories for museums, three categories for libraries, and one joint category have been streamlined to three parallel categories across the museum and library programs. These are 'advancing learning communities,' 'building digital resources,' and 'research and demonstration.'"
"Dr. Robert Martin, Director of IMLS, said, 'This change will allow IMLS to open up new opportunities as well as encourage the exemplary projects supported in the past.'"
For more information, please see the full press release at <http://www.imls.gov/whatsnew/current/042604.htm>.
Library of Congress Prints and Photographs Division: New Online Collections
April 22, 2004, announcement from Laura Gottsman, Library of Congress.
The Library of Congress's Prints and Photographs Division is pleased to announce that between January and March 2004, it added thousands of catalog records and images to the Library's Prints and Photographs Online Catalog (PPOC) <http://www.loc.gov/rr/print/catalog.html>, bringing the number of images in the catalog to nearly 1 million."
New materials that will be of interest to a wide variety of researchers include:
National Child Labor Committee Collection (NCLC):
To view or search the collection, go to the Prints and Photographs Online Catalog <http://www.loc.gov/rr/print/catalog.html>, select the blue button labeled: "Search the Catalog," and then scroll down the alphabetical list of collections to "National Child Labor Committee Collection." Further information about the collection may be found at <http://lcweb2.loc.gov/pp/nclchtml/nclcabt.html>.
U.S. News and World Report Magazine Photograph Collection:
The Library of Congress's Prints and Photographs Online Catalog contains catalog records and digital images representing a rich cross-section of still pictures held by the Prints & Photographs Division and other units of the Library of Congress.
For information on new collections and upcoming programs in the Prints and Photographs Division, see the division's "What's New" page <http://www.loc.gov/rr/print/whatsnew.html>.
For questions about PPOC or the holdings and services of the Prints and Photographs Division, consult the division's Ask a Librarian service: <http://www.loc.gov/rr/askalib/ask-print.html>.
Registry of Institutional Open Access Archives
Announced April 22, 2004, by Stevan Harnad.
"Tim Brody has created a Registry of Institutional OA Archives that
lists the known archives by Country, Type, and Software (Eprints,
Dspace, or other), harvested from celestial.
"But there are more OA Archives out there!
Please register yours, or any you know of at:
"All 182 registered archives can be viewed, or they can be browsed by Country, Archive Type, or Archive Software."
"Suggestions and corrections are welcome:
For more information, please see <http://archives.eprints.org/>.
mod_oai Project Aims at Optimizing Web Crawling
April 21, 2004 "Norfolk VA & Los Alamos NM - The Computer Science Department of Old Dominion University and the Research Library of the Los Alamos National Laboratory announce the launch of the "mod_oai" project. The aim of the project is to create the mod_oai Apache software module that will expose content accessible from Apache Web servers via the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH). The mod_oai project is generously funded by the Andrew W. Mellon Foundation."
"Apache is an open-source Web server that is used by 63%approximately 27 millionof the Websites in the world. The OAI-PMH is a protocol to selectively harvest from data repositories. The protocol has had a considerable impact in the field of digital libraries but it has yet to be embraced by the general Web community. The mod_oai project hopes to achieve such broader acceptance by making the power and efficiency of the OAI-PMH available to Web servers and Web crawlers. For example, the planned OAI-PMH interface to Apache Web servers should allow responding to requests to collect all files added or changed since a specified date, or all files that are of a specified MIME-type."
"The Apache Web server defines an extensible module format that allows specific functionality to be incorporated directly into the Web server. The mod_oai project will build such an Apache module that is able to respond to OAI-PMH requests pertaining to files made accessible by the Apache server. The mod_oai module will be developed under the GNU Public License (GPL) and distributed through sourceforge.net upon completion."
For more information about mod_oai, see <http://www.modoai.org/> or contact Michael Nelson at <email@example.com> or Herbert Van de Sompel at <firstname.lastname@example.org>.
'Turn' the pages of world cultures, science, history - landmark step as Turning the Pages hits the web
April 20, 2004 - "Now anyoneworldwidewith access to the internet will be able to 'turn' the pages of electronic images from these major items in the Library's collection and experience Turning the Pages by use of a computer mouse in a similar way as visitors to the Library's Exhibition Galleries in London can run their fingers over the computer screen and 'turn' images from these works."
"Turning the Pages brings together on the web treasures of several of the world cultures that are represented in Britain today: the Diamond Sutra (Buddhism), Sultan Baybars' Qur'an (Islam), the Golden Haggadah (Judaism), plus the Lindisfarne Gospels, Sherborne Missal, Luttrell Psalter and Sforza Hours (Christianity), along with scientific works (Leonardo da Vinci's Notebook, Elizabeth Blackwell's Herbal and Andreas Vesalius's De Humani Corporis Fabrica, a rare sixteenth-century treatise on anatomy)."
"All ten treasures are celebrated for their superb illustrations which, for example in the case of the Luttrell Psalter, show unique depictions of contemporary fourteenth-century life prior to the devastation caused by pestilence in the 1340s of interest to historians and art lovers. The British Library has now made images from these freely available to everyone around the globe so that internet users can experience Turning the Pages to the same extent as visitors to the Library."
"Web users can, using a mouse, virtually 'turn' a selection of images from the pages of these rare books and manuscripts in a highly realistic way, using touch-screen technology and animation. They can zoom in on the high-quality digitised images (for example, to see a realistic cat depicted on the opening page of St Luke's Gospel in the Lindisfarne Gospels) and read or listen to notes explaining the significance of each page. There are other features specific to the individual books. In the Leonardo Notebook, for example, a mirror button turns the text round so visitors can read his famous mirror handwriting."
For more information, please see <http://www.bl.uk/cgi-bin/press.cgi?story=1420>.
Cleveland Public Library is First to Lend Popular eBooks for Smartphones and all PDAs
April 20, 2004, Cleveland, Ohio - "When Cleveland Public Librarian Cynthia Orr checks her new Motorola cell phone, in addition to making calls, it's to read new eBook titles available from the CLEVNET Digital Library Connection (http://dlc.clevnet.org). Cleveland Public Library was first among a national network of public libraries to add Mobipocket eBooks to their download libraries. With the free Mobipocket Reader software, patrons can download and read titles on Motorola, Samsung, and Nokia Smartphones, virtually all PDAs, and on personal and notebook computers."
"...Library eBooks are available from dozens of leading publishers including HarperCollins Perfectbound, McGraw-Hill, Time Warner, RosettaBooks, eReads, and Fictionwise. Cleveland Public Library was followed by Cuyahoga County Public Library (OH) (http://ebcd.cuyahogalibrary.org); Essex County Library (UK), (http://essex.bookaisle.com/) and dozens of others that now offer titles in Mobipocket format."
For more information, please contact Jennifer Jackson, OverDrive <email@example.com>.
One million images on PictureAustralia
April 20, 2004 - "If every picture tells a story, then PictureAustralia, the nation's premier image bank, now tells a million as it reaches a major milestone."
"The millionth PictureAustralia image, which comes from the Australian War Memorial's collection, is that of heroic Australian army nurse Vivian Bullwinkel, the sole survivor of the Banka (Sumatra) massacre of World War II."
"PictureAustralia, a collaborative Internet-based service hosted by the National Library, allows users to search the online pictorial collections of many cultural agencies from the one website. The service commenced in 1998 with five participating organisations and 470,000 images; it has grown to 34 organisations and one million images."
For more information, please see the full press release at <http://www.nla.gov.au/pressrel/2004/picture.html>.
Museums and the Web 2004: Best of the Web Winners
April 19, 2004: Announced by Jennifer Trant, Archives & Museum Informatics
"Each year, nominations are solicited from the Museum community and nominated sites are evaluated by a committee of peers. Full details including the list of judges, category definitions, judges' comments and a list of finalists in each category can be found on the Museums and the Web conference site at <http://www.archimuse.com/mw2004/best/>.
The MW2004 Best of the Web are:
Glendale Public Library Service Area Study Successfully Completed: Geographic Information Systems (GIS) Application from CIVIC Technologies, Inc. Provided Key Demographic Data
April 12, 2004 - "The Glendale Public Library and CIVIC Technologies, Inc., a Pasadena, CA-based software solutions provider, announced the successful completion of the Glendale Public Library Service Area Study, a library development strategic plan that recommends the establishment of three new branch libraries that will add approximately 60,000 square feet of library space, a 50 percent increase to meet the City's growing and diverse population to the year 2025. CIVIC Technologies, who led the consulting team for the study, customized a geographic information systems (GIS) application called the "Glendale Public Library GIS", that analyzed a diverse range of datademographic, school, geographic, population growth projections, and library informationin order to assist Glendale officials visualize and plan for the future."
"The library services strategy builds upon the strengths of the existing library system such as the large, 92,000 square foot Central Library; recognizes deficiencies such as small branch libraries located on small sites that will not accommodate expansion and the lack of branch service in the most densely populated southeastern part of the City; accounts for current improvements such as the recent opening of Pacific Park Branch Library; takes into account projected population growth; and recommends the preparation of a library facilities master plan to guide implementation. The plan was presented to Glendale City Council at a Study Session in December 2003. Library staff will work with other City Divisions in considering citywide priorities and funding options for the future...."
"...Utilizing the GIS, new library service areas are established to better meet the unique needs of neighborhoods and communities citywide, and will serve as the basis for future library development. GIS is a computer software application that integrates database operations with the visualization and analytical benefits of mapping."
For more information, please contact Cindy Cleary, Glendale Public Library, <firstname.lastname@example.org> or Marc Futterman, CIVIC Technologies, <email@example.com>.
Copyright 2004 © Corporation for National Research Initiatives