Meeting Report of the NISO LInking Workshop
February 11, 1999, Washington D.C.
by Mark Needleman, with contributions from
the Workshop Steering Committee

I. Background
II. Introduction
III. The Problem from Various Points of View
IV. Some Current Solutions
V. Discussion and Conclusions

I. Background

The National Information Standards Organization (NISO), the Digital Library Federation (DLF), the National Federation of Abstracting and Information Services (NFAIS), and the Society for Scholarly Publishing (SSP), sponsored a one day invitational workshop to discuss issues surrounding reference linking, specifically linking from citations to electronic journal literature. The meeting proposal and scope statement described the purpose of the meeting as follows:

End users in search of information want to go directly from a citation in electronic form to the cited journal or journal article in electronic form. The citation may appear in any number of places, including an online catalog, an online index, or among the references in an online text. In the simplest case, the user may achieve a link by clicking on the citation and connecting to a document located on a web page identified by a URL. Increasingly, however, the simplest case does not apply. The identifier embedded in the citation may be old and out of date. The cited object may be behind a firewall or available only through an online service which uses purely internal identifiers. The identifier may be an indirect reference, such as a Digital Object Identifier (DOI), which requires access to a resolver service. To further add to the complexity, a number of copies and/or versions of the cited object may exist, forcing the user to discover which ones he or she is authorized to use.

As multiple solutions emerge to provide linkages from citations to digital objects, the linking problem becomes more complex and of growing concern to a broad set of interested parties: scholars and researchers, who expect seamless access to network-accessible materials; authors and publishers who want wide awareness of and use of their intellectual property; repositories and vendors which require mechanisms to facilitate linking access both into and out of their systems, and libraries which must provide and manage a wide range of search tools and information sources containing links and serve those who use them.

The purpose of the workshop on linkage from citations to electronic journal literature is to bring together publishers, librarians, abstracting and indexing services, repositories, vendors of information services, and end users, in order to open a dialog on the issues. The workshop will begin with prepared statements of the problem from the perspectives of the library, publisher, abstracting and indexing service, and end user. This will be followed by presentations of some current approaches to linking taken by selected vendors and organizations. Participants can explore emerging solutions in the light of the requirements of various communities. This workshop will attempt to build a common awareness of a broad range of needs and to improve understanding of the strengths and limitations of current approaches. In addition, the workshop will seek to identify and stimulate actions needed to improve the facilities for linking citations and digital objects in the digital environment.

II. Introduction

Don Waters, the meeting facilitator, posed three questions to the group:

III. The Problem from Various Points of View

The end-user perspective: Dr. Peter Boyce

Dr. Peter Boyce, representing the American Astronomical Society, addressed the end user perspective. Boyce made the point that journal information is just one component of the scholarly information used in astronomy research. Other components include historical literature, searchable abstracts, published data tables, raw observations, and refined and calibrated data. Users of astronomy information have rated links amongst data as the most important feature of an information system. Boyce went on to describe one of the major components of the linking system in astronomy, an identifier known as the Bibcode. The Bibcode, which has been in use for fifteen years, is assigned via a distributed system with no central authority. Dr. Boyce stressed the importance of the human interface in identification systems. The Bibcode is "smart", that is, it is derived from bibliographic data and can function as a brief citation. However, because it uses a volume/page/year format, it is reaching the end of its useful lifespan, as information published electronically does not always have page numbers.

Boyce described some of the desired features in linking systems.

From the end user point of view, the system should provide:
•seamless and painless interconnectivity. User authorizations for access should be passed on automatically without intervention by the user.
•The system needs to provide several types of links and different levels of granularity, including queryable as well as static links.

From the publisher perspective the system should:
•use standardized identifiers
•provide automatic assignment of identifiers
•be able to handle links in the pre-publication phase.

In concluding, Boyce advised against waiting for the perfect system to emerge. Boyce stressed the need to achieve reasonable levels of interoperability, provide a system that handles both assignment of identifiers and a query mechanism, and work with all concerned parties to ensure that any system addresses a wide range of issues and concerns, not just a limited subset for a single audience.

The library perspective: Dale Flecker

Dale Flecker (Harvard University) addressed linking from the library perspective. Flecker reminded the group that a user on the network using "any old system" wants to link directly from a found citation to the information resource it points to. To the user it does not matter if the citation is in a commercial abstracting and indexing service, an electronic reserves system, an electronic bibliography, another ejournal article or any other source. Solutions must be generally useful, and not limited to any single, specific environment.

Flecker identified five major needs that any linking solution must encompass:

  1. Libraries have many suppliers for both abstracting and indexing services and for electronic journals ­ there are hundreds of sources today and will be thousands tomorrow. Solutions must work in an environment where the resources licensed and the services used are constantly changing.
  2. Library collections are intended for long-term use. Links must be designed to work over timeframes of 25 or 50 years or more.
  3. In many cases there will be multiple copies of the same article available. For example, an Elsevier journal may be available in Science Direct, in Michigan's PEAK database, through OhioLink, etc. Many legitimate reasons for multiple copies exist, including performance (caching), different service models, archival needs, and competition. The system must be able to find the right copy for the user, which, in the end, may have more to do with who the user is than what the journal is.
  4. Delivery formats for a given article will vary with technological change and the environment of the user. Linking should be to a given content, not to a given format.
  5. The linking system should take the user directly to the content not to the superstructure (like the publisher's search system) in which the content might reside.

The publisher's perspective: Pieter Bolman

Pieter Bolman (Academic Press) discussed linking from the publisher perspective. Bolman made the point, that if making information available electronically could be thought of as the first revolution for publishers, linking was the second one. In fact, making information available electronically had created user demand for immediate access which raises the whole linking issue. Bolman described the enormous size of the problem, estimating that there are about 1 million journal articles published yearly with an average of 20 cited references per article. He discussed various forms of linking such as linking from an abstracting database to the content, or linking directly from one journal article to another. He concluded by reviewing some of the issues involved in linking such as versioning, links to unofficial or preprint material, access and privacy issues, and the question of granularity. Finally, Bolman stressed the need for cooperation among all affected parties and the importance of standards.

The abstracting and indexing service providers' perspective: Helen Atkins

The final presentation in the initial set was given by Helen Atkins (Institute for Scientific Information) who discussed the problem from the abstracting and indexing (A&I) service perspective. Atkins described some of the types of links being made in A&I databases such as links to full text, patents, preprints and websites. She also described some of the identifiers in use or under development that are used for linking. These include the Uniform Resource Names (URN), the Serial Item and Contribution Identifier (SICI), the Publisher Item Identifier (PII), and the Digital Object Identifier (DOI) system. She discussed some characteristics of linking systems:

Atkins stressed the need for cooperation between original publishers and secondary content providers. She also discussed some of the problems that content providers such as A&I vendors have to deal with, including the data entry problem of getting links built in, the problems associated with unreliability of links that may change or no longer point at content, and the problem for A&I vendors of deciding what to index when a publication may exist in multiple formats (print and electronic) where the content may not be the same in both formats.

IV. Some Current Solutions

The first four presentations, having attempted to address the issues and lay out the scope of the problem, were followed by a series of presentations describing current work in the area of linking and linking systems.

The DOI: Norman Paskin

Norman Paskin, the Director of the International Digital Object Identifier Foundation (IDF), discussed the Digital Object Identifier(DOI) System and issues surrounding it. He began by recapping some of the problem issues that had been raised in the previous presentations and went on to discuss the URN work in the Internet community. He described the DOI system and the importance of structured metadata to that system. In order for the DOI system to function, DOIs will need to be accompanied by good descriptive metadata, so the IDF is focusing on developing a metadata schema. The model defines a core or "key" set of metadata elements required for all resource types, supplemented by a number of compulsory extensions for particular resource types or "genres", such as STM journals or technical papers. The framework is being based on work done in the Interoperability of Data in E-commerce Systems (INDECS) initiative with the intent of having the underlying data model be mapable to other models such as the International Federation of Library Associations (IFLA) data model and the Dublin Core. The metadata will be expressed using the Resource Description Framework (RDF) and its syntax will be the Extensible Markup Language (XML).

UMI SiteBuilder: Todd Fagan

Todd Fagan (UMI) described the ProQuest system which allows searching and retrieving of articles from over 5,000 publications with full text and/or page images available for many. He then described UMI's Sitebuilder software, a set of tools that allows a customer to link into ProQuest from external web pages and applications. Links can be made to journal titles, issues, or individual articles, or can be set up to execute predefined searches giving dynamic result sets. Links used in SiteBuilder contain parameters that define the function requested and provide for user authorization into the system. They may also include mechanisms for expiring links and for encryption. Custom searching, electronic reserve rooms and course packs are applications that can be built using Sitebuilder.

Dawson Information Quest: Mary Grace Palumbo

Mary Grace Palumbo (Dawson Information Quest) described their Information Quest system which provides access to internal material at Dawson and serves as a gateway to material at other sites, including publisher sites and other information providers such as A&I services. She described the tension between the ideal goal of building systems where everything links to everything seamlessly, and the real world problems of building links, communicating with other sources and using identifiers. Palumbo discussed the need for standard link identifiers so that customized links to individual publishers are unnecessary. She also touched on other issues such as identifying multiple data sources, version control, and dealing with multiple languages and formats. She made the point that there was a need for communications and discussion among all parties involved, and that identifiers are a major key in the linking process.

National Center for Biotechnology Information: Jim Ostell

Jim Ostell of the National Center for Biotechnology Information (NCBI) at the National Library of Medicine (NLM) described some of the systems and work being done at NLM on linking. He discussed the Entrez retrieval system, which provides multidimensional linking between bibliographic citation databases and genetic sequence and molecular biology databases. He distinguished between MEDLINE, PubMed and PubRef, three services which are often confused:
•MEDLINE selectively indexes journals to include biomedical literature only.
•PubMed includes all MEDLINE records as well as prepublication indexing and comprehensive indexing of all journals included in MEDLINE, reference linking, and the Entrez retrieval system.
•PubRef offers PubMed content plus additional, non-biomedical journals, but does not include Entrez.

PubMed and PubRef include a citation matching service which allows publishers and other users to match their citations to PubMed entries using standard bibliographic information. This allows publishers to easily link the references in their published articles to entries in PubMed and PubRef. Because of PubRef's ability to do citation matching and return a unique PubRef identifier, PubRef came up in the general discussion which followed as a possible system that could be used to prototype a more general citation-to-unique-identifier resolution system. Finally, Mr. Ostell described ongoing development work with a system called LinkOut, which allows customization of what links are returned when multiple links exist for the same item, so that users can be sent to a particular version of a resource based on a variety of criteria.

SLinkS: Eric Hellman

Eric Hellman (Openly Informatics) described the SLinkS (Scholarly Links Specification) framework, which enables publishers of scholarly journals to specify and communicate the information that other publishers need to build links to their material. SlinkS is a database of journals. For every journal SlinkS stores information about how to construct links for it. SlinkS also provides linking metadata so publishers can filter the types of links they want to include, and users can determine what sort of information they will get if they follow a link. SlinkS is intended to be an openly available specification, allowing anyone to develop software and services based on it.

V. Discussion and Conclusions

Following these presentations, the remainder of the meeting was given over to general discussion.

Major themes and points that came up included:

In the end a consensus emerged that the problem of linking could be broken into three related, yet distinct, components:

The identifiers. Identifiers are needed for journal articles that describe the work at the level of the intellectual content. Manifestations and copies may need to be identified separately, but for reference linking the work is the key. The group recognized that making the distinction between a work and its manifestations is a thorny problem, however, concluded that this is a relatively well understood requirement. There are systems in place or being developed, such as the PubMed identifier, that provide identification, and perhaps all that is needed is to allow these systems and their accompanying infrastructure to develop.

Metadata and lookup systems. A critical problem for linking systems is how to find out what the identifier for a resource is if the citation for it is known. To solve the problem metadata that describes the resource must accompany the identifier. One question is how much metadata is necessary. Lookup mechanisms are necessary for taking information from a citation, matching it to the metadata accompanying the identifier, and returning the associated identifier. Lookup mechanisms need to be invoked in batch mode as articles with citations are being published and in an interactive mode for individual citations. The PubRef system provides a working model of such a lookup mechanism and the IDF is planning to develop a similar mechanism. The group felt that further prototyping and experimentation is necessary to get a sense of what worked and what might be some of the operational considerations in running such systems.

Resolution of Identifier to actual content. This area encompasses the problem of how to go from an identifier for abstract content, or the work, and resolve that identifier to the identifier or location of particular manifestations or copies to which a user has access. Among the difficulties in constructing a resolution system is understanding who a user is, with what organizations a user is affiliated, to what manifestations or copies of the content (and under what arrangements) those institutions may have access, where the user is located, how performance expectations affect the specific content selected for use, what formats and versions of content the user is capable of viewing, how much the user is willing to pay, and similar issues. It was felt that the number of variables made the resolution identifier to content problem too difficult to try to develop concrete solutions at this point. In order to develop models and solutions, a detailed walk through and scoping out of the problem and its parameters is required.

The meeting concluded with agreement that a subset of those present at the Workshop would be asked to walk through of the complex resolution issues and prepare a report for the group. Norman Paskin agreed to activate a listserv to provide the members of the group a forum to continue discussions on this important topic.

Next Steps: Reference Linking Working Group Set-up


Back to the Linkage Workshop Homepage


Copyright © 1999 National Information Standards Organization