W3C

Harvesting RDF Statements from XLinks

W3C Note 21 July 2000

This version:
http://www.w3.org/XML/2000/07/xlink2rdf.htm
Editor:
Ron Daniel Jr. (Metacode Technologies Inc. ) <rdaniel@metacode.com >

Abstract

Both XLink [XLink] and RDF [RDF] provide a way of asserting relations between resources. RDF is primarily for describing resources and their relations, while XLink is primarily for specifying and traversing hyperlinks. However, the overlap between the two is sufficient that a mapping from XLink links to statements in an RDF model can be defined. Such a mapping allows XLink elements to be harvested as a source of RDF statements. XLink links (hereafter, "links") thus provide an alternate syntax for RDF information that may be useful in some situations.

This Note specifies such a mapping, so that links can be harvested and RDF statements generated. The purpose of this harvesting is to create RDF models that, in some sense, represent the intent of the XML document. The purpose is not to represent the XLink structure in enough detail that a set of links could be round-tripped through an RDF model.

Status of This Document

This Note is made available by the W3C XML Linking Working Group for discussion only. It has not been reviewed yet. Publication of this Note by W3C indicates no endorsement by W3C or the W3C Team, or any W3C Members.

Please send comments to the authors.

Table of Contents

1 Introduction
    1.1 Terminology
    1.2 Notation and Document Conventions
2 Principles of the Mapping
3 Mapping Specification
    3.1 Synthesizing XPointers
    3.2 Simple Linking Elements
    3.3 Extended XML Links
        3.3.1 arc-Type Element
        3.3.2 locator-Type Element
        3.3.3 resource-Type Element
        3.3.4 title-Type Element
    3.4 Linkbases
4 References


1 Introduction

The XLink specification [XLink] defines ways for XML documents to establish hyperlinks between resources. The Resource Description Framework specification [RDF] provides machine-understandable information about web resources.

Both XLink and RDF provide a way of asserting relations between resources. RDF is primarily for describing resources and their relations, while XLink is primarily for specifying and traversing hyperlinks. However, the overlap between the two is sufficient that a mapping from XLink links to statements in an RDF model can be defined. Such a mapping allows XLink elements to be harvested as a source of RDF statements. XLink links (hereafter, "links") thus provide an alternate syntax for RDF information that may be useful in some situations.

This Note specifies such a mapping, so that links can be harvested and RDF statements generated. The purpose of this harvesting is to create RDF models that, in some sense, represent the intent of the XML document. The purpose is not to represent the XLink structure in enough detail that a set of links could be round-tripped through an RDF model.

Readers of this Note are assumed to be familiar with [XLink], and [RDF]. Terms that are defined in those specifications will not be defined here. Readers should also be familiar with XML Base [XMLBase]. Familiarity with the RDF Schema Candidate Recommendation [RDFSchema] will be necessary for those who wish to make use of the mappings provided here that use RDF Schema Classes.

1.1 Terminology

[Definition: The key words must, must not , required, shall, shall not , should, should not, recommended, may , and optional in this specification are to be interpreted as described in [IETF RFC 2119]. ]

Some special terms are defined here in order to clarify their relationship to similar terms used in the technologies on which the mapping is based. Refer to [XLink] and [RDF] for definitions of other technical terms used here.

[Definition: harvesting]

The process of generating RDF statements from XLink elements.

[Definition: Resource]

A "resource" is anything identified by a URI.

[Definition: Participating resource]

A resource that has been identified in a link to serve as a potential starting or ending point of traversal.

1.2 Notation and Document Conventions

The xlink: and rdf: prefixes are used throughout to stand for the declaration of the XLink and RDF namespaces, respectively, on elements in whose scope the so-marked element or attribute appears (on the same element or on some ancestor element), whether or not a namespace declaration is present in the example. The use of specific namespace prefixes is an editorial convienience; as dictated by the Names in XML Recommendation [XML-Names], any prefix may be used as long as the URI it maps to is the correct one.

2 Principles of the Mapping

Simple RDF statements are comprised of a subject, a predicate, and an object. The subject and predicate are identified by URIs, and the object may be a URI or a literal string. To map an XLink link into an RDF statement, we need to be able to determine the URIs of the subject and predicate. We must also be able to determine the object, be it a URI or a literal.

The general principle behind the mapping specified here is that each arc in a link gives rise to one RDF statement. The starting resource of the arc is mapped to the subject of the RDF statement. The ending resource of the arc is mapped to the object of the RDF statement. The arc role is mapped to the predicate of the RDF statement. However, a number of corner cases arise, described in 3 Mapping Specification.

RDF statements are typically collected together into "models." The details of how models are structured are implementation dependent. This Note assumes that harvested statements are added to "the current model, " which is the model being constructed when the statement was harvested. But this Note, like [RDFSchema], does not specify exactly how models must be structured.

3 Mapping Specification

The following sections describe the mapping in detail.

3.1 Synthesizing XPointers

RDF is based on the use of URIs for identifying resources. In XLink, the linking element itself (in the case of a simple link) or a subelement of the linking element (in the case of an extended link) often serve as a participating resource in the link. This requires that we be able to define URIs that identify those linking elements. In order that different implementations harvest equivalent RDF statements from an XLink, the procedure in this section should be used when synthesizing XPointers for such linking elements.

The general approach used is for the synthesized XPointer to do element-wise navigation down the tree to reach the linking element. The navigation begins at the nearest identified point in the tree.

More formally, the base of the synthesized URI reference shall be specified as defined in [XMLBase].

Note:

Feedback on whether the synthesized URI references should be required to be absolute, or may be relative, is particularly sought.

The fragment identifier of the synthesized URI reference shall be delimited from the URI by the '#' character, as required by RFC 2396[RFC 2396].

The fragment identifier of the synthesized URI reference shall be an XPointer[XPTR]. The initial locator term of the XPointer shall be an ID reference to the nearest ancestor of the linking element, including the linking element itself, that bears an attribute of type ID. If no such attribute exists on an ancestor of the linking element, the '/' character shall be the first linking term, indicating that navigation shall be from the document element.

Subsequent locator terms shall provide the element type and index of the navigation path down the tree of XML elements to reach the desired element.

As an example, consider a document that contains the following simple link:

In heavy trading, <org
  xlink:type='simple'
  xlink:href="http://www.foo.com/"
  xml:base="http://www.bar.com/report1"
  ID="com231"
>Foo Manufacturing</org> closed sharply ...

The synthesized XPointer for this linking element is:

http://www.bar.com/report1#xpointer(id('com231'))

3.2 Simple Linking Elements

If a simple link's xink:arcrole attribute has the value " http://www.w3.org/1999/xlink/properties/linkbase", the link shall be harvested according to the procedure described in section 3.4 Linkbases. Otherwise the mapping defined in this section shall be used.

All simple links define zero or one traversal arcs. No traversal arc is specified if the xlink:href attribute is not specified. Therefore, harvesting software shall generate zero or one RDF statements, depending on whether the xlink:href attribute is specified. If it is specified, the single traversal arc shall be harvested to form an RDF statement. The starting resource of the simple link shall be mapped to the subject of the RDF statement. Note that the starting resource of a simple link is the linking element itself. Therefore, the harvesting software must synthesize a URI reference that identifies the linking element. The harvesting software should use the XPointer synthesis procedure specified in section 3.1 Synthesizing XPointers.

The ending resource of the simple link shall be mapped to the object of the RDF statement. Note that the ending resource of a simple link is always a URI reference, provided as the value of the xlink:href attribute.

The value of the xlink:arcrole attribute, if one is given, shall be mapped to the predicate of the RDF statement. Note that the value of the xlink:arcrole attribute is already required, by the XLink specification, to be a URI reference.

If no xlink:arcrole attribute is specified, harvesting software may generate no RDF statement, or it may map the element type of the linking element to the predicate of the RDF statement. This shall only be done if the element type is namespace qualified, so that an absolute URI reference may be constructed from the namespace URI and the local part. In this case the namespace name and the local part are concatenated using the approach documented in [RDF] in order to synthesize the absolute URI reference for the predicate.

If an xlink:role attribute is specified on the simple link, it shall result in at least one additional statement being added to the model. The object of that statement is the ending resource of the simple link, its predicate is "rdf:type", and its subject is the resource identified by the role attribute. Harvesting software may also generate a statement whose object is the resource identified by the role attribute, whose predicate is "rdf:type" and whose subject is the resource "rdfs:Class". This statement shall only be added to the model if an equivalent statement is not already part of the model.

An example of such an element is

... In a <x:extRef
  xlink:type="simple"
  xlink:href="http://www.foo.com/papers/crops.txt"
  xlink:arcrole="http://links.org/namespace/cite"
  xlink:role="http://links.org/namespace/screed"
>recent paper</x:extRef>, Dr. Taylor assumes that ...

Mapping that link according to this specification (and assuming it was the fourth extRef element within the third chap element) results in the RDF model shown below:

If the arc role had not been specified, then the result would have been the RDF model shown below:

3.3 Extended XML Links

We first describe the rules for harvesting the components of an extended link (arcs, locators, and resources). Then we describe the rules for the extended link as a whole.

3.3.1 arc-Type Element

If an arc contains an xlink:arcrole attribute whose value is "http://www.w3.org/1999/xlink/properties/linkbase", it shall be harvested according to the procedure in section 3.4 Linkbases. Otherwise the procedures in this section shall be used.

XLink elements of the arc type use the xlink:to and xlink:from attributes to specify the endpoints of zero or more possible traversals by referencing, not URIs, but rather labels that have been defined in the xlink:label attributes locator-type and resource-type elements.

The number of RDF statements harvested from a single arc-type element is equal to the number of possible traversals specified by that element. That quantity is the multiplicative product of the number of resource and/or locator elements identified by the xlink:to and xlink:from attributes. Each RDF statement will correspond to one and only one of the traversals.

The starting resources of the traversals shall be mapped to the subject of the RDF statement(s). The ending resources of the traversals shall be mapped to the object of the RDF statement(s). The value of the xlink:arcrole attribute, if one is specified, shall be mapped to the predicate of each RDF statement.

If no xlink:arcrole attribute is specified, harvesting software may generate no RDF statement, or it may map the element type of the linking element to the predicate of the RDF statement. This shall only be done if the element type is namespace qualified, so that an absolute URI reference may be constructed from the namespace URI and the local part. In this case the namespace name and the local part are concatenated using the approach documented in [RDF] in order to synthesize the absolute URI reference for the predicate.

Note that any element content of an arc is not harvested.

3.3.2 locator-Type Element

Each XLink locator-type element gives rise to zero or more statements in the RDF model. The subject of all of those statements is the value of the xlink:href attribute of the locator, except as noted below.

If the locator element provides an xlink:role attribute, one additional statement shall be added to the model. The value of the locator's xlink:href attribute shall be mapped to the subject of the statement. The value of the xlink:role attribute shall be mapped to the object, and the predicate shall be "rdf:type". Harvesting software may generate an additional statement whose subject is the value of the xlink:role attribute, whose predicate is "rdf:type" and whose object is "rdf:class". The second statement shall not be added to the RDF model if an equivalent statement already exists in the model.

If the locator element provides an xlink:label attribute, an RDF statement is added to the model. The value of the href attribute shall be mapped to the subject of the statement. The predicate of the statement shall be " xlink:label ". The object of the statement shall be the value of the xlink:label attribute.

If the locator element provides an xlink:title attribute, an RDF statement shall be added to the model. The value of the xlink:href attribute shall be mapped to the subject of the statement. The predicate of the statement shall be "xlink:title". The object of the statement shall be the value of the title attribute.

If the resource element contains one or more title elements, they are harvested as described in section 3.3.4 title-Type Element.

3.3.3 resource-Type Element

Each resource (XLink resource-type element) gives rise to zero or more statements in the RDF model. Unless noted otherwise, the subject of all of those statements is the resource element itself, identified by an XPointer synthesized according to the procedure described in section 3.1 Synthesizing XPointers.

If the resource element provides an xlink:role attribute, one RDF statement shall be added to the model, and a second RDF statement may be added to the model. The subject of the first statement is the synthesized URI reference for the resource. The value of the xlink:role attribute is mapped to the object of the statement. The predicate of the statement is 'rdf:type'. A second statement may be added to the model if the software supports the RDF Schema specification [RDFSchema]. The value of the xlink:role attribute is mapped to the subject of the optional statement. The predicate of the statement is "rdf:type " and the object is "rdfs:Class". The second statement shall not be added to the model if an identical statement already exists in the model.

If the resource element provides an xlink:label attribute, another RDF statement shall be added to the model. The subject of the statement is the synthesized URI reference for the resource. The predicate of the statement is "xlink:label". The object of the statement is the value of the label attribute.

If the resource element provides an xlink:title attribute, another RDF statement shall be added to the model. The subject of the statement is the synthesized URI reference for the resource. The predicate of the statement is "xlink:title". The object of the statement is the value of the title attribute.

If the resource element contains one or more title elements, they are harvested as described in section 3.3.4 title-Type Element.

3.3.4 title-Type Element

XLink title-type elements have an XLink-defined meaning only if they appear as a child element within an extended, locator, or resource element.

If an XLink extended-, locator-, or resource-type element contains one or more title-type elements, one RDF statement shall be added to the model for each title element. The subject of each statement shall be either the value of the xlink:href attribute (in the case of a locator element) or a synthesized XPointer identifying the extended or resource element. The predicate of each statement shall be "xlink:title". For each RDF statement, the object of the statement shall be a synthesized XPointer identifying the title element. (Identifying the title element, rather than just its content, allows attributes such as xml:lang to be captured along with the title.)

As an example, consider the following fragment of an extended link:

<annotation xlink:type='extended' ID='genid22'>
  <caption xlink:type='title' ID='genid23'>Recent comments</caption>
  <link xlink:type='arc' ...

The RDF statement harvested from the title is shown below:

3.4 Linkbases

More formally, a linkbase is an XML document which contains one or more extended links. It functions like a "database" of links. A linkbase arc is an XLink element (simple- or arc-type) whose xlink:arcrole attribute takes the value of "http://www.w3.org/1999/xlink/properties/linkbase ". The ending resource of a linkbase arc is a linkbase.

When harvesting software encounters a linkbase arc, it shall not generate an RDF statement for the arc. It should traverse the arc to retrieve the linkbase, and harvest the links from the linkbase to add to the current model using the methods specified in this Note.

Note:

Different applications might make different tradeoffs on depth of traversal in light of varying network conditions. This Note does not mandate specific behavior, but does recommend that all havesting applications attempt to obtain at least the immediately referenced linkbase.

4 References

XLink
Steve DeRose, Eve Maler, David Orchard, and Ben Trafford, editors. XML Linking Language (XLink) . World Wide Web Consortium, 2000. (See http://www.w3.org/TR/xlink.)
XML-Names
Tim Bray, Dave Hollander, and Andrew Layman, editors. Namespaces in XML . Textuality, Hewlett-Packard, and Microsoft. World Wide Web Consortium, 1999. (See http://www.w3.org/TR/REC-xml-names.)
IETF RFC 2119
S. Bradner, editor. Key words for use in RFCs to Indicate Requirement Levels . March 1997. (See http://www.ietf.org/rfc/rfc2119.txt .)
RDF
Ora Lassila and Ralph Swick, editors. Resource Description Framework (RDF) Model and Syntax Specification . World Wide Web Consortium, 1999. (See http://www.w3.org/TR/REC-rdf-syntax .) (See http://www.w3.org/TR/REC-rdf-syntax.)
XPTR
Ron Daniel, Steve DeRose, and Eve Maler, editors. XML Pointer Language (XPointer) V1.0 . Metacode Technologies, Brown University, and Sun Microsystems. Burlington, Seekonk, et al.: World Wide Web Consortium, 1998. (See http://www.w3.org/TR/xptr.) (See http://www.w3.org/TR/xptr.)
RFC 2396
RFC 2396. More info to be inserted (See .)
RDFSchema
RDF Schema spec, more info TBD. (See .)
XMLBase
XML Base, more info TBD. (See .)
XML Base
Jonathan Marsh, editor. XML Base (XBase) . World Wide Web Consortium, 1999. (See http://www.w3.org/TR/xmlbase.)