#rdfms-uri-substructure from Sergey Melnik on 2001-07-20 (w3c-rdfcore-wg@w3.org from July 2001)

From: Sergey Melnik <melnik@db.stanford.edu>
Date: Fri, 20 Jul 2001 11:41:36 -0700
To: RDFCore WG <w3c-rdfcore-wg@w3.org>
Message-ID: <3B587B60.A4464FDE@db.stanford.edu>
With this posting I'd like to open the discussion of the issue
http://www.w3.org/2000/03/rdf-tracking/#rdfms-uri-substructure and
provide some reference material for the upcoming F2F.

The issue is whether namespaces are part of the formal model /
abstract syntax, or just an abbreviation mechanism used in the RDF/XML
serialization.

Analysis of M&S
---------------

Let me start with summarizing what M&S says about namespaces:

        RDF also requires the XML namespace facility
        (http://www.w3.org/TR/REC-xml-names/) to precisely associate
        each property with the schema that defines the property; see
        Section 2.2.3., Schemas and Namespaces.

Remark: the XML namespace recommendation treats namespaces as integral
parts of qualified names (QNames). In many XML parsers and tools,
QNames are implemented as pairs of strings that are accessible in
APIs. In particular, DOM Level 2
(http://www.w3.org/TR/DOM-Level-2-Core/core.html#Namespaces-Considerations)
identifies elements and attributes by their namespaceURI and
localName. Furthermore, namespaces are explicitly included in the XML
Infoset model (http://www.w3.org/TR/xml-infoset/).

Continuing with M&S:

        Property names must be associated with a schema. This can be
        done by qualifying the element names with a namespace prefix
        to unambiguously connect the property definition with the
        corresponding RDF schema or by declaring a default namespace
        as specified in [NAMESPACES]. [...]

        Namespaces are simply a way to tie a specific use of a word in
        context to the dictionary (schema) where the intended
        definition is to be found. In RDF, each predicate used in a
        statement must be identified with exactly one namespace, or
        schema.

Aha! Namespaces (at least those of properties) *must* identify
schemas. These schemas *must* (or *should*?) contain the definitions
of the corresponding vocabulary elements.

        Each propertyElt E contained by a Description element results
        in the creation of a triple {p,r,v} where (1) p is the
        expansion of the namespace-qualified tag name (Generic
        Identifier) of E. This expansion is generated by concatenating
        the namespace name given in the namespace declaration with the
        LocalPart of the qualified name.

Look here: p is a concatenation of the QName namespace and its local
part. By doing so namespaces are dropped and won't appear in the
model.

	(1) p is the expansion of the namespace-qualified attribute
	name of A. This expansion is generated by concatenating the
	namespace name given in the namespace declaration with the
	LocalPart of the qualified name and then resolving this URI
	according to the algorithm in Section 5.2., Resolving Relative
	References to Absolute Form, in [URI].

The above refers to relative URIs. Notice that DOM Level 2 and Infoset
cautiously avoid dealing with relative URIs in namespaces!

        Note: Schema developers may be tempted to declare the values
        of certain properties to use a syntax corresponding to the XML
        Namespace qualified name abbreviation. We advise against using
        these qualified names inside property values as this may cause
        incompatibilities with future XML datatyping
        mechanisms. Furthermore, those fully versed in XML 1.0
        features may recognize that a similar abbreviation mechanism
        exists in user-defined entities. We also advise against
        relying on the use of entities as there is a proposal to
        define a future subset of XML that does not include
        user-defined entities.

Alright, namespaces prefixes are a no-no inside property values.

My conclusion from the above excepts from M&S is that in M&S
namespaces are *not* part of the model, but are a syntactic
artifact. Now let's turn to the question whether we do need namespaces
in the model.

Implementation issues
---------------------

A number of implementation issues raised on RDF Interest and Core
lists suggest that explicit treatment of namespaces is required. Here
are some, just to name a few:

- Michael Sintek who was working on a new version of Protege last year,
  expressed serious concerns that namespaces of resources could not
  be identified in RDF API at that time. In fact, in a schema editor it
is of
  paramount importance to be able to create a schema in a given
  namespace, translate all resources into a new namespace when a
  subsequent version of the schema is defined, display namespaces,
  identify them properly in parsed RDF content, save, etc.

- Perry A. Caro writes:

        It's this business about concatanating that worries me. The
        XML namespace spec never mentions concatanation as a valid
        mechanism. Indeed, the non-normative appendices seem to imply
        that the expansion of qualified names should be treated as
        ordered pairs.
        (http://www.mailbase.ac.uk/lists/rdf-dev/1999-07/0012.html)

- Jonathan Borden points out that XML Schema datatypes cannot be used if
  concatenation is deployed:

        http://www.w3.org/2000/03/rdf-tracking/#rdfms-qname-uri-mapping

Procedural issues / semantics
-----------------------------

There are several procedural issues that arise from M&S. The spec
states that the namespaces of all resources that are properties can be
used for retrieving the definitions of the properties. Must these
namespaces be URLs or would URIs also do? Can we make the same case
for other vocabulary elements like classes? How do we know if a given
resources refers to a piece of vocabulary?

The issue of namespaces is tightly intertwined with many other issues
in need of resolution. For example, the same trick of using namespaces
are schema/definition locators can be potentially used for
datatypes. Thus, given a namespace-prefixed resource
(http://iso.org/datatypes/integer32:,12345) we would know where to
fetch more information about 32-bit integers (similarly to XML Schema).

If namespaces are made explicit in the model, they could be used for
assigning special (denotational) semantics to certain resources, e.g.
"anonymous" resources, variables, etc.

Futher relevant issues are:

        http://www.w3.org/2000/03/rdf-tracking/#rdfms-fragments
       
http://www.w3.org/2000/03/rdf-tracking/#rdfms-qnames-cant-represent-all-uris

Syntactic namespaces vs. vocabulary namespaces
----------------------------------------------

Another important point is the distinction between "syntactic" and
"vocabulary" namespaces. For example, rdf:Description is what I call a
"syntactic" element, since it does not make it into the model in any
way. In contrast, rdf:type is a vocabulary element. Both belong to the
same namespace. xml:lang is an even more mysterious case. It is
unclear, whether such distinction is important.

Related issue:
        http://www.w3.org/2000/03/rdf-tracking/#rdfms-rdf-names-use

Introducing namespaces
----------------------

A suggestion of how namespaces can be introduces in the formal model
can be found at:

        http://www-db.stanford.edu/~melnik/rdf/formal-model.txt

In the above proposal, entity is a hypernym for resources and
literals, whereas an entity (constant) consists of two Unicode
strings. In this way, resources and literals are handled uniformly.

Additional evidence for making namespaces *Unicode strings* rather
than URIs is provided in DOM Level 2 spec:

        Absolute URI references are treated as strings and compared
        literally.

If xml:lang is resolved by attaching language labels, applications
need to manage a pair of strings like ("fr", "chat"), which is very
similar, but more limited than ("http://iso.org/1988/639/de",
"Rat"). For the curious: "Rat" means "council" in English. A special
namespace could also be used for identifying XML structure in
literals. Along with getNamespace() and getLocalName(),
implementations could provide methods like getObject() to return a
Java (C++, etc.) object corresponding to the resource.

Relevant issues:
        http://www.w3.org/2000/03/rdf-tracking/#rdfms-xmllang
       
http://www.w3.org/2000/03/rdf-tracking/#rdfms-literal-is-xml-structure

Another major question that arises is when two resources (or resource
constants) are considered equivalent. It seems that DOM Level 2
followed the approach that two ordered pairs are equal iff their
corresponding elements are equal (I did not find explicit evidence to
that in the spec though).

In the M&S serialization, it would not be quite trivial to figure out
what namespaces of resources are, since resources are often referred
to by using expanded/concatenated URIs. RDF API uses the explicit
namespace declarations to guess the namespaces of other resources by
looking at their prefixes.
Received on Friday, 20 July 2001 14:14:44 UTC