What to do about namespace derived URI refs... (long)

Hey folks,

I've been thinking alot about this namespace URI reference issue that
suggests
an inherent incompatibility between XML Schema and RDF, and have been
following
various discussions here and there, and would like to share some thoughts on
the matter for discussion and offer some proposals towards a solution.

The discussions about whether to concatenate fragment refs with namespace
URIs with
or without intermediate punctuation, such as '#', seems to me to miss the
whole point of
the problem. Modifying the RDF spec to have an algorithm for concatenation
that syncs 
with that of XML Schema is simply treating the symptom, not curing the
disease.

The root of the problem is that, even though namespaces use URI's to achieve
a set
of unique identifiers, which then serve as prefixes for names to in turn
achieve a set
of unique names for a global scope -- the fact is that namespace URIs are
not expected
nor required to resolve to any actual data stream nor, if they do correspond
to a data
stream, are they required to resolve to the same MIME content type for all
namespace
URIs. Since URI reference fragment identifiers are tied to a specific MIME
content type,
and as namespaces are not,*and* because a given namespace might have
definition
in a number of different MIME content types (DTD, XML Schema, RDF Schema, or
any other arbitrary schema encoding)  there cannot be any single,
consistent, reliable 
algorithm for deriving  the correct URI reference of a name defined within
some namespace
as any such URI reference will be tied to one of possibly many definitions
based on
that namespace and thus not representative of the abstract namespace itself.

Furthermore, because the fragment reference syntax for different MIME
content types
vary (e.g. the latest XML Schema spec vs. XML/RDF, etc.) it is to be
expected that
URI references to the definitions of named resources within schemas will
vary from
schema encoding to schema encoding -- and thus be unnable to address the
fact
that despite the different schema content types, we are talking about the
*same* 
resources!

This confusion has apparently arisen from the (unfortunate) use of HTTP URIs
as
namespace URIs. Although namespace URIs are themselves not expected to
resolve
to a content stream, URLs *are* (that's what makes them URLs!) and an HTTP
URI
is a URL and therefore IMO it is an error if it does *not* resolve to a
content 
stream. Note that the error is not that the namespace does not resolve to a 
content stream, but that the HTTP URL used to define the namespace does not.
However, since the vocabulary/ontology corresponding to a given namespace
can be defined by numerous schema encodings (and might have several in use),
one
cannot share a common HTTP URI namespace prefix with all schema encodings as
they
may have incompatible URI fragment syntax due to being different MIME
content types!

IMO, what is needed to solve this mess is an explicit and standardized
notation for
global universal identifiers based on a mechanism such as a URN scheme which
provides 
for the global specification of vocabularies/taxonomies which can be used as
the basis 
of common reference in various schemas and applications based on those
vocabularies. 
The root or partial prefix of instances of such a URN scheme would serve as
the
namespace prefix and below that would define the vocabulary terms,
hierarchically
arranged. There would then simply need to be a mapping from this single,
standardized 
notation to/from the various MIME content types such as XML, XML DTD, XML
Schema, RDF,
etc., but this would be explicit and regular. 

A proposal for discussion: Hierarchical Resource Names URN scheme

(the following is provided as a rough example for discussion only, please no
 nits about minor flaws, etc. there are surely errors and shortcomings, as
 will always be the case in contexts of high caffiene and sleep depravation
;-)

HRN = urn:hrn:<authority>/<path>
authority = (<rfc2732 host> | <user>)
user = <rfc2396 userinfo>@<rfc2732 host>
path = (<name> (/<name>)*)
name = /[a-zA-Z0-9]([-_.]?[a-zA-Z0-9])*/

E.g. (examples based on MARS metadata ontology)

urn:hrn:metia.nokia.com/MARS/2.1                 ;MARS 2.1 Vocabulary
urn:hrn:metia.nokia.com/MARS/2.1/coverage        ;MARS 'coverage' property
urn:hrn:metia.nokia.com/MARS/2.1/coverage/fi     ;MARS 'coverage' property
value 'fi' (Finland)
urn:hrn:metia.nokia.com/MARS/2.1/language        ;MARS 'language' property
urn:hrn:metia.nokia.com/MARS/2.1/language/fi     ;MARS 'language' property
value 'fi' (Finnish)
urn:hrn:metia.nokia.com/MARS/2.1/status          ;MARS 'status' property
urn:hrn:metia.nokia.com/MARS/2.1/status/draft    ;MARS 'status' property
value 'draft'
urn:hrn:metia.nokia.com/MARS/2.1/status/approved ;MARS 'status' property
value 'approved'
urn:hrn:metia.nokia.com/MARS/2.1/status/retired  ;MARS 'status' property
value 'retired'

urn:hrn:patrick.stickler@nokia.com/myCalendarOntology
urn:hrn:patrick.stickler@nokia.com/myCalendarOntology/date
urn:hrn:patrick.stickler@nokia.com/myCalendarOntology/time
urn:hrn:patrick.stickler@nokia.com/myCalendarOntology/event
...

Note:

* The property values 'coverage/fi' and 'language/fi' are not the same
concept/resource, even though they have the same ISO defined name. One is a 
country, the other a language. Thus, if we are to assign e.g. labels or 
other properties and relations for these resources for various 
languages/regions, we must be able to differentiate between them
explicitely.

* By requiring that the authority be a valid host or email address according
to 
RFC 2397 and 2732, , the issue of registering authority identifiers is
avoided as
the registries for internet domain names and address spaces as well as
per-domain, per-server user management can be utilized. It further serves
to ground the resource identities in known web resources.

* By allowing the authority to be not only a host but a user, an individual
is able to define and publish personal ontologies without having to first
secure a domain name, etc.

For RDF/RDF Schema/DAML/etc., one would simply use the HRN URNs in all 
statements. E.g.:

...

<Property      rdf:ID       ="urn:hrn:metia.nokia.com/MARS/2.1/status">
   <rdf:label  rdf:value    ="Status" xml:lang="en"/>
   <rdfs:range rdf:resource ="#Status"/>
   <count      rdf:resource ="#Single"/>
   <range      rdf:resource ="#Bounded"/>
   <ranking    rdf:resource ="#Strict"/>
   <default    rdf:resource
="urn:hrn:metia.nokia.com/MARS/2.1/status/draft"/>
</Property>

<rdf:Class rdf:ID="Status" .../>

<Status rdf:ID="urn:hrn:metia.nokia.com/MARS/2.1/status/draft">
   <rdf:label rdf:value="Draft" xml:lang="en"/>
   <rank      rdf:value="1"/>
</Status>

<Status rdf:ID="urn:hrn:metia.nokia.com/MARS/2.1/status/approved">
   <rdf:label rdf:value="Approved" xml:lang="en"/>
   <rank      rdf:value="2"/>
</Status>

...

In an XML Schema, one would use part of the HRN URN path as a namespace URI,

and define the mapping of element/attribute names from the XML Schema
encoding 
to the HRN URN representation. E.g.

<schema ...
        xmlns:mars="urn:hrn:metia.nokia.com/MARS/2.1"
        targetNamespace="urn:hrn:metia.nokia.com/MARS/2.1"
        ...>

...

<!-- urn:hrn:metia.nokia.com/MARS/2.1/status -->
<element name="status" substitutionGroup="mars:property">
   <complexType base="mars:Property" derivedBy="restriction">
      ...
      <simpleType base="mars:TokenString">
         <choice>
            <!-- urn:hrn:metia.nokia.com/MARS/2.1/status/draft -->
            <enumeration value="draft"/>
            <!-- urn:hrn:metia.nokia.com/MARS/2.1/status/draft_approved -->
            <enumeration value="draft_approved"/>
            <!-- urn:hrn:metia.nokia.com/MARS/2.1/status/approved -->
            <enumeration value="approved"/>
            <!-- urn:hrn:metia.nokia.com/MARS/2.1/status/retired -->
            <enumeration value="retired"/>
         </choice>
      </simpleType>
   </complexType>
<element/>

Presuming that the above XML Schema is being used to parse/validate
the following content: 

   <mars:status>approved</mars:status>
 
then what remains to be resolved is how the literal 'approved' accoding
to the serialization schema is associated with the HRN URN 
"urn:hrn:metia.nokia.com/MARS/2.1/status/approved", etc. so that
the RDF statements above regarding label, rank, etc. apply.

I.e., without such a mapping, we get the triple:

   ("...", "urn:hrn:metia.nokia.com/MARS/2.1/status", "approved")

but what we need/want is:

   ("...", "urn:hrn:metia.nokia.com/MARS/2.1/status", 
 
"urn:hrn:metia.nokia.com/MARS/2.1/status/approved")

It would be *really* icky (for lack of a more technical term ;-) to 
have to define the XML Schema as follows, simply to achieve a reliable
and explicit intersection between the XML Schema, XML serialized instance,
and RDF Schema... 

<!-- urn:hrn:metia.nokia.com/MARS/2.1/status -->
<element name="status" substitutionGroup="mars:property">
   <complexType base="mars:Property" derivedBy="restriction">
      <simpleType base="mars:HRN">
         <choice>
            <enumeration
value="urn:hrn:metia.nokia.com/MARS/2.1/status/draft"/>
            <enumeration
value="urn:hrn:metia.nokia.com/MARS/2.1/status/draft_approved"/>
            <enumeration
value="urn:hrn:metia.nokia.com/MARS/2.1/status/approved"/>
            <enumeration
value="urn:hrn:metia.nokia.com/MARS/2.1/status/retired"/>
         </choice>
      </simpleType>
   </complexType>
<element/>

and have to encode the serialization as:

 
<mars:status>urn:hrn:metia.nokia.com/MARS/2.1/status/approved</mars:status>

or

   <mars:status
rdf:resource="urn:hrn:metia.nokia.com/MARS/2.1/status/approved"/>

An alternate approach would be to use empty elements to represent
members of controlled value sets, e.g.

   <mars:status><mars_status:approved/></mars:status>

but as the value name set of each property having a controlled value set
and the property name set itself should correspond to different namespaces,
one must resort to separate XML Schema definitions for each property value
set, which is cumbersome, both for specification and for markup.

As it is common to use simple enumerations of controlled value sets (e.g.
xml:lang taking an ISO-639 value, etc.) there needs to be, in addition to
the schema encoding neutral identity of such values, a consistent way to
map to that identity from their literal representations, based on the schema
defining the serialization. 

One possible solution would be to permit a targetNamespace attribute to
be specified for enumeration declarations which would define the namespace 
to which the literal name value belongs. E.g.

<!-- urn:hrn:metia.nokia.com/MARS/2.1/status -->
<element name="status" substitutionGroup="mars:property">
   <complexType base="mars:Property" derivedBy="restriction">
      <simpleType base="mars:Token">
         <choice>
            <!-- urn:hrn:metia.nokia.com/MARS/2.1/status/draft -->
            <enumeration value="draft"   
 
targetNamespace="urn:hrn:metia.nokia.com/MARS/2.1/status"/>
            <!-- urn:hrn:metia.nokia.com/MARS/2.1/status/draft_approved -->
            <enumeration value="draft_approved" 
 
targetNamespace="urn:hrn:metia.nokia.com/MARS/2.1/status"/>
            <!-- urn:hrn:metia.nokia.com/MARS/2.1/status/approved -->
            <enumeration value="approved" 
 
targetNamespace="urn:hrn:metia.nokia.com/MARS/2.1/status"/>
            <!-- urn:hrn:metia.nokia.com/MARS/2.1/status/retired -->
            <enumeration value="retired" 
 
targetNamespace="urn:hrn:metia.nokia.com/MARS/2.1/status"/>
         </choice>
      </simpleType>
   </complexType>
</element>
 
Now, it is explicit for each literal enumerated value what its HRN URI
reference
should be, being the simple appendage of the literal name to the namespace
path,
and the simple serialization

   <mars:status>approved</mars:status>

results in the desired triple:

   ("...", "urn:hrn:metia.nokia.com/MARS/2.1/status", 
 
"urn:hrn:metia.nokia.com/MARS/2.1/status/approved")

There are likely numerous better ways to accomplish this mapping from
literal
value to qualified name, and I've not tried to ponder at length about the
precise 
mechanism by which this ultimately would be accomplished (as it would in any
case 
vary from MIME content type to type -- but have simply tried to illustrate
where 
the hole is and one possible path around it.

It is likely that the semantics of the targetNamespace attribute will
preclude
its use as per the examples above. The precise attribute used is irrelevant
so
long as it is possible to achive the necessary namespace declaration for the
literal values.

--

The benefit of having an global identifier scheme such as HRN defined
above is that one need not worry about the particulars of various schema
or other encoding mechanisms when referring to an abstract concept, such
as within the context of RDF/DAML/etc. I.e. an XML Schema declaration
for an element "foo" does not define or represent the concept "foo", only
one possible serialization of the concept "foo". We should be able to talk
about "foo" irregardless of how statements about it might be serialized
on one encoding or another. And the same scheme then works for concepts,
vocabularies, etc. which have no specification in any MIME content type
or which are encoded in a MIME content type for which there is no fragment
syntax (e.g. IETF RFCs encoded as text/plain ;-)

Please, let's abandon the use of HTTP URIs for namespace identity!
Namespaces,
vocabularies, ontologies, etc. are *abstract* resources and thus should be
defined using non-URL URIs! If one wishes to then specify one or more URLs
for schemas or other content streams which provide explicit definition of,
information about, realizations of, or constraints upon those abstract
resources,
great, but let's stop using URI schemes intended for identifying content
streams
to identify abstract resources!

In this regard, Topic Maps got it right, by separating the reification of
abstract (or even concrete) resources with their occurrences (realization,
expression, use, description, etc.). We can learn a lesson or two there.

I look forward to hearing the comments and discussion of the above from 
others in this forum. Sorry for the length.

Cheers,

Patrick

--
Patrick Stickler                      Phone:  +358 3 356 0209
Senior Research Scientist             Mobile: +358 50 483 9453
Software Technology Laboratory        Fax:    +358 7180 35409
Nokia Research Center                 Video:  +358 3 356 0209 / 4227
Visiokatu 1, 33720 Tampere, Finland   Email:  patrick.stickler@nokia.com
 

Received on Wednesday, 6 June 2001 05:14:00 UTC