- From: <Patrick.Stickler@nokia.com>
- Date: Wed, 6 Jun 2001 12:13:41 +0300
- To: www-rdf-interest@w3.org
- Cc: Ora.Lassila@nokia.com
Hey folks, I've been thinking alot about this namespace URI reference issue that suggests an inherent incompatibility between XML Schema and RDF, and have been following various discussions here and there, and would like to share some thoughts on the matter for discussion and offer some proposals towards a solution. The discussions about whether to concatenate fragment refs with namespace URIs with or without intermediate punctuation, such as '#', seems to me to miss the whole point of the problem. Modifying the RDF spec to have an algorithm for concatenation that syncs with that of XML Schema is simply treating the symptom, not curing the disease. The root of the problem is that, even though namespaces use URI's to achieve a set of unique identifiers, which then serve as prefixes for names to in turn achieve a set of unique names for a global scope -- the fact is that namespace URIs are not expected nor required to resolve to any actual data stream nor, if they do correspond to a data stream, are they required to resolve to the same MIME content type for all namespace URIs. Since URI reference fragment identifiers are tied to a specific MIME content type, and as namespaces are not,*and* because a given namespace might have definition in a number of different MIME content types (DTD, XML Schema, RDF Schema, or any other arbitrary schema encoding) there cannot be any single, consistent, reliable algorithm for deriving the correct URI reference of a name defined within some namespace as any such URI reference will be tied to one of possibly many definitions based on that namespace and thus not representative of the abstract namespace itself. Furthermore, because the fragment reference syntax for different MIME content types vary (e.g. the latest XML Schema spec vs. XML/RDF, etc.) it is to be expected that URI references to the definitions of named resources within schemas will vary from schema encoding to schema encoding -- and thus be unnable to address the fact that despite the different schema content types, we are talking about the *same* resources! This confusion has apparently arisen from the (unfortunate) use of HTTP URIs as namespace URIs. Although namespace URIs are themselves not expected to resolve to a content stream, URLs *are* (that's what makes them URLs!) and an HTTP URI is a URL and therefore IMO it is an error if it does *not* resolve to a content stream. Note that the error is not that the namespace does not resolve to a content stream, but that the HTTP URL used to define the namespace does not. However, since the vocabulary/ontology corresponding to a given namespace can be defined by numerous schema encodings (and might have several in use), one cannot share a common HTTP URI namespace prefix with all schema encodings as they may have incompatible URI fragment syntax due to being different MIME content types! IMO, what is needed to solve this mess is an explicit and standardized notation for global universal identifiers based on a mechanism such as a URN scheme which provides for the global specification of vocabularies/taxonomies which can be used as the basis of common reference in various schemas and applications based on those vocabularies. The root or partial prefix of instances of such a URN scheme would serve as the namespace prefix and below that would define the vocabulary terms, hierarchically arranged. There would then simply need to be a mapping from this single, standardized notation to/from the various MIME content types such as XML, XML DTD, XML Schema, RDF, etc., but this would be explicit and regular. A proposal for discussion: Hierarchical Resource Names URN scheme (the following is provided as a rough example for discussion only, please no nits about minor flaws, etc. there are surely errors and shortcomings, as will always be the case in contexts of high caffiene and sleep depravation ;-) HRN = urn:hrn:<authority>/<path> authority = (<rfc2732 host> | <user>) user = <rfc2396 userinfo>@<rfc2732 host> path = (<name> (/<name>)*) name = /[a-zA-Z0-9]([-_.]?[a-zA-Z0-9])*/ E.g. (examples based on MARS metadata ontology) urn:hrn:metia.nokia.com/MARS/2.1 ;MARS 2.1 Vocabulary urn:hrn:metia.nokia.com/MARS/2.1/coverage ;MARS 'coverage' property urn:hrn:metia.nokia.com/MARS/2.1/coverage/fi ;MARS 'coverage' property value 'fi' (Finland) urn:hrn:metia.nokia.com/MARS/2.1/language ;MARS 'language' property urn:hrn:metia.nokia.com/MARS/2.1/language/fi ;MARS 'language' property value 'fi' (Finnish) urn:hrn:metia.nokia.com/MARS/2.1/status ;MARS 'status' property urn:hrn:metia.nokia.com/MARS/2.1/status/draft ;MARS 'status' property value 'draft' urn:hrn:metia.nokia.com/MARS/2.1/status/approved ;MARS 'status' property value 'approved' urn:hrn:metia.nokia.com/MARS/2.1/status/retired ;MARS 'status' property value 'retired' urn:hrn:patrick.stickler@nokia.com/myCalendarOntology urn:hrn:patrick.stickler@nokia.com/myCalendarOntology/date urn:hrn:patrick.stickler@nokia.com/myCalendarOntology/time urn:hrn:patrick.stickler@nokia.com/myCalendarOntology/event ... Note: * The property values 'coverage/fi' and 'language/fi' are not the same concept/resource, even though they have the same ISO defined name. One is a country, the other a language. Thus, if we are to assign e.g. labels or other properties and relations for these resources for various languages/regions, we must be able to differentiate between them explicitely. * By requiring that the authority be a valid host or email address according to RFC 2397 and 2732, , the issue of registering authority identifiers is avoided as the registries for internet domain names and address spaces as well as per-domain, per-server user management can be utilized. It further serves to ground the resource identities in known web resources. * By allowing the authority to be not only a host but a user, an individual is able to define and publish personal ontologies without having to first secure a domain name, etc. For RDF/RDF Schema/DAML/etc., one would simply use the HRN URNs in all statements. E.g.: ... <Property rdf:ID ="urn:hrn:metia.nokia.com/MARS/2.1/status"> <rdf:label rdf:value ="Status" xml:lang="en"/> <rdfs:range rdf:resource ="#Status"/> <count rdf:resource ="#Single"/> <range rdf:resource ="#Bounded"/> <ranking rdf:resource ="#Strict"/> <default rdf:resource ="urn:hrn:metia.nokia.com/MARS/2.1/status/draft"/> </Property> <rdf:Class rdf:ID="Status" .../> <Status rdf:ID="urn:hrn:metia.nokia.com/MARS/2.1/status/draft"> <rdf:label rdf:value="Draft" xml:lang="en"/> <rank rdf:value="1"/> </Status> <Status rdf:ID="urn:hrn:metia.nokia.com/MARS/2.1/status/approved"> <rdf:label rdf:value="Approved" xml:lang="en"/> <rank rdf:value="2"/> </Status> ... In an XML Schema, one would use part of the HRN URN path as a namespace URI, and define the mapping of element/attribute names from the XML Schema encoding to the HRN URN representation. E.g. <schema ... xmlns:mars="urn:hrn:metia.nokia.com/MARS/2.1" targetNamespace="urn:hrn:metia.nokia.com/MARS/2.1" ...> ... <!-- urn:hrn:metia.nokia.com/MARS/2.1/status --> <element name="status" substitutionGroup="mars:property"> <complexType base="mars:Property" derivedBy="restriction"> ... <simpleType base="mars:TokenString"> <choice> <!-- urn:hrn:metia.nokia.com/MARS/2.1/status/draft --> <enumeration value="draft"/> <!-- urn:hrn:metia.nokia.com/MARS/2.1/status/draft_approved --> <enumeration value="draft_approved"/> <!-- urn:hrn:metia.nokia.com/MARS/2.1/status/approved --> <enumeration value="approved"/> <!-- urn:hrn:metia.nokia.com/MARS/2.1/status/retired --> <enumeration value="retired"/> </choice> </simpleType> </complexType> <element/> Presuming that the above XML Schema is being used to parse/validate the following content: <mars:status>approved</mars:status> then what remains to be resolved is how the literal 'approved' accoding to the serialization schema is associated with the HRN URN "urn:hrn:metia.nokia.com/MARS/2.1/status/approved", etc. so that the RDF statements above regarding label, rank, etc. apply. I.e., without such a mapping, we get the triple: ("...", "urn:hrn:metia.nokia.com/MARS/2.1/status", "approved") but what we need/want is: ("...", "urn:hrn:metia.nokia.com/MARS/2.1/status", "urn:hrn:metia.nokia.com/MARS/2.1/status/approved") It would be *really* icky (for lack of a more technical term ;-) to have to define the XML Schema as follows, simply to achieve a reliable and explicit intersection between the XML Schema, XML serialized instance, and RDF Schema... <!-- urn:hrn:metia.nokia.com/MARS/2.1/status --> <element name="status" substitutionGroup="mars:property"> <complexType base="mars:Property" derivedBy="restriction"> <simpleType base="mars:HRN"> <choice> <enumeration value="urn:hrn:metia.nokia.com/MARS/2.1/status/draft"/> <enumeration value="urn:hrn:metia.nokia.com/MARS/2.1/status/draft_approved"/> <enumeration value="urn:hrn:metia.nokia.com/MARS/2.1/status/approved"/> <enumeration value="urn:hrn:metia.nokia.com/MARS/2.1/status/retired"/> </choice> </simpleType> </complexType> <element/> and have to encode the serialization as: <mars:status>urn:hrn:metia.nokia.com/MARS/2.1/status/approved</mars:status> or <mars:status rdf:resource="urn:hrn:metia.nokia.com/MARS/2.1/status/approved"/> An alternate approach would be to use empty elements to represent members of controlled value sets, e.g. <mars:status><mars_status:approved/></mars:status> but as the value name set of each property having a controlled value set and the property name set itself should correspond to different namespaces, one must resort to separate XML Schema definitions for each property value set, which is cumbersome, both for specification and for markup. As it is common to use simple enumerations of controlled value sets (e.g. xml:lang taking an ISO-639 value, etc.) there needs to be, in addition to the schema encoding neutral identity of such values, a consistent way to map to that identity from their literal representations, based on the schema defining the serialization. One possible solution would be to permit a targetNamespace attribute to be specified for enumeration declarations which would define the namespace to which the literal name value belongs. E.g. <!-- urn:hrn:metia.nokia.com/MARS/2.1/status --> <element name="status" substitutionGroup="mars:property"> <complexType base="mars:Property" derivedBy="restriction"> <simpleType base="mars:Token"> <choice> <!-- urn:hrn:metia.nokia.com/MARS/2.1/status/draft --> <enumeration value="draft" targetNamespace="urn:hrn:metia.nokia.com/MARS/2.1/status"/> <!-- urn:hrn:metia.nokia.com/MARS/2.1/status/draft_approved --> <enumeration value="draft_approved" targetNamespace="urn:hrn:metia.nokia.com/MARS/2.1/status"/> <!-- urn:hrn:metia.nokia.com/MARS/2.1/status/approved --> <enumeration value="approved" targetNamespace="urn:hrn:metia.nokia.com/MARS/2.1/status"/> <!-- urn:hrn:metia.nokia.com/MARS/2.1/status/retired --> <enumeration value="retired" targetNamespace="urn:hrn:metia.nokia.com/MARS/2.1/status"/> </choice> </simpleType> </complexType> </element> Now, it is explicit for each literal enumerated value what its HRN URI reference should be, being the simple appendage of the literal name to the namespace path, and the simple serialization <mars:status>approved</mars:status> results in the desired triple: ("...", "urn:hrn:metia.nokia.com/MARS/2.1/status", "urn:hrn:metia.nokia.com/MARS/2.1/status/approved") There are likely numerous better ways to accomplish this mapping from literal value to qualified name, and I've not tried to ponder at length about the precise mechanism by which this ultimately would be accomplished (as it would in any case vary from MIME content type to type -- but have simply tried to illustrate where the hole is and one possible path around it. It is likely that the semantics of the targetNamespace attribute will preclude its use as per the examples above. The precise attribute used is irrelevant so long as it is possible to achive the necessary namespace declaration for the literal values. -- The benefit of having an global identifier scheme such as HRN defined above is that one need not worry about the particulars of various schema or other encoding mechanisms when referring to an abstract concept, such as within the context of RDF/DAML/etc. I.e. an XML Schema declaration for an element "foo" does not define or represent the concept "foo", only one possible serialization of the concept "foo". We should be able to talk about "foo" irregardless of how statements about it might be serialized on one encoding or another. And the same scheme then works for concepts, vocabularies, etc. which have no specification in any MIME content type or which are encoded in a MIME content type for which there is no fragment syntax (e.g. IETF RFCs encoded as text/plain ;-) Please, let's abandon the use of HTTP URIs for namespace identity! Namespaces, vocabularies, ontologies, etc. are *abstract* resources and thus should be defined using non-URL URIs! If one wishes to then specify one or more URLs for schemas or other content streams which provide explicit definition of, information about, realizations of, or constraints upon those abstract resources, great, but let's stop using URI schemes intended for identifying content streams to identify abstract resources! In this regard, Topic Maps got it right, by separating the reification of abstract (or even concrete) resources with their occurrences (realization, expression, use, description, etc.). We can learn a lesson or two there. I look forward to hearing the comments and discussion of the above from others in this forum. Sorry for the length. Cheers, Patrick -- Patrick Stickler Phone: +358 3 356 0209 Senior Research Scientist Mobile: +358 50 483 9453 Software Technology Laboratory Fax: +358 7180 35409 Nokia Research Center Video: +358 3 356 0209 / 4227 Visiokatu 1, 33720 Tampere, Finland Email: patrick.stickler@nokia.com
Received on Wednesday, 6 June 2001 05:14:00 UTC