- From: <Patrick.Stickler@nokia.com>
- Date: Fri, 10 Aug 2001 14:46:50 +0300
- To: scranefield@infoscience.otago.ac.nz, www-rdf-interest@w3.org
> The major problem is that the existing mechanism in RDF for mapping > QNames to URIs (via concatenation) would produce a URI that doesn't > correspond to a formal public ID (which must end with a language > indicator). For example, the RDF class BungyJump in the above > namespace would have the following odd-looking URI: Yup. In short, the issue of what kind of URI scheme to use is only about management and persistence of uniqueness, and thus, doesn't really impact that actual deployment of interoperable SW tools and agents, *but* the QName to resource URI problem is a "show stopper" (i.e. a critical problem that has to be fixed, and soon). > > urn:publicid:-:University+of+Otago:NONSGML+Tourism+ontology+v1 > .0:ENBungyJump > > One answer to this would be to make the QName to URI mapping be > dependent on the URI scheme used for the namespace. For urn:publicid > the algorithm might be to insert ";" and the local name before the > language specifier (":EN"). This would give: This isn't scalable. There can potentially be an infinite number of URI schemes, and we can expect that there will be at least thousands or millions in actual practice, but we cannot expect SW agents to grok all those schemes -- especially when the scheme involved is entirely *irrelevant* to the role that the URI is serving -- simply as a unique universal identifier. > ... > Therefore an unofficial syntax could be used to identify namespaces. > One possibility is to use the formal public ID syntax but with the > language part omitted and a convention to end the ID with "::" (so > that concatenation will work for QName to URI mapping): > > urn:publicid:-:University+of+Otago:NONSGML+Tourism+ontology+v1.0; > > corresponding to: > > -//University of Otago//NONSGML Tourism ontology v1.0:: But this is really just the same as adopting a single URI scheme having a single allowed MIME type with an explicit fragment syntax. > Alternatively some other syntax could be used, such as the one > proposed for the tag scheme (http://www.taguri.org/): > > urn:publicid:infoscience.otago.ac.nz,2001-08-10:TourismOntology; > > which corresponds to the following public identifier: > > infoscience.otago.ac.nz,2001-08-10//TourismOntology:: But again, this is imposing a single URI scheme on the SW in the efforts to fix what essentially is a missing interface between the syntax (QName) and semantic (resource URI) realms. Furthermore, there still is the issue of mapping literal values in serialized XML data to resources in the semantic space. To take a simple example: "xml:lang='en'" not only defines a property corresponding to 'Language' but also a resource corresponding to the language 'English'. If the value 'en' remains a literal in the RDF/SW space, then we can't specify that it happens to be the same as e.g. "urn:kielet:englanti" (which is a Finnish language ontology of language names, where 'Englanti' is the Finnish name for 'English', etc.) What is needed is a simple, explicit, standardised mechanism for defining the mappings from syntactic representations (QNames and literal property values) to resource URIs. For the sake of anyone who missed it a few months ago, I append my earlier posting to the group regarding a proposal for how to solve this mapping problem. Regards, Patrick -- Patrick Stickler Phone: +358 3 356 0209 Senior Research Scientist Mobile: +358 50 483 9453 Software Technology Laboratory Fax: +358 7180 35409 Nokia Research Center Video: +358 3 356 0209 / 4227 Visiokatu 1, 33720 Tampere, Finland Email: patrick.stickler@nokia.com ------------------------- (repost) The following is a proposal for extensions/refinements of RDF and RDF Schema with the aim of achieving modular, scalable and generic interchange of knowledge for the Semantic Web. It is believed that this proposal is fully backward compatible with all existing RDF applications. If the ideas embodied in this proposal have been suggested before by others, my apologies in advance for my ignorance of any prior work intersecting with that expressed below. First, I will outline some claims, which are the motivation for this proposal. Although I consider each of these claims to be true, the rejection of any one or all of them will not IMO reduce the inherit value of this proposal, only the absolute necessity for some solution such as that proposed. I will then summarize the problem and describe a possible solution to the problem with examples. === Claims === Claim 1: A namespace and name pair does not constitute any kind of universal semantic identity, only a unique syntactic form which can be associated with some semantic identity. Although names within namespaces do serve to differentiate content which is attributed meaning, and that meaning is typically (though not necessarily) suggested by the linguistic properties of that name, the syntactic form selected for any particular serialization is local to that serialization and many syntactic forms may map to the same common semantics. The syntactic form provides a mechanism by which we may define a mapping to that universal meaning, but it does not serve itself as the universal identifier of that meaning. Likewise, a namespace does not officially identify any ontology or semantic space, even if it is often used to do so, but only is a syntactic mechanism by which name collisions are avoided in the syndication of arbitrary syntactic forms in a given serialization. There is no requirement whatsoever that a namespace provide any semantic identity. Claim 2: A name within a given namespace does not equate to a URI reference of that name within any content dereferencable from the namespace URI reference. I.e. "namespace" + "name" != "namespace#name". Although one might by coincidence be able to dereference a name as a fragment within a MIME stream retrievable from the namespace URI and get something that defines, describes or otherwise relates to that name within the namespace, no such relationship is defined to exist between a namespace URI and a name. Furthermore, as a given namespace may have serializations defined in various schema formalisms, each potentially having different MIME content types with potentially different fragment schemes, yet all defining the same namespace URI and name, there is then potentially a many to one mapping from namespace and name pair to URI reference into each of those schema instances. Furthermore, the XML Namespace spec states that a namespace URI reference need not be dereferencable to any content, and therefore no particular fragment syntax can be deduced or inferred for an unknown or undefined MIME content type. Claim 3: We cannot use concatenation, suffixation, insertion or any other method of combining a name with a namespace URI reference to obtain a compound URI reference without violating the sanctity of either the URI scheme and/or some MIME content type fragment syntax space. This would not be a problem if rdf:about or rdf:resource values simply needed to be unique strings. However, they are required to be valid URI references, and therefore there is never a garuntee that any combination of namespace URI and name will not produce an invalid URI. Likewise, it is not possible to reliably re-partition any merged namespace plus name URI reference back into its namespace and name components which is necessary for re-serialization of knowledge (see discussion and examples below regarding bi-directional serialization mapping). Claim 4: The current methodology employed by RDF to attempt to create a semantic resource identity by direct concatenation of namespace and name does not ensure the preservation of the uniqueness of namespace qualified names. E.g. Both of the following valid yet distinct syntactic forms are mapped to the same semantic resource URI, resulting in an RDF-internal naming collision: <x:varovasti xmlns:x="http://x.com/z#aja"> -> "http://x.com/z#ajavarovasti" <x:rovasti xmlns:x="http://x.com/z#ajava"> -> "http://x.com/z#ajavarovasti"! The fact that the above example is contrived does in no way invalidate the fact that the present RDF methodology is unreliable and can result in inintended semantic ambiguity from distinct syntactic forms. This example, along with the discussion in claim 2 about unclear re-partitioning of combined URI references, demonstrates the fact that the uniqueness of a namespace and name pair has three elements: (1) the unique namespace, (2) the unique name within that namespace, and (3) a distinct boundary between the two. === Summary of the Problem === We must have an explicit mapping defined between a namespace and name pair and the univeral semantic identity they are intended to correspond to or represent. Neither RDF nor RDF Schema (nor DAML) currently provide this mapping. === Proposed Solution to Problem === Step 1: Clarify (refine) the interpretation of rdf:ID and rdf:about as follows: a) rdf:ID equates to a name within a namespace correlating to the serialization (syntactic form) of a semantic resource (meaning). It only equates to the name of an element within the RDF serialization and not to any semantic resource. b) rdf:about equates to a resource within the semantic space, either abstract or concrete. Thus, rdf:about values (URI references) become (or already are) the soul of the Semantic Web, whereas rdf:ID values (namespace and name pairs) are just a means to an end, serving only the mapping of syntactic constructs to semantics. Step 2: Provide for explicit mapping between syntactic forms and semantic resources. I.e. for mapping rdf:ID values to rdf:about values. This is achieved by the following two methods: Mapping method 1: RDF Add an element rdf:Map to RDF that is used to map syntactic forms to semantic resources. E.g. <rdf:Map rdf:resource="http://purl.org/dc/elements/1.0" rdf:ID="date" rdf:about="http://dublincore.org/1.0/Date"/> will result in a syntactic form such as <x:date xmlns:x="http://purl.org/dc/elements/1.0"> being equated with the semantic resource "http://dublincore.org/1.0/Date" Thus, the pair of the namespace specified as the rdf:resource and the name specified as the rdf:ID are mapped to the semantic resource specified in rdf:about. This new construct also provides for mapping of serialized literals to semantic resources by the inclusion of an rdf:value attribute along with rdf:resource and rdf:ID, mapping any such literal value occurring as the sole PCDATA of an element of the specified name within the specified namespace to the resource specified in the rdf:about value. E.g. <rdf:Map rdf:resource="http://metia.nokia.com/MARS/2.1" rdf:ID="language" rdf:about="name:metia.nokia.com/MARS/2.1/language"/> <rdf:Map rdf:resource="http://metia.nokia.com/MARS/2.1" rdf:ID="language" rdf:value="en" rdf:about="name:metia.nokia.com/MARS/2.1/language/en"/> will result in a syntactic form such as <x:language xmlns:x="http://metia.nokia.com/MARS/2.1">en</...> being equated with the predicate and object semantic resources pred: "name:metia.nokia.com/MARS/2.1/language" obj: "name:metia.nokia.com/MARS/2.1/language/en" Note that no namespace prefix is ever used in a rdf:Map definition as that is unnecessary, and also contrary to the XML Namespace spec which constrains the siginficance of prefixes to within a given serialized instance. All that is needed is the pair of namespace URI reference (rdf:resource) and element (or global attribute) name (rdf:ID), and optionally, a literal PCDATA string (rdf:value). Mapping method 2: RDF Schema For all rdfs:Class declarations, rdf:about becomes manditory as the identity of the semantic resource and as per the rdf:Map construct, rdf:ID and (optionally) rdf:resource and rdf:value are used to define a mapping from syntactic form to semantic resource. They need not be specified in the rdfs:Class declaration if either no serialization mapping is needed or it is defined elsewhere (the expected usual practice). Thus, in addition to reifying a semantic resource, the rdfs:Class functions as a synonymous construct for rdf:Map. E.g. <rdfs:Class rdf:resource="http://metia.nokia.com/MARS/2.1" rdf:ID="language" rdf:about="name:metia.nokia.com/MARS/2.1/language"/> maps a syntactic form such as <x:language xmlns:x="http://metia.nokia.com/MARS/2.1"> to the resource "name:metia.nokia.com/MARS/2.1/language" and is equivalent to the following two constructs <rdfs:Class rdf:about="name:metia.nokia.com/MARS/2.1/language"/> <rdf:Map rdf:resource="http://metia.nokia.com/MARS/2.1" rdf:ID="language" rdf:about="name:metia.nokia.com/MARS/2.1/language"/> Or, if the default namespace is defined and is to be used for serialization names, the following <rdf:RDF xmlns="http://metia.nokia.com/MARS/2.1" ...> <rdfs:Class rdf:ID="language" rdf:about="name:metia.nokia.com/MARS/2.1/language"/> </rdf:RDF> defines the very same syntactic form to semantic resource mapping as the first rdf:Class example given above. The value of the default namespace of the instance is, per the normal XML Namespace behavior, used as the rdf:resource value of the rdfs:Class declaration. Given this refinement to RDFS, it is an error if no default namespace is defined yet an rdf:ID value is defined in a rdfs:Class declaration; as this would fail to ground the serialization name within any namespace. === Discussion and Examples === With this refined interpretation of RDF and RDF Schema, the combined use of rdf:ID values and rdfs:Class constructs by RDF and RDF Schema simply provides a built in serialization schema mechanism for cases when stricter serialization (such as for literal content values) is not needed. Yet the definition of any serialization does not satisfy the need for mapping syntactic forms to semantic resources. The addition of the rdf:Map element provides for such an explicit mapping; and furthermore, given the additional ability to map literal PCDATA values to abstract semantic resources, it provides for easier use of controlled value sets in serializations, which greatly simplifies as well as reduces the verbosity of serialized instances. [Example 1: Given the following definition in an RDF Schema reifying an abstract semantic concept within a given ontology <rdfs:Class rdf:about="name:metia.nokia.com/MARS/2.1/language"/> and in a separate RDF Schema the following mapping from a particular syntactic form to that semantic resource <rdf:Map rdf:resource="http://metia.nokia.com/MARS/2.1" rdf:ID="language" rdf:about="name:metia.nokia.com/MARS/2.1/language"/> then from the following syntactic form <x:language xmlns:x="http://metia.nokia.com/MARS/2.1"> we get the predicate "name:metia.nokia.com/MARS/2.1/language" Yet from another serialization model, mapped to the same semantics for the same ontology <rdf:Map rdf:resource="mailto:patrick.stickler@nokia.com" rdf:ID="lang" rdf:about="name:metia.nokia.com/MARS/2.1/language"/> then from the following syntactic form <x:lang xmlns:x="mailto:patrick.stickler@nokia.com"> we get the *same* predicate "name:metia.nokia.com/MARS/2.1/language" Or, yet another alternate, such as for a localized Finnish language serialization of the same ontology to the common (language neutral, or unified English language) semantics <rdf:Map rdf:resource="http://metia.nokia.com/MARS/2.1/fi" rdf:ID="kieli" rdf:about="name:metia.nokia.com/MARS/2.1/language"/> then from the following localized syntactic form <x:kieli xmlns:x=http://metia.nokia.com/MARS/2.1/fi"> we still get the *same* predicate "name:metia.nokia.com/MARS/2.1/language" ] [Example 2: In example 1 above, a separate mapping was defined for each serialization context (element) of the literal PCDATA value 'fi' to the same semantic resource. If all literals for all serializations are expected/required to be from the same controlled set of literals, then a more global mapping can be defined using rdf:resource rather than rdf:ID and rdf:resource value pairs. I.e. <rdf:Map rdf:resource="name:metia.nokia.com/MARS/2.1/language" rdf:value="fi" rdf:about="name:metia.nokia.com/MARS/2.1/language/fi"/> will now suffice for any element context mapped to the specified resource and having the sole literal value of 'fi'. I.e. the above does the work of both of the following, plus any other serialization of the value 'fi' within any element serialization mapped to the semantic resource "name:metia.nokia.com/MARS/2.1/language": <rdf:Map rdf:resource="http://metia.nokia.com/MARS/2.1" rdf:ID="language" rdf:value="fi" rdf:about="name:metia.nokia.com/MARS/2.1/language/fi"/> <rdf:Map rdf:resource="http://metia.nokia.com/MARS/2.1/fi" rdf:ID="kieli" rdf:value="fi" rdf:about="name:metia.nokia.com/MARS/2.1/language/fi"/> Thus from both of the following alternate serializations <x:language xmlns:x=http://metia.nokia.com/MARS/2.1">fi</...> <x:kieli xmlns:x=http://metia.nokia.com/MARS/2.1/fi">fi</...> we get the *same* predicate and object pair pred: "name:metia.nokia.com/MARS/2.1/language" obj: "name:metia.nokia.com/MARS/2.1/language/fi" ] The interpretation of the rdf:resource value within an rdf:Map declaration depends on whether or not an rdf:ID value exists. If it does, then the rdf:resource value is a namespace (syntactic) resource, otherwise it is a semantic resource for which there might exist one or more other mappings from syntactic forms to that same rdf:resource value and any such syntactic form acts as a valid context for the mapping of specified literal value string to the specified resource. Any element in a serialization that is not identified by a mapping (either by an rdf:Map or rdfs:Class construct) is not necessarily flagged as an error, but it is ambiguous and therefore should be ignored by the parsing process and not mapped into any triple. Any literal PCDATA content that is not identified by an rdf:Map declaration with matching namespace, name and value may be flagged as an error if the enclosing property has a non-Literal range. However, in both cases, the parser could be instructed to issue warnings about all such cases, thus providing exceptionally strict data validation for all controlled vocabularies with values serialized as data strings (e.g. 'en' for English, etc.) -- though still not providing any true validation mechanisms for true literals such as integer, float, date formats, etc. without additional mechanisms -- though read on ;-) This proposed rdf:Map construct allows different folks to use different namespaces for equivalent semantics or when ns URI's change over time one can still unify all syntactic variants to a single consistent semantics -- AND semantics is no longer inseparably bound to syntactic forms but each system could then map any serialization to a local set of semantic terms (custom, non-standardized ontology of resource URIs) and then utilize RDF Schema to map that ontology to other ontologies. Finally, it allows for RDF interpretations of non-RDF and legacy serializations with no cooperation of the defining authority of that serialization nor modification of serialized content. === Regular expression constraints on syntactic literals === A final addition to the above methodology provides both the last missing functionality needed for strict literal data typing (within the limits of regular expressions) as well as allows for pattern constraints to be associated not only with the syntactic form or the resource it is directly mapped to, but (in conjunction with RDF Schema) with any instance of that class or any subclass of that target resource. [Example 3: If suffixes are possibly allowed for language values (for dialects), the following definitions map all regional dialects to the same common language resource (i.e. in this case, we don't care about dialectal differences, even if specified in the serialized data): <rdf:Map rdf:resource="name:metia.nokia.com/MARS/2.1/language" rdf:regex="en(-.*)?" rdf:about="name:metia.nokia.com/MARS/2.1/language/en"/> <rdf:Map rdf:resource="name:metia.nokia.com/MARS/2.1/language" rdf:regex="fi(-.*)?" rdf:about="name:metia.nokia.com/MARS/2.1/language/fi"/> thus both of the following syntactic forms <x:language xmlns:x=http://metia.nokia.com/MARS/2.1">en</...> <x:language xmlns:x=http://metia.nokia.com/MARS/2.1">en-us</...> map to the same pair of semantic resources pred: "name:metia.nokia.com/MARS/2.1/language" obj: "name:metia.nokia.com/MARS/2.1/language/en" ] Granted, one could define separate mappings for all of the possible literal values to the same resource, but the above is much more concise and clearer. Still, the real use of the rdf:regex extension is for literal values in serialization that will remain literals in the triples but for which some degree of validation is needed. If no rdf:about value is specified in the rdf:Map construct, then the pattern is simply interpreted as a constraint on the literal value, and the serialized value still becomes a literal in the triple. [Example 4: Let's make sure that integer values for count properties really are integers (with no irrelevant multi-zero padding of course ;-): <rdf:Map rdf:resource="name:metia.nokia.com/MARS/2.1/count" rdf:regex="(0)|([1-9][0-9]*)"/> ] [Example 5: A percentage is an integer between 0 and 100: <rdf:Map rdf:resource="name:metia.nokia.com/MARS/2.1/percentage" rdf:regex="(100)|([1-9][0-9])|[0-9]"/> ] [Example 6: For the DAML folks ;-) "over 17": <rdf:Map rdf:resource="name:metia.nokia.com/MARS/2.1/over17" rdf:regex="(1[89])|([2-9][0-9])|([1-9][0-9][0-9]*)"/> thus given the additional mapping and definition <rdf:Map rdf:resource="http://foo.org" rdf:ID="age" rdf:about="http://foo.org#age"/> <rdfs:Class rdf:about="http://foo.org#age" rdfs:subClassOf="name:metia.nokia.com/MARS/2.1/over17"/> and the serialization <x:age xmlns:x="http://foo.org">87</x:age> we get the following predicate and literal pred: "name:metia.nokia.com/MARS/2.1/age" value: "87" yet from the following <x:age xmlns:x="http://metia.nokia.com/MARS/2.1">10</x:age> we get an error, as "10" is a value attributed to the predicate "http://foo.org#age" and that is a subclass of "name:metia.nokia.com/MARS/2.1/over17" and "10" does not pass the mapping constraint regular expression defined for all instances of the resource "name:metia.nokia.com/MARS/2.1/over17" or instances of any of its subclasses. ] It will be admitted that regular expressions are not as elegant and readable for some constraints as XML Schema range constraints, but they should suffice for all RDF serialization needs except for the trully esoteric (bordering on bizarre) for which there are likely custom validation functions available or desireable anyway. Thus, with this final extension to RDF, we don't need XML Schema (or any other schema solution) for RDF/DAML serialization or data type validation for most (even nearly all) applications! In conjunction with RDF Schema, such constraints could be specified once for a superclass, and then utilized by each subclass without the need for redefinition for each class that is directly mapped to from a serialized element. Thus, e.g. any literal value in a serialization that corresponds to a subclass of "name:metia.nokia.com/MARS/2.1/percentage" per the constraint above, must conform to the specified regex constraint. This permits one to define ones data types in RDF, rather than resort to subclassing some XML Schema data type (and just what does that *mean* to an RDF parser anyway?!) This rdf:regex extension could be an optional functionality of an RDF parser/validator, such that without it, everything works but you just don't trap erroneous/invalid literal values (as is the case now with present day RDF parsers). === Backwards compatibility === The above extensions for explicitely defining mappings from arbitrary namespace + name pairs to resources can be made fully backward compatible with existing RDF practice by retaining the current (imperfect/insufficient) mapping of direct concatenation of namespace to name if no other mapping is defined. Thus, systems that have thus far worked by luck and the convenience of HTML fragment syntax and use of http: URLs will continue to work without modification to data or schemas -- yet new systems or revised versions of existing systems can take advantage of the new extensions to more reliably and explicitely address these mapping needs. The risk of collisions between such semantic resource URI references and the inability to re-partition them for serialization will of course remain. === RDF as stand-alone bi-directional solution for serialization === Mappings having either rdf:value or rdf:ID are fully bi-directional and can also be used to serialize semantics according to one or more namespaces! The only ambiguity that can arise is whether an rdf:ID represents an element or global attribute. [Example 7: A non-RDF savvy agent can request of another RDF savvy agent what it knows about a given resource and can specify a custom serialization in which to encode the results by specifying the namespace(s) to use and there being defined for the namespace(s) the rdf:Map definitions mapping between semantic and syntactic forms. I.e. rdf:Map definitions provide for the following mappings ns+name -> resource -> ns+name ns+name+PCDATA -> resource -> PCDATA So, given the following SPO triples in our knowledge base ("http://foo.com/bar.html", "name:metia.nokia.com/MARS/2.1/created", '2001-01-29') ("http://foo.com/bar.html", "name:metia.nokia.com/MARS/2.1/language", "name:metia.nokia.com/MARS/2.1/language/en") ("http://foo.com/bar.html", "http://dublincore.org/1.0/elements/Title", 'The Tao of Bar') ("name:metia.nokia.com/MARS/2.1/title", "http://www.w3.org/2000/01/rdf-schema#subPropertyOf", "http://purl.org/dc/elements/1.1/title") and the following mappings <rdf:Map rdf:resource="http://metia.nokia.com/MARS/2.1" rdf:ID="created" rdf:about="name:metia.nokia.com/MARS/2.1/created"/> <rdf:Map rdf:resource="http://metia.nokia.com/MARS/2.1" rdf:ID="language" rdf:about="name:metia.nokia.com/MARS/2.1/language"/> <rdf:Map rdf:resource="http://metia.nokia.com/MARS/2.1" rdf:ID="language" rdf:value="en" rdf:about="name:metia.nokia.com/MARS/2.1/language/en"/> <rdf:Map rdf:resource="http://metia.nokia.com/MARS/2.1" rdf:ID="title" rdf:about="name:metia.nokia.com/MARS/2.1/title"/> then with the requested target namespace "http://metia.nokia.com/MARS/2.1" we get the desired serialization for that knowledge as <rdf:RDF xmlns:ns1="http://metia.nokia.com/MARS/2.1" ...> <rdf:Description rdf:about="http://foo.com/bar.html"> <ns1:created>2001-01-29</ns1:created> <ns1:language>en</ns1:language> <ns1:title>The Tao of Bar</ns1:title> </rdf:Description> </rdf:RDF> ] Note that in the example above the following triple is inferred from the defined relation between the MARS and DC ontologies via rdfs:subPropertyOf and a query derived from the serialization mapping definition for the target namespace ("http://foo.com/bar.html", "name:metia.nokia.com/MARS/2.1/title", "The Tao of Bar") Thus, the target namespace(s) specified in the query select a set of mappings, from which are derived a number of RDF queries for the subject of interest, and all knowledge about that subject which is retrievable based on those queries are then included in the serialized response, according to the mapping definitions. [Example 8: If the target namespace is e.g. "http://metia.nokia.com/MARS/2.1/fi" (the Finnish language version of the above serialization) then with the alternate serialization mappings to/from the same semantics <rdf:Map rdf:resource="http://metia.nokia.com/MARS/2.1/fi" rdf:ID="luotu" rdf:about="name:metia.nokia.com/MARS/2.1/created"/> <rdf:Map rdf:resource="http://metia.nokia.com/MARS/2.1/fi" rdf:ID="kieli" rdf:about="name:metia.nokia.com/MARS/2.1/language"/> <rdf:Map rdf:resource="http://metia.nokia.com/MARS/2.1/fi" rdf:ID="kieli" rdf:value="en" rdf:about="name:metia.nokia.com/MARS/2.1/language/en"/> <rdf:Map rdf:resource="http://metia.nokia.com/MARS/2.1/fi" rdf:ID="nimi" rdf:about="name:metia.nokia.com/MARS/2.1/title"/> we get the alternate serialization of the same knowledge <rdf:RDF xmlns:ns1="http://metia.nokia.com/MARS/2.1/fi" ...> <rdf:Description rdf:about="http://foo.com/bar.html"> <ns1:luotu>2001-01-29</ns1:luotu> <ns1:kieli>en</ns1:kieli> <ns1:nimi>The Tao of Bar</ns1:nimi> </rdf:Description> </rdf:RDF> ] Note that it doesn't matter what prefix is associated with a namespace in an instance. So they can just be enumerated ns1, ns2, ns3 as needed for the serialization. Any parsing application that bases identification of content based on QNames is "broken" and not conformant to the NS spec; and the cost of that short-cut hack will quickly become apparent. === Making it all work auto-magically === (This final section is not part of the above proposal proper and is not essential for adoption of the mapping solution as described above -- but if combined with the above solution would result in enormous benefit to the SW) If RDDL instances would be dereferencable from namespace URI references and provide links to RDF instances defining rdf:Map mappings to one or more standardized ontologies from serializations grounded in that namespace, then any arbitrary SW agent has the ability to (potentially) eat any serialized input whatsoever from any namespace because it would be able (potentially) to dynamically aquire the knowledge necessary to map serializations from any arbitrary namespace to (potentially) some known set of semantic resources. [Use Case 1: A SW agent recieves an XML instance that includes the property element <ao:päivä xmlns:ao="http://sisu.hut.fi/termit/aikaonto.rddl">, but it is not familiar with that namespace, so it retrieves the bootstrapping RDDL instance from the namespace URI in the hope that it can find out enough about that namespace to make use of the data. Fortunately, the RDDL instance provides a URL to an RDF Schema for that namespace, and in that schema, the agent learns that the syntactic form <ao:päivä xmlns:ao="http://sisu.hut.fi/termit/aikaonto.rddl"> corresponds to a semantic resource "http://sisu.hut.fi/termit/ao/pv" and that that semantic resource is a rdfs:subPropertyOf the semantic resource "http://purl.org/dc/elements/2.1/date". Great. Within its own knowledge base, it knows that one of the properties in its own primary ontology 'urn:partax:foo(created)' is a rdfs:subPropertyOf "http://purl.org/dc/elements/2.1/date". Ahhh, now the recieving agent knows what that input content "means" *and* it can save the new knowledge that it has learned about the relations of the various semantic resources from the different ontologies so that next time it encounters any of them, it knows what they mean! ] The key issue here is that (a) there is a consistent representation for names defined within namespaces irregardless of any schema or content serialization format, and (b) an agent is able to retrieve in a consistent generic manner information about a namespace with no prior knowledge about that namespace whatsoever. Though one could concieve of other methods of tying a bootstrapping instance such as a RDDL instance to a namespace, the use of a URL pointing to that RDDL instance as the namespace instance works with existing web mechanisms and needs no additional infrastructure to be put to use -- and it makes namespace URI references more "logical" as they relate to a consistent content type. Folks can still use any arbitrary URI as a namespace identifier and not tie a bootstrapping instance such as RDDL to it, but then agents that wish to understand how to deal with data serialized according to that namespace will either have to have the knowledge hard-coded or have other additional means of obtaining that knowledge. The SW cannot benefit the world at large unless it achieves a critical mass of interchangable knowledge. RDF is one step towards that goal, but not only must the knowlege be encoded in a consistent manner it must be *accessible* and *retrievable* in a consistent manner in order to meet the distributed, chaotic, scalability requirements which the nature of the web imposes. This architecture is like "DNS for the SW". Without such an architecture and set of mechanisms by which any arbitrary agent can find the information it needs about any arbitrary namespace (and hence any arbitrary ontology) it is like using only /etc/hosts files with no DNS in that every agent must then know what every other agent knows if it is to interact with maximal effectiveness! Whether or not the URI of the namespace is the URL of a RDDL instance bootstrapping that namespace or whether there is some other (URN like) mechanism for resolving the namespace URI to the RDDL instance is a secondary issue. What is crucial is that such a mapping exists, either directly or indirectly, and that agents can access the RDDL instance as needed for any arbitrary namespace. This architecture serves not only semantic needs but, per the purpose of RDDL, also syntactic needs, by also allowing the agent to obtain the necessary serialization schemas to validate incoming information according to the authority of the namespace. E.g. a high capacity, high quality agent is not going to just trust any old bit of data coming to it. It will want to go and get the serialization schema and make sure that e.g. that date is really a valid date according to the schema and not some bit of program code to insert a virus into the system or merely insert invalid data of any kind, which might cause its internal processes to fail -- as they depend on the agents to serve as "data integrity firewalls" for incoming information. Such an architecture, combined with the mapping solution of this proposal would achieve massive scalability and true generalized interoperability for the Semantic Web.
Received on Friday, 10 August 2001 07:47:02 UTC