A proposed solution to the RDF syntactic/semantic mapping problem (long)

(My apologies in advance for the long posting...)


The following is a proposal for extensions/refinements of RDF
and RDF Schema with the aim of achieving modular, scalable and
generic interchange of knowledge for the Semantic Web.

It is believed that this proposal is fully backward compatible
with all existing RDF applications.

If the ideas embodied in this proposal have been suggested before
by others, my apologies in advance for my ignorance of any prior
work intersecting with that expressed below.

First, I will outline some claims, which are the motivation for
this proposal. Although I consider each of these claims to be true,
the rejection of any one or all of them will not IMO reduce the
inherit value of this proposal, only the absolute necessity for
some solution such as that proposed. I will then summarize the
problem and describe a possible solution to the problem with
examples.


=== Claims ===

Claim 1: A namespace and name pair does not constitute any kind of
universal semantic identity, only a unique syntactic form which
can be associated with some semantic identity.

Although names within namespaces do serve to differentiate content
which is attributed meaning, and that meaning is typically (though
not necessarily) suggested by the linguistic properties of that name,
the syntactic form selected for any particular serialization is
local to that serialization and many syntactic forms may map to the
same common semantics. The syntactic form provides a mechanism
by which we may define a mapping to that universal meaning, but it
does not serve itself as the universal identifier of that meaning. 

Likewise, a namespace does not officially identify any ontology or
semantic space, even if it is often used to do so, but only is a
syntactic mechanism by which name collisions are avoided in
the syndication of arbitrary syntactic forms in a given serialization.
There is no requirement whatsoever that a namespace provide any
semantic identity.

Claim 2: A name within a given namespace does not equate to a URI
reference of that name within any content dereferencable from the
namespace URI reference.

I.e. "namespace" + "name" != "namespace#name".

Although one might by coincidence be able to dereference a name
as a fragment within a MIME stream retrievable from the namespace
URI and get something that defines, describes or otherwise relates
to that name within the namespace, no such relationship is defined
to exist between a namespace URI and a name. Furthermore, as a
given namespace may have serializations defined in various schema
formalisms, each potentially having different MIME content types
with potentially different fragment schemes, yet all defining
the same namespace URI and name, there is then potentially a many to
one mapping from namespace and name pair to URI reference into each
of those schema instances.

Furthermore, the XML Namespace spec states that a namespace URI reference
need not be dereferencable to any content, and therefore no particular
fragment syntax can be deduced or inferred for an unknown or undefined
MIME content type.

Claim 3: We cannot use concatenation, suffixation, insertion or
any other method of combining a name with a namespace URI reference
to obtain a compound URI reference without violating the sanctity of
either the URI scheme and/or some MIME content type fragment syntax
space.

This would not be a problem if rdf:about or rdf:resource values
simply needed to be unique strings. However, they are required to
be valid URI references, and therefore there is never a garuntee
that any combination of namespace URI and name will not produce
an invalid URI.

Likewise, it is not possible to reliably re-partition any merged namespace
plus name URI reference back into its namespace and name components
which is necessary for re-serialization of knowledge (see discussion
and examples below regarding bi-directional serialization mapping).

Claim 4: The current methodology employed by RDF to attempt to create
a semantic resource identity by direct concatenation of namespace
and name does not ensure the preservation of the uniqueness of namespace
qualified names.

E.g. Both of the following valid yet distinct syntactic forms
are mapped to the same semantic resource URI, resulting in an
RDF-internal naming collision:

   <x:varovasti xmlns:x="http://x.com/z#aja">
   -> "http://x.com/z#ajavarovasti"

   <x:rovasti xmlns:x="http://x.com/z#ajava">
   -> "http://x.com/z#ajavarovasti"!

The fact that the above example is contrived does in no way
invalidate the fact that the present RDF methodology is 
unreliable and can result in inintended semantic ambiguity
from distinct syntactic forms.

This example, along with the discussion in claim 2 about unclear
re-partitioning of combined URI references, demonstrates the fact that
the uniqueness of a namespace and name pair has three elements:
(1) the unique namespace,
(2) the unique name within that namespace,
and
(3) a distinct boundary between the two.


=== Summary of the Problem ===

We must have an explicit mapping defined between a namespace
and name pair and the univeral semantic identity they are intended
to correspond to or represent. Neither RDF nor RDF Schema (nor DAML)
currently provide this mapping.


=== Proposed Solution to Problem ===

Step 1: Clarify (refine) the interpretation of rdf:ID and rdf:about
as follows:

a) rdf:ID equates to a name within a namespace correlating to the
serialization (syntactic form) of a semantic resource (meaning).
It only equates to the name of an element within the RDF serialization
and not to any semantic resource. 

b) rdf:about equates to a resource within the semantic
space, either abstract or concrete.

Thus, rdf:about values (URI references) become (or already are) the
soul of the Semantic Web, whereas rdf:ID values (namespace and name
pairs) are just a means to an end, serving only the mapping of syntactic
constructs to semantics.

Step 2: Provide for explicit mapping between syntactic forms and
semantic resources. I.e. for mapping rdf:ID values to rdf:about values.

This is achieved by the following two methods:

Mapping method 1: RDF

Add an element rdf:Map to RDF that is used to map syntactic
forms to semantic resources. E.g.

   <rdf:Map rdf:resource="http://purl.org/dc/elements/1.0"
            rdf:ID="date"
            rdf:about="http://dublincore.org/1.0/Date"/>

will result in a syntactic form such as

   <x:date xmlns:x="http://purl.org/dc/elements/1.0">

being equated with the semantic resource 

   "http://dublincore.org/1.0/Date"

Thus, the pair of the namespace specified as the rdf:resource and the
name specified as the rdf:ID are mapped to the semantic resource
specified in rdf:about.

This new construct also provides for mapping of serialized literals
to semantic resources by the inclusion of an rdf:value attribute
along with rdf:resource and rdf:ID, mapping any such literal value
occurring as the sole PCDATA of an element of the specified name
within the specified namespace to the resource specified in the
rdf:about value. E.g.

<rdf:Map rdf:resource="http://metia.nokia.com/MARS/2.1"
         rdf:ID="language"
         rdf:about="name:metia.nokia.com/MARS/2.1/language"/>

<rdf:Map rdf:resource="http://metia.nokia.com/MARS/2.1"
         rdf:ID="language"
         rdf:value="en"
         rdf:about="name:metia.nokia.com/MARS/2.1/language/en"/>

will result in a syntactic form such as

   <x:language xmlns:x="http://metia.nokia.com/MARS/2.1">en</...>

being equated with the predicate and object semantic resources

   pred: "name:metia.nokia.com/MARS/2.1/language"
   obj:  "name:metia.nokia.com/MARS/2.1/language/en"

Note that no namespace prefix is ever used in a rdf:Map definition
as that is unnecessary, and also contrary to the XML Namespace spec
which constrains the siginficance of prefixes to within a given
serialized instance. All that is needed is the pair of namespace
URI reference (rdf:resource) and element (or global attribute)
name (rdf:ID), and optionally, a literal PCDATA string (rdf:value).

Mapping method 2: RDF Schema

For all rdfs:Class declarations, rdf:about becomes manditory as
the identity of the semantic resource and as per the rdf:Map
construct, rdf:ID and (optionally) rdf:resource and rdf:value
are used to define a mapping from syntactic form to semantic
resource. They need not be specified in the rdfs:Class declaration
if either no serialization mapping is needed or it is defined
elsewhere (the expected usual practice).

Thus, in addition to reifying a semantic resource, the rdfs:Class
functions as a synonymous construct for rdf:Map. 

E.g. 

   <rdfs:Class rdf:resource="http://metia.nokia.com/MARS/2.1"
               rdf:ID="language"
               rdf:about="name:metia.nokia.com/MARS/2.1/language"/>

maps a syntactic form such as

   <x:language xmlns:x="http://metia.nokia.com/MARS/2.1">

to the resource

   "name:metia.nokia.com/MARS/2.1/language"

and is equivalent to the following two constructs

   <rdfs:Class rdf:about="name:metia.nokia.com/MARS/2.1/language"/>

   <rdf:Map rdf:resource="http://metia.nokia.com/MARS/2.1"
            rdf:ID="language"
            rdf:about="name:metia.nokia.com/MARS/2.1/language"/>

Or, if the default namespace is defined and is to be used for
serialization names, the following

   <rdf:RDF xmlns="http://metia.nokia.com/MARS/2.1" ...>
      <rdfs:Class rdf:ID="language"
                  rdf:about="name:metia.nokia.com/MARS/2.1/language"/>
   </rdf:RDF>

defines the very same syntactic form to semantic resource mapping
as the first rdf:Class example given above. The value of the default
namespace of the instance is, per the normal XML Namespace behavior,
used as the rdf:resource value of the rdfs:Class declaration. Given
this refinement to RDFS, it is an error if no default namespace is
defined yet an rdf:ID value is defined in a rdfs:Class declaration;
as this would fail to ground the serialization name within any
namespace.


=== Discussion and Examples ===

With this refined interpretation of RDF and RDF Schema, the combined use
of rdf:ID values and rdfs:Class constructs by RDF and RDF Schema simply
provides a built in serialization schema mechanism for cases when stricter
serialization (such as for literal content values) is not needed. Yet
the definition of any serialization does not satisfy the need for mapping
syntactic forms to semantic resources.

The addition of the rdf:Map element provides for such an explicit
mapping; and furthermore, given the additional ability to map literal
PCDATA values to abstract semantic resources, it provides for easier
use of controlled value sets in serializations, which greatly simplifies
as well as reduces the verbosity of serialized instances.

[Example 1:

Given the following definition in an RDF Schema reifying an abstract
semantic concept within a given ontology

   <rdfs:Class rdf:about="name:metia.nokia.com/MARS/2.1/language"/>

and in a separate RDF Schema the following mapping from a particular
syntactic form to that semantic resource

   <rdf:Map rdf:resource="http://metia.nokia.com/MARS/2.1"
            rdf:ID="language"
            rdf:about="name:metia.nokia.com/MARS/2.1/language"/>

then from the following syntactic form

   <x:language xmlns:x="http://metia.nokia.com/MARS/2.1">

we get the predicate

   "name:metia.nokia.com/MARS/2.1/language"

Yet from another serialization model, mapped to the same semantics
for the same ontology

   <rdf:Map rdf:resource="mailto:patrick.stickler@nokia.com"
            rdf:ID="lang"
            rdf:about="name:metia.nokia.com/MARS/2.1/language"/>
   
then from the following syntactic form

   <x:lang xmlns:x="mailto:patrick.stickler@nokia.com">

we get the *same* predicate

   "name:metia.nokia.com/MARS/2.1/language"

Or, yet another alternate, such as for a localized Finnish language
serialization of the same ontology to the common (language neutral,
or unified English language) semantics

   <rdf:Map rdf:resource="http://metia.nokia.com/MARS/2.1/fi"
            rdf:ID="kieli"
            rdf:about="name:metia.nokia.com/MARS/2.1/language"/>

then from the following localized syntactic form

   <x:kieli xmlns:x=http://metia.nokia.com/MARS/2.1/fi">

we still get the *same* predicate

   "name:metia.nokia.com/MARS/2.1/language"
]

[Example 2:

In example 1 above, a separate mapping was defined for each serialization
context (element) of the literal PCDATA value 'fi' to the same semantic
resource.

If all literals for all serializations are expected/required to be from
the same controlled set of literals, then a more global mapping can be
defined using rdf:resource rather than rdf:ID and rdf:resource value
pairs. I.e.

   <rdf:Map rdf:resource="name:metia.nokia.com/MARS/2.1/language"
            rdf:value="fi"
            rdf:about="name:metia.nokia.com/MARS/2.1/language/fi"/>

will now suffice for any element context mapped to the specified
resource and having the sole literal value of 'fi'. I.e. the above does
the work of both of the following, plus any other serialization of
the value 'fi' within any element serialization mapped to the semantic
resource "name:metia.nokia.com/MARS/2.1/language":

   <rdf:Map rdf:resource="http://metia.nokia.com/MARS/2.1"
            rdf:ID="language"
            rdf:value="fi"
            rdf:about="name:metia.nokia.com/MARS/2.1/language/fi"/>

   <rdf:Map rdf:resource="http://metia.nokia.com/MARS/2.1/fi"
            rdf:ID="kieli"
            rdf:value="fi"
            rdf:about="name:metia.nokia.com/MARS/2.1/language/fi"/>

Thus from both of the following alternate serializations

   <x:language xmlns:x=http://metia.nokia.com/MARS/2.1">fi</...>
   <x:kieli xmlns:x=http://metia.nokia.com/MARS/2.1/fi">fi</...>

we get the *same* predicate and object pair

   pred: "name:metia.nokia.com/MARS/2.1/language"
   obj:  "name:metia.nokia.com/MARS/2.1/language/fi"

]

The interpretation of the rdf:resource value within an rdf:Map
declaration depends on whether or not an rdf:ID value exists. If it
does, then the rdf:resource value is a namespace (syntactic)
resource, otherwise it is a semantic resource for which there might
exist one or more other mappings from syntactic forms to that same
rdf:resource value and any such syntactic form acts as a valid context
for the mapping of specified literal value string to the specified
resource.

Any element in a serialization that is not identified by a mapping
(either by an rdf:Map or rdfs:Class construct) is not necessarily
flagged as an error, but it is ambiguous and therefore should be
ignored by the parsing process and not mapped into any triple.

Any literal PCDATA content that is not identified by an rdf:Map
declaration with matching namespace, name and value may be flagged
as an error if the enclosing property has a non-Literal range.

However, in both cases, the parser could be instructed to issue warnings
about all such cases, thus providing exceptionally strict data validation
for all controlled vocabularies with values serialized as data strings
(e.g. 'en' for English, etc.) -- though still not providing any true
validation mechanisms for true literals such as integer, float, date
formats, etc. without additional mechanisms -- though read on ;-)

This proposed rdf:Map construct allows different folks to use different
namespaces for equivalent semantics or when ns URI's change over
time one can still unify all syntactic variants to a single 
consistent semantics -- AND semantics is no longer inseparably
bound to syntactic forms but each system could then map any
serialization to a local set of semantic terms (custom, non-standardized
ontology of resource URIs) and then utilize RDF Schema to map that ontology
to other ontologies. Finally, it allows for RDF interpretations of
non-RDF and legacy serializations with no cooperation of the
defining authority of that serialization nor modification of serialized
content.


=== Regular expression constraints on syntactic literals ===

A final addition to the above methodology provides both the last
missing functionality needed for strict literal data typing
(within the limits of regular expressions) as well as allows
for pattern constraints to be associated not only with the
syntactic form or the resource it is directly mapped to, but
(in conjunction with RDF Schema) with any instance of that
class or any subclass of that target resource.

[Example 3:

If suffixes are possibly allowed for language values (for
dialects), the following definitions map all regional dialects
to the same common language resource (i.e. in this case, we don't
care about dialectal differences, even if specified in the
serialized data):

   <rdf:Map rdf:resource="name:metia.nokia.com/MARS/2.1/language"
            rdf:regex="en(-.*)?"
            rdf:about="name:metia.nokia.com/MARS/2.1/language/en"/>

   <rdf:Map rdf:resource="name:metia.nokia.com/MARS/2.1/language"
            rdf:regex="fi(-.*)?"
            rdf:about="name:metia.nokia.com/MARS/2.1/language/fi"/>

thus both of the following syntactic forms

   <x:language xmlns:x=http://metia.nokia.com/MARS/2.1">en</...>
   <x:language xmlns:x=http://metia.nokia.com/MARS/2.1">en-us</...>

map to the same pair of semantic resources

   pred: "name:metia.nokia.com/MARS/2.1/language"
   obj:  "name:metia.nokia.com/MARS/2.1/language/en"
]

Granted, one could define separate mappings for all of the possible
literal values to the same resource, but the above is much more
concise and clearer. Still, the real use of the rdf:regex extension
is for literal values in serialization that will remain literals
in the triples but for which some degree of validation is needed.

If no rdf:about value is specified in the rdf:Map construct, then the
pattern is simply interpreted as a constraint on the literal value, and
the serialized value still becomes a literal in the triple.

[Example 4:

Let's make sure that integer values for count properties really
are integers (with no irrelevant multi-zero padding of course ;-):

   <rdf:Map rdf:resource="name:metia.nokia.com/MARS/2.1/count"
            rdf:regex="(0)|([1-9][0-9]*)"/>
]

[Example 5:

A percentage is an integer between 0 and 100:

   <rdf:Map rdf:resource="name:metia.nokia.com/MARS/2.1/percentage"
            rdf:regex="(100)|([1-9][0-9])|[0-9]"/>
]

[Example 6:

For the DAML folks ;-) "over 17":

   <rdf:Map rdf:resource="name:metia.nokia.com/MARS/2.1/over17"
            rdf:regex="(1[89])|([2-9][0-9])|([1-9][0-9][0-9]*)"/>

thus given the additional mapping and definition

   <rdf:Map rdf:resource="http://foo.org"
            rdf:ID="age"
            rdf:about="http://foo.org#age"/>

   <rdfs:Class rdf:about="http://foo.org#age"
               rdfs:subClassOf="name:metia.nokia.com/MARS/2.1/over17"/>

and the serialization 

   <x:age xmlns:x="http://foo.org">87</x:age>

we get the following predicate and literal

   pred:  "name:metia.nokia.com/MARS/2.1/age"
   value: "87"

yet from the following

   <x:age xmlns:x="http://metia.nokia.com/MARS/2.1">10</x:age>

we get an error, as "10" is a value attributed to the predicate
"http://foo.org#age" and that is a subclass of
"name:metia.nokia.com/MARS/2.1/over17" and "10" does not pass the
mapping constraint regular expression defined for all instances of
the resource "name:metia.nokia.com/MARS/2.1/over17" or instances
of any of its subclasses.
]

It will be admitted that regular expressions are not as elegant
and readable for some constraints as XML Schema range constraints,
but they should suffice for all RDF serialization needs except
for the trully esoteric (bordering on bizarre) for which there are
likely custom validation functions available or desireable anyway.

Thus, with this final extension to RDF, we don't need XML Schema
(or any other schema solution) for RDF/DAML serialization or data
type validation for most (even nearly all) applications!

In conjunction with RDF Schema, such constraints could be specified
once for a superclass, and then utilized by each subclass without
the need for redefinition for each class that is directly mapped
to from a serialized element. Thus, e.g. any literal value in
a serialization that corresponds to a subclass of 
"name:metia.nokia.com/MARS/2.1/percentage" per the constraint
above, must conform to the specified regex constraint.

This permits one to define ones data types in RDF, rather than
resort to subclassing some XML Schema data type (and just what
does that *mean* to an RDF parser anyway?!)

This rdf:regex extension could be an optional functionality
of an RDF parser/validator, such that without it, everything
works but you just don't trap erroneous/invalid literal values
(as is the case now with present day RDF parsers).


=== Backwards compatibility ===

The above extensions for explicitely defining mappings from
arbitrary namespace + name pairs to resources can be made fully
backward compatible with existing RDF practice by retaining the
current (imperfect/insufficient) mapping of direct concatenation
of namespace to name if no other mapping is defined. Thus,
systems that have thus far worked by luck and the convenience of HTML
fragment syntax and use of http: URLs will continue to work without
modification to data or schemas -- yet new systems or revised
versions of existing systems can take advantage of the new extensions
to more reliably and explicitely address these mapping needs.

The risk of collisions between such semantic resource URI references
and the inability to re-partition them for serialization will of
course remain.


=== RDF as stand-alone bi-directional solution for serialization ===

Mappings having either rdf:value or rdf:ID are fully bi-directional and
can also be used to serialize semantics according to one or more
namespaces!

The only ambiguity that can arise is whether an rdf:ID represents an
element or global attribute.

[Example 7:

A non-RDF savvy agent can request of another RDF savvy agent what
it knows about a given resource and can specify a custom serialization
in which to encode the results by specifying the namespace(s) to use and
there being defined for the namespace(s) the rdf:Map definitions
mapping between semantic and syntactic forms.

I.e. rdf:Map definitions provide for the following mappings

   ns+name         -> resource  ->  ns+name
   ns+name+PCDATA  -> resource  ->  PCDATA

So, given the following SPO triples in our knowledge base

   ("http://foo.com/bar.html",
    "name:metia.nokia.com/MARS/2.1/created",
    '2001-01-29')

   ("http://foo.com/bar.html",
    "name:metia.nokia.com/MARS/2.1/language",
    "name:metia.nokia.com/MARS/2.1/language/en")

   ("http://foo.com/bar.html",
    "http://dublincore.org/1.0/elements/Title",
    'The Tao of Bar')

   ("name:metia.nokia.com/MARS/2.1/title",
    "http://www.w3.org/2000/01/rdf-schema#subPropertyOf",
    "http://purl.org/dc/elements/1.1/title")

and the following mappings

   <rdf:Map rdf:resource="http://metia.nokia.com/MARS/2.1"
            rdf:ID="created"
            rdf:about="name:metia.nokia.com/MARS/2.1/created"/>
   
   <rdf:Map rdf:resource="http://metia.nokia.com/MARS/2.1"
            rdf:ID="language"
            rdf:about="name:metia.nokia.com/MARS/2.1/language"/>
   
   <rdf:Map rdf:resource="http://metia.nokia.com/MARS/2.1"
            rdf:ID="language"
            rdf:value="en"
            rdf:about="name:metia.nokia.com/MARS/2.1/language/en"/>

   <rdf:Map rdf:resource="http://metia.nokia.com/MARS/2.1"
            rdf:ID="title"
            rdf:about="name:metia.nokia.com/MARS/2.1/title"/>
   
then with the requested target namespace "http://metia.nokia.com/MARS/2.1"
we get the desired serialization for that knowledge as

   <rdf:RDF xmlns:ns1="http://metia.nokia.com/MARS/2.1" ...>
      <rdf:Description rdf:about="http://foo.com/bar.html">
         <ns1:created>2001-01-29</ns1:created>
         <ns1:language>en</ns1:language>
         <ns1:title>The Tao of Bar</ns1:title>
      </rdf:Description>
   </rdf:RDF>
]

Note that in the example above the following triple is inferred from
the defined relation between the MARS and DC ontologies via
rdfs:subPropertyOf and a query derived from the serialization mapping
definition for the target namespace

   ("http://foo.com/bar.html",
    "name:metia.nokia.com/MARS/2.1/title",
    "The Tao of Bar")

Thus, the target namespace(s) specified in the query select
a set of mappings, from which are derived a number of RDF
queries for the subject of interest, and all knowledge
about that subject which is retrievable based on those queries
are then included in the serialized response, according to
the mapping definitions.

[Example 8:

If the target namespace is e.g. "http://metia.nokia.com/MARS/2.1/fi"
(the Finnish language version of the above serialization) then with
the alternate serialization mappings to/from the same semantics

   <rdf:Map rdf:resource="http://metia.nokia.com/MARS/2.1/fi"
            rdf:ID="luotu"
            rdf:about="name:metia.nokia.com/MARS/2.1/created"/>

   <rdf:Map rdf:resource="http://metia.nokia.com/MARS/2.1/fi"
            rdf:ID="kieli"
            rdf:about="name:metia.nokia.com/MARS/2.1/language"/>
   
   <rdf:Map rdf:resource="http://metia.nokia.com/MARS/2.1/fi"
            rdf:ID="kieli"
            rdf:value="en"
            rdf:about="name:metia.nokia.com/MARS/2.1/language/en"/>

   <rdf:Map rdf:resource="http://metia.nokia.com/MARS/2.1/fi"
            rdf:ID="nimi"
            rdf:about="name:metia.nokia.com/MARS/2.1/title"/>
   
we get the alternate serialization of the same knowledge

   <rdf:RDF xmlns:ns1="http://metia.nokia.com/MARS/2.1/fi" ...>
      <rdf:Description rdf:about="http://foo.com/bar.html">
         <ns1:luotu>2001-01-29</ns1:luotu>
         <ns1:kieli>en</ns1:kieli>
         <ns1:nimi>The Tao of Bar</ns1:nimi>
      </rdf:Description>
   </rdf:RDF>
]

Note that it doesn't matter what prefix is associated with a
namespace in an instance. So they can just be enumerated
ns1, ns2, ns3 as needed for the serialization. Any parsing
application that bases identification of content based on
QNames is "broken" and not conformant to the NS spec; and
the cost of that short-cut hack will quickly become apparent.


=== Making it all work auto-magically ===

(This final section is not part of the above proposal proper and
is not essential for adoption of the mapping solution as described
above -- but if combined with the above solution would result in
enormous benefit to the SW)

If RDDL instances would be dereferencable from namespace URI references
and provide links to RDF instances defining rdf:Map mappings to one
or more standardized ontologies from serializations grounded in that
namespace, then any arbitrary SW agent has the ability to (potentially)
eat any serialized input whatsoever from any namespace because it would
be able (potentially) to dynamically aquire the knowledge necessary to
map serializations from any arbitrary namespace to (potentially) some
known set of semantic resources.

[Use Case 1:

A SW agent recieves an XML instance that includes the property element
<ao:päivä xmlns:ao="http://sisu.hut.fi/termit/aikaonto.rddl">, but
it is not familiar with that namespace, so it retrieves the
bootstrapping RDDL instance from the namespace URI in the hope that it
can find out enough about that namespace to make use of the data.

Fortunately, the RDDL instance provides a URL to an RDF Schema for
that namespace, and in that schema, the agent learns that the syntactic
form <ao:päivä xmlns:ao="http://sisu.hut.fi/termit/aikaonto.rddl">
corresponds to a semantic resource "http://sisu.hut.fi/termit/ao/pv"
and that that semantic resource is a rdfs:subPropertyOf the
semantic resource "http://purl.org/dc/elements/2.1/date".

Great. Within its own knowledge base, it knows that one of the
properties in its own primary ontology 'urn:partax:foo(created)'
is a rdfs:subPropertyOf "http://purl.org/dc/elements/2.1/date".

Ahhh, now the recieving agent knows what that input content "means"
*and* it can save the new knowledge that it has learned about the
relations of the various semantic resources from the different ontologies
so that next time it encounters any of them, it knows what they mean!
]

The key issue here is that (a) there is a consistent representation for
names defined within namespaces irregardless of any schema or
content serialization format, and (b) an agent is able to retrieve in
a consistent generic manner information about a namespace with
no prior knowledge about that namespace whatsoever.

Though one could concieve of other methods of tying a bootstrapping
instance such as a RDDL instance to a namespace, the use of a URL
pointing to that RDDL instance as the namespace instance works with
existing web mechanisms and needs no additional infrastructure to
be put to use -- and it makes namespace URI references more "logical"
as they relate to a consistent content type. Folks can still use any
arbitrary URI as a namespace identifier and not tie a bootstrapping
instance such as RDDL to it, but then agents that wish to understand
how to deal with data serialized according to that namespace will either
have to have the knowledge hard-coded or have other additional means
of obtaining that knowledge.

The SW cannot benefit the world at large unless it achieves a
critical mass of interchangable knowledge. RDF is one step towards
that goal, but not only must the knowlege be encoded in a consistent
manner it must be *accessible* and *retrievable* in a consistent 
manner in order to meet the distributed, chaotic, scalability requirements
which the nature of the web imposes.

This architecture is like "DNS for the SW". Without such an architecture
and set of mechanisms by which any arbitrary agent can find the information
it needs about any arbitrary namespace (and hence any arbitrary ontology)
it is like using only /etc/hosts files with no DNS in that every agent
must then know what every other agent knows if it is to interact with
maximal effectiveness!

Whether or not the URI of the namespace is the URL of a RDDL instance
bootstrapping that namespace or whether there is some other (URN like)
mechanism for resolving the namespace URI to the RDDL instance is a
secondary issue. What is crucial is that such a mapping exists, either
directly or indirectly, and that agents can access the RDDL instance
as needed for any arbitrary namespace. 

This architecture serves not only semantic needs but, per the
purpose of RDDL, also syntactic needs, by also allowing the agent to
obtain the necessary serialization schemas to validate incoming information
according to the authority of the namespace. E.g. a high capacity,
high quality agent is not going to just trust any old bit of data coming
to it. It will want to go and get the serialization schema and make
sure that e.g. that date is really a valid date according to the schema
and not some bit of program code to insert a virus into the system
or merely insert invalid data of any kind, which might cause its internal
processes to fail -- as they depend on the agents to serve as "data
integrity firewalls" for incoming information.  

Such an architecture, combined with the mapping solution of this
proposal would achieve massive scalability and true generalized
interoperability for the Semantic Web.

-- 

Thththththat's all folks ;-)


I look forward to comments and discussion regarding the above proposal.

Regards,

Patrick


--
Patrick Stickler                      Phone:  +358 3 356 0209
Senior Research Scientist             Mobile: +358 50 483 9453
Software Technology Laboratory        Fax:    +358 7180 35409
Nokia Research Center                 Video:  +358 3 356 0209 / 4227
Visiokatu 1, 33720 Tampere, Finland   Email:  patrick.stickler@nokia.com
 

Received on Sunday, 10 June 2001 15:23:19 UTC