RE: A proposed solution to the RDF syntactic/semantic mapping pro blem (long) from Patrick.Stickler@nokia.com on 2001-06-12 (www-rdf-interest@w3.org from June 2001)

From: <Patrick.Stickler@nokia.com>
Date: Tue, 12 Jun 2001 14:44:48 +0300
To: jborden@mediaone.net, Patrick.Stickler@nokia.com, www-rdf-interest@w3.org
Message-ID: <6D1A8E7871B9D211B3B00008C7490AA507958773@treis03nok>
> -----Original Message-----
> From: ext Jonathan Borden [mailto:jborden@mediaone.net]
> Sent: 11 June, 2001 18:36
> To: Patrick.Stickler@nokia.com; www-rdf-interest@w3.org
> Subject: Re: A proposed solution to the RDF syntactic/semantic mapping
> problem (long)
> 
> 
> Patrick:
> 
> >
> > === Claims ===
> >
> > Claim 1: A namespace and name pair does not constitute any kind of
> > universal semantic identity, only a unique syntactic form which
> > can be associated with some semantic identity.
> 
> Since I have no idea what a "universal semantic identity" is, 
> nor do I know
> if one exists (I strongly suspect that there does not exist 
> 'universal'
> agreement on any of these issues) this statement is probably true.

Yeah. Point taken. Sorry for the imperfect choice of terms. I was
trying to achieve a somewhat discipline-neutral definition.

By, "univeral semantic identity" I mean the "concept" for which the 
namespace + name pair serves as potentially one of many possible 
identifying "signs".

It is unlikely that we wish to restrict the set of URIs which can 
act as signs of concepts to only those URIs which can be constructed
by the concatenation of namespace URI and name -- simply because
we use XML as our serialization mechanism. Furthermore, any given
serialization model (DTD/Schema) may represent a localized or
custom syntactic representation for an agreed intersection of 
semantics shared by other differing syntactic representations.

> >
> > Although names within namespaces do serve to differentiate content
> > which is attributed meaning, and that meaning is typically (though
> > not necessarily) suggested by the linguistic properties of 
> that name,
> > the syntactic form selected for any particular serialization is
> > local to that serialization and many syntactic forms may map to the
> > same common semantics.
> 
> Yet XML Schema, for example, uses QNames not URIs to denote 
> types, so really
> depending on the application, either a QName or a URI may be 
> the primary
> means of identifying some 'thing'.

I don't see how you interpret the use of QNames as not being 
the use of "namespace#name" forms, as that is the interpretation
imposed upon QNames by XML Schema. Furthermore, one can argue
that XML Schema has an implicit assumption that any QName 
referenced in a schema in in fact a URI reference into the same
or some other XML Schema instance, and *not* into any arbitrary
resource dereferencable from some combination of namespace URI
and name.
 
Furthermore, QName prefixes only have meaning within a single
instance (or within the scope of a single element if defined
for that element) and therefore cannot serve as identifiers
beyond such syntactic boundaries.

> > The syntactic form provides a mechanism
> > by which we may define a mapping to that universal meaning, but it
> > does not serve itself as the universal identifier of that meaning.
> 
> Again, not sure how a URI universally identifies a "meaning". 
> It identifies
> a resource but isn't it the point of an ontology/schema etc 
> to define a
> "meaning"?

Sorry that this argument isn't so clear. I'll try again...

There are vocabularies and then there are vocabularies. The set of
vocabularies which can be encoded using arbitrary URI references
is a superset of the vocabularies that can be encoded using
namespace plus name pairs -- *if* those are to be concatenated into
a single URI (reference); because there are URI schemes which do
not lend themselves to direct concatenation, nor is direct
concatenation guarunteed to produce a valid URI according to the
URI scheme syntax or possible MIME content type fragment syntax.

Thus, a large consortium of persons/organizations may wish to use a 
URI scheme (e.g. a URN scheme) that is not compatible with namespace
plus name concatenation in order to define a common vocabulary (ontology)
of abstract concepts (semantics) to serve as a point of intersection
between a disparate set of serialization vocabularies, for the purpose
of knowledge interchange and interoperability.

Thus any given namespace plus name pair in any given serialization does not 
constitute the common meaning that that syntactic form serves to 
represent, but must be mapped to that common "sign" associated with
the abstract concept.

> >
> > Claim 2: A name within a given namespace does not equate to a URI
> > reference of that name within any content dereferencable from the
> > namespace URI reference.
> >
> > I.e. "namespace" + "name" != "namespace#name".
> 
> I suppose it depends on what you expect the "name" to reference.
> 
> I consider this a bug not a feature.

Eh? A fragment in a URI reference is specific to the MIME content
type of the data that is accessible from the URI. That means that
any ontology defined using signs which are URI references constructed
by the combination of namespace URI and name with intervening #
are bound to the syntax of a given MIME content type. 

Furthermore, just how do you handle clearly broken URI refs such as
the following:

"http://foo.com/bar.html#boo" + "bas" -> "http://foo.com/bar.html#boo#bas"

Eh?

I again assert: "namespace" + "name" != "namespace#name"
 
> ... Furthermore, as a
> > given namespace may have serializations defined in various schema
> > formalisms, each potentially having different MIME content types
> > with potentially different fragment schemes, yet all defining
> > the same namespace URI and name, there is then potentially a many to
> > one mapping from namespace and name pair to URI reference into each
> > of those schema instances.
> 
> This is a mess.

And the mess is because, due to the fact that most folks equate URI to
URL and URL to HTTP URL and furthermore sincerely wanting and needing
that namespace URIs actually dereference to something recognizable and
concrete, they assumed that "namespace" + "name" == "namespace#name"
and that "namespace" is a URL and *not* a URL reference.

And to make RDF work, added the hack "{URL}#" suffixing the '#' on the
end so that the concatenation would create (presumably but unreliably)
a URL reference that might be dereferencable.

Yes. The real situation is a mess -- but only because the presumed
automatic mapping of namespace and name to some combined URI does
not in fact work for arbitrary namespace URI references and arbitrary
URI scheme and MIME content type fragment syntaxes.

We just need to add the explicit mapping mechanism that *does* work.

> >
> > Claim 3: We cannot use concatenation, suffixation, insertion or
> > any other method of combining a name with a namespace URI reference
> > to obtain a compound URI reference without violating the sanctity of
> > either the URI scheme and/or some MIME content type fragment syntax
> > space.
> 
> What sanctity? We need to define practical and interoperable 
> ways of dealing
> with QNames and URIs. The _goal_ is to create systems that 
> work, not to
> maintain URIs and RFC 2396 on a pedestal, even when that 
> pedestal is sitting
> right in the middle of the Santa Monica freeway -- or I-93.

As I said, if RDF just wants unique strings and doesn't demand
valid URIs, OK, no problem with "invalid" URIs

*BUT* that means that RDF must provide some *other* means to
ensure unique strings! *AND* no RDF/SW application can presume
that those strings are anything but opaque, and should not
expect them to be dereferencable or meaningful to any web
application or protocol that knows about certain URI schemes.

If you can get the RDF spec changed thus, more power to ya ;-)

I don't think abandonment of URIs for RDF resource identity
would be a good think (I actually think it would be catastrophic).

I also don't think that maintaining URIs in RDF is blind dogmatism
either.

And the freeway/interstate isn't built yet, so let's not tear
down the pedastal if we can build the road around it eh?

> > Claim 4: The current methodology employed by RDF to attempt 
> to create
> > a semantic resource identity by direct concatenation of namespace
> > and name does not ensure the preservation of the uniqueness 
> of namespace
> > qualified names.
> 
> agreed.
> 
> >
> > This example, along with the discussion in claim 2 about unclear
> > re-partitioning of combined URI references, demonstrates 
> the fact that
> > the uniqueness of a namespace and name pair has three elements:
> > (1) the unique namespace,
> > (2) the unique name within that namespace,
> > and
> > (3) a distinct boundary between the two.
> 
> agreed.
> 
> >
> > Step 2: Provide for explicit mapping between syntactic forms and
> > semantic resources. I.e. for mapping rdf:ID values to 
> rdf:about values.
> >
> > This is achieved by the following two methods:
> >
> > Mapping method 1: RDF
> 
> Why not "daml:equivalentTo" or "rdfs:isDefinedBy"?

Firstly, the syntactic to semantic mapping (i.e. serialization
to triples) is IMO the domain of RDF, not RDF Schema or DAML
and therefore should be fundamental to the RDF spec and the
solution embodied in every compliant RDF parser.

Secondly, we must define a mapping from two distinct (possibly
three, given literal to resource mapping) syntactic components
to a single semantic resource: 

1. namespace
2. name
3. PCDATA

Since the whole problem is that there *isn't* yet a single
resource identifying the "sign" comprised of the above three
components, just how do you use daml:equivalentTo or 
rdfs:isDefinedBy?!

One needs a construct such as the proposed rdf:Map element
that binds the multiple syntactic components to a single
resource identity. Until that is done, RDF Schema and
DAML (or any other valid RDF ontology) are useless. Eh?

RDF Schema (with the exception of convenience overlap with
the proposed rdf:Map construct) and DAML are firmly and
completely within the domain of triples -- not serializations.

If you don't know the URI reference of the resource in question,
you can't say things about it with RDF Schema, DAML or any other
ontology.

My proposal addresses the mapping of complex, multi-component
serialized syntactic forms (signs) for concepts to single,
monolithic (and likely standardized) forms (signs) for those
same concepts within an RDF knowledge base of triples.

This should happen well before RDF Schema and DAML come into play.

> Isn't that the role of an ontology?

An ontology is a vocabulary and (optionally) relations between
members of that vocabulary, right?

Ontologies within the realm of RDF Schema, DAML, etc. require
single, monolithic, valid URI references acting as the members
of the vocabulary of that ontology, right?

But the logical, compound construct (namespace(name)) or 
(namespace(name(PCDATA))) as provided by XML serializations
utilizing XML Namespaces are *not* single, monolithic, valid
URI references.

Therefore, they *cannot* serve as members of any vocabulary
for any ontology that would be suitable for RDF, RDF Schema,
DAML, etc. etc.

There is needed an explicit, consistent, standardized mechanism
for mapping serialized vocabulary constructs to URI references
for the resources they represent.

Whether or not that mechanism resembles what I proposed, or is
something totally different, it presently does not exist and
absolutely must exist, and soon.
 
> >
> > Mapping method 2: RDF Schema
> 
> [snip]
> 
> I agree that this mapping is needed. I would prefer to see 
> such a mechanism
> within RDFS/DAML.

But, as I argue above, the mapping from serialized forms to
triples is the exclusive domain of RDF and by addressing it
within a higher layer, we further confuse the issue, and
also complicate those higher layers unnecessarily.

> [snip]
> >
> > === Regular expression constraints on syntactic literals ===
> 
> this seems an extension of RDF aboutEachPrefix... an 
> interesting idea and I
> can certainly see how it might be useful but given the problems that
> aboutEachPrefix has had in gaining traction, it would be hard 
> getting this
> accepted.

Yes. I think you may be right on that point. 

The specification of a PCDATA literal in a mapping is clearly
necessary as part of the syntax/semantics interface; but regular
expressions are either simply a syntactic convenience to avoid
having to enumerate multiple literal mappings or constraints
similar to aboutEachPrefix such as for the purposes of data
checking.

This is something that needs more thought/discussion....

Cheers,

Patrick

--
Patrick Stickler                      Phone:  +358 3 356 0209
Senior Research Scientist             Mobile: +358 50 483 9453
Software Technology Laboratory        Fax:    +358 7180 35409
Nokia Research Center                 Video:  +358 3 356 0209 / 4227
Visiokatu 1, 33720 Tampere, Finland   Email:  patrick.stickler@nokia.com
Received on Tuesday, 12 June 2001 07:45:08 UTC