Re: DECIDED: untidy semantics from Patrick Stickler on 2002-09-23 (w3c-rdfcore-wg@w3.org from September 2002)

From: Patrick Stickler <patrick.stickler@nokia.com>
Date: Mon, 23 Sep 2002 13:19:55 +0300
To: "ext Jeremy Carroll" <jjc@hplb.hpl.hp.com>, "Jan Grant" <Jan.Grant@bristol.ac.uk>, "Jos De_Roo" <jos.deroo.jd@belgium.agfa.com>
Cc: "w3c-rdfcore-wg" <w3c-rdfcore-wg@w3.org>
Message-ID: <006201c262ea$c34d0830$d74416ac@NOE.Nokia.com>

[Patrick Stickler, Nokia/Finland, (+358 50) 483 9453, patrick.stickler@nokia.com]

> Or alternatively by use of tidy syntax and tidy semantics at the abstract
> level and a simple transform to add the bnode as we read the RDF/XML (from
> my ...)
> 
> http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2002Jan/0369.html
> [[[
> Match:
>   ?x  ?y ?z
>   where ?y != rdf:value and
>         ?z a literal node
> 
>   replace with
>   ?x ?y NewNode
>   NewNode rdf:value ?z
> 
>   where NewNode is a newly minted bNode.
> 
> For example:
> 
> <a> <foo> "ss" .
> 
> is transformed to
> 
> <a> <foo> _:b.
> _:b <rdf:value> "ss".
> ]]]

Essentially, this is what the implicitly typed literal approach
is doing, in that

   <a> <foo> <xyz:blargh>"ss"

is analogous to

   <a> <foo> _:c .
   _:c rdf:type xyz:blarg .
   _:c rdf:value "ss" .

and for implicitly typed (inline) literal the

   <a> <foo> _:b .
   _:b rdf:value "ss" .

is just compressed to 
   
   <a> <foo> _:b"10" .

to capture the untidy semantics in a single node rather
than an additional triple.

Since one of the key motivations of explicitly typed literal nodes 
was the compression of datatyping indioms into single nodes, I
think it makes sense to do the same for implicitly typed
literals.

After all, with untidy semantics, all literals are typed. Just for
some (inline literals), the datatype is implicit in the context of
the occurrence of the literal. So why not use a form of representation
that is consistent for all typed literals, whether the typing is
explicit or implicit?

This also means that APIs/software that define node equality based on
label equality, or which merge string-equal labeled nodes together, need
not change at all, once each occurrence of an inline literal is provided 
a unique label by the parser and the semantic untidyness is "protected"
from legacy presumptions about label-equal inline literals by the now 
unique node labels, functions will behave as expected (though some
earlier entailments may no longer hold -- as this clarification on
the untidyness of inline literal semantics is essentially saying
that they were not valid entailments in the first place).

The interpretation, then, of literal nodes, with implicit or explicit
typing, is the same: for each value to compare, (a) determine the datatype, 
if implicit and (b) if the datatype is supported, determine the value; and
if the values are obtained then compare the values.

If either the datatype cannot be determined, or is not supported, then
the comparision cannot be made. That's not the same as inequality, of
course; equality or inequality simply cannot be determined in such a
case.

As I've pointed out before, there is a strong parallel between the
syntactic and semantic interpretation of datatyped literals and
case-variant URIrefs. Just as http://foo.com/bar and HTTP://FOO.COM/bar
will be syntactically distinct and, to the core RDF MT semantically
distinct, yet applications with specific knowledge about the http:
URI scheme can infer that the above two URIrefs denote the same resource,
and are in fact semantically equivalent; likewise _:a"xx" and _:b"xx",
or <foo:bar>"xx" and <foo:bar>"xxxx" (or even <foo:bar>"xx" and 
<foo:blargh>"xx") will be syntactically distinct, and to the core RDF MT 
semantically distinct, yet applications with specific knowlege about the 
datatypes might infer that they denote the same value and are in fact 
semantically equivalent.

In both cases, presuming unique systemID node labels for inline literals,
the string-equality of node labels or intersection of nodes in 
the abstract graph syntax does reflect equality of denotation, but the 
string-inequality of node labels does not necessarily reflect inequality
of denotation. Thus, insofar as syntax-based comparisons are concerned,
all that can be determined is equality, but never inequality. And that
is true regardless of the node lable: URIref, systemID, datatype+literal,
or systemID+literal.

APIs which suggest, or applications which presume, that syntactic inequality
means semantic inequality need to be fixed accordingly.

Patrick

Received on Monday, 23 September 2002 06:20:04 UTC