RE: Datatyping Summary from Dan Connolly on 2002-01-29 (w3c-rdfcore-wg@w3.org from January 2002)

From: Dan Connolly <connolly@w3.org>
Date: 29 Jan 2002 09:24:41 -0600
To: Jeremy Carroll <jjc@hplb.hpl.hp.com>
Cc: Graham Klyne <Graham.Klyne@MIMEsweeper.com>, Brian McBride <bwm@hplb.hpl.hp.com>, RDF core WG <w3c-rdfcore-wg@w3.org>
Message-Id: <1012317882.3709.212.camel@dirk>
On Tue, 2002-01-29 at 06:24, Jeremy Carroll wrote:
> 
> This message addresses the main criticisms of TDL.

Hmm... perhaps it amplifies it ;-)

> I will follow up with more detail concerning query, Brian's B3 & B4.
> 
> The proponents of S furnish us with an implementation of S, and a model
> theory for S (which includes, naturally self-entailment).
> 
> I now can create an implementation of TDL in the following fashion.
> 
> As I read in any RDF graph I apply the following syntactic transformation.
> 
> Match:
>   ?x  ?y ?z
>   where ?y != rdf:value and
>         ?z a literal node
> 
>   replace with
>   ?x ?y NewNode
>   NewNode rdf:value ?z
> 
>   where NewNode is a newly minted bNode.

Yes, exactly; that's an implementation technique we (in SWAD)
considered.

For one thing, it seems ugly. But that's a matter of taste,
not one I'd expect folks to find compelling.

But for another, it looks like a HUGE change to RDF 1.0!
The whole point of S is that it's *not* a change to RDF 1.0;
it's a set of properties and classes layered on top
of (what I think is) the simplest and most common
reading of RDF 1.0.

If I write
	<http://www.w3.org/> dc:title "W3C".
you propose to hold me accountable for something that I
don't think I said, namely:
	<http://www.w3.org/> dc:title _:x.
	_:x rdf:value "W3C".

At a minimum, we would need to use some new name... something
other than rdf:value. Folks are already using rdf:value to
mean something, and we can't conclude that they mean the above.
(after all, RDF M&S 1.0 specifies that rdf:value is to
be used to model n-ary relations. go figure.)

Another thing: when you say ?y != rdf:value do you mean
the syntactic term isn't rdf:value? or do you mean
that the property isn't the same property as what rdf:value denotes?
If you mean the latter, then it's a theorem-prooving
excercise to decide. If you mean the former, then rdf:value
becomes special syntax; i.e. folks can't expect
	my:value rdfs:subPropertyOf rdf:value.
	my:thing my:attribute [ my:value "abc"].
to pass the ?y != rdf:value test.

Another thing...

> For example:
> 
> <a> <foo> "ss" .
> 
> is transformed to
> 
> <a> <foo> _:b.
> _:b <rdf:value> "ss".

What does the bit under "transformed to" mean? i.e.
how are we to read
	_:b <rdf:value> "ss".
?

Is it turtles all the way down? ;-)
Seriously: with S, the basic building-blocks for
documents are strings and URIs. That seems pretty fundamental.
Without the ability to know, at a glance, that "abc" = "abc"
and "abc" != "xyz", getting off the ground becomes really
painful.

You write "As I read in any RDF graph..."; perhaps you
mean to change the relationship between RDF/xml and n-triples,
i.e. re-do a buch of test results?

Or perhaps you're just proving that TDL is implementable,
and we shouldn't take the proof construction method too
seriously?

> We then use the S implementation and S model theory (idiom S-P is the only
> idiom used).
> 
> Hence:
>   If S is implementable then so is TDL

Yes, I'm compelled to accept that...

>   The maximum overhead required for TDL is the same as that for S idiom A
> and/or S idiom P.

All the S idioms are zero-cost to the RDF 1.0 parser implementor.
Idioms A and P cost the document-writers a bit, but they don't
impact parser writers at all.

Perhaps I don't know what you mean by an implementation.
Is it different from an RDF 1.0 parser implementation?
If so, would you please give me a test that a datatypes-capable
implementation is expected to pass that an RDF 1.0
parser isn't required to pass? Maybe something like this?

	ex:age rdfs:range dt:decimal.
	ex:somebody ex:age "abc".

... where a datatypes implementation would be expected to complain
because that entails
	"abc" rdf:type dt:decimal.
but we know that "abc" isn't a decimal literal. Something like that?
Note that it's a theorem-proving excercise to find all such
contradictions. Not that theorem-proving is hard with RDF/RDFS,
but still, it's not dead-simple. And with WebOnt layered on
top, I expect it will be quite onerous.


> All problems to do with entailment, query, implication, etc. are clarified
> and addressed with this process (as long as they are clear and addressed
> with S).

Hmm... self-entailment seems to be handled, but not query.
Consider Sergey's example:

  _:f <rdf:type> <film> .
  _:f <dc:Title> "10" .
  <mary> <age> "10" .

By the transformation, that becomes

  _:f <rdf:type> <film> .
  _:f <dc:Title> _:gen1. _:gen1 rdf:value "10" .
  <mary> <age> _:gen2. _:gen2 rdf:value "10" .

It's no longer valid to conclude that

  (?x <dc:Title> ?y) & (?z <age> ?y)

is satisfiable.


> >From an implementators point of view, it is clearly easier to implement the
> syntactic transformation and S-P, than to implement S-A, S-B and S-P.

That is not clear to me at all.

As I say: as an implementor (of a parser and rules engine) I
get all the S idioms for free. The only cost is to the document
authors.

> Graham, does this adequately address your concern about self-entailment?

It does address the self-entailment issue, I suppose.

But the query issue seems just as important.

To me, it comes down to this: In the RDF community, do folks
expect that "abc" always denotes the same thing as "abc"?
I looked at the Jena source, and it seems to.
The squish, rql, rdfdb and other query languages seem to.

That's why I objected to the DAML design; it undermines
a popular assumption in the RDF community. (not to
mention that I find it ugly that we can't use
strings and URIs as the basic building blocks
for knowledge exchange).


> [Small technical detail:
> 
> S-P uses a closed world assumption on data types, whereas TDL uses an open
> world assumption. The two can be made equivalent by using S-P with at least
> two incompatible types in its closed world  both having domain being the
> complete set of unicode strings. Two such types are:
> 
> xsd:string = { < x, x > | for any unicode string x }
> appendA    = { < x, x."A" > | for any unicode string, . being string
> concatenation }
> 
> ]

I don't understand that. I don't see any closed-world assumptions in S.

-- 
Dan Connolly, W3C http://www.w3.org/People/Connolly/
Received on Tuesday, 29 January 2002 10:25:42 UTC