Re: datatypes message - draft 2

>On Fri, 2002-06-28 at 00:36, pat hayes wrote:
>>  >On Thu, 2002-06-27 at 01:40, Patrick Stickler wrote:
>>  >>
>>  >>
>>  >>  On 2002-06-26 20:38, "ext Brian McBride" <bwm@hplb.hpl.hp.com> wrote:
>>  >>
>>  >>
>>  >>  In the introduction, it would be good, I think, to add a comment
>>  >>  that the tidy/untidy issue has little to do with implementational
>>  >>  efficiency of triples stores,
>>  >
>>  >not true; see comments from Sergey a long time ago about
>>  >the cost of untidy literals in RDF/database systems.
>>  >
>>  >Sorry I don't have time to find it.
>>
>>  I recall. But Sergey was being alarmist about a worst-possible case
>>  where every triple contained a literal
>
>ok, density of literals is perhaps not that high, though
>they're quite common, but...
>
>>  and every literal might or
>>  might not be the same as any other.
>
>... that's the definition of untidy literals, no?
>
>You can't tell whether two literals denote the same thing,
>so you have to keep them separate until you know more.
>
>>  That will never happen.
>
>
>>  >The swap/cwm implementation would suffer a significant
>>  >efficiency hit if literals weren't tidy.
>>
>>  Im not convinced of this. Sure, the code would get a bit more
>>  complicated, but I bet that any efficiency cost could be kept
>>  marginal, way below a linear factor in practice. If the graphs are
>>  encoded in the extended Ntriples convention then there should be no
>>  extra cost at all: literal nodes act just like bnodes, uniquely
>>  specified by their nodeIDs.
>
>???
>
>That looks like tidy literals, to me; i.e. if literals
>work like bnodes, then just as
>
>	:Mary :age _:x.
>	:movie :title _x.
>
>entails
>
>	:Mary :age _:y.
>	:movie :title _y.
>
>we would have
>
>	:Mary :age "10".
>	:movie :title "10".
>
>entials
>
>	:Mary :age _:y.
>	:movie :title _y.

Well, what I meant was that

:Mary :age _:y "10" .
:movie :title_:y "10" .

would entail that (just rub out the literal label), but

:Mary :age _:x "10" .
:movie :title_:y "10" .

would not. Without the extra tags, the Ntriples are ambiguous: you 
have to know the graph structure. My suggestion is that the 
convention we adopt is that 'plain' literals in Ntriples , as in your 
examples, are unique to the triple in which they occur.

>If literals aren't tidy, then we have to keep the two
>occurences of "10" from matching each other.
>
>We end up with nothing in our language that work
>like integer numerals:
>	x=y => s(x) = s(y)
>and
>	0 <> s(x)
>i.e. a (countably) infinte supply if expressions where
>      same expression => same denotation,
>different expression => different denotation

Well, the strings themselves are unique and can be compared, 
differences detected, and so on. The issue is the extent to which we 
can assume that unique expressions denote distinct things. The 
central problem, seems to me, is that we don't have the freedom 
(which any rational person who knows arithmetic has) to interpret 
"10" as denoting ten. Having to be sensitive to datatyping kind of 
ties our hands here: indeed, we *can't* assume that.

>  > If they are encoded in RDF/XML then the
>>  only cases which arise will be those that we introduce deliberately
>>  to handle some XML idioms, which can be recognized during
>>  round-tripping one triple at a time by looking at the property
>>  urirefs. That is just a few extra operations per triple, at worst.
>
>I don't follow that part.

What I meant was that cases like the ones I suggested in my long 
message for handling XML lang tags, eg

_:x "10" rdf:xmlLang "FR" .

Pat
-- 
---------------------------------------------------------------------
IHMC					(850)434 8903   home
40 South Alcaniz St.			(850)202 4416   office
Pensacola,  FL 32501			(850)202 4440   fax
phayes@ai.uwf.edu 
http://www.coginst.uwf.edu/~phayes

Received on Friday, 28 June 2002 10:46:59 UTC