Re: datatypes message - draft 2

On 2002-06-26 20:38, "ext Brian McBride" <bwm@hplb.hpl.hp.com> wrote:


In the introduction, it would be good, I think, to add a comment
that the tidy/untidy issue has little to do with implementational
efficiency of triples stores, but is a conceptual distinction
relating to the graph semantics and that there are numerous ways
for actual implementations to compress untidyness in the conceptual
graph in an actual triples store, etc.

This issue seems to always crop up when you mention untidyness, and
thus the key issue can get lost in the subsequent noise.

>  _:a     <xsdr:decimal> "10" .

Again, if we could please use 'xsd:'.

'xsdr:' does not help the clarity of the inquiry. Folks
will wonder what the relationship between xsdr:integer
and xsd:integer is, and we'll waste time trying to clarify
the issue.

We're trying to make a determination about the views of the
community on tidyness/untidyness, not whether 'xsd:' can
be used for datatype properties. That is a separate issue
(and one I think has already been decided upon).

Otherwise, the intro looks very good.

> Test A:
> 
>  <Jenny> <ageInYears> "10" .
>  <John>  <ageInYears> "10" .
> 
> Should an RDF processor conclude that the value of the ageInYears
> properties for Jenny and John are the same?

This test should be removed because it is true no matter whether
we have tidy or untidy literals. It does not help clarify the
tidy/untidy issue and thus serves no useful purpose.

> Test B:
> 
>  <Jenny> <ageInYears> "10" .
>  <Jenny> <testScore>  "10" .
> 
> Should an RDF processor conclude that the value of Jenny's ageInYears
> property is the same as the value of Jenny's testScore property?
> 
> Note that this question only relates to the situation where there are no
> range constraints.  Given compatible range constraints on the properties,
> there is no difficulty concluding that the answer is yes.

I still feel that this test does not help clearly and easily decide
between tidy/untidy. There is no technical distinction whatsoever
between Test B and C. Test C does everything that test B does, and
does so in a way that intuitions will see the distinction between
tidy and untidy semantics far more clearly.

Folks that don't fully grok the effect of global typing will intuitively
consider both of the above to denote the integer 10, and therefore
will not be able to make any determination for tidy vs. untidy
semantics.

And then, the decision between B and D could be influenced by the
additional syntactic size of D, with folks choosing B because it
appears to be "simpler" not because they have fully groked the
tidy/untidy distinction.

I recommend that this test case be removed.

> Test C:
> 
>  <Jenny> <ageInYears>   "10" .
>  <Film>  <title>        "10" .
> 
> Should an RDF processor conclude that the value of Jenny's age property is
> the same as the value of the Film's title property?  If the value the
> <ageInYears> property is an integer, and the value of the <title> property
> is a string, they are not the same thing and are thus not equal.
>
> Again this question only relates to the situation where there are no range
> constraints on the properties.

OK to here.

> Given the appropriate range constraints on
> the properties, the answer is clearly no.

Not quite. Who's to say that I can't define the range of ageInYears as
xsd:string? I may consider such a range constraint "appropriate".

This should probably read "Given incompatable range constraints on the
properties, the answer is clearly no."
 
> Test D:
> 
>  <Jenny>      <ageInYears> "10" .
>  <ageInYears> rdfs:range xsd:decimal .
> 
>  <John>  <ageInYears>   _:a .
>  _:a     xsdr:decimal   "10" .
> 
> Should an RDF processor conclude that Jenny and John have the same
> age?  [Note: in this example the range constraint is expressed using
> rdfs:range.  We may have to introduce a special datatyping range property,
> but that is an independent detail for now.]

Good.

> It is not possible to have the answers to Test B, Test C and Test D all be
> yes.  B and C must also have the same answer.  Either B and C can be yes or
> D can be yes.  We have to decide which of these is the most important to
> have; (B and C) or D.

Folks will likely wonder here "What about A? Where does that fit in?"

I think this inquiry will be alot easier for folks to answer if there
is a simple binary choice between C and D, which together sufficiently
capture the issue of tidy vs. untidy semantics.


> WHY THESE TEST CASES MATTER
> ===========================
> 
> The formal semantics can define the meaning of a literal in one of two ways:
> 
>  tidy) the <ageInYears> property takes a value which is a numeral, i.e. a
> string
> 
>  untidy) the <ageInYears> property takes a value which is some datatype
> value whose string  representation is "10", but without further
> information, such as
> a range constraint, we can't tell exactly what the value is, e.g. the
> string might be in octal.

Should we consider using more neutral terms than tidy and untidy, such
as 'constant' and 'contextual'?

'Untidy' has a rather negative connotation.

> If we choose the tidy option, the object of the statement is always a
> string, which means that in:
> 
>  <Jenny> <ageInYears> "10" .
>  <Film>  <title>      "10" .
> 
> the values of the two properties are the same; they are both the STRING "10".
> 
> If we choose the untidy option, the value of the object object of the
> statement is unknown from this statement alone; a range constraint is
> required to determine the value from the literal string:
> 
>  <jenny>      <ageInYears> "10" .
>  <ageInYears> <rdfs:range> <xsd:decimal> .
> 
> With a range constraint, we can know that the object of the property is the
> integer 10.
> 
> CONCLUSION
> ==========
> 
> To end then, please send a message to www-rdf-comments@w3.org (by 12 July
> 2002) indicating whether you believe its more important to have the answer
> to test case B be yes, or test case D be yes:
> 
>  Test B:
> 
>  <Jenny> <ageInYears> "10" .
>  <Jenny> <testScore>  "10" .
> 
> Test D:
> 
>  <Jenny>      <ageInYears> "10" .
>  <ageInYears> <rdfs:range> <xsd:decimal> .
> 
>  <John>  <ageInYears>      _:a .
>  _:a     <xsdr:decimal>   "10" .
> 
> 
> We would also like to know the reasons for this preference.
> 
> Brian McBride
> on behalf of the RDFCore WG

Good (though again, B should be C)

Patrick 

--
               
Patrick Stickler              Phone: +358 50 483 9453
Senior Research Scientist     Fax:   +358 7180 35409
Nokia Research Center         Email: patrick.stickler@nokia.com

Received on Thursday, 27 June 2002 02:36:01 UTC