RE: Answer to the question: What is a "value" to RDF from Patrick.Stickler@nokia.com on 2001-11-21 (w3c-rdfcore-wg@w3.org from November 2001)

From: <Patrick.Stickler@nokia.com>
Date: Wed, 21 Nov 2001 09:08:53 +0200
To: phayes@ai.uwf.edu
Cc: w3c-rdfcore-wg@w3.org
Message-ID: <2BF0AD29BC31FE46B78877321144043114C0BE@trebe003.NOE.Nokia.com>
> What the S and DC (and URV) proposals do keep simple is 
> the idea that RDF graphs can be tidy on literal nodes as well as on 
> uriref nodes, which would indeed allow the RDF graph syntax to be 
> stated more concisely since there would be (as Dan C. has noted) no 
> need to bother with the distinction between nodes and labels.

Then I point you to my latest recommendation

http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2001Nov/0579.html

which, in a nutshell, proposes that a combination of the 
P (not P++), U, and DC (with slightly different vocabulary) 
proposals be adopted as equivalent representations for asserting 
the same pairing of data type with lexical form.

If folks need maximal graph compression such that all typed
literals participate in tidying operations, then use URVs.

The S proposal, however interesting and whatever the apparent
positive qualities, just raises too many questions and issues
and may very well break more things than it fixes.

There is also the risk (a big one IMO) that such a radical
change at this point in time will undermine much of the recent
positive perception and adoption of RDF -- if suddenly folks
have to start thinking differently about how they are doing
data typing -- and have to go and convert all their data
to use e.g. properties rather than anon node idioms or range
constraints (both of which are precluded by the S proposal).

> >such that the S treatment is preferred because it is (supposedly)
> >easier to define in the MT but it does not reflect common usage
> >or present definition of the RDF graph model or intuitions about
> >the purpose and semantics of terms such as rdfs:range, 
> rdfs:subClassOf,
> >  or rdfs:subPropertyOf.
> 
> I think it respects all of this except maybe current usage. 

Which is a *huge* issue, no?

> The S 
> proposal doesn't require us to revise ...
> the meanings of any of the rdfs vocabulary.

Sure it does. It says that I can't use rdfs:range to assign type
to the value of a "data type property". Thus the definition of
rdfs:range has to be modified to state for which "types" of properties
it should not or cannot be used.

Earlier, I offered an example like:

   x ex:age "10" .
   ex:age subPropertyOf xsd:integer .
   ex:age rdfs:range xsd:integer .

and was told it was "wrong" to use rdfs:range because xsd:integer, according
to the S proposal, is a property, not a type! 

Yet current usage such as the following

   x ex:age [ rdf:value "10", rdf:type xsd:integer ]

or

   x ex:age "10" .
   ex:age rdfs:range xsd:integer .

declares that xsd:integer is a type class and thus a legitimate
value for an rdfs:range constraint -- and if the S proposal
precludes that, then the S proposal is unacceptable on that
point alone.

> >And the present idiom based on anonymous nodes is IMO much
> >clearer and accomplishes the same purpose but does so per
> >the present RDF "tradition" without mucking up type and
> >property distinctions:
> >
> >    xxx --rdf:type---> foo:date .
> >    xxx --rdf:value--> "2001-11-29" .
> >
> >Thus, the anonymous node (bNode) denotes the value, and it
> >has properties for type and lexical form, and thus acts
> >as the identity in the graph for that pairing.
> 
> That is the DC idiom. 

Essentially, yes, though with slightly different vocabulary.
The DC idiom as proposed uses rdf:label rather than rdf:value.

> But that has some severe problems of its own, 
> as we have already noted in earlier discussions, and I was under the 
> impression that the S proposal was generally considered superior.

Perhaps superior in some respects, yes, but I don't consider
it a superior solution taking into account issues such as common
perception and usage, or risk of unknown impact or conflict with
current RDF mechanisms -- in which case it is one of the least 
suitable proposals on the table IMO.

(not that my opinion has much value)

> The chief problem with the use of rdf:value to link values to lexical 
> forms is that the link between type and form is too weak, 

Actually, I see the DC form as providing the stronger link, because
even in the context of "careless binding" of values by inference on
subPropertyOf relations, all the type information is carried along
(as is the case for URVs). One can really view URVs as a URI packaging
of the DC idiom.

> and if the 
> same value is specified in two ways, then the link can be completely 
> lost, eg if
> 
> xxx --rdf:type --> xsd:binary
> xxx --rdf:value --> "111"
> 
> and
> 
> xxx --rdf:type --> xsd:integer
> xxx --rdf:value --> "7"
> 
> Notice BTW that if we use rdfs:subClassOf on datatypes then
> xxx rdf:type xsd:integer .
> will be entailed by
> xxx rdf:type xsd:binary .

Well, actually, there is no such type as xsd:binary, but
for the sake of example, let's pretend there is and that
it is a subClassOf xsd:integer and has a lexical space
based on binary notation of integer values (that's what 
you meant, right?)

Thus above, we have two different TDLs (Typed Data Literals; see
link to my last proposal for the full definition). In a nutshell,
a TDL is a pairing of lexical form (literal) with data type (URI),
which denotes a single value in the value space of the data type.

So we have the TDLs ("111",xsd:binary) and ("7",xsd:integer) and 
each anon node that has these pairs of properties denotes *some* 
value in the respective value space defined for the TDL. 

In the first case, the value denoted by the lexical form "111" is in 
the xsd:binary value space. 

In the second case, the value denoted by the lexical form "7" is in 
the xsd:integer value space. 

The xsd:binary value space is a *separate* space from the xsd:integer
space. Right?

The relation of xsd:binary subClassOf xsd:integer states that all
members of the xsd:binary value space (not lexical space) are also
members of the xsd:integer value space -- thus, we can infer from
that relation that the two anon nodes denote the same value.

*BUT* the two anon nodes do *not* constitute the same TDL! Nor
do they denote the *same* value, insofar as the explicitly
declared knowledge is concerned.

Just as one may infer, by means of a daml:equivalentTo relation
that two resources are the same, so too may one infer, by means
of a subClassOf that two TDLs denote the same value. But that
doesn't mean that the nodes should be merged. Eh?

> The fact that xsd:bin
> and the upward-incompatibility problems that you raised concerning 
> the P proposals would apply here in just the same way. 

I never said that there were upward-incompatability problems with
the P(++) proposals, per se, only that the subClassOf relation between
data types only applies to value spaces and not lexical spaces.

> That objection 
> applies to *any* datatyping proposal that uses class reasoning on the 
> value spaces of datatypes; 

Exactly. That same issue applies to *ALL* of the proposals! Including S.

> the S proposal escapes it precisely by 
> treating datatypes as properties rather than as classes.

How does that allow S to escape it?!

If xsd:binary is a subPropertyOf xsd:integer, then that
doesn't mean that any value of the xsd:binary property is
a valid value for the xsd:integer property.

   xx xsd:binary "111" .

does not mean that

   xx xsd:integer "111" .

is valid because xsd:binary subPropertyOf xsd:integer! No?

The same issues of inference binding of literal values to
superordinate properties with incompatible lexical spaces
applies just as much to the S proposal as to all the other
proposals -- except the DC or U proposals! Because with the DC
and U proposals, the value being bound to the property is
either an anon node, "carrying along" with it the needed type
information, or a URI in which the type information is
encapsulated.

So in this regard, the S proposal is just as vulnerable as 
the P proposals, and only U and DC are "safe".

> I can't see any simple way around this problem, by the way. If 
> datatypes are classes and if we expect to be able to use normal class 
> reasoning on them - which includes the use of rdf:type - then normal, 
> valid, RDFS class reasoning is liable to produce wrong datatype 
> answers, in general. 

Not if the reasoning is about value spaces only, and it is accepted
that even if a given value is deemed to be a member of a particular
value space, its lexical representation (literal) may not be a member
of the lexical space for that data type.

> This is a very general and robust problem, and 
> there is no simple way to wriggle past it. 

Then perhaps it should be deferred to a "future working group"
rather than making radical changes such as the S proposal in
the hopes that maybe it *might* be better overall than the present 
common usage as reflected by use of rdfs:range and idioms such
as DC.

> The only ways I can see to 
> get past all involve somehow isolating datatype reasoning from class 
> reasoning, either by removing it completely (the URV and S 
> proposals); or maybe by providing a special subproperty of 
> rdfs:subClassOf  to be used on datatypes , something like 
> rdfs:subDatatypeOf, with its own special semantic conditions; or 
> maybe by declaring that RDF is only guaranteed to give correct 
> answers when used on 'upward compatible' datatyping schemes (ie those 
> for which
> 
> aaa rdf:type rdfs:datatypeClass .
> bbb rdf:type rdfs:datatypeClass .
> aaa rdfs:subClassOf bbb .
> 
> together entail
> 
> aaa rdfs:subDatatypeOf bbb . )

Would it not simply be sufficient to state that subClassOf
relates values, not lexical forms? Thus, any member of
the value space of type 'aaa' is also a member of the value
space of 'bbb' -- though the lexical spaces for these two
types may have no intersection whatsoever, and the 
"execution" of the mapping from lexical form to value
must occur within the context of the specific data type
to which a literal is bound.

This allows one to reason about the relations between
types and equivalences of values without requiring
that there be perfect subsetting of lexical spaces in
an upward compatible manner.

Eh?

Cheers,

Patrick
Received on Wednesday, 21 November 2001 02:10:51 UTC