Re: XML Schema WG comments on RDF documents from pat hayes on 2003-10-06 (www-rdf-comments@w3.org from October to December 2003)

From: pat hayes <phayes@ihmc.us>
Date: Mon, 6 Oct 2003 13:24:18 -0500
To: "C. M. Sperberg-McQueen" <cmsmcq@acm.org>
Cc: www-rdf-comments@w3.org
Message-Id: <p06001f1ebba74c8592cb@[10.0.100.25]>
>  > xmlsch-01 1.1. Design question, complexity (substantive)
>>  ++++++++++++++++++++++++++++++++++++++++++++++
>  > you said:
>>  [[
>>  1.1. Design question, complexity (substantive)
>>   The introduction of pairs consisting of a lexical form and a type (or,
>>  strictly speaking, a lexical form and a type label) seems at first glance to
>>  complicate the RDF model somewhat. We have had the impression that in other
>>  parts of RDF, typing is handled by adding further arcs and nodes. 
>>If the type
>>  of a resource is identified by having an arc labeled rdf:type from 
>>it to (the
>>  URI of) its (RDF) type, and if the type of an arc is similarly identified by
>>  an arc, then surely a reason ought to be given for shifting to a different
>>  method for typing literal strings. It seems like a dramatic shift in the
>>  infrastructure of RDF, from "everything is a node, an arc, or a literal
>>  value" to "everything is a node, an arc, or a typed literal value". Perhaps
>>  not quite so dramatic, after all. But the question of design consistency
>>  remains: why not "everything is a typed node, a typed arc, or a typed
>>  literal"?
>>  ]]
>>
>>
>>  Our resolution is:
>>  xmlsch-01 as in 0252 with amendment.
>>  i.e.
>>  [[
>>  The RDF Core WG interprets this comment as two questions and a comment:
>>
>  >    1)  Why is the type of a literal not described using a property arc, as
>>  is done for other literals?
>>
>>     2)  Having introduced typed literal nodes, why not introduce typed
>>  resource nodes and typed property arcs as well
>>
>>     3)  The WG should provide a rationale for this design in the 
>>specifications
>>
>>  Regarding question 1:
>>
>>  This would require that literals be allowed as subjects of RDF
>>  statements.  This is not possible in current RDF/XML and would require
>>  considerable change, beyond the scope of the WG, to  support it.    Further
>>  it introduces problems of non-monotonicity in the semantics.  A property
>>  whose value is plain literal is currently taken to denote a sequence
>>  characters.  Adding a further statement could change that value to, say an
>>  integer, invalidating previous inferences and breaking a fundamental tenet
>>  of RDF.
>
>On question 1: Thank you; that helps clarify the design.
>
>>  Regarding question 2:
>>
>>  No requirement justified a change to the notion of a URIREF node or an RDF
>>  arc.
>
>On question 2: In the final analysis this is your call and we don't
>plan to lie down in the road over it. For the record, though, we
>should record that we find your analysis unconvincing.  The
>introduction of typed literals introduces a new idea into RDF, and it
>is obvious that this new idea has possible applications elsewhere in
>the design space. Your response amounts to saying that you chose not
>to work through the design implications of introducing this kind of
>type labeling, because it seemed possible to get by without such
>re-thinking.  The result is that the new idea will continue to feel
>incompletely integrated into RDF; it will feel like a patch added as
>an afterthought rather than an integral part of the design.

Allow me to offer some more exposition which may help clarify this 
issue. I should first say that I am speaking here with my own voice, 
not that of the WG.

I am not sure what you mean by typed literals being a 'new idea'. It 
sounds from your response above as though you see the new idea here 
being the syntactic association of a type (ie a datatype) with a 
literal string, and that you therefore see a natural generalization 
of this associating-of-a-type as a kind of generic syntactic option. 
I do not see the situation in this way, and find the proposed 
generalization unmotivated and close to incoherent. (Our inability to 
understand the motivation for this proposal may be one reason for the 
brevity of our response.)

Although our response did not go into this in detail, it may be 
appropriate to point out that the original comment seems to embody a 
conceptual error, by conflating 'type' as in rdf:type with 'type' as 
in datatype. These are quite distinct ideas: the similarity in fact 
is little better than a pun.  The property rdf:type indicates 
membership in a class, or application of a property. It is what is 
conventionally called 'member' or, in formal set theory, written 
using an infix epsilon, or, in conventional logical notations, 
written as the application of the rdf:type property value (the class 
or predicate) to the subject (the individual in the class). In other 
words, it has to do with membership in a class. Datatyping, in 
contrast, has to do with ways of interpreting lexical forms. These 
are different topics.  RDF blurs this distinction to some extent by 
allowing a datatype name to be used to refer to the value class, so 
that a well-formed typed literal denotes a value which is in - bears 
the rdf:type property to - that type considered as a class. This is 
purely a convention in RDF, however, and in fact itself has been the 
subject of controversy since the 'primary' interpretation of any 
datatype name has to be the lexical-to-value mapping rather than the 
value class.

The role of datatyping seems to be to provide for alternative ways of 
interpreting lexical items which have conventional interpretations in 
widespread use, such as numerals used to indicate numbers, calendar 
conventions used to indicate days and times, and so on. In 
conventional (pre-Web) formal languages these are often thought of as 
fixed, so that numerals always denote numbers using decimal 
conventions, strings are always indicated by enclosing quotations, 
etc.. On the Web we need to both allow for a wider range of 
alternatives but also give a specific indication of the 
lexical-to-value mapping intended: hence the utility of the XSD 
structure.  Thus, the combination of a lexical string and a datatype 
can be seen as a kind of 'fixed name' which is required to denote its 
conventional meaning in any interpretation, and this is moreover a 
meaning which can be determined by any processor which has access to 
the datatyping conventions indicated by the type name, which embody 
the conventions being used. The syntactic association of a literal 
string (a lexical form) and a datatype name seems like a natural way 
to encode the use of the convention named by the latter to interpret 
the lexical form which is the former: the Ntriples ^^ convention can 
be read as ", understood as a"., e.g.
"234"^^xsd:number
means '123', understood as a number...  in contrast with, say, '123', 
understood as a character string. .Note that none of this has got 
anything particularly to do with class membership.

But all this applies only to those parts of RDF (or indeed any other 
such formalism) which are intended to be understood 'conventionally', 
ie relative to a widely used convention (such as dates and numerals). 
Most names are not conventional in this way: in a programming 
language, typically, identifiers are not; in RDF, URIrefs are not. 
They are simply general-purpose denoting expressions, which follow no 
particular lexical conventions and for which no generic rules can be 
given which relate their lexical form to their intended 
interpretations. Thus, to associate a lexical-to-value mapping with a 
URIref would be meaningless and would provide no useful information 
about the referent, or functionality to a reasoner: to say " 
<ex:aaa>, understood as a number" is otiose: if <ex:aaa> is a number 
then it is vacuous, and if not, meaningless: either way, the datatype 
contributes nothing towards the interpretation of the URIref. For 
this reason, the generalization you suggest to typed nodes and arcs 
seems unmotivated and semantically meaningless; contrary to your 
claim above, it does not have 'obvious' possible applications 
elsewhere in the design space.

Seen in this light, therefore, the 'new' idea to which you refer is 
only new in the sense that it provides a syntactic association 
between a datatype and a lexical form. Any way of using datatypes in 
RDF must somehow provide for such an association - that is not new - 
and, as we discovered during a very long and arduous process of 
exploration, a direct syntactic association is one of the very few 
such techniques which does not break either the underlying semantic 
model of RDF or else the underlying graph syntax conventions. We did 
not therefore see this as a highly new idea, more of a workable 
solution to a pressing, but old, problem.

You claim that typed literals are a 'patch', incompletely integrated 
into the RDF design. I disagree; if anything, plain literals with 
language tags are a patch, not properly integrated but required for 
legacy reasons.  In fact, with hindsight, I think it would be fair to 
say that in an sense *all* RDF literals can be viewed as typed: we 
retained the 'plain' style for essentially historical reasons (and to 
satisfy the i18n requirements for a syntactic placeholder for XML 
language tags) but in fact, both semantically and in the central 
syntactic model, those could be considered to be typed with a 
'trivial type' (which is in fact extremely similar to xsd:string, 
though not quite exactly the same.)

You suppose in your answer that we "chose not to work through" the 
design implications. I rather resent this supposition, and reject the 
implication of laziness. If the design implications you refer to are 
the possibility of datatyping URI references, it would be more 
accurate to say that we thought about them and decided that there 
were none. If you are referring to the use of explicit properties to 
describe datatypes, without introducing the 'new' typed-literal 
syntax, then rest assured that we considered many design options in 
detail, as could be discovered from a perusal of the WG email archive.

If I have missed your point entirely and you (or y'all) feel that 
there is some other obvious opportunity we have missed here, I would 
welcome a more detailed correction; particularly if it could be in 
some way related to the RDF model theory.

Pat Hayes

-- 
---------------------------------------------------------------------
IHMC	(850)434 8903 or (650)494 3973   home
40 South Alcaniz St.	(850)202 4416   office
Pensacola			(850)202 4440   fax
FL 32501			(850)291 0667    cell
phayes@ihmc.us       http://www.ihmc.us/users/phayes
Received on Monday, 6 October 2003 14:24:48 UTC