Re: Summary of strings, markup, and language tagging in RDF (resend) from pat hayes on 2003-07-03 (w3c-rdfcore-wg@w3.org from July 2003)

From: pat hayes <phayes@ihmc.us>
Date: Thu, 3 Jul 2003 13:43:06 -0500
To: Martin Duerst <duerst@w3.org>
Cc: <w3c-rdfcore-wg@w3.org>
Message-Id: <p0600123bbb2a1f1337b8@[10.0.100.7]>
>>  > [Pat:]4. Regarding Martin's other beef, that some XML without 
>>any markup in
>>  > it is 'really' just plain text,
>>
>>[Patrick:]I'm not 100% sure that this is in fact Martin's position, 
>>but if it is, or
>>is
>>anyone else's position,
>
>{Martin:]It indeed is.
>
>>then my reply is that this is simply wrong.
>>
>>The difference between a plain literal and an XML literal, regardless
>>of the presence of markup, is that an RDF application is free to
>>presume that an XML literal constitutes well-formed XML, whereas
>>a plain literal need not. Period. It's as simple as that.
>
>Wrong. You are confusing some particular representations with
>the abstraction behind it. A plain literal can always be represented
>as a well-formed XML fragment. In RDF/XML, it actually is represented
>that way. In an RDF store, you may choose a different representation,
>but that's an implementation detail, which should not affect the
>design.

Ah, this may reveal the source of some of the heat in this 
discussion. IF one takes the RDF/XML to be the 'real' representation, 
and the RDF graph to be simply an implementation - call this view X - 
then certain identities appear obvious. However, if you take the RDF 
graph to be the 'real' representation and the RDF/XML to be simply an 
implementation (an interchange syntax for the graphs) - call this 
view G - then different identities seem obvious. The entire RDF 
design has for some time now been based on view G rather than view X, 
and on that view the design seems much more intuitive since the XML 
text inside XML literals is just a chunk of text being treated as 
XML; on this view, there is virtually no connection between the XML 
fragment inside a paseType="Literal" (which is just a hunk of text 
being plonked into a literal node in the RDF graph and labelled 'XML 
stuff') and the XML outside it (which is the XML encoding of the RDF 
graph, used only as an interchange syntax). I can  see, and empathize 
with, Patrick's point of view here: on view G, there really is no 
connection between them, and the XML inside the paseType="Literal" in 
a sense isn't even really a part of the surrounding XML document at 
all, and shouldn't inherit such things as lang tags by XML inclusion 
rules.  On the other hand, there is no doubt that an RDF/XML document 
sure *looks* like XML, and I also can see Martin's point that it is 
particularly weird from a position X point of view to have an XML 
document in which the pieces of it which are explicitly labelled as 
being RDF-XML are the only pieces that apparently violate normal XML 
rules. The point of my suggestion was not to deny view G (looking at 
Patrick here) but only to suggest that there might be a low-cost way 
to arrange that the view-X vision of RDF/XML was somewhat less 
peculiar.

.....

>
>>Furthermore, the comparison of XML literals is not
>>based on string-equality. These important distinctions from plain
>>strings are captured semantically in the definition of the datatype
>>rdf:XMLLiteral and in the RDF datatyping model.
>
>Is there any case where
>    <foo:prop>Some text here</foo:prop>
>and
>    <foo:prop rdf:parseType="Literal">Some text here</foo:prop>
>actually behave differently with respect to equality?

Well, yes, of course. For example suppose some app is manipulating 
XML documents by taking them apart into pieces, checking various 
properties of those pieces, maybe substituting other pieces for some 
of them, and then reassembling the XML; and suppose it is using RDF 
to encode the fragments in order to reason about them in some way. It 
would be natural to do a kind of type checking, using rdf:XMLLiteral, 
to ensure that only genuine XML fragments were included. Under these 
circumstances, the distinction between text drawn from an XML source 
and the 'same' text from some other source might be critical, and the 
reasoner might want to carefully distinguish them and treat them as 
distinct. The fact that some piece of XML text happens to not have 
any XML markup in it would be largely irrelevant to this kind of 
application; and to rely on that kind of internal examination of the 
text would be infeasable as an implementation strategy and would 
nullify the point of having the typing information in the first 
place. This is the entire point of having datatypes, for many 
purposes: it frees code from the need to check that is methods are 
appropriate, by labelling the data with a 'type' which guarantees 
that they are.

>I.e. is it possible to construct two text strings A and B,
>both conforming to XML #PCDATA, where
>    <foo:prop>A</foo:prop> and
>    <foo:prop>B</foo:prop> are equal
>but
>    <foo:prop rdf:parseType="Literal">A</foo:prop> and
>    <foo:prop rdf:parseType="Literal">B</foo:prop> are not
>(or the other way round)?
>
>I do not know of any such case. If you know of any, please tell us.
>If there is none, the above argument is moot.
>
>>Though this is not pointed out anywhere explicitly in the RDF specs
>>(which is a shame, but understandable, since it needs a bit more
>>testing to ensure there are no major dragons) the present RDF datatyping
>>solution, with XML Literals modelled as a datatype, allow for us
>>to support the entire range of XML Schema types, including complex
>>types! And thereby, define property ranges to be e.g. xhtml:title,
>>asserting that all property values conform to the content model
>>constraining the lexical space of xhtml:title elements.
>
>First, xhtml:title is really a boring example, as it is currently
>only #PCDATA. But that should change with XHTML2, so let's assume
>we are there for the sake of this example.
>
>It is important to point out that there is a distinction between
>xhtml:title, and the content model of xhtml:title. The former
>looks like
>   <foo:prop rdf:parseType="Literal"><xhtml:title
>       >Your <xhtml:em>Title</xhtml:em> Here</xhtml:title></foo:prop>
>The later looks like
>   <foo:prop rdf:parseType="Literal"
>       >Your <xhtml:em>Title</xhtml:em> Here</foo:prop>
>
>I am very convinced that the later will be much more frequent
>in the context of RDF, because it is better to model the
>fact that this is a title as an RDF property. This is also
>the more important case for I18N.
>
>>Such benefits
>>simply dissappear if XML Literals are not modeled as typed
>>literals
>
>Well, they may.
>
>>-- and we certainly don't want to go back to treating
>>rdf:XMLLiteral as a special case of datatype with lang tag.
>
>Why not? It is special, so why don't you treat it specially?

There are technical reasons (to do with identity substitution on 
datatype names) why it is unworkable to have a 'special' datatype 
which violates the structural assumptions of the datatyping model, 
and it is not feasible or desireable to include lang tags in the 
datatyping model. So *if* we treat XML literals as being typed by an 
XML datatype, it is infeasible to include lang tags as part of the 
literal structure. They could be included as part of the XML literal 
string itself, by requiring all such literals to have a special 
rdf-wrapper onto which the lang tag can be attached by normal XML 
conventions; but then of course the actual XML literal string no 
longer looks like the XML fragment included in the RDF/XML document. 
But on the other hand, from view X, I guess that would just be an 
implementation detail.

Pat
-- 
---------------------------------------------------------------------
IHMC	(850)434 8903 or (650)494 3973   home
40 South Alcaniz St.	(850)202 4416   office
Pensacola			(850)202 4440   fax
FL 32501			(850)291 0667    cell
phayes@ihmc.us       http://www.ihmc.us/users/phayes
Received on Thursday, 3 July 2003 14:43:10 UTC