Re: Summary of strings, markup, and language tagging in RDF (resend) from pat hayes on 2003-07-03 (w3c-rdfcore-wg@w3.org from July 2003)

From: pat hayes <phayes@ihmc.us>
Date: Wed, 2 Jul 2003 20:52:44 -0500
To: w3c-rdfcore-wg@w3.org, Brian_McBride <bwm@hplb.hpl.hp.com>, jjc@hplb.hpl.hp.com
Cc: Martin Duerst <duerst@w3.org>
Message-Id: <p06001231bb292ae38f02@[10.0.100.7]>
>
>The wrapper is one solution to carry the language information.
>Of course you can choose whatever solution fits you best, but
>you should not forget that there are other solutions. One of
>them would be to handle XML Literals in the same way as plain
>literals, carrying the language information separately. If that
>can be done for plain literals, why can it not be done for
>XML Literals?
>

Martin puts his finger on the key point. It could be, but we chose a 
design for XML literals in which the XML 'label' is treated as a 
built-in datatype; which then puts a strong design constraint on us 
to treat it uniformly with the other datatypes, and that in turn 
requires either than it not have lang tags or that all other datatype 
namespaces have lang tags.  The latter option is unworkable, so we 
chose the former.

Since this issue seems to be so centrally important, and since our 
design now appears to people like Martin to be so completely 
brain-damaged, let me propose that we re-open this issue and change 
our design slightly, by reverting to an older design. The trouble 
seems to arise from our insisting that XML literals are treated 
uniformly with typed literals: so let us abandon that idea, in spite 
of its being very neat, and revert to the state where the XML 
literals as treated as a special syntactic case in the RDF graph, so 
that there would be five kinds of literal: plain and XML with and 
without lang tags, plus datatyped literals.

In detail, the proposal is as follows.

1. There are five kinds of literal in an RDF graph, indicated in 
Ntriples as follows:
"string"  				plain
"string"@tag  			plain plus lang tag
"string"^^rdf:XMLLIteral  		XML
"string"@tag^^rdf:XMLLiteral  	XML plus lang tag
"string"^^foo:baz  			typed, where foo:baz is any 
URI other than 'rdf:XMLLiteral'

Notice that the Ntriples way of indicating the XML case is just as it 
is now, but thats just a syntactic decision to save work; 
rdf:XMLLiteral isn't a datatype and XML literals are not typed 
literals in this design, so the possibility of having lang tags in 
its lexical space isn't going to cause any headaches..

2. The semantic conditions on the first four are specified in the RDF 
interpretations and spelled out in detail - exactly how I leave to 
others to decide, but it seems to me that we could dispense with the 
wrapper (since we don't need to include the lang tag in a value space 
any more) and could just say that the XML case is treated 
semantically just like the plain case, ie the XML literal denotes 
itself (a piece of XML text, perhaps one conforming to Jeremy's 
elaborate conditions in 
http://www.w3.org/2001/sw/RDFCore/TR/WD-rdf-concepts-20030117/#section-XMLLiteral, 
or such a piece of text plus a lang tag); this would simplify the RDF 
MT, in fact.

3. In RDF/XML, rdf:parseType="Literal" maps to an XML literal and any 
enclosing lang tag in the XML document is incorporated into the @tag 
in the RDF graph. This allows RDF/XML to not appear 
XML-brain-damaged, since now

<rdf:Description xml:lang="en">
   <foo:prop parseType="Literal">
     <em>chat</em>
   </foo:prop>
   <foo:prop>chat</foo:prop>
</rdf:Description>

parses into

_:x foo:prop "<em>chat</em>"@en^^rdf:XMLLiteral .
_:x foo:prop "chat"@en .

4. Regarding Martin's other beef, that some XML without any markup in 
it is 'really' just plain text, this design also allows an RDF 
application to deal with this reasonably sensibly, since that 
identification amounts to just stripping off the ^^rdf:XMLLiteral 
flag when the literal string has no XML markup in it. I would vote 
against making that a valid RDF entailment in the semantics, but it 
would be relatively easy for a small app to do this using simple 
scripting on literals and still be a sensible semantic extension, 
without getting into all the datatyping complexity.

It would be relatively trivial to make the corresponding changes to 
the Semantics document for this design: I think the changes to 
Ntriples would be simple. Concepts would need some re-wording in 
sections 3.4 , 5 and 6; and maybe it would need section 5.1 being 
relocated. The changes are all essentially editorial, however, since 
every document has to treat XML literals as a special case already. 
I'm not sure what the effect on Syntax or Primer would be, but I 
think it would be relatively easy to tweak them. I havn't checked 
test cases.

Pat

-- 
---------------------------------------------------------------------
IHMC	(850)434 8903 or (650)494 3973   home
40 South Alcaniz St.	(850)202 4416   office
Pensacola			(850)202 4440   fax
FL 32501			(850)291 0667    cell
phayes@ihmc.us       http://www.ihmc.us/users/phayes
Received on Wednesday, 2 July 2003 21:52:48 UTC