Re: Summary of strings, markup, and language tagging in RDF (resend) from Graham Klyne on 2003-06-30 (w3c-rdfcore-wg@w3.org from June 2003)

From: Graham Klyne <gk@ninebynine.org>
Date: Mon, 30 Jun 2003 09:42:43 +0100
To: Martin Duerst <duerst@w3.org>, Dan Connolly <connolly@w3.org>
Cc: w3c-i18n-ig@w3.org, "Ralph R. Swick" <swick@w3.org>, misha.wolf@reuters.com, Tim Berners-Lee <timbl@w3.org>, w3c-rdfcore-wg@w3.org
Message-Id: <5.1.0.14.2.20030630091922.0329bd90@127.0.0.1>
At 08:48 29/06/03 -0400, Martin Duerst wrote:
>Hello Graham,
>
>At 18:53 03/06/27 +0100, Graham Klyne wrote:
>
>>Speaking for myself, and my understanding of our discussion...
>>
>>What I found "distasteful" was the suggestion that one would have to look 
>>*inside* the content of a literal to figure out what type it is.
>
>Obviously, to find out whether it is text with markup or text
>without markup, one way is to look inside. Another way would be
>to disallow rdf:parseType='Literal' on pure text strings.

I think this possibility was mentioned in our discussion, but rejected on 
the grounds of invalidating some (much?) existing RDF, and also making life 
much harder for RDF writers.


>>In discussion, I understood the request to be for:
>>
>>[[
>><dc:title rdf:parseType='Literal'>
>>   A Midsummer Night's Dream
>></dc:title>
>>]]
>>
>>to denote a plain string literal, but
>>
>>[[
>><dc:title rdf:parseType='Literal'>
>>   <em>A Midsummer Night's Dream</em>
>></dc:title>
>>]]
>>
>>to be a completely different kind of literal denoting an XML document in 
>>some way (because of the presence of markup).
>>
>>(I originally read Martin's note to suggest that an XML document is 
>>itself just a string of Unicode characters, not distinguished from 
>>non-XML strings.  That is a position I could support but with which 
>>others have expressed concerns.)
>
>Can we please make sure that we separate syntax and semantics?

I wasn't aware of conflating the two.  This issue seems to be entirely 
syntactic:  is a sequence of Unicode characters used to represent an XML 
document (and conforming to XML syntax) syntactically distinguished from 
any other sequence of Unicode characters?  (Hmmm... maybe the conflation 
here is between concrete syntax and abstract syntax -- I'm thinking of 
abstract syntax here.)

As for the rest of what you say, I really don't want to get into encoding 
tricks here -- to me that is just another layer of complexity we don't 
need, and as such should be left to implementers to deal with in their own 
way.   That is, if the string
    "<a>Some text</a>"
is to be distinct from the XML document encoded as:
    "<a>Some text</a>"
then we should just say so and deal with the consequences.

Personally, I don't think XML should have this distinguished status in 
RDF.  If it's really necessary to distinguish an XML document literal in 
RDF, when why not use RDF facilities to do so?  e.g.

    <ex:XMLDocument>
       <rdf:value rdf:parseType="Literal"><a>Some text</a></rdf:value>
    </ex:XMLDocument>

as distinct from, say:

    <ex:StringData>
       <rdf:value rdf:parseType="Literal"><a>Some text</a></rdf:value>
    </ex:StringData>

>XML is defined as a syntax on a sequence of Unicode characters,
>so treating it as such in a particular implementation,... is
>possible. If you are a bit careful with escaping, you can store
>text without markup in the same form. Other implementations are
>easily possible (for example, one could observe that "<>" is illegal
>in XML, and thus use "<>" to escape '<', and not escape &, and
>use '""' to escape '"' in an attribute. This would no longer look
>like XML, but would store the same information).
>
>For RDF to say that XML is *treated* as a string of Unicode characters
>is perfectly okay. For RDF to say that XML *is* nothing but a string
>of Unicode characters is a bad idea.

I don't think the issue here is that RDF is or is not trying to say 
anything about what an XML document may be, but rather to decide whether or 
not RDF embodies special treatment of literals that happen to be XML 
documents.  My position being:  why shouldn't RDF adopt the same techniques 
for talking about XML documents that it uses for talking about any other 
kind of thing in the universe of discourse?

>What is important is that the same semantic things, i.e.:
>- Text (without markup or language information)
>- Text with language information (but no markup)
>- Text with markup (but no language info)
>- Text with markup and language information
>are in each of the above cases recognized as being the same rather
>than being split up in a number of different things based on some
>representational details. On top of that, recognizing the continuity
>between the four variants above and making it easy to deal with
>this continuity would be a definite plus.

Which all seems to be saying that there are different flavours of text for 
which consistent handling is required.  Which seems reasonable to me.  But 
what is confusing me is the suggestion that XML is, on one hand, just 
another flavour of text, yet is also something completely different.  I 
can't make coherent sense of this.

In its way, XML *is* a "representational detail", which happens to be used 
to represent many more things than just text.  I'm not sure what you mean 
by continuity in this case.

This message is in danger of getting longer and longer... the more I think 
about what you seem to be asking for, the less I can see a coherent view of 
it.  So, in summary, I think we have two choices:
(a) XML has no distinguished status in the RDF abstract syntax.  (I like 
this, others don't)
(b) XML does have distinguished status, and we accept the consequences, 
warts and all.

#g


-------------------
Graham Klyne
<GK@NineByNine.org>
PGP: 0FAA 69FF C083 000B A2E9  A131 01B9 1C7A DBCA CB5E
Received on Monday, 30 June 2003 06:57:51 UTC