Re: Summary of strings, markup, and language tagging in RDF (resend) from Graham Klyne on 2003-06-30 (w3c-rdfcore-wg@w3.org from June 2003)

From: Graham Klyne <gk@ninebynine.org>
Date: Mon, 30 Jun 2003 21:36:18 +0100
To: Martin Duerst <duerst@w3.org>, Dan Connolly <connolly@w3.org>
Cc: w3c-i18n-ig@w3.org, "Ralph R. Swick" <swick@w3.org>, misha.wolf@reuters.com, Tim Berners-Lee <timbl@w3.org>, w3c-rdfcore-wg@w3.org
Message-Id: <5.1.0.14.2.20030630213453.02fc3e48@127.0.0.1>
Martin,

I actually disagree with your points here on several levels.  But I 
recognize I'm a minority opinion, and will leave the field clear for others 
to weigh in.

(Incidentally, with reference to another message, I agree with your reading 
of M&S, but disagree with your conclusion.)

#g
--

At 12:09 30/06/03 -0400, Martin Duerst wrote:
>Hello Graham, others,
>
>
>At 09:42 03/06/30 +0100, Graham Klyne wrote:
>
>>At 08:48 29/06/03 -0400, Martin Duerst wrote:
>
>>>Obviously, to find out whether it is text with markup or text
>>>without markup, one way is to look inside. Another way would be
>>>to disallow rdf:parseType='Literal' on pure text strings.
>>
>>I think this possibility was mentioned in our discussion, but rejected on 
>>the grounds of invalidating some (much?) existing RDF, and also making 
>>life much harder for RDF writers.
>
>I did not want to suggest that this possibility is a good solution.
>I just tried to show what the issue was about: As long as there
>is no markup in the 'XML'-Literal, there is really no semantic
>distinction to plain literals.
>
>
>>>>In discussion, I understood the request to be for:
>>>>
>>>>[[
>>>><dc:title rdf:parseType='Literal'>
>>>>   A Midsummer Night's Dream
>>>></dc:title>
>>>>]]
>>>>
>>>>to denote a plain string literal, but
>>>>
>>>>[[
>>>><dc:title rdf:parseType='Literal'>
>>>>   <em>A Midsummer Night's Dream</em>
>>>></dc:title>
>>>>]]
>>>>
>>>>to be a completely different kind of literal denoting an XML document 
>>>>in some way (because of the presence of markup).
>>>>
>>>>(I originally read Martin's note to suggest that an XML document is 
>>>>itself just a string of Unicode characters, not distinguished from 
>>>>non-XML strings.  That is a position I could support but with which 
>>>>others have expressed concerns.)
>>>
>>>Can we please make sure that we separate syntax and semantics?
>>
>>I wasn't aware of conflating the two.  This issue seems to be entirely 
>>syntactic:  is a sequence of Unicode characters used to represent an XML 
>>document (and conforming to XML syntax) syntactically distinguished from 
>>any other sequence of Unicode characters?  (Hmmm... maybe the conflation 
>>here is between concrete syntax and abstract syntax -- I'm thinking of 
>>abstract syntax here.)
>
>First, we are not dealing with XML documents, we are dealing with
>XML fragments. Second, of course there is a distinction between
>an XML fragment that has actual markup and a string that does
>contain nothing but text. But this is not what we are talking about.
>The question is whether there is a difference between an 'XML fragment'
>that contains nothing else but just text and a simple string that
>contains nothing else but text. What I was saying was that there
>may be some syntactic differences (because there may be some need
>for escaping in the first case, but not in the second case),
>but there is no real difference. (I'll try to avoid the word
>'semantic' from now on.)
>
>
>>As for the rest of what you say, I really don't want to get into encoding 
>>tricks here -- to me that is just another layer of complexity we don't 
>>need, and as such should be left to implementers to deal with in their own way.
>
>I fully agree. In the same way, if rdf:parseType='Literal' is irrelevant
>if there is no markup in the literal, then we should just say so and
>let implementations deal with it.
>
>
>>  That is, if the string
>>    "<a>Some text</a>"
>>is to be distinct from the XML document encoded as:
>>    "<a>Some text</a>"
>>then we should just say so and deal with the consequences.
>
>Yes, exactly. The former would turn out in RDF/XML as something
>like <foo>&lt;a&gt;Some text&lt;/a&gt;<foo>, the later would turn
>out as <foo rdf:parseType='Literal'><a>Some text</a></foo>.
>I think nobody in this discussion claims that these two should
>be the same. What we are discussing is the cases where there is
>only an XML fragment without markup. I.e. if the string
>     "Some text"
>is to be distinct from the XML fragment encoded as:
>     "Some text"
>then we should just say so. But very obviously, they are the same,
>so we should not claim they are different.
>
>
>>Personally, I don't think XML should have this distinguished status in 
>>RDF.  If it's really necessary to distinguish an XML document literal in 
>>RDF, when why not use RDF facilities to do so?  e.g.
>>
>>    <ex:XMLDocument>
>>       <rdf:value rdf:parseType="Literal"><a>Some text</a></rdf:value>
>>    </ex:XMLDocument>
>>
>>as distinct from, say:
>>
>>    <ex:StringData>
>>       <rdf:value rdf:parseType="Literal"><a>Some text</a></rdf:value>
>>    </ex:StringData>
>
>First, this would be against RDF Model and Syntax. Second,
>as Jeremy pointed out, it would be against all the other
>decisions RDF Core has taken up to last call. Third, it
>would create even more different representations for what's
>exactly the same thing. There would be nothing but syntax
>differences between the following two:
>
><ex:XMLDocument>
>    <rdf:value rdf:parseType='Literal'>Some text</rdf:value>
></ex:XMLDocument>
>
><ex:StringData>
>    <rdf:value rdf:parseType='Literal'>Some text</rdf:value>
></ex:StringData>
>
>And fourth, the second one could easily be seen as yet another
>way to do CDATA Sections, see the parallel between
>
><ex:StringData>
>     <rdf:value rdf:parseType="Literal"><a>Some text</a></rdf:value>
></ex:StringData>
>
>and <![CDATA[<a>Some text</a>]]>.
>
>As I18N has worked hard to keep CDATA Sections out of the infoset,
>we wouldn't be pleased about this either :-(.
>
>
>>>For RDF to say that XML is *treated* as a string of Unicode characters
>>>is perfectly okay. For RDF to say that XML *is* nothing but a string
>>>of Unicode characters is a bad idea.
>>
>>I don't think the issue here is that RDF is or is not trying to say 
>>anything about what an XML document may be, but rather to decide whether 
>>or not RDF embodies special treatment of literals that happen to be XML 
>>documents.  My position being:  why shouldn't RDF adopt the same 
>>techniques for talking about XML documents that it uses for talking about 
>>any other kind of thing in the universe of discourse?
>
>So to play devil's advocate, why allow strings? Why not model them the
>RDF way as a sequence of characters?
>
>Seriously, XML fragments got into RDF because they are a natural
>extension of plain literals. The Web has brought us markup, and
>it has proven to be useful. Why go back to plain text if we don't
>have to? And XML fragments cover important internationalization needs,
>such as multilingual strings, ruby, bidirectionality, and so on.
>
>
>>>What is important is that the same semantic things, i.e.:
>>>- Text (without markup or language information)
>>>- Text with language information (but no markup)
>>>- Text with markup (but no language info)
>>>- Text with markup and language information
>>>are in each of the above cases recognized as being the same rather
>>>than being split up in a number of different things based on some
>>>representational details. On top of that, recognizing the continuity
>>>between the four variants above and making it easy to deal with
>>>this continuity would be a definite plus.
>>
>>Which all seems to be saying that there are different flavours of text 
>>for which consistent handling is required.  Which seems reasonable to 
>>me.  But what is confusing me is the suggestion that XML is, on one hand, 
>>just another flavour of text, yet is also something completely 
>>different.  I can't make coherent sense of this.
>
>Marked-up text is just another flavor of text. Of course text with
>markup and text without markup is not exactly the same, otherwise
>we wouldn't need markup in the first place.
>Also, an XML fragment that is just only text is just that, just only text.
>Anything that is just text can be an xml fragment.
>XML is text + markup. An XML document has to have markup (the root element).
>An XML fragment does not have to have markup. So an XML fragment can
>be just text.
>
>
>>In its way, XML *is* a "representational detail", which happens to be 
>>used to represent many more things than just text.  I'm not sure what you 
>>mean by continuity in this case.
>
>'many more things than just text' may have two different senses.
>In one sense, it refers to the fact that XML is often used for
>representing (structured) data. In that case, it is probably better
>to 'convert' that XML to RDF, either explicitly or by using the
>fact that many XML formats, sometimes with a little bit of help
>such as parseType='Resource', can be interpreted as RDF. This is
>not really the topic of this discussion.
>The other sense may refer to the fact that markup adds value to
>text. It indeed does, but only if actually present.
>
>
>Let me try another approach. RDF says that
>
><foo rdf:parseType="Resource">
>     <rdf:type>Dog</rdf:type>
>     <name>Fifi</name>
></foo>
>
>and
>
><foo>
>   <Dog>
>     <name>Fifi</name>
>   </Dog>
></foo>
>
>(modulo my syntax error) are the same, namely a thing foo with type
>Dog and name Fifi. Why would it then be so difficult for RDF to say that
>
>    <bar rdf:parseType="Literal">Fifi</bar>
>and
>    <bar>Fifi</bar>
>are also the same?
>
>
>>This message is in danger of getting longer and longer... the more I 
>>think about what you seem to be asking for, the less I can see a coherent 
>>view of it.  So, in summary, I think we have two choices:
>>(a) XML has no distinguished status in the RDF abstract syntax.  (I like 
>>this, others don't)
>>(b) XML does have distinguished status, and we accept the consequences, 
>>warts and all.
>
>What's warty about saying that a text string without markup is the
>same as a text string without markup?
>
>
>Regards,     Martin.

-------------------
Graham Klyne
<GK@NineByNine.org>
PGP: 0FAA 69FF C083 000B A2E9  A131 01B9 1C7A DBCA CB5E
Received on Monday, 30 June 2003 16:48:26 UTC