- From: Martin Duerst <duerst@w3.org>
- Date: Sun, 29 Jun 2003 08:48:32 -0400
- To: Graham Klyne <gk@ninebynine.org>, Dan Connolly <connolly@w3.org>
- Cc: w3c-i18n-ig@w3.org, "Ralph R. Swick" <swick@w3.org>, misha.wolf@reuters.com, Tim Berners-Lee <timbl@w3.org>, w3c-rdfcore-wg@w3.org
Hello Graham, At 18:53 03/06/27 +0100, Graham Klyne wrote: >Speaking for myself, and my understanding of our discussion... > >What I found "distasteful" was the suggestion that one would have to look >*inside* the content of a literal to figure out what type it is. Obviously, to find out whether it is text with markup or text without markup, one way is to look inside. Another way would be to disallow rdf:parseType='Literal' on pure text strings. >In discussion, I understood the request to be for: > >[[ ><dc:title rdf:parseType='Literal'> > A Midsummer Night's Dream ></dc:title> >]] > >to denote a plain string literal, but > >[[ ><dc:title rdf:parseType='Literal'> > <em>A Midsummer Night's Dream</em> ></dc:title> >]] > >to be a completely different kind of literal denoting an XML document in >some way (because of the presence of markup). > >(I originally read Martin's note to suggest that an XML document is itself >just a string of Unicode characters, not distinguished from non-XML >strings. That is a position I could support but with which others have >expressed concerns.) Can we please make sure that we separate syntax and semantics? XML is defined as a syntax on a sequence of Unicode characters, so treating it as such in a particular implementation,... is possible. If you are a bit careful with escaping, you can store text without markup in the same form. Other implementations are easily possible (for example, one could observe that "<>" is illegal in XML, and thus use "<>" to escape '<', and not escape &, and use '""' to escape '"' in an attribute. This would no longer look like XML, but would store the same information). For RDF to say that XML is *treated* as a string of Unicode characters is perfectly okay. For RDF to say that XML *is* nothing but a string of Unicode characters is a bad idea. What is important is that the same semantic things, i.e.: - Text (without markup or language information) - Text with language information (but no markup) - Text with markup (but no language info) - Text with markup and language information are in each of the above cases recognized as being the same rather than being split up in a number of different things based on some representational details. On top of that, recognizing the continuity between the four variants above and making it easy to deal with this continuity would be a definite plus. Regards, Martin.
Received on Sunday, 29 June 2003 09:00:26 UTC