- From: Graham Klyne <gk@ninebynine.org>
- Date: Mon, 30 Jun 2003 09:42:43 +0100
- To: Martin Duerst <duerst@w3.org>, Dan Connolly <connolly@w3.org>
- Cc: w3c-i18n-ig@w3.org, "Ralph R. Swick" <swick@w3.org>, misha.wolf@reuters.com, Tim Berners-Lee <timbl@w3.org>, w3c-rdfcore-wg@w3.org
At 08:48 29/06/03 -0400, Martin Duerst wrote: >Hello Graham, > >At 18:53 03/06/27 +0100, Graham Klyne wrote: > >>Speaking for myself, and my understanding of our discussion... >> >>What I found "distasteful" was the suggestion that one would have to look >>*inside* the content of a literal to figure out what type it is. > >Obviously, to find out whether it is text with markup or text >without markup, one way is to look inside. Another way would be >to disallow rdf:parseType='Literal' on pure text strings. I think this possibility was mentioned in our discussion, but rejected on the grounds of invalidating some (much?) existing RDF, and also making life much harder for RDF writers. >>In discussion, I understood the request to be for: >> >>[[ >><dc:title rdf:parseType='Literal'> >> A Midsummer Night's Dream >></dc:title> >>]] >> >>to denote a plain string literal, but >> >>[[ >><dc:title rdf:parseType='Literal'> >> <em>A Midsummer Night's Dream</em> >></dc:title> >>]] >> >>to be a completely different kind of literal denoting an XML document in >>some way (because of the presence of markup). >> >>(I originally read Martin's note to suggest that an XML document is >>itself just a string of Unicode characters, not distinguished from >>non-XML strings. That is a position I could support but with which >>others have expressed concerns.) > >Can we please make sure that we separate syntax and semantics? I wasn't aware of conflating the two. This issue seems to be entirely syntactic: is a sequence of Unicode characters used to represent an XML document (and conforming to XML syntax) syntactically distinguished from any other sequence of Unicode characters? (Hmmm... maybe the conflation here is between concrete syntax and abstract syntax -- I'm thinking of abstract syntax here.) As for the rest of what you say, I really don't want to get into encoding tricks here -- to me that is just another layer of complexity we don't need, and as such should be left to implementers to deal with in their own way. That is, if the string "<a>Some text</a>" is to be distinct from the XML document encoded as: "<a>Some text</a>" then we should just say so and deal with the consequences. Personally, I don't think XML should have this distinguished status in RDF. If it's really necessary to distinguish an XML document literal in RDF, when why not use RDF facilities to do so? e.g. <ex:XMLDocument> <rdf:value rdf:parseType="Literal"><a>Some text</a></rdf:value> </ex:XMLDocument> as distinct from, say: <ex:StringData> <rdf:value rdf:parseType="Literal"><a>Some text</a></rdf:value> </ex:StringData> >XML is defined as a syntax on a sequence of Unicode characters, >so treating it as such in a particular implementation,... is >possible. If you are a bit careful with escaping, you can store >text without markup in the same form. Other implementations are >easily possible (for example, one could observe that "<>" is illegal >in XML, and thus use "<>" to escape '<', and not escape &, and >use '""' to escape '"' in an attribute. This would no longer look >like XML, but would store the same information). > >For RDF to say that XML is *treated* as a string of Unicode characters >is perfectly okay. For RDF to say that XML *is* nothing but a string >of Unicode characters is a bad idea. I don't think the issue here is that RDF is or is not trying to say anything about what an XML document may be, but rather to decide whether or not RDF embodies special treatment of literals that happen to be XML documents. My position being: why shouldn't RDF adopt the same techniques for talking about XML documents that it uses for talking about any other kind of thing in the universe of discourse? >What is important is that the same semantic things, i.e.: >- Text (without markup or language information) >- Text with language information (but no markup) >- Text with markup (but no language info) >- Text with markup and language information >are in each of the above cases recognized as being the same rather >than being split up in a number of different things based on some >representational details. On top of that, recognizing the continuity >between the four variants above and making it easy to deal with >this continuity would be a definite plus. Which all seems to be saying that there are different flavours of text for which consistent handling is required. Which seems reasonable to me. But what is confusing me is the suggestion that XML is, on one hand, just another flavour of text, yet is also something completely different. I can't make coherent sense of this. In its way, XML *is* a "representational detail", which happens to be used to represent many more things than just text. I'm not sure what you mean by continuity in this case. This message is in danger of getting longer and longer... the more I think about what you seem to be asking for, the less I can see a coherent view of it. So, in summary, I think we have two choices: (a) XML has no distinguished status in the RDF abstract syntax. (I like this, others don't) (b) XML does have distinguished status, and we accept the consequences, warts and all. #g ------------------- Graham Klyne <GK@NineByNine.org> PGP: 0FAA 69FF C083 000B A2E9 A131 01B9 1C7A DBCA CB5E
Received on Monday, 30 June 2003 06:57:51 UTC