- From: Graham Klyne <gk@ninebynine.org>
- Date: Mon, 30 Jun 2003 21:36:18 +0100
- To: Martin Duerst <duerst@w3.org>, Dan Connolly <connolly@w3.org>
- Cc: w3c-i18n-ig@w3.org, "Ralph R. Swick" <swick@w3.org>, misha.wolf@reuters.com, Tim Berners-Lee <timbl@w3.org>, w3c-rdfcore-wg@w3.org
Martin, I actually disagree with your points here on several levels. But I recognize I'm a minority opinion, and will leave the field clear for others to weigh in. (Incidentally, with reference to another message, I agree with your reading of M&S, but disagree with your conclusion.) #g -- At 12:09 30/06/03 -0400, Martin Duerst wrote: >Hello Graham, others, > > >At 09:42 03/06/30 +0100, Graham Klyne wrote: > >>At 08:48 29/06/03 -0400, Martin Duerst wrote: > >>>Obviously, to find out whether it is text with markup or text >>>without markup, one way is to look inside. Another way would be >>>to disallow rdf:parseType='Literal' on pure text strings. >> >>I think this possibility was mentioned in our discussion, but rejected on >>the grounds of invalidating some (much?) existing RDF, and also making >>life much harder for RDF writers. > >I did not want to suggest that this possibility is a good solution. >I just tried to show what the issue was about: As long as there >is no markup in the 'XML'-Literal, there is really no semantic >distinction to plain literals. > > >>>>In discussion, I understood the request to be for: >>>> >>>>[[ >>>><dc:title rdf:parseType='Literal'> >>>> A Midsummer Night's Dream >>>></dc:title> >>>>]] >>>> >>>>to denote a plain string literal, but >>>> >>>>[[ >>>><dc:title rdf:parseType='Literal'> >>>> <em>A Midsummer Night's Dream</em> >>>></dc:title> >>>>]] >>>> >>>>to be a completely different kind of literal denoting an XML document >>>>in some way (because of the presence of markup). >>>> >>>>(I originally read Martin's note to suggest that an XML document is >>>>itself just a string of Unicode characters, not distinguished from >>>>non-XML strings. That is a position I could support but with which >>>>others have expressed concerns.) >>> >>>Can we please make sure that we separate syntax and semantics? >> >>I wasn't aware of conflating the two. This issue seems to be entirely >>syntactic: is a sequence of Unicode characters used to represent an XML >>document (and conforming to XML syntax) syntactically distinguished from >>any other sequence of Unicode characters? (Hmmm... maybe the conflation >>here is between concrete syntax and abstract syntax -- I'm thinking of >>abstract syntax here.) > >First, we are not dealing with XML documents, we are dealing with >XML fragments. Second, of course there is a distinction between >an XML fragment that has actual markup and a string that does >contain nothing but text. But this is not what we are talking about. >The question is whether there is a difference between an 'XML fragment' >that contains nothing else but just text and a simple string that >contains nothing else but text. What I was saying was that there >may be some syntactic differences (because there may be some need >for escaping in the first case, but not in the second case), >but there is no real difference. (I'll try to avoid the word >'semantic' from now on.) > > >>As for the rest of what you say, I really don't want to get into encoding >>tricks here -- to me that is just another layer of complexity we don't >>need, and as such should be left to implementers to deal with in their own way. > >I fully agree. In the same way, if rdf:parseType='Literal' is irrelevant >if there is no markup in the literal, then we should just say so and >let implementations deal with it. > > >> That is, if the string >> "<a>Some text</a>" >>is to be distinct from the XML document encoded as: >> "<a>Some text</a>" >>then we should just say so and deal with the consequences. > >Yes, exactly. The former would turn out in RDF/XML as something >like <foo><a>Some text</a><foo>, the later would turn >out as <foo rdf:parseType='Literal'><a>Some text</a></foo>. >I think nobody in this discussion claims that these two should >be the same. What we are discussing is the cases where there is >only an XML fragment without markup. I.e. if the string > "Some text" >is to be distinct from the XML fragment encoded as: > "Some text" >then we should just say so. But very obviously, they are the same, >so we should not claim they are different. > > >>Personally, I don't think XML should have this distinguished status in >>RDF. If it's really necessary to distinguish an XML document literal in >>RDF, when why not use RDF facilities to do so? e.g. >> >> <ex:XMLDocument> >> <rdf:value rdf:parseType="Literal"><a>Some text</a></rdf:value> >> </ex:XMLDocument> >> >>as distinct from, say: >> >> <ex:StringData> >> <rdf:value rdf:parseType="Literal"><a>Some text</a></rdf:value> >> </ex:StringData> > >First, this would be against RDF Model and Syntax. Second, >as Jeremy pointed out, it would be against all the other >decisions RDF Core has taken up to last call. Third, it >would create even more different representations for what's >exactly the same thing. There would be nothing but syntax >differences between the following two: > ><ex:XMLDocument> > <rdf:value rdf:parseType='Literal'>Some text</rdf:value> ></ex:XMLDocument> > ><ex:StringData> > <rdf:value rdf:parseType='Literal'>Some text</rdf:value> ></ex:StringData> > >And fourth, the second one could easily be seen as yet another >way to do CDATA Sections, see the parallel between > ><ex:StringData> > <rdf:value rdf:parseType="Literal"><a>Some text</a></rdf:value> ></ex:StringData> > >and <![CDATA[<a>Some text</a>]]>. > >As I18N has worked hard to keep CDATA Sections out of the infoset, >we wouldn't be pleased about this either :-(. > > >>>For RDF to say that XML is *treated* as a string of Unicode characters >>>is perfectly okay. For RDF to say that XML *is* nothing but a string >>>of Unicode characters is a bad idea. >> >>I don't think the issue here is that RDF is or is not trying to say >>anything about what an XML document may be, but rather to decide whether >>or not RDF embodies special treatment of literals that happen to be XML >>documents. My position being: why shouldn't RDF adopt the same >>techniques for talking about XML documents that it uses for talking about >>any other kind of thing in the universe of discourse? > >So to play devil's advocate, why allow strings? Why not model them the >RDF way as a sequence of characters? > >Seriously, XML fragments got into RDF because they are a natural >extension of plain literals. The Web has brought us markup, and >it has proven to be useful. Why go back to plain text if we don't >have to? And XML fragments cover important internationalization needs, >such as multilingual strings, ruby, bidirectionality, and so on. > > >>>What is important is that the same semantic things, i.e.: >>>- Text (without markup or language information) >>>- Text with language information (but no markup) >>>- Text with markup (but no language info) >>>- Text with markup and language information >>>are in each of the above cases recognized as being the same rather >>>than being split up in a number of different things based on some >>>representational details. On top of that, recognizing the continuity >>>between the four variants above and making it easy to deal with >>>this continuity would be a definite plus. >> >>Which all seems to be saying that there are different flavours of text >>for which consistent handling is required. Which seems reasonable to >>me. But what is confusing me is the suggestion that XML is, on one hand, >>just another flavour of text, yet is also something completely >>different. I can't make coherent sense of this. > >Marked-up text is just another flavor of text. Of course text with >markup and text without markup is not exactly the same, otherwise >we wouldn't need markup in the first place. >Also, an XML fragment that is just only text is just that, just only text. >Anything that is just text can be an xml fragment. >XML is text + markup. An XML document has to have markup (the root element). >An XML fragment does not have to have markup. So an XML fragment can >be just text. > > >>In its way, XML *is* a "representational detail", which happens to be >>used to represent many more things than just text. I'm not sure what you >>mean by continuity in this case. > >'many more things than just text' may have two different senses. >In one sense, it refers to the fact that XML is often used for >representing (structured) data. In that case, it is probably better >to 'convert' that XML to RDF, either explicitly or by using the >fact that many XML formats, sometimes with a little bit of help >such as parseType='Resource', can be interpreted as RDF. This is >not really the topic of this discussion. >The other sense may refer to the fact that markup adds value to >text. It indeed does, but only if actually present. > > >Let me try another approach. RDF says that > ><foo rdf:parseType="Resource"> > <rdf:type>Dog</rdf:type> > <name>Fifi</name> ></foo> > >and > ><foo> > <Dog> > <name>Fifi</name> > </Dog> ></foo> > >(modulo my syntax error) are the same, namely a thing foo with type >Dog and name Fifi. Why would it then be so difficult for RDF to say that > > <bar rdf:parseType="Literal">Fifi</bar> >and > <bar>Fifi</bar> >are also the same? > > >>This message is in danger of getting longer and longer... the more I >>think about what you seem to be asking for, the less I can see a coherent >>view of it. So, in summary, I think we have two choices: >>(a) XML has no distinguished status in the RDF abstract syntax. (I like >>this, others don't) >>(b) XML does have distinguished status, and we accept the consequences, >>warts and all. > >What's warty about saying that a text string without markup is the >same as a text string without markup? > > >Regards, Martin. ------------------- Graham Klyne <GK@NineByNine.org> PGP: 0FAA 69FF C083 000B A2E9 A131 01B9 1C7A DBCA CB5E
Received on Monday, 30 June 2003 16:48:26 UTC