- From: Martin Duerst <duerst@w3.org>
- Date: Wed, 02 Jul 2003 11:50:31 -0400
- To: Brian McBride <bwm@hplb.hpl.hp.com>
- Cc: Graham Klyne <gk@ninebynine.org>, Dan Connolly <connolly@w3.org>, w3c-i18n-ig@w3.org, "Ralph R. Swick" <swick@w3.org>, misha.wolf@reuters.com, Tim Berners-Lee <timbl@w3.org>, rdf core <w3c-rdfcore-wg@w3.org>
Hello Brian, Many thanks for your efforts to move the discussion forward. At 10:30 03/07/02 +0100, Brian McBride wrote: >Noted. However, turning to the higher priority issue b), I suggest we >lay out the issue (I've taken a stab) and that we analyse the pro's and >con's of the WG's decision. Largely, I suggest we do that with test >cases and use cases. > >To RDFCore I say, Martin and his colleagues on I18N are experts in these >matters. I strongly encourage listening their *arguments* with an open >mind. I will try to do so below, to help the RDF Core WG to understand our concerns. However, just for the record, I would like to point out, as I have explained in previous mail, that I think that the solution that the RDF Core WG has been favoring for the past few weeks is in conflict with RDF M&S, and that as far as I know, the RDF Core WG was not chartered to do such things. >Given as an example: > ><rdf:Description xml:lang="en"> > <foo:prop parseType="Literal"> > <em>chat</em> > </foo:prop> ></rdf:Description> > ><rdf:Description xml:lang="fr"> > <foo:prop parseType="Literal"> > <em>chat</em> > </foo:prop> ></rdf:Description> > >Your contention, I think, is that users familiar with XML will be >surprised that these two statements have the same object; that the outer >xml:lang does not affect the object of the statements. Well, that in and by itself is definitely a problem, but it is not the biggest problem. Take the following example: <rdf:Description xml:lang="en"> <foo:prop parseType="Literal"> <em>chat</em> </foo:prop> <foo:prop>chat</foo:prop> </rdf:Description> <rdf:Description xml:lang="fr"> <foo:prop parseType="Literal"> <em>chat</em> </foo:prop> <foo:prop>chat</foo:prop> </rdf:Description> How do you suggest we will ever be able to explain to users that one of the literals is the same in both cases, because xml:lang doesn't count for the XML Literal, but is different for the other case, because in that case, xml:lang counts? That just for the thing that the spec calls an XML Literal, the language inheritance rules of XML 1.0 are put out of force? The main reason we (i18n) got into this business with XML literals in the first place was that there are quite a few cases where some languages (or even most languages) are satisfied with simple text strings, but in some cases, something more would be needed. The best example for this are book titles. Some book titles may need some markup because they are multilingual, they may need bidirectional markup, they may need ruby markup, they may need MathML markup, and so on. Now assume we have: <rdf:Description xml:lang="en"> <foo:title>RDF for dummies</foo:title> </rdf:Description> and another one, <rdf:Description xml:lang="en"> <foo:title parseType="Literal" ><html:span xml:lang="de">Deutsch</html:span> for dummies</foo:title> </rdf:Description> Now this requires the addition of the <span> for the multilingual markup, and parseType="Literal" to say that there is markup. But why should these changes suddenly remove the fact that 'for dummies' is English? Why should it be necessary to restate information that is already there and was already used? In discussion, some people have said that once we have markup, we can add as much markup as we want, and this should solve the problem. But why should it be necessary to write: <rdf:Description xml:lang="en"> <foo:title parseType="Literal" ><html:span xml:lang="en"><html:span xml:lang="de" >Deutsch</html:span> for dummies</html:span></foo:title> </rdf:Description> This solution would have various problems, including: - Not all the things in XML literals will be HTML, or will otherwise have a somewhat 'neutral' element such as <span> at their disposal. And <span> is not usable in all contexts, e.g. <span><p></p></span> is illegal in HTML. [By the way, the I18N WG a few years ago tentatively asked the XML (whatever that was at the time) WG about defining a 'totally neutral' element <xml:span>, but that never got anywhere.] - Applications creating RDF out of other data will need complex logic to do the right thing. Assume a converter between some traditional XML format and RDF/XML. Assume that the XML has a <title> element for booktitles, which is mapped to a title property. Adding parseType can be handled rather generically (even with a default attribute in an internal DTD subset). Transferring the language information from the outside to the inside of the property will be much more application-specific and complicated. And the reverse transformation won't be possible without knowing the specifics of the first transformation. - Adding an additional element is a change of the markup, it effectively creates something different (see my answers to Pat for details) - It can get overly voluminous even if (almost) everything is in one language. - It makes adding language information very tedious even in the simplest cases. It is definitely not so that adding language information is the first thing that people creating data are thinking about. If it gets too complicated, they will just ignore it (might work out somehow) or they may do it in a haphazard fashion (very bad). >Martin: is that the sum of your objection? Can you provide better >examples that clearly indicate the force of your argument? I hope I have done so above. >RDFCore considered a number of options, but the relevant ones were that >the object of the statement should either be one of the following xml >fragments: > ><wrapper xml:lang="en"> > <em>chat</em> ></wrapper> > >or > ><em>chat</em> > >The former carries the enclosing lang tag, the latter does not. The WG >preferred the latter, though, as I recall, this was largely an aesthetic >judgement. The wrapper is one solution to carry the language information. Of course you can choose whatever solution fits you best, but you should not forget that there are other solutions. One of them would be to handle XML Literals in the same way as plain literals, carrying the language information separately. If that can be done for plain literals, why can it not be done for XML Literals? >I'm going to make a trawl of the email archives this morning and see if >I can lay out the various pro's and con's, but I'd sure be happy if >someone beat me to it. One of the cons that some people have mentioned is that having language inherit may make it more difficult to integrate data from different sources. This was the case when there was no way to 'reset' xml:lang, but the XML Core WG has now defined that xml:lang="" serves to do that. Also, of course, this argument is spurious, because the same problem would have applied to plain literals. Regards, Martin.
Received on Wednesday, 2 July 2003 13:55:29 UTC