Re: Summary of strings, markup, and language tagging in RDF (resend) from Martin Duerst on 2003-07-02 (w3c-rdfcore-wg@w3.org from July 2003)

From: Martin Duerst <duerst@w3.org>
Date: Wed, 02 Jul 2003 11:50:31 -0400
To: Brian McBride <bwm@hplb.hpl.hp.com>
Cc: Graham Klyne <gk@ninebynine.org>, Dan Connolly <connolly@w3.org>, w3c-i18n-ig@w3.org, "Ralph R. Swick" <swick@w3.org>, misha.wolf@reuters.com, Tim Berners-Lee <timbl@w3.org>, rdf core <w3c-rdfcore-wg@w3.org>
Message-Id: <4.2.0.58.J.20030702105147.03550810@localhost>
Hello Brian,

Many thanks for your efforts to move the discussion forward.

At 10:30 03/07/02 +0100, Brian McBride wrote:

>Noted.  However, turning to the higher priority issue b), I suggest we
>lay out the issue (I've taken a stab) and that we analyse the pro's and
>con's of the WG's decision.  Largely, I suggest we do that with test
>cases and use cases.
>
>To RDFCore I say, Martin and his colleagues on I18N are experts in these
>matters.  I strongly encourage listening their *arguments* with an open
>mind.

I will try to do so below, to help the RDF Core WG to understand our
concerns. However, just for the record, I would like to point out,
as I have explained in previous mail, that I think that the solution
that the RDF Core WG has been favoring for the past few weeks is
in conflict with RDF M&S, and that as far as I know, the RDF Core
WG was not chartered to do such things.


>Given as an example:
>
><rdf:Description xml:lang="en">
>   <foo:prop parseType="Literal">
>     <em>chat</em>
>   </foo:prop>
></rdf:Description>
>
><rdf:Description xml:lang="fr">
>   <foo:prop parseType="Literal">
>     <em>chat</em>
>   </foo:prop>
></rdf:Description>
>
>Your contention, I think, is that users familiar with XML will be
>surprised that these two statements have the same object; that the outer
>xml:lang does not affect the object of the statements.

Well, that in and by itself is definitely a problem, but
it is not the biggest problem. Take the following example:

<rdf:Description xml:lang="en">
   <foo:prop parseType="Literal">
     <em>chat</em>
   </foo:prop>
   <foo:prop>chat</foo:prop>
</rdf:Description>

<rdf:Description xml:lang="fr">
   <foo:prop parseType="Literal">
     <em>chat</em>
   </foo:prop>
   <foo:prop>chat</foo:prop>
</rdf:Description>

How do you suggest we will ever be able to explain to users
that one of the literals is the same in both cases, because
xml:lang doesn't count for the XML Literal, but is different
for the other case, because in that case, xml:lang counts?
That just for the thing that the spec calls an XML Literal,
the language inheritance rules of XML 1.0 are put out of force?

The main reason we (i18n) got into this business with XML
literals in the first place was that there are quite a few
cases where some languages (or even most languages) are
satisfied with simple text strings, but in some cases,
something more would be needed. The best example for this
are book titles. Some book titles may need some markup
because they are multilingual, they may need bidirectional
markup, they may need ruby markup, they may need MathML
markup, and so on.

Now assume we have:

<rdf:Description xml:lang="en">
   <foo:title>RDF for dummies</foo:title>
</rdf:Description>

and another one,

<rdf:Description xml:lang="en">
   <foo:title parseType="Literal"
     ><html:span xml:lang="de">Deutsch</html:span> for dummies</foo:title>
</rdf:Description>

Now this requires the addition of the <span> for the multilingual
markup, and parseType="Literal" to say that there is markup. But
why should these changes suddenly remove the fact that 'for dummies'
is English? Why should it be necessary to restate information
that is already there and was already used?

In discussion, some people have said that once we have markup,
we can add as much markup as we want, and this should solve
the problem. But why should it be necessary to write:

<rdf:Description xml:lang="en">
   <foo:title parseType="Literal"
     ><html:span xml:lang="en"><html:span xml:lang="de"
       >Deutsch</html:span> for dummies</html:span></foo:title>
</rdf:Description>

This solution would have various problems, including:
- Not all the things in XML literals will be HTML, or will otherwise
   have a somewhat 'neutral' element such as <span> at their disposal.
   And <span> is not usable in all contexts, e.g. <span><p></p></span>
   is illegal in HTML. [By the way, the I18N WG a few years ago tentatively
   asked the XML (whatever that was at the time) WG about defining a
   'totally neutral' element <xml:span>, but that never got anywhere.]
- Applications creating RDF out of other data will need complex logic
   to do the right thing. Assume a converter between some traditional XML
   format and RDF/XML. Assume that the XML has a <title> element for
   booktitles, which is mapped to a title property. Adding parseType
   can be handled rather generically (even with a default attribute
   in an internal DTD subset). Transferring the language information
   from the outside to the inside of the property will be much more
   application-specific and complicated. And the reverse transformation
   won't be possible without knowing the specifics of the first
   transformation.
- Adding an additional element is a change of the markup, it effectively
   creates something different (see my answers to Pat for details)
- It can get overly voluminous even if (almost) everything is in one
   language.
- It makes adding language information very tedious even in the
   simplest cases. It is definitely not so that adding language
   information is the first thing that people creating data are
   thinking about. If it gets too complicated, they will just
   ignore it (might work out somehow) or they may do it in
   a haphazard fashion (very bad).


>Martin: is that the sum of your objection?  Can you provide better
>examples that clearly indicate the force of your argument?

I hope I have done so above.



>RDFCore considered a number of options, but the relevant ones were that
>the object of the statement should either be one of the following xml
>fragments:
>
><wrapper xml:lang="en">
>   <em>chat</em>
></wrapper>
>
>or
>
><em>chat</em>
>
>The former carries the enclosing lang tag, the latter does not.  The WG
>preferred the latter, though, as I recall, this was largely an aesthetic
>judgement.

The wrapper is one solution to carry the language information.
Of course you can choose whatever solution fits you best, but
you should not forget that there are other solutions. One of
them would be to handle XML Literals in the same way as plain
literals, carrying the language information separately. If that
can be done for plain literals, why can it not be done for
XML Literals?


>I'm going to make a trawl of the email archives this morning and see if
>I can lay out the various pro's and con's, but I'd sure be happy if
>someone beat me to it.

One of the cons that some people have mentioned is that
having language inherit may make it more difficult to
integrate data from different sources. This was the case
when there was no way to 'reset' xml:lang, but the XML Core WG
has now defined that xml:lang="" serves to do that. Also, of
course, this argument is spurious, because the same problem
would have applied to plain literals.


Regards,    Martin.
Received on Wednesday, 2 July 2003 13:55:29 UTC