Re: Summary of strings, markup, and language tagging in RDF (resend) from Martin Duerst on 2003-07-03 (w3c-rdfcore-wg@w3.org from July 2003)

From: Martin Duerst <duerst@w3.org>
Date: Thu, 03 Jul 2003 10:24:32 -0400
To: "Patrick Stickler" <patrick.stickler@nokia.com>, "ext Brian McBride" <bwm@hplb.hpl.hp.com>
Cc: "Graham Klyne" <gk@ninebynine.org>, "Dan Connolly" <connolly@w3.org>, <w3c-i18n-ig@w3.org>, "Ralph R. Swick" <swick@w3.org>, <misha.wolf@reuters.com>, "Tim Berners-Lee" <timbl@w3.org>, "rdf core" <w3c-rdfcore-wg@w3.org>
Message-Id: <4.2.0.58.J.20030703093904.04ee3660@localhost>
Hello Patrick,

At 12:29 03/07/03 +0300, Patrick Stickler wrote:

> > RDFCore considered a number of options, but the relevant ones were that
> > the object of the statement should either be one of the following xml
> > fragments:

>It may be useful to consider the following viewpoint regarding the
>significance of xml:lang in the processing of  RDF/XML.
>
>RDF/XML is an intermediate tool for the interchange of
>knowledge between disparate systems, not for the direct
>modelling and consumption of knowledge. RDF/XML
>could be seen as providing instructions of a sort for how to
>construct an RDF graph.
>
>It is the resultant graph that matters to RDF applications,
>not the RDF/XML serialization.

Yes. Things go through a parser and into the graph.
That does not mean that we have to make a mess
when that happens.


>Alot of the discussion that I've reviewed since returning from
>vacation seems to center around a presumed requirement that
>RDF applications must interpret RDF/XML according to
>the same conventions as XML applications, and that if there
>is a consistent means in XML to deal with language qualification
>of both plain text content and marked up content, then RDF
>applications must also follow such conventions.

It would at least be nice if RDF/XML would interpret
the pieces that RDF treats as XML according to XML conventions.


>It is clearly specified what parts of an RDF/XML serialization
>are relevant to, and are reflected in an RDF graph. If some
>parts of the XML serialization, such as xml:lang scoping, or
>qualified names, or namespace declarations, or such are
>not reflected in the RDF graph, that is not "wrong".

It is not wrong if you just look at mechanics. It is wrong
if you look at the actual requirements.


>It is not "wrong" that RDF users would qualify
>their plain literals for language in one way and  yet qualify
>XML literals for language in  a different way.

If it's not wrong that RDF users qualify their plain
literals the XML way, yet don't qualify their XML
literals the XML way, then it definitely still looks
extremely weird.

Also, please note that unnecessary variability is a bad idea,
in particular in areas such as i18n, where we have to try
and conserve the user's efforts for those many things where
there is actual variability in languages and cultures.


>There are also practical reasons why XML literals should,
>by default, not be infected with the language scope defined
>elsewhere in an RDF/XML instance -- primarily because RDF
>is an ideal tool for modular content management which also
>has powerful mechanisms for dynamic syndication of knowledge
>which may be interchanged (via RDF/XML) intensively during
>processing and the risk for inadvertantly and erroneously infecting
>XML literals during complex combinations of syndication, interchange
>serialization, and reserialization can result in huge headaches for
>both users and developers.

The actual solution in that case is very easy: xml:lang="".
And if you need it, you will need it for plain literals, too.


>I would in fact go so far as to assert that the infection of
>xml:lang values on plain literals is nearly as bad, though
>the risk for inadvertant infection is far less in their case;

Can you explain why you think there is less of a risk?


>yet since the treatment of xml:lang attributes with regards
>to plain literals is fairly clear in M&S in contrast to XML
>literals,

Sorry, but M&S does not make any distinction whatsoever
regarding xml:lang. It does not even make a distinction
between plain literals and XML literals (and despite
claims to the contrary, there is actually no absolute
need to make such a distinction, there are other ways
to deal with escaping problems).


>it would be too great an affront to the charter
>to remove the affect of xml:lang entirely from the mapping
>of RDF/XML to the RDF graph. So it remains.

So following the charter is some cases, but not in others,
is okay? Please note that the charter would have allowed
the RDF Core WG to ignore the M&S treatment of xml:lang,
on the grounds that this is one option that M&S gives to
applications. But we would then have asked you to find
another way to represent language information, and I'm
very sure that that would have included taking the information
from xml:lang in the case of RDF/XML.


>Though
>it should be noted that there are other (and IMO better)
>ways to model language scoping than xml:lang tags. In
>the future, I wouldn't be surprised to see the use of xml:lang
>in RDF/XML become deprecated entirely, at least in practice
>if not in future specs (but I digress...)

If the RDF Core WG had approached the I18N WG in Cannes
with a different way of handling language tagging in the
RDF graph (maybe allowing literals as resources, or some
other idea), then we would definitely have considered this.
But I don't think we would have agreed to just throwing
away xml:lang information.


>The WG, after *alot* of discussion, decided to model
>XML literals using an RDF datatype.

We are not objecting to this decision as far as it does
not create problems. For example, the way this decision
was implemented in the last call drafts was mostly fine
with us.


>There are many solid reasons for modelling XML literals
>as typed literals. Two of the most prevalent are (1) it allows
>us to define the relation between lexical forms of expression
>with canonical forms of comparison in an elegant and
>precise manner, and (2) it provides a solid foundation for
>the subsequent support of XML Schema complex types
>as subclasses of rdf:XMLLiteral.

This is definitely a very nice idea. But see my other mail.


>The original solution allowed for XML literals to be special
>and have an associate datatype. But there were dragons...
>(particularly green and red ones, though also a few blue
>ones here and there... or was that the tequila...)
>
>The WG then decided, based on yet more discussion,
>and clear community feedback, to treat XML literals as
>a normal datatype, without lang tag, killing a whole lot
>of dragons in the MT and elsewhere (ahhh... what a
>barbaque that was...).
>
>The cost for this decision was that content producers
>would not be able to have global language definitions for
>all XML literals within an RDF/XML instance. Well,
>a similar situation exists for datatyping as well, where
>one cannot have global datatyping definitions for typed
>literals. That seems to be now then a consistent characteristic
>of RDF/XML. That one must be explicit about what one
>means to say. All in all, that's probably a very good thing.

Sorry, but there is an important difference. The datatype
is in the element that carries the name of the property.
The xml:lang has nowhere to go.


>And, harking back to discussions in Cannes, there is not
>really any technical difference between ignoring lang tags
>for typed literals than for ignoring them for XML literals.
>After all, XML Schema defines things such as names, tokens,
>etc. all of which usually contain linguistic data.

'Names' and 'tokens' are in many cases words taken from
natural languages, but their use is clearly as simple
tokens.


>And content providers are still able to express
>language qualification of all XML content in XML literals.

This is wrong. They are not able to express
the language qualification of xml fragments (mixed
content).

If I have to change
   <foo:prop rdf:parseType="Literal"
      >Hello <html:em>World</html:em>!</foo:prop>
to
   <foo:prop rdf:parseType="Literal"
      ><html:span xml:lang="en">Hello <html:em
       >World</html:em>!</html:span></foo:prop>
then I'm not language-qualifying the XML fragment
   Hello <html:em>World</html:em>!

but I'm changing that fragment. Here goes your "modular content
management with powerful mechanisms for dynamic syndication of knowledge".
Do you prefer to add xml:lang="" when combining stuff from different
sources (which is constant, and external to the actual literal),
or do you prefer to have to hand-code something like
<html:span> separately for each application? What is the bigger headache?


Regards,    Martin.
Received on Thursday, 3 July 2003 11:23:42 UTC