Re: Summary of strings, markup, and language tagging in RDF (resend)

----- Original Message -----
From: "ext Brian McBride" <bwm@hplb.hpl.hp.com>
To: "Martin Duerst" <duerst@w3.org>
Cc: "Graham Klyne" <gk@ninebynine.org>; "Dan Connolly" <connolly@w3.org>;
<w3c-i18n-ig@w3.org>; "Ralph R. Swick" <swick@w3.org>;
<misha.wolf@reuters.com>; "Tim Berners-Lee" <timbl@w3.org>; "rdf core"
<w3c-rdfcore-wg@w3.org>
Sent: 02 July, 2003 12:30
Subject: Re: Summary of strings, markup, and language tagging in RDF
(resend)


> RDFCore considered a number of options, but the relevant ones were that
> the object of the statement should either be one of the following xml
> fragments:

Martin, et. al,

In the hopes of offering something useful to this discussion, I'll
summarize my own thoughts on this issue, which I'll do with the
caveat (or possibly benefit ;-) of just having a month long vacation 
from all this stuff:

It may be useful to consider the following viewpoint regarding the
significance of xml:lang in the processing of  RDF/XML.

RDF/XML is an intermediate tool for the interchange of
knowledge between disparate systems, not for the direct
modelling and consumption of knowledge. RDF/XML 
could be seen as providing instructions of a sort for how to 
construct an RDF graph. 

It is the resultant graph that matters to RDF applications, 
not the RDF/XML serialization.

XML is not the "model of consumption" for RDF knowledge, 
and as such, many of the constraints and conventions relevant for 
the consumption of XML documents where the XML encoding 
*is* the model of consumption simply do not apply to RDF
applications. The model of consumption for RDF knowledge
is the RDF graph, not the RDF/XML serialization.

Alot of the discussion that I've reviewed since returning from
vacation seems to center around a presumed requirement that
RDF applications must interpret RDF/XML according to
the same conventions as XML applications, and that if there
is a consistent means in XML to deal with language qualification
of both plain text content and marked up content, then RDF 
applications must also follow such conventions.

This is simply not the case. RDF applications
do not interpret RDF/XML at all. Only RDF parsers do so.

RDF applications operate on an RDF graph, not on XML.

It is clearly specified what parts of an RDF/XML serialization
are relevant to, and are reflected in an RDF graph. If some
parts of the XML serialization, such as xml:lang scoping, or
qualified names, or namespace declarations, or such are
not reflected in the RDF graph, that is not "wrong".

It is not "wrong" that RDF users would qualify
their plain literals for language in one way and  yet qualify 
XML literals for language in  a different way. 
The role of RDF/XML is different from the
role of most other XML serializations where language is
relevant

There are also practical reasons why XML literals should,
by default, not be infected with the language scope defined
elsewhere in an RDF/XML instance -- primarily because RDF
is an ideal tool for modular content management which also
has powerful mechanisms for dynamic syndication of knowledge
which may be interchanged (via RDF/XML) intensively during
processing and the risk for inadvertantly and erroneously infecting 
XML literals during complex combinations of syndication, interchange
serialization, and reserialization can result in huge headaches for 
both users and developers.

I would in fact go so far as to assert that the infection of 
xml:lang values on plain literals is nearly as bad, though
the risk for inadvertant infection is far less in their case;
yet since the treatment of xml:lang attributes with regards
to plain literals is fairly clear in M&S in contrast to XML
literals, it would be too great an affront to the charter
to remove the affect of xml:lang entirely from the mapping
of RDF/XML to the RDF graph. So it remains. Though
it should be noted that there are other (and IMO better)
ways to model language scoping than xml:lang tags. In
the future, I wouldn't be surprised to see the use of xml:lang
in RDF/XML become deprecated entirely, at least in practice 
if not in future specs (but I digress...)

The WG, after *alot* of discussion, decided to model
XML literals using an RDF datatype.

There are many solid reasons for modelling XML literals
as typed literals. Two of the most prevalent are (1) it allows
us to define the relation between lexical forms of expression
with canonical forms of comparison in an elegant and
precise manner, and (2) it provides a solid foundation for
the subsequent support of XML Schema complex types
as subclasses of rdf:XMLLiteral.

The original solution allowed for XML literals to be special
and have an associate datatype. But there were dragons...
(particularly green and red ones, though also a few blue
ones here and there... or was that the tequila...)

The WG then decided, based on yet more discussion,
and clear community feedback, to treat XML literals as
a normal datatype, without lang tag, killing a whole lot
of dragons in the MT and elsewhere (ahhh... what a
barbaque that was...).

The cost for this decision was that content producers 
would not be able to have global language definitions for
all XML literals within an RDF/XML instance. Well, 
a similar situation exists for datatyping as well, where
one cannot have global datatyping definitions for typed
literals. That seems to be now then a consistent characteristic
of RDF/XML. That one must be explicit about what one
means to say. All in all, that's probably a very good thing.

Also, consider again my comments above about the 
inherent problems in global infection of language on XML
literals in the context of processing involving a high
degree of syndication and interchange between 
heterogenous agents (the expected norm on the SW). 
In that regard, the cost for this decision is actually
a benefit.

And, harking back to discussions in Cannes, there is not
really any technical difference between ignoring lang tags
for typed literals than for ignoring them for XML literals.
After all, XML Schema defines things such as names, tokens,
etc. all of which usually contain linguistic data.

In the end, it's all about tradeoffs. The benefits derived
for modelling XML literals as typed literals, and the benefits
of having a consistent datatyping model without any
special cases, and the benefits for safe modular content 
management in a distributed SW environment far outweight 
IMO any  loss of convenience or inconsistency from the
general XML world from xml:lang attributes not applying
to XML literals.

While I am myself a vocal proponent for consistency and
unity between standards, and appreciate the fact that
the RDF treatment of xml:lang with regards to XML literal
content is not fully in sync with the XML world, I think that
the present design is more than sufficiently grounded in the
real-world (and unique) needs of the RDF community to 
justify such a departure.

And content providers are still able to express
language qualification of all XML content in XML literals.
We are not preventing the rich and powerful use of 
language tagging in XML content. We're simply drawing
the line about where it can be done a bit more tightly than
is done for more traditional XML serializations, but for
very good reasons.

In these matters, the WG has faced many "least of all evils"
decisions, and no solution will be perfect. Cest la vie.

I hope that my comments above will serve to help all those
who have been dissatisfied with the present solution at least
come to a point where they can live with it.

We really do need to get this stuff done, and what we have 
now is a pretty damn good solution, all things considered.

Cheers,

Patrick

--

Patrick Stickler
Nokia, Finland
patrick.stickler@nokia.com
 

Received on Thursday, 3 July 2003 06:06:00 UTC