Re: Related comment from XML Schema

----- Original Message ----- 
From: "ext Martin Duerst" <duerst@w3.org>
To: <w3c-rdfcore-wg@w3.org>
Cc: <w3c-i18n-ig@w3.org>; <msm@w3.org>
Sent: 10 July, 2003 23:58
Subject: Related comment from XML Schema


> 
> Here is something related I have found: While the I18N WG
> unfortunately didn't find the bandwith for a serious review
> of the various RDF last call documents, the XML Schema WG
> pointed out very clearly that markup was an important
> component for many natural language 'utterances'
> (and pointed to the I18N WG for more expertise).
> 
> If anything, this makes it more unclear to me why the RDF
> Core WG decided to remove language information from XML
> literals.

Martin,

Several members of the WG have tried to make the following
point clear to you, yet you seem to keep missing it, insofar
as your posts suggest, at least to me.

The RDF Core WG has not "removed language information
from XML literals", as if we don't care about the
ability to associate language with XML encoded content,
but rather have made a very necessary clarification of the 
boundary between an encapsulating markup language and 
any encapsulated markup language.

An XML literal within an RDF/XML serialization is NOT the
same as e.g. a MathML fragment within an XHTML instance.
And this is the crux of the problem, and the point of tension.

XML markup languages are free to express their own semantics,
including what information embodied in a given infoset is relevant
to the application for which the markup language was designed.

The XML specs specify what information is available, and
specifies how that information is available in a consistent
fashion, but they do not mandate what information must be 
used by a given application.

What we have done is no different than some XML model
defining  a FIXED attribute value for xml:lang as '' for a
particular element type which is, by design, intended to be
an encapsulation of data that has no direct interpretation in
terms of the markup language used to encapsulate it.

The fact that this cannot be done explicitly for RDF/XML in
a DTD or XML Schema, since the RDF serialization model is 
more of a set of architectural forms than an explicit schema, is 
beside the point. What the present treatment does is essentially
to define a fixed attribute value xml:lang='' for every property
element having an attribute+value rdf:parseType="Literal".

If I have an XML DTD for a data serialization model that 
includes the definition

<!ELEMENT literal #ANY>
<!ATTLIST   literal  xml:lang FIXED ''>

Then that would be a sound engineering decision and one that,
quite frankly, is entirely outside the scope of  I18N since it
concerns a data markup model and not a textual markkup
model  (even if it might happen to encapsulate textual, natural 
language content).

RDF/XML is a data serialization model. Period. Enf of story.

It is not a textual content markup model. 

XML fragments embedded in RDF/XML are encapsulated in
that data serialization model, and as such, the semantics of
XML that apply to XML fragments in textual markup models
do not apply to XML fragments encapsulated in a data markup
model.

You claim you understand the G-view, but you appear to
believe that RDF/XML is *wrong* for taking that view. If you
do in fact understand the G-view, then you would accept that
contextual properties such as xml:lang should not apply to
literals and that no only are the decisions made to date by
the WG valid per the G-view but that there exist bugs in
the legacy-inherited treatment of plain literals that conflict
with that view.

It seems your real criticism to the RDF Core WG is that it
has overstepped its charter by taking the G-view over the
X-view. 

Well, fair enough, though I (and I'm sure most if not of the
rest of the WG) would not consider that particular decision
open for debate.

And in fact, we need not even go there, since
either of the two latest proposals essentially facilitate the use of RDF
according to either the X or G view, and it is clear in the graph
which view has been taken, since plain literals and typed literals
are distinct. Reconcilliation between the semantics (including
tests of equivalence) of plain literals and that of typed literals
is left as a later exercise for those who are concerned with such
things.

Either proposal allows folks who have no clue whatsoever about 
data encapsulation issues relating to XML serialization models to 
go along merrily doing the wrong thing, just as they did with M&S, 
but those of us who do know what we're doing, will do the right 
thing with typed literals, and try our best to deal with any scruffy 
graphs containing old-style M&S literals.

Maybe that's the best way forward. 

Patrick

Received on Friday, 11 July 2003 06:03:26 UTC