Re: first pass parseType="Literal" text for primer from Martin Duerst on 2003-07-27 (w3c-rdfcore-wg@w3.org from July 2003)

From: Martin Duerst <duerst@w3.org>
Date: Sun, 27 Jul 2003 12:46:46 -0400
To: Graham Klyne <gk@ninebynine.org>, rdf core <w3c-rdfcore-wg@w3.org>, i18n <w3c-i18n-ig@w3.org>
Message-Id: <4.2.0.58.J.20030725175324.057f30c0@localhost>
Hello Graham,

At 11:35 03/07/24 +0100, Graham Klyne wrote:

>At 14:39 23/07/03 -0400, Martin Duerst wrote:

>>Many people designing 'RDF Applications' will start out with e.g.
>><Title> being a plain literal. Later, they may discover that there
>>are cases where they would need markup. But with the current design,
>>they would have to go back and change all the <Title>s from plain
>>literals to XML Literals. The way RDF is supposed to work, this
>>will just not work out. So the needs for micro-markup, in particular
>>for internationalization, will very sadly just be ignored if we
>>don't change the design.
>
>While I can appreciate that having a seamless path from simple text to 
>marked-up text would be nice, I feel that this particular horse has 
>already bolted.

What do you mean?


>   I think there's a lot of RDF "out there" that is based on simple plain 
> literals, which would be damaged if some plain text were to be 
> reinterpreted as markup.

Sorry, there is a very serious difference between plain text being
reinterpreted as markup (which is a bad thing), and literals with
markup being added alongside literals without markup in the same
application.

What we don't want to happen is the value of

<title>Why the &lt;FONT&gt; Tag is Bad</title>

to suddenly be interpreted as XML (and therefore, in this case,
become non-well-formed). What we want is to be able to add
another title, e.g.

<title rdf:parseType='Literal'>The <strong>Best</strong>
Cruise Vacations for Dummies</title>

without having to go back and change all the previous titles to e.g.

<title rdf:parseType='Literal'>Why the &lt;FONT&gt; Tag is Bad</title>

thereby creating all kinds of confusion because RDF applications
have been told that
    <title>Why the &lt;FONT&gt; Tag is Bad</title>
and
    <title rdf:parseType='Literal'>Why the &lt;FONT&gt; Tag is Bad</title>
are two different things.


>I'll also note that one RDF-based design, FOAF, has been used with 
>Japanese names based simply on the current form of plain literals:
>   http://kanzaki.com/docs/sw/foaf.html

I have looked at that previously. The association between names
and natural language is quite complex. For example, my last name
is clearly German, but my first name is very international.

Also, pronunciations (readings) are very important for Japanese names,
but are usually given in separate fields (e.g. separate properties)
rather than e.g. with Ruby Annotation markup. Such separate
properties are currently still missing, but I had some discussions
about them recently with Dan Brickley.

So this is not really a good example to show the need for
inline markup.


>My browser can't handle the extra characters,

What browser do you have? I'm rather sure that it can handle
these characters (otherwise you have a really old and crappy
browser and should upgrade asap). The only thing you may need
to do is to install some fonts. Please contact me privately
for details.


>so I cannot comment how well it works.  There is some discussion of this at:
>   http://rdfweb.org/pipermail/rdfweb-dev/2003-June/011202.html
>
>But the point I really wanted to make is that I think, in RDF, the 
>migration need not be so painful.  Even if plain literals cannot handle 
>all the markup, standard RDF inference tools should be able to recognize 
>the simple form and infer a more flexible form as and when such is needed.

So can you tell me how a tool would infer that
the XML Literal "<dummy xml:lang='en'>Moby Dick</dummy>"
and the plain literal "Moby Dick"@en are the same, if
'dummy' can be anything? It would be much easier to do
this if the XML Literal was "Moby Dick"@en^^XML
(or whatever the actual notation would be).


>I grant that's not a good basis for designing all new applications, and 
>recommendations of the kind Brian has suggested [1] should encourage the 
>use of an appropriate form other than simple plain-text literals, such as 
>using parseType=Literal, for any value that may reasonably need to be 
>multilingual text.

I think recommendations may help. But we need something better than that.
If it's necessary for people to start out with XML Literals if they want
to have a chance to at any time in the future use markup, then that's wrong.
But the text currently being worked on seems to suggest exactly that.


>I think a comprehensive handling of multilingual text may need features 
>more comprehensive than just XML literals, and I think that datatyping 
>provides the way forward.  I think we are here discussing the details of 
>features which are not ultimately going to solve these problems, whatever 
>choices we make.  I don't know what the final solution may look like, but 
>I could imagine something like a "multilingual text" datatype whose 
>lexical forms are a well thought-out framework for handling all manner of 
>textual values, isolated (by RDF) from the kinds of problems that Pat 
>raised in his "tiger by the tail" message [2].

I find the discussion of further solutions interesting in its own right.
I'm sure we would have been open for such discussions e.g. at the Tech
Plenary in Cannes, or in future work.
But I do not think it should be an excuse for arbitrary inconsistencies in
the current design. It will be much easier to add new things for handling
multilingual texts if the current design is clear and flexible.

I'll get back to Pat's message to answer some of his points, too.


>I found some discussion from the original WG [3] where one of the 
>alternative options seems to do with different namespaces,

As I proposed this solution, I can just say that it isn't what you
may think it is. It is clearly different from what Ralph mentions
about namespaces in his mail [5]. It just proposed that instead of
labeling each 'XML Literal' with parseType="Literal", one would simply
look at the document and note that some of the namespaces were used for
RDF (properties, e.g. doublin core,...), whereas others would be
used in XML Literals (starting with XHTML,...). There would simply
be a global declaration saying which namespaces would be used which
way (and different prefixes could be used with the same namespace
to make a distinction). Given the current climate against attributes
with global consequences in RDF/XML, it may have been a good thing
that we didn't go that way.


>but that was problematic (apparently) due to interactions with other XML 
>applications.  I observe that the RDF datatype mechanism provides a 
>similar effect while avoiding those unwanted interactions, in that is 
>provides a very specific way to say how the literal text is to be 
>interpreted.  I think [4] is also worthy of review, in that it describes 
>the problems being addressed, even if the solution proposed was not really 
>workable.

Yes indeed. Please note that although when we comment, we may propose 
solutions,
we do not want to constrain the WG in charge on the actual solution taken.
[In the current context, this means that we do not care whether a 'wrapper'
solution is adopted, whether XML Literals are a 'third category', whether
the inconsistencies pointed out in the semantics document are simply
carefully fixed, or whether there is another solution.]


>In summary, I think the issues of multilingual text representation should 
>be distanced from the RDF core, not bound up with it, for the long-term 
>advantage of both.

Before discussing distancing or binding up, we would greatly appreciate if
it were not messed up!


>(While I'm digging, [5] appears to be Ralph's original "parseType" 
>proposal.  I note that this is a purely syntactic approach, as issues of 
>what it actually represents are explicitly ducked.)

And now the RDF Core WG tries to solve that issue by claiming that
XML Literals are sequences of octets! Sorry, but I don't want to
call such a backwards layer violation progress.

Regards,    Martin.


>[3] http://lists.w3.org/Archives/Member/w3c-rdf-syntax-wg/1998Oct/0085.html
>(member-only archive)
>
>[4] http://www.w3.org/International/Group/1998/10/NOTE-i18n-rev-rdfms-19981023
>
>[5] http://lists.w3.org/Archives/Member/w3c-rdf-syntax-wg/1998Oct/0064.html
Received on Sunday, 27 July 2003 21:59:48 UTC