Re: first pass parseType="Literal" text for primer from Graham Klyne on 2003-07-24 (w3c-rdfcore-wg@w3.org from July 2003)

From: Graham Klyne <gk@ninebynine.org>
Date: Thu, 24 Jul 2003 11:35:06 +0100
To: Martin Duerst <duerst@w3.org>, rdf core <w3c-rdfcore-wg@w3.org>, i18n <w3c-i18n-ig@w3.org>
Message-Id: <5.1.0.14.2.20030724102838.025193a8@127.0.0.1>
At 14:39 23/07/03 -0400, Martin Duerst wrote:
>Hello Graham,
>
>We still seem to be caught up in terminology.
>I'll try again and go back to the text from Brian:
>
>>This example illustrates that designers should take care when designing
>>RDF data.  In cases where the value of a property may sometimes contain
>>rich text and sometimes not, the designer should either use
>>rdf:parseType="Literal" throughout, or design the application to handle
>>both plain literals and rdf:XMLLiteral's.
>
>What I was trying to say is that on the Web, asking people to use
>a special way (rdf:parseType="Literal") throughout when they originally
>have no motivation to do so and cannot anticipate what they or others
>may need in the future is a bad idea, because it does not scale.

OK, now I think we have a basis for discussion of the design...

>Many people designing 'RDF Applications' will start out with e.g.
><Title> being a plain literal. Later, they may discover that there
>are cases where they would need markup. But with the current design,
>they would have to go back and change all the <Title>s from plain
>literals to XML Literals. The way RDF is supposed to work, this
>will just not work out. So the needs for micro-markup, in particular
>for internationalization, will very sadly just be ignored if we
>don't change the design.

While I can appreciate that having a seamless path from simple text to 
marked-up text would be nice, I feel that this particular horse has already 
bolted.  I think there's a lot of RDF "out there" that is based on simple 
plain literals, which would be damaged if some plain text were to be 
reinterpreted as markup.  I'll also note that one RDF-based design, FOAF, 
has been used with Japanese names based simply on the current form of plain 
literals:
   http://kanzaki.com/docs/sw/foaf.html
My browser can't handle the extra characters, so I cannot comment how well 
it works.  There is some discussion of this at:
   http://rdfweb.org/pipermail/rdfweb-dev/2003-June/011202.html

But the point I really wanted to make is that I think, in RDF, the 
migration need not be so painful.  Even if plain literals cannot handle all 
the markup, standard RDF inference tools should be able to recognize the 
simple form and infer a more flexible form as and when such is needed.

I grant that's not a good basis for designing all new applications, and 
recommendations of the kind Brian has suggested [1] should encourage the 
use of an appropriate form other than simple plain-text literals, such as 
using parseType=Literal, for any value that may reasonably need to be 
multilingual text.

I think a comprehensive handling of multilingual text may need features 
more comprehensive than just XML literals, and I think that datatyping 
provides the way forward.  I think we are here discussing the details of 
features which are not ultimately going to solve these problems, whatever 
choices we make.  I don't know what the final solution may look like, but I 
could imagine something like a "multilingual text" datatype whose lexical 
forms are a well thought-out framework for handling all manner of textual 
values, isolated (by RDF) from the kinds of problems that Pat raised in his 
"tiger by the tail" message [2].

I found some discussion from the original WG [3] where one of the 
alternative options seems to do with different namespaces, but that was 
problematic (apparently) due to interactions with other XML 
applications.  I observe that the RDF datatype mechanism provides a similar 
effect while avoiding those unwanted interactions, in that is provides a 
very specific way to say how the literal text is to be interpreted.  I 
think [4] is also worthy of review, in that it describes the problems being 
addressed, even if the solution proposed was not really workable.

In summary, I think the issues of multilingual text representation should 
be distanced from the RDF core, not bound up with it, for the long-term 
advantage of both.

#g
--

(While I'm digging, [5] appears to be Ralph's original "parseType" 
proposal.  I note that this is a purely syntactic approach, as issues of 
what it actually represents are explicitly ducked.)

[1] http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2003Jul/0244.html

[2] http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2003Jul/0067.html

[3] http://lists.w3.org/Archives/Member/w3c-rdf-syntax-wg/1998Oct/0085.html
(member-only archive)

[4] http://www.w3.org/International/Group/1998/10/NOTE-i18n-rev-rdfms-19981023

[5] http://lists.w3.org/Archives/Member/w3c-rdf-syntax-wg/1998Oct/0064.html


>At 22:49 03/07/22 +0100, Graham Klyne wrote:
>
>>At 15:05 22/07/03 -0400, Martin Duerst wrote:
>>>Hello Graham,
>>>
>>>Sorry that I used the wrong word, maybe. Let me explain some
>>>of the background for the language I have used.
>>>
>>>The key document that proposed serious internationalization for
>>>the Web, written by Gavin Nicol, and still available at
>>>http://www.mind-to-mind.com/library/papers/multilingual/multilingual-www.h 
>>>tml,
>>>used the concept of "The WWW As A Multilingual Application"
>>>to explain why it was important to have an overall I18N model:
>>>On the Web (many people these days say *in* the Web), there
>>>is no guarantee that your data will stay with your application
>>>and not go somewhere else.
>>
>>I took a look at that, and immediately come up against a problem:
>
>I should have been more explicit, but you might have noticed:
>The work on Web Internationalization is close to 10 years old.
>I'm sure that most other documents written about the Web in
>1994 look quite a bit outdated nowadays, even if they were
>quite revolutionary when they were written. What stays, even
>after that much time, is the very basic idea.
>
>
>Regards,    Martin.
>
>
>
>
>>[[
>> From an end-user perspective, no matter where a link leads, the browser 
>> will be able to cope intelligently with the data received. From a system 
>> viewpoint, all clients and servers should be able to at least communicate.
>>]]
>>which implies (to me) that the only thing the web is supposed to do is 
>>browsing.  To me, the web (and especially the semantic web) is about 
>>browsing and much much more.
>>
>>The document then raises the need for multiple data formats for different 
>>purposes, and goes on, as far as I can tell, to talk about no data format 
>>other than HTML.
>>
>>As a discussion of web *browsing*, I'm not criticising this document, but 
>>I do think there's more to the web.  (I also think that RDF is a 
>>technology that has, or should have, uses *beyond* the web, but that's 
>>probably not an argument to swing in this forum ;-)
>>
>>>As you have showed very well below, the word 'application'
>>>is still used for smaller, identifiable pieces of software rather
>>>than for the whole Web. However, the idea that any Web page should
>>>be renderable on any browser, that pieces of XML data can move around
>>>freely, and that any RDF data can move to other places (called applications
>>>in general usage of the term) nevertheless is the central idea of
>>>the Web (including of course the Semantic Web).
>>
>>In citing those quotes from the architecture document, I saw a clear 
>>distinction between "agents" (which the architecture document also 
>>mentions) which appear to be the "identifiable pieces of software", and 
>>applications which I see as multiple cooperating software components 
>>communicating across the Internet using Web architectural principles.
>>
>>>So while I may have used the wrong words, I think my point was a
>>>very valid one, namely that any kind of attempt at trying to look
>>>at RDF data too much in terms of single, independent 'applications',
>>>and trying to use this to justify design, is against the very basic
>>>idea of the Web.
>>
>>I think there's a false dichotomy here:  we're not talking about a 
>>"single application", not is it multiple "independent applications", but 
>>a web of networked applications that share concepts and ideas to the 
>>extent that it's useful for them to do so.  In particular, RDF is not 
>>separate from the rest of the web, nor is it just another part of "the 
>>Web application".
>>
>>#g
>>--
>
>-------------------
>Graham Klyne
><GK@NineByNine.org>
>PGP: 0FAA 69FF C083 000B A2E9  A131 01B9 1C7A DBCA CB5E
Received on Thursday, 24 July 2003 07:33:23 UTC