W3C home > Mailing lists > Public > www-rdf-comments@w3.org > July to September 2003

Re: pfps-04 (why the thread is germane to pfps-04)

From: pat hayes <phayes@ihmc.us>
Date: Sun, 27 Jul 2003 17:04:39 -0500
Message-Id: <p06001a26bb49ed836564@[10.0.100.23]>
To: Martin Duerst <duerst@w3.org>
Cc: "Peter F. Patel-Schneider" <pfps@research.bell-labs.com>, bwm@hplb.hpl.hp.com, www-rdf-comments@w3.org

>Hello Peter,
>
>At 09:27 03/07/25 -0400, Peter F. Patel-Schneider wrote:
>>I believe that a complete theory of equality for XML literals resolves this
>>comment.  I suggest that several test cases be added to the RDF test suite.
>>
>>The related issue of whether the value spaces of xsd:string and plain
>>literals are disjoint also appears to be well on the way to resolution.
>
>Apart from the issue of language information (plain literals can take
>language information, xsd:string can't), what is the reason for making
>these two disjoint? We seem to get into a serious proliferation of
>string-related datatypes that provide no useful distinction.

True, but I guess my reaction to this is that apparently, this 
proliferation exists, and RDF's job is not to try to put the world to 
rights, but to allow anyone to make any assertions they wish to about 
any topic they wish to, as far as possible. If therefore there are 
people out there who wish to distinguish "Hello World" as character 
string from "Hello World"  as octet sequence from "Hello World"  as 
XML, or even "Hello World"  as red from "Hello World"  as green, who 
are we to say that they should not do so?

>In RDF, the simple text "Hello World" (without language information)
>can be a plain literal, an xsd:string, and an XML literal.
>What is the point of them all being different if there is no
>observable difference?

I am not sure what you mean by 'observable' in this context, or why 
that is relevant. Identity does not rely on indistinguishability. In 
another message you insist that
" it is very important to make sure that the plain
string "<br/>" (in XML written as "&lt;br/&gt;") is not the
same as the XML markup "<br/>" (in XML written as "<br/>")."
which seems like an unobservable difference to me of exactly the same 
kind. How something is written in XML is beside the point: the 
sequence of 5 characters (less-than, lowercase-b, lowercase-r, 
forward-slash, greater-than) is what it is.  What you seem to be 
insisting on is that markup is not text; that indeed makes sense as a 
parsing restriction when discussing XML.  But (with a passing bow to 
charmod) characters are characters. '<br/>' was a sequence of 5 
characters before XML was invented, and its still the same sequence 
of 5 characters. When I'm editing XHTML, I will treat this sequence 
differently when I see it in the code window than when I see it in 
the design window, but its the same 5 characters I am looking at in 
each case.

>>PS: Although the current situation may be technically satisfactory in this
>>area, the pain in getting there suggests that a slightly different
>>description of XML literals might be more useful, perhaps something along
>>the line of making the value space of XML literals in RDF be some abstract
>>set with equality defined as per exclusive XML canonicalization and
>>explicitly determined to be disjoint from the value space of plain RDF
>>literals and also from the XSD value spaces.  This would also probably make
>>the XML guys much more happy.
>
>I have proposed something like this just a day or two ago. It would
>definitely make I18N quite a bit happier, because it would not be
>a straightforward violation of the Character Model, and would indeed
>be much more in line with the XML spec.

I guess we have been working under the tacit assumption that as far 
as possible we *should* specify what our RDF-described domains 
actually are. This abstract set trick does make the semantics easier 
to state, but all it does operationally is to guarantee that 
identities *cannot* be inferred. If something is in an abstract set 
then is is definitely not a XML character sequence or octet sequence 
or XML markup, for example. Is this really what i18n wants?

Pat Hayes

-- 
---------------------------------------------------------------------
IHMC	(850)434 8903 or (650)494 3973   home
40 South Alcaniz St.	(850)202 4416   office
Pensacola			(850)202 4440   fax
FL 32501			(850)291 0667    cell
phayes@ihmc.us       http://www.ihmc.us/users/phayes
Received on Sunday, 27 July 2003 18:04:40 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Friday, 21 September 2012 14:16:32 GMT