Re: ISSUE-12 On languages and datatypes from William Waites on 2011-06-09 (public-rdf-wg@w3.org from June 2011)

From: William Waites <ww@styx.org>
Date: Thu, 9 Jun 2011 15:30:11 +0200
To: Pat Hayes <phayes@ihmc.us>
Cc: RDF WG <public-rdf-wg@w3.org>
Message-ID: <20110609133011.GM42832@styx.org>
* [2011-06-08 21:04:44 -0500] Pat Hayes <phayes@ihmc.us> écrit:

] I once thought so, but now I disagree. To borrow a term from 
] hilosophy, we have to look at the identity conditions. "chat"
] in French is a **different word** than "chat" in English. Same
] string, different word. Ergo, the words are not the same as 
] the strings. 

Yes.

] And indeed, once you look at it carefully, they aren't strings,
] exactly because they are *in a language*. They aren't just strings 
] of characters, they are language texts. Formally, a pair of a
] string and a language is not the same kind of thing as a simple 
] string. "Le chat est sur le table" and "fhk frus fns noeptr k" are
] just two strings, nothing to particularly choose one over the other,
] but "Le chat est sur le table"@fr and "fhk frus fns noeptr k"@fr are 
] very different. Something that understands the tag might well treat
] the second one as an error.

Kind of. They are still strings, just strings drawn from some lexicon.
All writing systems for all languages (I can't think of a
counter-example) use sequences of characters. The valid ones are
different for different languages. In fact languages are fluid enough
that it is perfectly valid, if a bit non-standard, to arbitrarily
import words from other languages - "venez donc, je veux avoir un
p'tit chat avec vous". The meaning of the sub-string "chat" is clear
from context and is pretty clearly not theanimal. Language-independent
(statistical) techniques in computational linguistics very often
consider only words qua strings.

Again, I agree we need a way to distinguish texts in different
languages. But I don't agree that texts in different languages are so
fundamentally different in character from other datatypes that we need
special machinery for handling them. "42"^^xsd:string and
"42"^^xsd:int are different too, and for some purposes I'm interested
in their lexical representation and for some purposes I'm interested
in their type. Why don't we add another special case for the base in
which the numbers are written, then I can have "2A"@hex^^xsd:int,
which is perfectly reasonable and, I think, completely analogous.

Anyways, I gather from the other mails that I'm going against the
grain, so I'll just say that that the business with rdfs:subClassOf
was not intended to be a definitive or complete or even correct 
model, it was just intended to show that the RDF machiery could be
brought to bear on the language question if we do it like this (and
also not that this WG should get involved in actually doing this 
modelling). The "backwards compatibility" just comes from the fact
that adding an "if a simple rdflang:foo is present, serialise
as "asdas"@foo rule" is enough that existing systems consuming data
made with these considerations in mind would still work because 
there would be no change to the serialised representations.

Cheers,
-w

-- 
William Waites                <mailto:ww@styx.org>
http://river.styx.org/ww/        <sip:ww@styx.org>
F4B3 39BF E775 CF42 0BAB  3DF0 BE40 A6DF B06F FD45
Received on Thursday, 9 June 2011 13:31:04 UTC