- From: William Waites <ww@styx.org>
- Date: Thu, 9 Jun 2011 15:30:11 +0200
- To: Pat Hayes <phayes@ihmc.us>
- Cc: RDF WG <public-rdf-wg@w3.org>
* [2011-06-08 21:04:44 -0500] Pat Hayes <phayes@ihmc.us> écrit: ] I once thought so, but now I disagree. To borrow a term from ] hilosophy, we have to look at the identity conditions. "chat" ] in French is a **different word** than "chat" in English. Same ] string, different word. Ergo, the words are not the same as ] the strings. Yes. ] And indeed, once you look at it carefully, they aren't strings, ] exactly because they are *in a language*. They aren't just strings ] of characters, they are language texts. Formally, a pair of a ] string and a language is not the same kind of thing as a simple ] string. "Le chat est sur le table" and "fhk frus fns noeptr k" are ] just two strings, nothing to particularly choose one over the other, ] but "Le chat est sur le table"@fr and "fhk frus fns noeptr k"@fr are ] very different. Something that understands the tag might well treat ] the second one as an error. Kind of. They are still strings, just strings drawn from some lexicon. All writing systems for all languages (I can't think of a counter-example) use sequences of characters. The valid ones are different for different languages. In fact languages are fluid enough that it is perfectly valid, if a bit non-standard, to arbitrarily import words from other languages - "venez donc, je veux avoir un p'tit chat avec vous". The meaning of the sub-string "chat" is clear from context and is pretty clearly not theanimal. Language-independent (statistical) techniques in computational linguistics very often consider only words qua strings. Again, I agree we need a way to distinguish texts in different languages. But I don't agree that texts in different languages are so fundamentally different in character from other datatypes that we need special machinery for handling them. "42"^^xsd:string and "42"^^xsd:int are different too, and for some purposes I'm interested in their lexical representation and for some purposes I'm interested in their type. Why don't we add another special case for the base in which the numbers are written, then I can have "2A"@hex^^xsd:int, which is perfectly reasonable and, I think, completely analogous. Anyways, I gather from the other mails that I'm going against the grain, so I'll just say that that the business with rdfs:subClassOf was not intended to be a definitive or complete or even correct model, it was just intended to show that the RDF machiery could be brought to bear on the language question if we do it like this (and also not that this WG should get involved in actually doing this modelling). The "backwards compatibility" just comes from the fact that adding an "if a simple rdflang:foo is present, serialise as "asdas"@foo rule" is enough that existing systems consuming data made with these considerations in mind would still work because there would be no change to the serialised representations. Cheers, -w -- William Waites <mailto:ww@styx.org> http://river.styx.org/ww/ <sip:ww@styx.org> F4B3 39BF E775 CF42 0BAB 3DF0 BE40 A6DF B06F FD45
Received on Thursday, 9 June 2011 13:31:04 UTC