- From: <Misha.Wolf@reuters.com>
- Date: Sat, 23 Feb 2002 19:20:28 +0000
- To: Jeremy Carroll <jjc@hplb.hpl.hp.com>
- Cc: bwm@hplb.hpl.hp.com, w3c-i18n-ig@w3.org, w3c-rdfcore-wg@w3.org
In preparation for our discussion next week, I'd like to note some of the sub-issues. - In XML, xml:lang is inherited. This means that, in XML, there is a large number of ways to achieve the same set of associations between strings and languages. This may not matter in RDF as long as there is no more than one language per string. However ... - A string may contain sub-strings in different languages. RDF doesn't, IIRC, support a semantically neutral carrier equivalent to <xhtml:span>. Thus there is currently no way to represent multilingual strings in "first class" RDF. One reason for not associating arcs with string language is that one could very easily break any possibility of matching multilingual strings. - RFC 3066 defines exact and approximate matching (I don't recall the actual terminology) between language tags. For instance, "en-us" precisely matches "en-us" and approximately matches "en". - I don't think the proposal: > suggesting that such pairs are equal > if and only if > the unicode strings are equal > and > the lang tags are either both absent, or both present and equal (as lang > tags, i.e. case insensitive). is right, as a string without a language tag would not match one with. A consequence would be that people would be discouraged from language tagging their strings, in case other people haven't tagged *their* strings. - The above seems to suggest that degrees of fuzziness are required, at user option, as with regular search engines. - All of the above is closely related to other "control" constructs needed for correctly writing text in different languages, eg BiDi controls for BiDirectional languages. Though Math(s) is a language in quite a different sense, the same problem arises. Let's say the title of a paper contains something that can't be expressed in plain text, eg an integral from value A to value B. How do I do this in RDF and how will others match on it? Misha On 20/02/2002 11:11:07 Jeremy Carroll wrote: > > rdfms-xmllang: Why isn't xml:lang information represented within the RDF > data model? > > > This was put on hold whilst we looked at datatypes. > > Model and Syntax says that lang is part of the literal; that no triples > are > > generated for an xml:lang. We can choose to stick with that or change it. > > Does anyone have a compelling reason to change it? > > > > My proposal before we put it on hold was in the overly long: > > http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2001Sep/0378.html > > [[[ > [1] > An RDF Literal is a Unicode string, optionally paired with a > language tag (as defined in RFC3066). > ]]] > > in that thread we identified equally rules as follows: > > http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2001Oct/0375.html > > suggesting that such pairs are equal > if and only if > the unicode strings are equal > and > the lang tags are either both absent, or both present and equal (as lang > tags, i.e. case insensitive). > > > > > This then works orthogonally with: > - the graph syntax > - model theory > - datatyping > - any treatment of Unicode string normalization > > > Jeremy ------------------------------------------------------------- --- Visit our Internet site at http://www.reuters.com Any views expressed in this message are those of the individual sender, except where the sender specifically states them to be the views of Reuters Ltd.
Received on Saturday, 23 February 2002 14:21:44 UTC