- From: <Misha.Wolf@reuters.com>
- Date: Sat, 23 Feb 2002 19:20:28 +0000
- To: Jeremy Carroll <jjc@hplb.hpl.hp.com>
- Cc: bwm@hplb.hpl.hp.com, w3c-i18n-ig@w3.org, w3c-rdfcore-wg@w3.org
In preparation for our discussion next week, I'd like to note some of
the sub-issues.
- In XML, xml:lang is inherited. This means that, in XML, there is a
large number of ways to achieve the same set of associations between
strings and languages. This may not matter in RDF as long as there
is no more than one language per string. However ...
- A string may contain sub-strings in different languages. RDF
doesn't, IIRC, support a semantically neutral carrier equivalent to
<xhtml:span>. Thus there is currently no way to represent
multilingual strings in "first class" RDF. One reason for not
associating arcs with string language is that one could very easily
break any possibility of matching multilingual strings.
- RFC 3066 defines exact and approximate matching (I don't recall the
actual terminology) between language tags. For instance, "en-us"
precisely matches "en-us" and approximately matches "en".
- I don't think the proposal:
> suggesting that such pairs are equal
> if and only if
> the unicode strings are equal
> and
> the lang tags are either both absent, or both present and equal (as lang
> tags, i.e. case insensitive).
is right, as a string without a language tag would not match one
with. A consequence would be that people would be discouraged from
language tagging their strings, in case other people haven't tagged
*their* strings.
- The above seems to suggest that degrees of fuzziness are required, at
user option, as with regular search engines.
- All of the above is closely related to other "control" constructs
needed for correctly writing text in different languages, eg BiDi
controls for BiDirectional languages. Though Math(s) is a language
in quite a different sense, the same problem arises. Let's say the
title of a paper contains something that can't be expressed in plain
text, eg an integral from value A to value B. How do I do this in
RDF and how will others match on it?
Misha
On 20/02/2002 11:11:07 Jeremy Carroll wrote:
> > rdfms-xmllang: Why isn't xml:lang information represented within the RDF
> data model?
>
> > This was put on hold whilst we looked at datatypes.
> > Model and Syntax says that lang is part of the literal; that no triples
> are
> > generated for an xml:lang. We can choose to stick with that or change it.
> > Does anyone have a compelling reason to change it?
>
>
>
> My proposal before we put it on hold was in the overly long:
>
> http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2001Sep/0378.html
>
> [[[
> [1]
> An RDF Literal is a Unicode string, optionally paired with a
> language tag (as defined in RFC3066).
> ]]]
>
> in that thread we identified equally rules as follows:
>
> http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2001Oct/0375.html
>
> suggesting that such pairs are equal
> if and only if
> the unicode strings are equal
> and
> the lang tags are either both absent, or both present and equal (as lang
> tags, i.e. case insensitive).
>
>
>
>
> This then works orthogonally with:
> - the graph syntax
> - model theory
> - datatyping
> - any treatment of Unicode string normalization
>
>
> Jeremy
------------------------------------------------------------- ---
Visit our Internet site at http://www.reuters.com
Any views expressed in this message are those of the individual
sender, except where the sender specifically states them to be
the views of Reuters Ltd.
Received on Saturday, 23 February 2002 14:21:44 UTC