RE: Outstanding Issues - rdfms-xmllang from Misha.Wolf@reuters.com on 2002-02-23 (w3c-rdfcore-wg@w3.org from February 2002)

From: <Misha.Wolf@reuters.com>
Date: Sat, 23 Feb 2002 19:20:28 +0000
To: Jeremy Carroll <jjc@hplb.hpl.hp.com>
Cc: bwm@hplb.hpl.hp.com, w3c-i18n-ig@w3.org, w3c-rdfcore-wg@w3.org
Message-ID: <T5940d9b289c407b70724c@reuters.com>

In preparation for our discussion next week, I'd like to note some of
the sub-issues.

-  In XML, xml:lang is inherited.  This means that, in XML, there is a
   large number of ways to achieve the same set of associations between
   strings and languages.  This may not matter in RDF as long as there
   is no more than one language per string.  However ...

-  A string may contain sub-strings in different languages.  RDF
   doesn't, IIRC, support a semantically neutral carrier equivalent to
   <xhtml:span>.  Thus there is currently no way to represent
   multilingual strings in "first class" RDF.  One reason for not
   associating arcs with string language is that one could very easily
   break any possibility of matching multilingual strings.

-  RFC 3066 defines exact and approximate matching (I don't recall the
   actual terminology) between language tags.  For instance, "en-us"
   precisely matches "en-us" and approximately matches "en".

-  I don't think the proposal:

   > suggesting that such pairs are equal
   >   if and only if
   >   the unicode strings are equal
   > and
   >    the lang tags are either both absent, or both present and equal (as lang
   > tags, i.e. case insensitive).

   is right, as a string without a language tag would not match one
   with.  A consequence would be that people would be discouraged from
   language tagging their strings, in case other people haven't tagged
   *their* strings.

-  The above seems to suggest that degrees of fuzziness are required, at
   user option, as with regular search engines.

-  All of the above is closely related to other "control" constructs
   needed for correctly writing text in different languages, eg BiDi
   controls for BiDirectional languages.  Though Math(s) is a language
   in quite a different sense, the same problem arises.  Let's say the
   title of a paper contains something that can't be expressed in plain
   text, eg an integral from value A to value B.  How do I do this in
   RDF and how will others match on it?

Misha


On 20/02/2002 11:11:07 Jeremy Carroll wrote:
> > rdfms-xmllang: Why isn't xml:lang information represented within the RDF
> data model?
>
> > This was put on hold whilst we looked at datatypes.
> > Model and Syntax says that lang is part of the literal; that no triples
> are
> > generated for an xml:lang.  We can choose to stick with that or change it.
> > Does anyone have a compelling reason to change it?
>
>
>
> My proposal before we put it on hold was in the overly long:
>
> http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2001Sep/0378.html
>
> [[[
> [1]
> An RDF Literal is a Unicode string, optionally paired with a
> language tag (as defined in RFC3066).
> ]]]
>
> in that thread we identified equally rules as follows:
>
> http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2001Oct/0375.html
>
> suggesting that such pairs are equal
>   if and only if
>   the unicode strings are equal
> and
>    the lang tags are either both absent, or both present and equal (as lang
> tags, i.e. case insensitive).
>
>
>
>
> This then works orthogonally with:
> - the graph syntax
> - model theory
> - datatyping
> - any treatment of Unicode string normalization
>
>
> Jeremy





------------------------------------------------------------- ---
        Visit our Internet site at http://www.reuters.com

Any views expressed in this message are those of  the  individual
sender,  except  where  the sender specifically states them to be
the views of Reuters Ltd.

Received on Saturday, 23 February 2002 14:21:44 UTC