W3C home > Mailing lists > Public > w3c-rdfcore-wg@w3.org > February 2002

RE: Outstanding Issues - rdfms-xmllang

From: <Misha.Wolf@reuters.com>
Date: Mon, 25 Feb 2002 01:36:23 +0000
Message-ID: <T5947580677c407b70724c@reuters.com>
To: timbl@w3.org
Cc: w3c-rdfcore-wg@w3.org, w3c-i18n-ig@w3.org


Tonight (it has since turned into last night) Graham Klyne drew my
attention to your "Interpretation properties" [1].  You wrote:

> There has to date (2000/02) been a consistent muddle in the RDF
> community about how to represent the natural language of a string. In
> XML it is simple, because you never have to exactly explain what you
> mean. You can mark up span of text and declare it to be French.
>   His name was <html:span xml:lang="fr">Jean-Fran&ccedilla;ois</html:span>
>   but we called him Dan.
> Under pressure from the XML community to be standard, the RDF spec
> included this attribute as the official RDF way to record that a string
> was in a given language. This was a mistake, as the attribute was thrown
> into the syntax but not into the model which the spec was defining.

IIRC, the RDF M&S WG didn't include string language in the model in the
regular RDF manner because of the argument, put forward by us I18N
folks, that in real life many strings are mixed-language (your made up
sentence above is an example).  The most obvious node-and-arc treatment
cannot, I think, handle that.

We need to devise an approach which allows the string to exist as a
string and, simultaneously, to be recognised as containing substrings
associated with various languages.  I imagine this can be achieved using
regular RDF constructs, but it needs some careful thought.  Graham and
I tonight discussed the possibility of using XPath-based arcs to address
the monolingual substrings and to associate each with a language.

More difficult problems are posed by:

-  Text markup with control semantics, eg for BiDi.

-  Strings which encode neither human language text, nor a datatype such
   as a date or a floating point number.  An example is a mathematical
   equation (using MathML).  Then we have the mixed case of a human
   language string containing some embedded MathML, eg the title of a
   scientific paper.

[1] http://www.w3.org/DesignIssues/InterpretationProperties.html


------------------------------------------------------------- ---
        Visit our Internet site at http://www.reuters.com

Any views expressed in this message are those of  the  individual
sender,  except  where  the sender specifically states them to be
the views of Reuters Ltd.
Received on Sunday, 24 February 2002 20:37:22 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 7 January 2015 14:53:56 UTC