- From: Christoph LANGE <c.lange@cs.bham.ac.uk>
- Date: Thu, 30 Aug 2012 10:41:19 +0200
- To: public-esw-thes@w3.org
Dear all,
I am one of the developers of the Mathematics Subject Classification
(MSC) SKOS dataset (see http://msc2010.org/resources/MSC/2010/info/ and
http://thedatahub.org/dataset/msc).
Some of the skos:prefLabels in this dataset contain MathML formulas, and
we have labels in different languages. Thus, if the RDF data model
allowed it, we would prefer using (with MathML abbreviated as LaTeX for
easier reading):
msc2010:11B57
skos:prefLabel
"Farey sequences; the sequences <math>{1^k, 2^k,
\cdots}</math>"@en^^rdf:XMLLiteral .
So it seems we have to choose between a rock and a hard place, and I'd
like to ask you for advice with what to choose:
Choice 1: Don't use the rdf:XMLLiteral datatype, i.e. use "text
<math/>"@language.
Con: We can no longer convey to applications the information that the
label consists of well-balanced XML content.
Con: Applications that process the labels but don't expect XML content
here will display XML source code.
Choice 2: Encode the language information into the XML, i.e. "text <math
xml:lang='en'/>"^^rdf:XMLLiteral.
Pro: Applications that don't know XML will fail (as they should).
Con: In the multilingual case, a skos:Concept would have multiple
datatyped skos:prefLabels with "no language". This violates the
convention that skos:prefLabel is only used with plain literals
(http://www.w3.org/TR/skos-reference/#L2655). It _may_ also violate the
integrity condition S14 that "a resource has no more than one value of
skos:prefLabel per language tag"
(http://www.w3.org/TR/skos-reference/#L1567; however "no language tag"
is not "a language tag").
Con: Slows down SPARQL queries: Filtering by language would have to be
done by treating the label as text and filtering against regular
expression such as "xml:lang='en'".
Con: As the majority of labels doesn't contain formulas, we would most
reasonably represent them as plain literals, thus ending up with a
mixture of language-tagged plain literals and XML literals.
Note that we absolutely need the formulas in the labels; there is no way
of separating them out of the literals into some auxiliary structures,
for the following reasons:
* Some labels contain more than one mathematical formula, scattered over
multiple places in the text.
* While this is not yet the case in the labels of the MSC dataset and
their translations, note that mathematical notation varies with
language; consider "Binomial coefficient <math>\binom{n}{k}</math>"@en
vs. "Coefficient binomial <math>C^k_n</math>"@fr.
And there is no way of doing without MathML. We have 23 labels that
can't be expressed by just using Unicode, and some more that could
theoretically be expressed in Unicode but where practically available
fonts don't support it.
What would you recommend?
Cheers, and thanks in advance,
Christoph
--
Christoph Lange, School of Computer Science, University of Birmingham
http://cs.bham.ac.uk/~langec, Skype duke4701
→ Building & Exploring Web Based Environments. Seville, Spain, 27 Jan–
1 Feb 2013. Deadline 22 Sep.
http://iaria.org/conferences2013/WEB13.html
Received on Thursday, 30 August 2012 08:41:53 UTC