Re: Need XML in multilingual prefLabels: choose XMLLiteral datatype or language tags?

Dear Chistoph,

That's a hard problem indeed. And not very SKOS-specific.
One could argue that we could generalize prefLabel to accept other things than plain literals, but that wouldn't really fit very well the idea of a label (and the fact that these properties specialize rdfs:label).

And I'm afraid skos:notation [1] isn't a perfect fit with your requirement, either--I let you confirm!

Maybe a way to alleviate the issue is to use the SKOS-XL extension [2]. With instances of skos-xl:Label (for example, :aLabel, you can attach several statements to a label.
In particular, you could use xl:literalForm (the "main literal" attached to a subject") for the "basic but not very beautiful" plain literal ("text <math/>"@language).
And you could use other properties expressing:
- the MathML mark-up: for example,
:aLabel my:mathMarkup "text <math/>"^^rdf:XMLLiteral
- the language: for example
:aLabel dc:language "en"

This way, the most simple SKOS application would get something, and the MathML-aware applications could be customized to search for the "real" MathML XML to interprete.

By the way maybe you could sub-class xl:Label with an appropriate class that will reflect that MathML-aware applications could search for more data within.. Something like my:LabelwithMathSymbol .

Sorry but I'm afraid it's the only solution I can think of now.

Cheers,

Antoine

[1] http://www.w3.org/TR/skos-reference/#notations, http://www.w3.org/TR/skos-primer/#secnotations for more examples
[2] http://www.w3.org/TR/skos-reference/#xl ,


> Dear all,
>
> I am one of the developers of the Mathematics Subject Classification (MSC) SKOS dataset (see http://msc2010.org/resources/MSC/2010/info/ and http://thedatahub.org/dataset/msc).
>
> Some of the skos:prefLabels in this dataset contain MathML formulas, and we have labels in different languages. Thus, if the RDF data model allowed it, we would prefer using (with MathML abbreviated as LaTeX for easier reading):
>
> msc2010:11B57
> skos:prefLabel
> "Farey sequences; the sequences <math>{1^k, 2^k, \cdots}</math>"@en^^rdf:XMLLiteral .
>
> So it seems we have to choose between a rock and a hard place, and I'd like to ask you for advice with what to choose:
>
> Choice 1: Don't use the rdf:XMLLiteral datatype, i.e. use "text <math/>"@language.
>
> Con: We can no longer convey to applications the information that the label consists of well-balanced XML content.
>
> Con: Applications that process the labels but don't expect XML content here will display XML source code.
>
> Choice 2: Encode the language information into the XML, i.e. "text <math xml:lang='en'/>"^^rdf:XMLLiteral.
>
> Pro: Applications that don't know XML will fail (as they should).
>
> Con: In the multilingual case, a skos:Concept would have multiple datatyped skos:prefLabels with "no language". This violates the convention that skos:prefLabel is only used with plain literals (http://www.w3.org/TR/skos-reference/#L2655). It _may_ also violate the integrity condition S14 that "a resource has no more than one value of skos:prefLabel per language tag" (http://www.w3.org/TR/skos-reference/#L1567; however "no language tag" is not "a language tag").
>
> Con: Slows down SPARQL queries: Filtering by language would have to be done by treating the label as text and filtering against regular expression such as "xml:lang='en'".
>
> Con: As the majority of labels doesn't contain formulas, we would most reasonably represent them as plain literals, thus ending up with a mixture of language-tagged plain literals and XML literals.
>
> Note that we absolutely need the formulas in the labels; there is no way of separating them out of the literals into some auxiliary structures, for the following reasons:
>
> * Some labels contain more than one mathematical formula, scattered over multiple places in the text.
> * While this is not yet the case in the labels of the MSC dataset and their translations, note that mathematical notation varies with language; consider "Binomial coefficient <math>\binom{n}{k}</math>"@en vs. "Coefficient binomial <math>C^k_n</math>"@fr.
>
> And there is no way of doing without MathML. We have 23 labels that can't be expressed by just using Unicode, and some more that could theoretically be expressed in Unicode but where practically available fonts don't support it.
>
> What would you recommend?
>
> Cheers, and thanks in advance,
>
> Christoph
>

Received on Thursday, 30 August 2012 11:02:32 UTC