Re: A modest note on rdf:text from the an OWLer

From: Pat Hayes <phayes@ihmc.us>
Subject: Re: A modest note on rdf:text from the an OWLer
Date: Thu, 21 May 2009 12:22:53 -0500

> 
> On May 21, 2009, at 10:15 AM, Peter F.Patel-Schneider wrote:
> 
>> From: Pat Hayes <phayes@ihmc.us>
>> Subject: Re: A modest note on rdf:text from the an OWLer
>> Date: Thu, 21 May 2009 09:53:39 -0500
>>
>>> On May 20, 2009, at 8:01 PM, Peter F.Patel-Schneider wrote:
>>>
>>>> I've been following along on the conversation and not contributing,
>>>> but
>>>> I'm now going to stick by toe in.
>>>>
>>>>
>>>> Here is a fictious dialog between several points of view.  You may
>>>> decide, if you wish, to assign human actors to these points of view.
>>>> That is completely up to you, I'm not saying that anyone holds these
>>>> views as I've stated them.
>>>>
>>>> (An well-known Zakim meeting and IRC chat room.)
>>>>
>>>> IH:  We need better datatype support in OWL 2!
>>>> IH:  But how can it be done?
>>>> BM:  Let's use XML Schema datatypes and facets!
>>>> IH:  Sounds good, go for it.
>>>> JC:  But what about plain literals?  We need to support all RDF
>>>> graphs!
>>>> BM:  Hmm, we need a datatype for them.
>>>
>>> PH: Hey, BM, run that past me again. Why exactly does OWL need to
>>> have
>>> a **datatype** for plain literals? That is, why can't it simply allow
>>> RDF-style plain literals? Of course, it would be *ugly*, but would
>>> anything actually break? (What?)  And you know the old aphorism: if
>>> it
>>> ain't broken, don't fix it. OWL can always invent a class of all
>>> plain
>>> literal values and even declare it to be a datatype class if that
>>> makes the document-writers aesthetic sense happier and saves them
>>> some
>>> boring repetition when stating rules and conditions, without needing
>>> to actually change how RDF writes its literals.
>>
>> Well, OWL needs *something* to say that the range of some property is
>> any string (with or without a language tag), i.e., a datatype.  The
>> datatype extensibility solution in OWL (borrowed from XML Schema
>> datatypes) requires a datatype to hang facets on, for example to
>> have a
>> range that consisting exactly of strings with a US English language
>> tag.
> 
> Ah, so this is why just having a class name for the range will not be
> enough, right? (Everyone so far has cited the 'range' issue, which
> seems to be trivially solvable.)

OWL (2) DL needs a datatype name even just for the range.  

>> That said, there is nothing in OWL forbidding RDF-style plain literals
>> in its syntaxes.  The functional syntax, for example, allows for
>> literals like "Padre de familia"@es.  There is also nothing in OWL (or
>> in the rdf:text document) that forbids the use of plain literals in
>> RDF graphs or even in any RDF exchange format.
> 
> Right, but it kind of devalues it, since OWL users will of course want
> to use the new form, and this will then leak into non-OWL RDF, but
> having lost its meaning; 

> and, more to the point, non-OWL RDF users
> will be writing RDF which OWL will not be able to utilize properly.

This is not the case.  OWL 2 (and OWL 1) can handle plain literals.  In
OWL 1 plain literals without language tags belong to xsd:string.  In OWL
2 plain literals belong to ---:text.

> When they mix, nobody will be happy. 

OWL 2 (and OWL 1) are perfectly happy with both the current (and
envisioned) state of affairs.

> I can almost directly predict the
> forlorn emails to the lists asking why their engine is only finding
> half the information in the billion-sized triple stores, and who do
> not want to be told that all they have to do is rewrite all the
> literals properly. But in any case, seems to me that any 'transcribe
> in/out' solution isn't going to work because there is information loss
> in either direction, so nobody will be able to round-trip. So there
> will be two RDF worlds: the OWL-2-savvy one and the others, and they
> in effect won't be able to communicate, in practice, even if they are
> both technically legal according to the specs.

Maybe so.   But this problem *already* exists.  It exists with
xsd:string, which is an essential part of OWL 1.  It exists with
numbers. 

>>>> I know, we'll just use
>>>>    xsd:string - its extension includes all reasonable plain
>>>> literals.
>>>> JC:  Not so - to satisfy internationalization concerns we need to
>>>> also
>>>>    handle plain literals with a language tag.
>>>> BM:  Then let's have a new datatype, owl:text, that includes both
>>>>    strings without a language tag and strings with a language tag.
>>>
>>> PH. Might have been better, in retrospect, to have restricted it to
>>> the case with a language tag, which is the only case you actually
>>> need
>>> - and stuck to xsd:string for the other case. Then at least we only
>>> have two ways to write plain literals (datatyped or not) instead of
>>> three. And, as y'all are constantly pointing out, the world has
>>> already gotten used to the fact that "foo" and "foo"^^xsd:string are
>>> the same, and your new type would just be doing exactly the same
>>> thing
>>> for the tagged case, which is a smaller pill to swallow.
>>
>> Well, if owl:text is restricted to require a language tag, then
>> there is
>> a (minor) need for the union of owl:text and xsd:string.  I don't see
>> any pain difference between the two solutions, and the one that OWL
>> chose requires one less new datatype.
> 
> ? xsd:string isn't new, surely. But the pain difference between the
> two-choice case (plain vs. xsd:string typed, plain+tag vs. rdf:text
> typed) and the three-choice case (plain vs. xsd:string vs. rdf:text)
> is huge.  In the 2-choice case there is one way to get it wrong (so
> you can then try the other); in the three-choice case there are five
> ways to get it wrong. Its hard enough to get two-way agreements, but
> getting three-way agreements is just about impossible.

I don't see any new pain.  I don't see any new ways to get things
"wrong". 

>>>> It
>>>>    is just like xsd:string but with complete coverage of all plain
>>>>    literals. It is a perfectly good RDF datatype, conforms perfectly
>>>>    to XML Schema Datatypes, there are no downsides.
>>>> JC:  Sounds like a plan.  I'm happy.
>>>>
>>>> (Lots of on-stage document hacking.)
>>>>
>>>> (AP enters, announced by Zakim.)
>>>>
>>>> AP:  What's this owl:text?  This other WG is doing the same thing
>>>> and so
>>>>    both WGs should use the same name for it!
>>>> IH:  Hmm.  OK, let's form a task force and come up with a joint
>>>>    document.
>>>> BM:  Let's call the datatype rdf:text, because that is a good
>>>>    description of the purpose of the joint datatype.
>>>> JC:  OK by me, but not really necessary, any name will do.
>>>> AP:  That's fine.  This other WG just needs to add functions, which
>>>> you
>>>>    in OWL don't seem to have.
>>>> BM:  I don't see any reason not to include a section on functions
>>>> for
>>>>    rdf:text.
>>>> JC:  OK by me, but it doesn't make me any happier.  I don't need the
>>>>    functions.
>>>> IH:  Let's include a bit of wording to encourage tools to use plain
>>>>    literals whereever possible, just so that tools that are not
>>>>    aware of the rdf:text datatype work as if they did.
>>>> BM:  Why just for rdf:text?  Other datatypes have the same issue, or
>>>>    even worse!  We are not going to require normalization of all
>>>>    literals!  Making bad design choices just to support existing
>>>>    tools is not a good idea in general, and is certainly not a good
>>>>    idea here.
>>>
>>> PH: Actually, following previous design choices, whatever their
>>> perceived merits, is a VERY good idea when writing standards for
>>> interoperability. Such a good idea, in fact, that one should only not
>>> follow it only when there is a pressing, urgent, user-driven need to
>>> not do so. There is a very good chance that a future WG will not
>>> agree
>>> with your judgements about 'good' or 'bad' design choices anyway.
>>> After all, you apparently disagree with at least one previous WG.
>>
>> As far as I am concerned, OWL and rdf:text are precisely following
>> previous design choices, *except* for the special rules that try to
>> force a certain way of writing strings.  I would be very happy if this
>> particular violation of previous design choices was removed from OWL,
>> and from rdf:text.
>>
>>>> IH:  I know, I know, but making a special case for rdf:text might
>>>> make
>>>>    it more acceptable.
>>>> BM:  OK, but you are going to be sorry you ever tried to be nice.
>>>>
>>>> (A moderate amount of on-stage document hacking.)
>>>>
>>>> IH:  Hello world!  The OWL WG and this other WG have this great new
>>>>    thing for you!  A new datatype, called rdf:text, for any sort of
>>>>    internationalized text.
>>>> (Tomatoes being thrown from everywhere in the audience.)
>>>> IH:  Did I say "any sort of internationalized text"?  I meant to say
>>>>    "strings with language tags".
>>>> (More tomatoes being thrown, but only from one place in the
>>>> audience.)
>>>> IH:  Oh, you don't like rdf:text at all?  Well, the OWL WG will just
>>>> go
>>>>    back to the previous happy situation and have a new datatype in
>>>>    OWL, called owl:text, to go along with our use of lots of new XML
>>>>    Schema datatypes and datatype facets.  Other WGs can use this new
>>>>    datatype if they want, or not.  Other WGs can even define
>>>> functions
>>>>    on this new datatype, the OWL WG has nothing to say about this.
>>>
>>>> IH:  [Aside to BM]  I'm really, really sorry.
>>>> BM:  [Aside to IH]  I told you so.
>>>>
>>>> (A bit of on-stage document hacking.)
>>>>
>>>> IH:  Hello world!  The OWL WG is proud to present its CR documents!
>>>>    Sorry about the previous brouhaha.
>>>> AP:  [Inaudible] Grumble, grumble.
>>>>
>>>> (Everyone realizes that if they throw tomatoes at owl:text they have
>>>> to
>>>> also shoot down the entire idea of RDF datatypes and D-entailment.)
>>>
>>> ? Not at all. Only at the idea of insisting that ALL literals must be
>>> datatyped, even the ones that aren't.
>>
>> And neither OWL nor rdf:text does this insisting.
> 
> Yes, I overstated the case, you are correct. But I perceive an overall
> tendency to put pressure on the RDF world to move towards this new
> view of all-typed literals, and I don't think it is the right kind of
> pressure to be applying at this point in time. The last thing we want,
> right now, is to break the deployment of 'loose' RDF in a wide-world
> (and therefore messy and unstructured) setting, especially over the
> issue of representations of plain text fragments. See my recent post
> to the list for an alternative suggestion.

I do not see any such pressure.  Current OWL 1 tools already handle both
plain literal strings and xsd:string datatyped strings.  If there was
going to be significant pressure and significant problems, then they
should have already been seen.

> Pat
> 
> PS. terrific summary, BTW. I never did like tomatoes.
> 
>>  In fact, the only
>> violation of previous design choices concerns the suggestion/
>> requirement
>> for one particular way of writing string literals (namely the
>> non-datatyped one).
>>
>>> Pat
>>>
>>>>
>>>> (World peace reigns!)
>>>>
>>>>
>>>> This appears to be the result that is being argued for.
>>>>
>>>>
>>>> Peter F. Patel-Schneider
>>>> Bell Labs Research
>>
>> peter

peter

Received on Thursday, 21 May 2009 18:14:51 UTC