Re: A modest note on rdf:text from the an OWLer

On May 21, 2009, at 10:15 AM, Peter F.Patel-Schneider wrote:

> From: Pat Hayes <phayes@ihmc.us>
> Subject: Re: A modest note on rdf:text from the an OWLer
> Date: Thu, 21 May 2009 09:53:39 -0500
>
>> On May 20, 2009, at 8:01 PM, Peter F.Patel-Schneider wrote:
>>
>>> I've been following along on the conversation and not contributing,
>>> but
>>> I'm now going to stick by toe in.
>>>
>>>
>>> Here is a fictious dialog between several points of view.  You may
>>> decide, if you wish, to assign human actors to these points of view.
>>> That is completely up to you, I'm not saying that anyone holds these
>>> views as I've stated them.
>>>
>>> (An well-known Zakim meeting and IRC chat room.)
>>>
>>> IH:  We need better datatype support in OWL 2!
>>> IH:  But how can it be done?
>>> BM:  Let's use XML Schema datatypes and facets!
>>> IH:  Sounds good, go for it.
>>> JC:  But what about plain literals?  We need to support all RDF
>>> graphs!
>>> BM:  Hmm, we need a datatype for them.
>>
>> PH: Hey, BM, run that past me again. Why exactly does OWL need to  
>> have
>> a **datatype** for plain literals? That is, why can't it simply allow
>> RDF-style plain literals? Of course, it would be *ugly*, but would
>> anything actually break? (What?)  And you know the old aphorism: if  
>> it
>> ain't broken, don't fix it. OWL can always invent a class of all  
>> plain
>> literal values and even declare it to be a datatype class if that
>> makes the document-writers aesthetic sense happier and saves them  
>> some
>> boring repetition when stating rules and conditions, without needing
>> to actually change how RDF writes its literals.
>
> Well, OWL needs *something* to say that the range of some property is
> any string (with or without a language tag), i.e., a datatype.  The
> datatype extensibility solution in OWL (borrowed from XML Schema
> datatypes) requires a datatype to hang facets on, for example to  
> have a
> range that consisting exactly of strings with a US English language  
> tag.

Ah, so this is why just having a class name for the range will not be  
enough, right? (Everyone so far has cited the 'range' issue, which  
seems to be trivially solvable.)

>
> That said, there is nothing in OWL forbidding RDF-style plain literals
> in its syntaxes.  The functional syntax, for example, allows for
> literals like "Padre de familia"@es.  There is also nothing in OWL (or
> in the rdf:text document) that forbids the use of plain literals in  
> RDF
> graphs or even in any RDF exchange format.

Right, but it kind of devalues it, since OWL users will of course want  
to use the new form, and this will then leak into non-OWL RDF, but  
having lost its meaning; and, more to the point, non-OWL RDF users  
will be writing RDF which OWL will not be able to utilize properly.  
When they mix, nobody will be happy. I can almost directly predict the  
forlorn emails to the lists asking why their engine is only finding  
half the information in the billion-sized triple stores, and who do  
not want to be told that all they have to do is rewrite all the  
literals properly. But in any case, seems to me that any 'transcribe  
in/out' solution isn't going to work because there is information loss  
in either direction, so nobody will be able to round-trip. So there  
will be two RDF worlds: the OWL-2-savvy one and the others, and they  
in effect won't be able to communicate, in practice, even if they are  
both technically legal according to the specs.

>
>>> I know, we'll just use
>>>    xsd:string - its extension includes all reasonable plain  
>>> literals.
>>> JC:  Not so - to satisfy internationalization concerns we need to  
>>> also
>>>    handle plain literals with a language tag.
>>> BM:  Then let's have a new datatype, owl:text, that includes both
>>>    strings without a language tag and strings with a language tag.
>>
>> PH. Might have been better, in retrospect, to have restricted it to
>> the case with a language tag, which is the only case you actually  
>> need
>> - and stuck to xsd:string for the other case. Then at least we only
>> have two ways to write plain literals (datatyped or not) instead of
>> three. And, as y'all are constantly pointing out, the world has
>> already gotten used to the fact that "foo" and "foo"^^xsd:string are
>> the same, and your new type would just be doing exactly the same  
>> thing
>> for the tagged case, which is a smaller pill to swallow.
>
> Well, if owl:text is restricted to require a language tag, then  
> there is
> a (minor) need for the union of owl:text and xsd:string.  I don't see
> any pain difference between the two solutions, and the one that OWL
> chose requires one less new datatype.

? xsd:string isn't new, surely. But the pain difference between the  
two-choice case (plain vs. xsd:string typed, plain+tag vs. rdf:text  
typed) and the three-choice case (plain vs. xsd:string vs. rdf:text)  
is huge.  In the 2-choice case there is one way to get it wrong (so  
you can then try the other); in the three-choice case there are five  
ways to get it wrong. Its hard enough to get two-way agreements, but  
getting three-way agreements is just about impossible.

>
>>> It
>>>    is just like xsd:string but with complete coverage of all plain
>>>    literals. It is a perfectly good RDF datatype, conforms perfectly
>>>    to XML Schema Datatypes, there are no downsides.
>>> JC:  Sounds like a plan.  I'm happy.
>>>
>>> (Lots of on-stage document hacking.)
>>>
>>> (AP enters, announced by Zakim.)
>>>
>>> AP:  What's this owl:text?  This other WG is doing the same thing
>>> and so
>>>    both WGs should use the same name for it!
>>> IH:  Hmm.  OK, let's form a task force and come up with a joint
>>>    document.
>>> BM:  Let's call the datatype rdf:text, because that is a good
>>>    description of the purpose of the joint datatype.
>>> JC:  OK by me, but not really necessary, any name will do.
>>> AP:  That's fine.  This other WG just needs to add functions, which
>>> you
>>>    in OWL don't seem to have.
>>> BM:  I don't see any reason not to include a section on functions  
>>> for
>>>    rdf:text.
>>> JC:  OK by me, but it doesn't make me any happier.  I don't need the
>>>    functions.
>>> IH:  Let's include a bit of wording to encourage tools to use plain
>>>    literals whereever possible, just so that tools that are not
>>>    aware of the rdf:text datatype work as if they did.
>>> BM:  Why just for rdf:text?  Other datatypes have the same issue, or
>>>    even worse!  We are not going to require normalization of all
>>>    literals!  Making bad design choices just to support existing
>>>    tools is not a good idea in general, and is certainly not a good
>>>    idea here.
>>
>> PH: Actually, following previous design choices, whatever their
>> perceived merits, is a VERY good idea when writing standards for
>> interoperability. Such a good idea, in fact, that one should only not
>> follow it only when there is a pressing, urgent, user-driven need to
>> not do so. There is a very good chance that a future WG will not  
>> agree
>> with your judgements about 'good' or 'bad' design choices anyway.
>> After all, you apparently disagree with at least one previous WG.
>
> As far as I am concerned, OWL and rdf:text are precisely following
> previous design choices, *except* for the special rules that try to
> force a certain way of writing strings.  I would be very happy if this
> particular violation of previous design choices was removed from OWL,
> and from rdf:text.
>
>>> IH:  I know, I know, but making a special case for rdf:text might  
>>> make
>>>    it more acceptable.
>>> BM:  OK, but you are going to be sorry you ever tried to be nice.
>>>
>>> (A moderate amount of on-stage document hacking.)
>>>
>>> IH:  Hello world!  The OWL WG and this other WG have this great new
>>>    thing for you!  A new datatype, called rdf:text, for any sort of
>>>    internationalized text.
>>> (Tomatoes being thrown from everywhere in the audience.)
>>> IH:  Did I say "any sort of internationalized text"?  I meant to say
>>>    "strings with language tags".
>>> (More tomatoes being thrown, but only from one place in the  
>>> audience.)
>>> IH:  Oh, you don't like rdf:text at all?  Well, the OWL WG will just
>>> go
>>>    back to the previous happy situation and have a new datatype in
>>>    OWL, called owl:text, to go along with our use of lots of new XML
>>>    Schema datatypes and datatype facets.  Other WGs can use this new
>>>    datatype if they want, or not.  Other WGs can even define
>>> functions
>>>    on this new datatype, the OWL WG has nothing to say about this.
>>
>>> IH:  [Aside to BM]  I'm really, really sorry.
>>> BM:  [Aside to IH]  I told you so.
>>>
>>> (A bit of on-stage document hacking.)
>>>
>>> IH:  Hello world!  The OWL WG is proud to present its CR documents!
>>>    Sorry about the previous brouhaha.
>>> AP:  [Inaudible] Grumble, grumble.
>>>
>>> (Everyone realizes that if they throw tomatoes at owl:text they have
>>> to
>>> also shoot down the entire idea of RDF datatypes and D-entailment.)
>>
>> ? Not at all. Only at the idea of insisting that ALL literals must be
>> datatyped, even the ones that aren't.
>
> And neither OWL nor rdf:text does this insisting.

Yes, I overstated the case, you are correct. But I perceive an overall  
tendency to put pressure on the RDF world to move towards this new  
view of all-typed literals, and I don't think it is the right kind of  
pressure to be applying at this point in time. The last thing we want,  
right now, is to break the deployment of 'loose' RDF in a wide-world  
(and therefore messy and unstructured) setting, especially over the  
issue of representations of plain text fragments. See my recent post  
to the list for an alternative suggestion.

Pat

PS. terrific summary, BTW. I never did like tomatoes.

>  In fact, the only
> violation of previous design choices concerns the suggestion/ 
> requirement
> for one particular way of writing string literals (namely the
> non-datatyped one).
>
>> Pat
>>
>>>
>>> (World peace reigns!)
>>>
>>>
>>> This appears to be the result that is being argued for.
>>>
>>>
>>> Peter F. Patel-Schneider
>>> Bell Labs Research
>
> peter
>
>

------------------------------------------------------------
IHMC                                     (850)434 8903 or (650)494 3973
40 South Alcaniz St.           (850)202 4416   office
Pensacola                            (850)202 4440   fax
FL 32502                              (850)291 0667   mobile
phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes

Received on Thursday, 21 May 2009 17:24:07 UTC