Re: A summary of the proposal for resolving the issues with rdf:text --> Could you please check it one more time? from Harry Halpin on 2009-05-21 (semantic-web@w3.org from May 2009)

From: Harry Halpin <hhalpin@ibiblio.org>
Date: Thu, 21 May 2009 20:38:31 +0200
To: Pat Hayes <phayes@ihmc.us>
Cc: Boris Motik <boris.motik@comlab.ox.ac.uk>, "Eric Prud'hommeaux" <eric@w3.org>, Andy Seaborne <andy.seaborne@hp.com>, Alan Ruttenberg <alanruttenberg@gmail.com>, public-rdf-text@w3.org, Semantic Web <semantic-web@w3.org>, Sandro Hawke <sandro@w3.org>, Axel Polleres <axel.polleres@deri.org>
Message-ID: <b3be92a00905211138y6f7ac9b9oea35545e6d1e564@mail.gmail.com>
Quick note -

    I have not been tracking this issue, but it does seem a very bad
idea (well worthy of a formal objection) in general to create a
incompatibility with something as a basic as a text string, ala plan
literal. As this seems to be an issue almost entirely motivated by
formal semantics, I see *no* reason why formal semantic motivations
should cause pain for users and already existing data.

 Why not just in RIF and OWL2 have plain literals default to be
treated as having a data-type of "rdf:text" (or whatever is needed in
the formal semantics), and never require the explicit edition of any
work by the users?

In particular, ""Family Guy" would then default to ""Family Guy@". Why
is this option not tenable? Seems rather sensible to me, but I assume
there *must* be some reason for not doing it that way.


On Wed, May 20, 2009 at 7:20 PM, Pat Hayes <phayes@ihmc.us> wrote:
>
> On May 20, 2009, at 9:57 AM, Boris Motik wrote:
>
>> Hello,
>>
>> I have to agree that the text in the rdf:text specification might not
>> reflect
>> correctly the intentions I expressed. Quite frankly, we (i.e., the authors
>> of
>> the rdf:text specification) haven't been really aware of all the
>> repercussions
>> and possible interpretations of our spec. The text you refer to at the end
>> of
>> this e-mail has been introduced as a reaction to one of the earlier
>> comments by
>> the SPARQL WG.
>>
>> Nevertheless, here is what the goals of rdf:text are:
>
> Thanks for this summary.
>
>>
>> 1. Both RIF and OWL 2 find the distinction between plain and typed
>> literals
>> painful. This is because, whenever one refers to literals, one needs two
>> subcases: for a plain and for a typed literal.
>
> So?
>
>> Both RIF and OWL 2 have
>> independently come up with exactly the same idea: they opted to represent
>> the
>> "semantic content" of plain literals through typed literals whose value is
>> the
>> same as the corresponding plain literals.
>
> Thanks for making this clear in a public forum. OWL 2 and RIF are
> deliberately, by design, creating a central incompatibility with a basic
> feature of RDF. This  seems to me to be a quite extraordinary and amazing
> observation, one that deserves to be publicized as widely as possible (which
> is why I am CCing this to semantic-web@w3.org). Why would two W3C WGs set
> out to deliberately *create* interoperability problems with other W3C
> standards, just when those standards are beginning to achieve widespread
> acceptance?
>
>> This makes the definitions and the
>> semantic treatment of literals in both RIF and OWL 2 much simpler.
>
> It makes it more elegant, yes, but is there really a PROBLEM here that needs
> to be solved? That is, what actual issues for users or implementations are
> posed by the presence of two literal forms? Or is this discomfort simply a
> matter of theoretician's feelings of inelegance or clumsiness? Because if
> the latter (as I strongly suspect), this is not a sufficient reason to
> attempt to retroactively undermine the existing RDF standard, and to
> deliberately create what I believe will be troublesome and awkward problems
> for an entire generation of implementations, and certainly for a majority of
> existing ones. Creating problems like this is exactly what W3C WGs should
> NOT be doing, especially at a critical point in the deployment of SWeb
> technology. Google just quietly announced their cautious support for RDFA.
> It is not exactly a great idea for two W3C WGs to be at that very moment
> deliberately attempting to undermine one of the basic aspects of the RDF
> design. Elegance, is, frankly, not of central importance right now.
>
>>
>>
>> 2. Both RIF and OWL 2 need a mechanism to refer to the set of all plain
>> literals. For example, in OWL 2 you might want to say "the range is a
>> piece of
>> text".
>
> That problem can be trivially solved by introducing a class of such values,
> and giving it a reserved name. RDF plain literals denote themselves, so that
> the class of plain literal values is also the class of plain literals which
> is also the class of pieces of text.
>
>> In OWL 2 this is very important because of facets. Using a datatype for
>> this purpose is natural.
>
> Natural. maybe, but not REQUIRED. And given the problems that it causes,
> maybe it isn't so natural after all. Think of OWL 2 as part of an existing
> world-wide deployment of SWeb systems, and then ask if it is 'natural'.
>
>> Both RIF and OWL 2 have chosen to follow the
>> definitions of datatypes from XML Schema. Thus, each datatype consists of
>> a set
>> of lexical values, a value space, and a L2V mapping. Plain literals do not
>> follow these principles
>
> Of course they do. Abstractly, the plain literal 'datatype' is as follows:
> the lexical space is all character strings; the value space is all character
> strings; and the L2V mapping is the identity map. Obvious extension to the
> case of tagged literals. Where is the conceptual problem here?
>
>> ; therefore, rdf:text defines lexical values that encode
>> the content of plain literals.
>
> Giving rise immediately, and predictably, to the interoperability nightmare
> of there being two ways to represent one ubiquitous kind of thing - a piece
> of unmarked text, with an optional language tag -  which require exotic
> means to establish their equivalence, and different specs requiring
> different ways to be used. This is an elementary systems-engineering
> mistake, a decision which could have been designed to create global systemic
> problems. (Even the "managerial" situation of narrowly focussed WGs working
> on parts of the problem in isolation is classic. Future systems engineering
> 101 course textbooks will be able to cite this as an example.)
>
>> Now as I have already said, we have not had the complete store as clear in
>> our
>> minds right from the beginning. Given all the LC comments (which have by
>> the way
>> have been quite useful and have significantly improved the spec), however,
>> both
>> RIF and OWL 2 have agreed that the view I proposed in my e-mail is the
>> appropriate one (at least from the RIF and OWL 2 points of view).
>
> All SWeb WG's points of view should be primarily to further the deployment
> of the SWeb.
>
>> As I've stated
>> in my summary e-mail, to achieve this we simply need to remove from the
>> specification any special treatment of rdf:text: this should be a datatype
>> like
>> any other. This is precisely the part of the document that you are
>> referring to.
>>
>> Thus, the final version of the document would not mention any
>> interoperability
>> problems.
>
> How wonderful. We will not mention them, so they will have gone away. Or,
> more precisely, they have not gone away, but they aren't OUR problem.  We
> are just doing our job, and making the Semantic Web work isn't in our WG
> charter: we are just concerned with RIF/OWL.
>
> Sorry about the sarcastic tone, but this really does deserve it.
>
>> Furthermore, we may also rework the introduction to make the intention
>> behind rdf:text clearer.
>
> Certainly, the rdf:text document is very misleading as written. It purports
> to be about representing internationalized text, which is clearly not even
> close to the truth, and it does not even mention the apparently real
> motivation, which is (see above) to create incompatibilities with the RDF
> plain literal design.
>
> Pat
>
>>
>> Regards,
>>
>>        Boris
>>
>>> -----Original Message-----
>>> From: Eric Prud'hommeaux [mailto:eric@w3.org]
>>> Sent: 20 May 2009 16:36
>>> To: Boris Motik
>>> Cc: 'Seaborne, Andy'; 'Alan Ruttenberg'; public-rdf-text@w3.org; 'Sandro
>>> Hawke'; 'Axel Polleres'
>>> Subject: Re: A summary of the proposal for resolving the issues with
>>> rdf:text
>>> --> Could you please check it one more time?
>>>
>>> On Wed, May 20, 2009 at 01:38:29PM +0200, Boris Motik wrote:
>>>>
>>>> Hello,
>>>>
>>>> I fully appreciate use case and I agree with your observation: this is
>>>
>>> something
>>>>
>>>> that has to be addressed. I don't think, however, that solving this
>>>> problem
>>>
>>> is
>>>>
>>>> in the domain of rdf:text. The rdf:text specification merely defines yet
>>>
>>> another
>>>>
>>>> datatype by specifying it in exactly the same way as this is done in XML
>>>
>>> Schema.
>>>>
>>>> This datatype is just like any other XML Schema datatype; hence, the job
>>>
>>> from
>>>>
>>>> rdf:text's point of view is done.
>>>
>>> Ahh, perhaps we have different goals for rdf:text. rdf:text was, if I
>>> understand, created to address the issue that one could not infer the
>>> presenece of or the consequences of plain literals. One could fill
>>> that hole by creating a datatype that consumes and infers plain
>>> literals, or one could create a datatype which bijects to plain
>>> literals. Special machinery associated with that datatype is required
>>> in either case.
>>>
>>> (I, who was not involved in rdf:text except as an afterthought,
>>> argue that it is intended to take the former approach. You, an
>>> author, argue something closer to the latter.
>>> )
>>>
>>>> Furthermore, the addition of rdf:text to the mix of the supported
>>>> datatypes
>>>
>>> adds
>>>>
>>>> no new conceptual problems to SPARQL: the situation with rdf:text is no
>>>> different than with, say, xsd:integer (there are other examples as
>>>> well).
>>>
>>> For
>>>>
>>>> example, assume that you have an RDF graph
>>>>
>>>> G = { <a, b, "1"^xsd:integer> }
>>>>
>>>> but you ask the query
>>>>
>>>> Q = { <a, b, "1.0"^^xsd:decimal> }.
>>>>
>>>> Clearly, G D-entails Q, so Q should be answered as TRUE in G. It is not
>>>> the
>>>> business of XML Schema to specify how this is to be achieved: XML Schema
>>>
>>> merely
>>>>
>>>> specifies what the correct answer to the above question is. It is a
>>>> SPARQL
>>>> implementation such as OWLIM that should think of how to support such a
>>>> definition.
>>>
>>> SPARQL is defined in terms of the graph, so Q will fail to match G. As
>>> entailments supplement the graph, a D-entailing system confronted with
>>>  <a, b, "1"^xsd:integer>
>>>
>>> will have a (notional) graph
>>>  G = { <a, b, "1"^xsd:integer> .
>>>       <a, b, "1.0"^^xsd:decimal> . }.
>>>
>>> I'd say that we're aguing whether <a, b, "bob@en"^^rdf:text> shows up
>>> in the graph. You propose something like:
>>>  <a, b, "bob@en"^^rdf:text> D-entails to
>>>  G = { <a, b, "bob@en"^^rdf:text> .
>>>       <a, b, "bob"@en> . }.
>>> while I propose it that you never utter <a, b, "bob@en"^^rdf:text> and
>>> have the tools that implement the specification produce only <a, b,
>>> "bob"@en>
>>> .
>>>
>>>
>>>> I don't know whether a solution to the above problem (with xsd:integer
>>>> and
>>>> xsd:decimal) exists. If not, I agree that one should be developed;
>>>> however,
>>>
>>> we
>>>>
>>>> would not go to the XML Schema WG and ask them to specify how should
>>>> SPARQL
>>>> handle this case, would we?
>>>>
>>>> The problem with rdf:text is *precisely* the same as the one that I
>>>> outlined
>>>> above. At an abstract level, it can be stated as "Several syntactic
>>>> forms of
>>>> literals get mapped to the semantically identical data values". AS
>>>
>>> demonstrated
>>>>
>>>> above, this problem exists without rdf:text, so I don't see how rdf:text
>>>
>>> brings
>>>>
>>>> anything new into the whole picture. Thus, you can apply to the rdf:text
>>>
>>> case
>>>>
>>>> exactly the same solution that you would apply to xsd:integer and
>>>
>>> xsd:decimal.
>>>
>>> Your proposal is analogous to the D-entailment of numeric types, while
>>> I interpret the rdf:text last call wording as attempting to reduce the
>>> interrop challenges that would stem from spotty coverage with respect
>>> to that D-entailment.
>>>
>>>
>>>> If such a solution doesn't exist yet, then the SPARQL WG should address
>>>
>>> these
>>>>
>>>> issues, and it should do so in general for all datatypes (xsd:integer,
>>>> xsd:decimal, and so on), not just for rdf:text.
>>>
>>> I'd argue that it's more of an RDF Core issue (admitting that they
>>> don't exist). To solve an entailment especially for SPARQL sidesteps
>>> the other folks who want to know what's in the graph, for instance, an
>>> RDF graph API (such as exist in Jena, Sesame, ...), other entailment
>>> regimes that may or may not stack on top of OWL (imagine an
>>> app-directed regime like FOAF smushing), as well as secondary
>>> consumers of RDF graphs, for instance, an XSLT wich runs on the XML
>>> results returned from a SPARQL query service.
>>>
>>>
>>>> To summarize, I think that the work from the point of view of the
>>>> rdf:text
>>>
>>> WG is
>>>>
>>>> *done* and that we should not do anything else in this forum.
>>>
>>> Andy has argued that approach 1 is the only of the 3 that is
>>> compatible with this text from the last call document:
>>> [[
>>> Despite the semantic equivalence between typed rdf:text literals and
>>> plain literals, the presence of typed rdf:text literals in an RDF
>>> graph might cause interoperability problems between RDF tools, as not
>>> all RDF tools will support rdf:text. Therefore, before exchanging an
>>> RDF graph with other RDF tools, an RDF tool that suports rdf:text MUST
>>> replace in the graph each typed rdf:text literal with the
>>> corresponding plain literal. The notion of graph exchange includes,
>>> but is not limited to, the process of serializing an RDF graph using
>>> any (normative or nonnormative) RDF syntax.
>>> ]]. \1 is clarifying the boundries of the above graph exchange.
>>>
>>>> Regards,
>>>>
>>>>        Boris
>>>>
>>>>> -----Original Message-----
>>>>> From: Eric Prud'hommeaux [mailto:eric@w3.org]
>>>>> Sent: 20 May 2009 13:18
>>>>> To: Boris Motik
>>>>> Cc: 'Seaborne, Andy'; 'Alan Ruttenberg'; public-rdf-text@w3.org;
>>>>> 'Sandro
>>>>> Hawke'; 'Axel Polleres'
>>>>> Subject: Re: A summary of the proposal for resolving the issues with
>>>
>>> rdf:text
>>>>>
>>>>> --> Could you please check it one more time?
>>>>>
>>>>> On Wed, May 20, 2009 at 09:29:00AM +0200, Boris Motik wrote:
>>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> I don't see the benefit of option 1, as it makes things unnecessarily
>>>>>
>>>>> complex.
>>>>>>
>>>>>> The fewer exceptions we have, the easier it will be to actually
>>>
>>> implement a
>>>>>>
>>>>>> conformant system. The dichotomy between plain und typed literals is
>>>
>>> just an
>>>>>>
>>>>>> example of an exception that just makes implementation difficult.
>>>
>>> Instead of
>>>>>>
>>>>>> introducing more special cases, I think we should unify these whenever
>>>>>
>>>>> possible.
>>>>>>
>>>>>> Furthermore, I'm not sure whether sorting out things such as the ones
>>>>>
>>>>> pointed
>>>>>>
>>>>>> out below is necessary to finalize the rdf:text specification. Please
>>>
>>> note
>>>>>
>>>>> that
>>>>>>
>>>>>> rdf:text already has a well-defined lexical and value space, and this
>>>>>> is
>>>>>
>>>>> *the
>>>>>>
>>>>>> only* thing that we need to be able to plug rdf:text into the model
>>>
>>> theory
>>>>>
>>>>> of
>>>>>>
>>>>>> RDF. That is, given RDF graphs G1 and G2 possibly containing rdf:text
>>>>>
>>>>> literals
>>>>>>
>>>>>> and/or plain literals, using the definitions from the present rdf:text
>>>>>> specification one can unambiguously answer the question whether G1 D-
>>>
>>> entails
>>>>>
>>>>> G2.
>>>>>>
>>>>>> For example, if G1 is
>>>>>>
>>>>>> <a, b, "abc@en"^^rdf:text>
>>>>>>
>>>>>> and G2 is
>>>>>>
>>>>>> <a, b, "abc"@en>
>>>>>>
>>>>>> then, according to the existing RDF model theory document, G1
>>>>>> D-entails
>>>
>>> G2
>>>>>
>>>>> and
>>>>>>
>>>>>> vice versa. I don't see what else is there for the rdf:text
>>>
>>> specification to
>>>>>
>>>>> do:
>>>>>>
>>>>>> I really think that the specification is complete. If SPARQL or other
>>>>>> specifications want to apply rdf:text in a different way and create
>>>
>>> special
>>>>>>
>>>>>> cases, they are free to do so; however, I don't think it is in scope
>>>>>> of
>>>
>>> the
>>>>>>
>>>>>> rdf:text specification to solve all such problems.
>>>>>
>>>>> (Hesitantly re-stating use case), consider the use case of the OWLIM
>>>>> plugin for Sesame. If OWLIM forward chains some triples into the
>>>>> Sesame repository with objects like "bob"@en, existing SPARQL queries
>>>>> on the existing Sesame engine will match them as expected. RIF rules
>>>>> can consume those triples and know that any rules applying to a domain
>>>>> of rdf:text apply.
>>>>>
>>>>> Constrast that with an OWLIM which emits triples with objects like
>>>>> "bob@en"^^rdf:text . These triples will not match conventional queries
>>>>> intended to discover e.g. all the folks named "Bob". The Sesame SPARQL
>>>>> implementation can be extended, but then we are in Pat's scenario of
>>>>> fixing RDF by visiting all the deployed code.
>>>>>
>>>>> I expect that any design of rdf:text would have it reacting to plain
>>>>> literals as if they had a datatype of rdf:text and the appropriate
>>>>> lexical transformation. I propose that the simplest complete design is
>>>>> one where the inference of rdf:text objects results in their
>>>>> expression as plain literals, avoiding a dualism between
>>>>> "bob@en"^^rdf:text and "bob"@en which would lose interroperability
>>>>> with existing queries, graph APIs, XPaths operating on SPARQL Results,
>>>>> non-OWL inferencing systems, ...
>>>>>
>>>>>
>>>>>> Regards,
>>>>>>
>>>>>>        Boris
>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: public-rdf-text-request@w3.org [mailto:public-rdf-text-
>>>>>
>>>>> request@w3.org]
>>>>>>>
>>>>>>> On Behalf Of Eric Prud'hommeaux
>>>>>>> Sent: 20 May 2009 03:18
>>>>>>> To: Seaborne, Andy
>>>>>>> Cc: Alan Ruttenberg; public-rdf-text@w3.org; Boris Motik; Sandro
>>>
>>> Hawke;
>>>>>
>>>>> Axel
>>>>>>>
>>>>>>> Polleres
>>>>>>> Subject: Re: A summary of the proposal for resolving the issues with
>>>>>
>>>>> rdf:text
>>>>>>>
>>>>>>> --> Could you please check it one more time?
>>>>>>>
>>>>>>> On Tue, May 19, 2009 at 03:57:11PM +0000, Seaborne, Andy wrote:
>>>>>>>>
>>>>>>>> Apologies:
>>>>>>>>
>>>>>>>>> On Fri, May 15, 2009 at 11:50 AM, Seaborne, Andy
>>>>>
>>>>> <andy.seaborne@hp.com>
>>>>>>>
>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>> Monday PM end before 18:00 (GMT+1)
>>>>>>>>>> Thursday PM.
>>>>>>>>>> Tuesday @17:00 (GMT+1) for a short call; end before 17:30.
>>>>>>>>
>>>>>>>> I can't make the slot.
>>>>>>>>
>>>>>>>> Input: please consider interoperability of data between OWL and RDF.
>>>>>
>>>>> Option
>>>>>>>
>>>>>>> 1 is better for that than option 2 as Eric points out.
>>>>>>>>
>>>>>>>> This is also the least change to LC and IMHO is not a substantive
>>>
>>> change
>>>>>
>>>>> (it
>>>>>>>
>>>>>>> follows on from the current graph exchange intent) to add the text
>>>
>>> needed
>>>>>
>>>>> for
>>>>>>>
>>>>>>> SPARQL.  Roughly: the scoping graph of an rdf-text aware D-entailment
>>>
>>> for
>>>>>
>>>>> BGP
>>>>>>>
>>>>>>> matching includes the RDF forms and does not include ^^rdf:text.
>>>
>>> (Non-
>>>>>
>>>>> aware
>>>>>>>
>>>>>>> entailment regimes would merely treat as a datatype form.)
>>>>>>>
>>>>>>> does anyone oppose option 1 (plain literals are considered to satisfy
>>>>>>> entailments constrained to type rdf:text and entailments of type
>>>
>>> rdf:text
>>>>>
>>>>> are
>>>>>>>
>>>>>>> expressed as plain literals in the RDF graph)? (i'm wondering if we
>>>
>>> can
>>>>>
>>>>> work
>>>>>>>
>>>>>>> this out before we work out scheduling this phone call.)
>>>>>>>
>>>>>>>
>>>>>>>>        Andy
>>>>>>>>
>>>>>>>>> -----Original Message-----
>>>>>>>>> From: Alan Ruttenberg [mailto:alanruttenberg@gmail.com]
>>>>>>>>> Sent: 19 May 2009 16:01
>>>>>>>>> To: Axel Polleres
>>>>>>>>> Cc: Seaborne, Andy; public-rdf-text@w3.org; Boris Motik; Sandro
>>>
>>> Hawke;
>>>>>>>>>
>>>>>>>>> eric@w3.orf
>>>>>>>>> Subject: Re: A summary of the proposal for resolving the issues
>>>
>>> with
>>>>>>>>>
>>>>>>>>> rdf:text --> Could you please check it one more time?
>>>>>>>>>
>>>>>>>>> On Mon, May 18, 2009 at 10:03 AM, Axel Polleres
>>>>>
>>>>> <axel.polleres@deri.org>
>>>>>>>>>
>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>> Alan, since you were calling for the TC, is that fixed now?
>>>>>>>>>> Otherwise, I am afraid it is not possible before Friday.
>>>>>>>>>
>>>>>>>>> Yes, let's have whoever can make it meet at 5:30 BST = 12:30
>>>
>>> Boston
>>>>>>>>>
>>>>>>>>> time.
>>>>>>>>> Zakim, meet on irc #rdftext for the code. I will send a code
>>>
>>> earlier
>>>>>
>>>>> if
>>>>>>>>>
>>>>>>>>> I can.
>>>>>>>>>
>>>>>>>>> -Alan
>>>>>>>
>>>>>
>>>
>>> --
>>> -eric
>>>
>>> office: +1.617.258.5741 32-G528, MIT, Cambridge, MA 02144 USA
>>> mobile: +1.617.599.3509
>>>
>>> (eric@w3.org)
>>> Feel free to forward this message to any list for any purpose other than
>>> email address distribution.
>>
>>
>>
>>
>
> ------------------------------------------------------------
> IHMC                                     (850)434 8903 or (650)494 3973
> 40 South Alcaniz St.           (850)202 4416   office
> Pensacola                            (850)202 4440   fax
> FL 32502                              (850)291 0667   mobile
> phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
>
>
>
>
>
>
>
Received on Thursday, 21 May 2009 18:39:14 UTC