Re: A summary of the proposal for resolving the issues with rdf:text --> Could you please check it one more time? from Axel Polleres on 2009-05-20 (public-rdf-text@w3.org from April to June 2009)

From: Axel Polleres <axel.polleres@deri.org>
Date: Wed, 20 May 2009 22:17:43 +0100
To: Pat Hayes <phayes@ihmc.us>
CC: Boris Motik <boris.motik@comlab.ox.ac.uk>, Eric Prud'hommeaux <eric@w3.org>, Andy Seaborne <andy.seaborne@hp.com>, Alan Ruttenberg <alanruttenberg@gmail.com>, public-rdf-text@w3.org, Sandro Hawke <sandro@w3.org>
Message-ID: <4A147377.4060805@deri.org>
Pat,

> This  seems to me to be a quite extraordinary and
> amazing observation, one that deserves to be publicized as widely as
> possible (which is why I am CCing this to semantic-web@w3.org).

Allow me to state that I don't find this overly productive (which is why
I take this cc: out in that reply).

I'd appreciate if we could simply join forces instead of mutually
accusing each other for trying to break the Semantic Web. This doesn't
do good to anyone.

Let me remind you that this thread started to set up a joint TC where
the stakeholders get together to resolve the issues. Different
alternatives have been put on the table and pro's and cons discussed
extensivley. I'd kindly ask Alan to coordinate the call as
suggested and would suggest to invite Pat to participate.

Both Pat and Boris (and all others) have made some very valid points in
my opinion (I am not speaking for either RIF, nor SPARQL, nor "for
rdf:text" here).

> We are just doing our job, and making the Semantic Web work
> isn't in our WG charter: we are just concerned with RIF/OWL.

Undoubtedly, neither Boris nor me, nor any other single member can speak
for RIF, OWL nor both of these working groups. Such statement is
therefore surprising.

Undoubtedly, we are facing here some fundamental problems, which seem to
be not restricted to RDF text alone, but in general to the extensibility
of datatypes and D-entailment.

The current status in RIF is that we have put rdf:text in its current
form "at risk" in the RIF documents, hoping the outcome of this
discussion will resolve this for all three specs (OWL, RIF, SPARQL) in a
satisfiable way.

I hope we can set up in one (or several, if necessary) call(s) a
dedicated task force including all stakeholders to work out a
satisfiable solution.

As a side effect, I am optimistic that this whole discussion can serve
as a basis for D-entailment in SPARQL as well, which is very much
appreciated.

with best regards,
Axel

Pat Hayes wrote:
> 
> On May 20, 2009, at 9:57 AM, Boris Motik wrote:
> 
>> Hello,
>>
>> I have to agree that the text in the rdf:text specification might not 
>> reflect
>> correctly the intentions I expressed. Quite frankly, we (i.e., the 
>> authors of
>> the rdf:text specification) haven't been really aware of all the 
>> repercussions
>> and possible interpretations of our spec. The text you refer to at the 
>> end of
>> this e-mail has been introduced as a reaction to one of the earlier 
>> comments by
>> the SPARQL WG.
>>
>> Nevertheless, here is what the goals of rdf:text are:
> 
> Thanks for this summary.
> 
>>
>> 1. Both RIF and OWL 2 find the distinction between plain and typed 
>> literals
>> painful. This is because, whenever one refers to literals, one needs two
>> subcases: for a plain and for a typed literal.
> 
> So?
> 
>> Both RIF and OWL 2 have
>> independently come up with exactly the same idea: they opted to 
>> represent the
>> "semantic content" of plain literals through typed literals whose 
>> value is the
>> same as the corresponding plain literals.
> 
> Thanks for making this clear in a public forum. OWL 2 and RIF are 
> deliberately, by design, creating a central incompatibility with a basic 
> feature of RDF. This  seems to me to be a quite extraordinary and 
> amazing observation, one that deserves to be publicized as widely as 
> possible (which is why I am CCing this to semantic-web@w3.org). Why 
> would two W3C WGs set out to deliberately *create* interoperability 
> problems with other W3C standards, just when those standards are 
> beginning to achieve widespread acceptance?
> 
>> This makes the definitions and the
>> semantic treatment of literals in both RIF and OWL 2 much simpler.
> 
> It makes it more elegant, yes, but is there really a PROBLEM here that 
> needs to be solved? That is, what actual issues for users or 
> implementations are posed by the presence of two literal forms? Or is 
> this discomfort simply a matter of theoretician's feelings of inelegance 
> or clumsiness? Because if the latter (as I strongly suspect), this is 
> not a sufficient reason to attempt to retroactively undermine the 
> existing RDF standard, and to deliberately create what I believe will be 
> troublesome and awkward problems for an entire generation of 
> implementations, and certainly for a majority of existing ones. Creating 
> problems like this is exactly what W3C WGs should NOT be doing, 
> especially at a critical point in the deployment of SWeb technology. 
> Google just quietly announced their cautious support for RDFA. It is not 
> exactly a great idea for two W3C WGs to be at that very moment 
> deliberately attempting to undermine one of the basic aspects of the RDF 
> design. Elegance, is, frankly, not of central importance right now.
> 
>>
>>
>> 2. Both RIF and OWL 2 need a mechanism to refer to the set of all plain
>> literals. For example, in OWL 2 you might want to say "the range is a 
>> piece of
>> text".
> 
> That problem can be trivially solved by introducing a class of such 
> values, and giving it a reserved name. RDF plain literals denote 
> themselves, so that the class of plain literal values is also the class 
> of plain literals which is also the class of pieces of text.
> 
>> In OWL 2 this is very important because of facets. Using a datatype for
>> this purpose is natural.
> 
> Natural. maybe, but not REQUIRED. And given the problems that it causes, 
> maybe it isn't so natural after all. Think of OWL 2 as part of an 
> existing world-wide deployment of SWeb systems, and then ask if it is 
> 'natural'.
> 
>> Both RIF and OWL 2 have chosen to follow the
>> definitions of datatypes from XML Schema. Thus, each datatype consists 
>> of a set
>> of lexical values, a value space, and a L2V mapping. Plain literals do 
>> not
>> follow these principles
> 
> Of course they do. Abstractly, the plain literal 'datatype' is as 
> follows: the lexical space is all character strings; the value space is 
> all character strings; and the L2V mapping is the identity map. Obvious 
> extension to the case of tagged literals. Where is the conceptual 
> problem here?
> 
>> ; therefore, rdf:text defines lexical values that encode
>> the content of plain literals.
> 
> Giving rise immediately, and predictably, to the interoperability 
> nightmare of there being two ways to represent one ubiquitous kind of 
> thing - a piece of unmarked text, with an optional language tag -  which 
> require exotic means to establish their equivalence, and different specs 
> requiring different ways to be used. This is an elementary 
> systems-engineering mistake, a decision which could have been designed 
> to create global systemic problems. (Even the "managerial" situation of 
> narrowly focussed WGs working on parts of the problem in isolation is 
> classic. Future systems engineering 101 course textbooks will be able to 
> cite this as an example.)
> 
>> Now as I have already said, we have not had the complete store as 
>> clear in our
>> minds right from the beginning. Given all the LC comments (which have 
>> by the way
>> have been quite useful and have significantly improved the spec), 
>> however, both
>> RIF and OWL 2 have agreed that the view I proposed in my e-mail is the
>> appropriate one (at least from the RIF and OWL 2 points of view).
> 
> All SWeb WG's points of view should be primarily to further the 
> deployment of the SWeb.
> 
>> As I've stated
>> in my summary e-mail, to achieve this we simply need to remove from the
>> specification any special treatment of rdf:text: this should be a 
>> datatype like
>> any other. This is precisely the part of the document that you are 
>> referring to.
>>
>> Thus, the final version of the document would not mention any 
>> interoperability
>> problems.
> 
> How wonderful. We will not mention them, so they will have gone away. 
> Or, more precisely, they have not gone away, but they aren't OUR 
> problem.  We are just doing our job, and making the Semantic Web work 
> isn't in our WG charter: we are just concerned with RIF/OWL.
> 
> Sorry about the sarcastic tone, but this really does deserve it.
> 
>> Furthermore, we may also rework the introduction to make the intention
>> behind rdf:text clearer.
> 
> Certainly, the rdf:text document is very misleading as written. It 
> purports to be about representing internationalized text, which is 
> clearly not even close to the truth, and it does not even mention the 
> apparently real motivation, which is (see above) to create 
> incompatibilities with the RDF plain literal design.
> 
> Pat
> 
>>
>> Regards,
>>
>>     Boris
>>
>>> -----Original Message-----
>>> From: Eric Prud'hommeaux [mailto:eric@w3.org]
>>> Sent: 20 May 2009 16:36
>>> To: Boris Motik
>>> Cc: 'Seaborne, Andy'; 'Alan Ruttenberg'; public-rdf-text@w3.org; 'Sandro
>>> Hawke'; 'Axel Polleres'
>>> Subject: Re: A summary of the proposal for resolving the issues with 
>>> rdf:text
>>> --> Could you please check it one more time?
>>>
>>> On Wed, May 20, 2009 at 01:38:29PM +0200, Boris Motik wrote:
>>>> Hello,
>>>>
>>>> I fully appreciate use case and I agree with your observation: this is
>>> something
>>>> that has to be addressed. I don't think, however, that solving this 
>>>> problem
>>> is
>>>> in the domain of rdf:text. The rdf:text specification merely defines 
>>>> yet
>>> another
>>>> datatype by specifying it in exactly the same way as this is done in 
>>>> XML
>>> Schema.
>>>> This datatype is just like any other XML Schema datatype; hence, the 
>>>> job
>>> from
>>>> rdf:text's point of view is done.
>>>
>>> Ahh, perhaps we have different goals for rdf:text. rdf:text was, if I
>>> understand, created to address the issue that one could not infer the
>>> presenece of or the consequences of plain literals. One could fill
>>> that hole by creating a datatype that consumes and infers plain
>>> literals, or one could create a datatype which bijects to plain
>>> literals. Special machinery associated with that datatype is required
>>> in either case.
>>>
>>> (I, who was not involved in rdf:text except as an afterthought,
>>> argue that it is intended to take the former approach. You, an
>>> author, argue something closer to the latter.
>>> )
>>>
>>>> Furthermore, the addition of rdf:text to the mix of the supported 
>>>> datatypes
>>> adds
>>>> no new conceptual problems to SPARQL: the situation with rdf:text is no
>>>> different than with, say, xsd:integer (there are other examples as 
>>>> well).
>>> For
>>>> example, assume that you have an RDF graph
>>>>
>>>> G = { <a, b, "1"^xsd:integer> }
>>>>
>>>> but you ask the query
>>>>
>>>> Q = { <a, b, "1.0"^^xsd:decimal> }.
>>>>
>>>> Clearly, G D-entails Q, so Q should be answered as TRUE in G. It is 
>>>> not the
>>>> business of XML Schema to specify how this is to be achieved: XML 
>>>> Schema
>>> merely
>>>> specifies what the correct answer to the above question is. It is a 
>>>> SPARQL
>>>> implementation such as OWLIM that should think of how to support such a
>>>> definition.
>>>
>>> SPARQL is defined in terms of the graph, so Q will fail to match G. As
>>> entailments supplement the graph, a D-entailing system confronted with
>>>  <a, b, "1"^xsd:integer>
>>>
>>> will have a (notional) graph
>>>  G = { <a, b, "1"^xsd:integer> .
>>>        <a, b, "1.0"^^xsd:decimal> . }.
>>>
>>> I'd say that we're aguing whether <a, b, "bob@en"^^rdf:text> shows up
>>> in the graph. You propose something like:
>>>  <a, b, "bob@en"^^rdf:text> D-entails to
>>>  G = { <a, b, "bob@en"^^rdf:text> .
>>>        <a, b, "bob"@en> . }.
>>> while I propose it that you never utter <a, b, "bob@en"^^rdf:text> and
>>> have the tools that implement the specification produce only <a, b, 
>>> "bob"@en>
>>> .
>>>
>>>
>>>> I don't know whether a solution to the above problem (with 
>>>> xsd:integer and
>>>> xsd:decimal) exists. If not, I agree that one should be developed; 
>>>> however,
>>> we
>>>> would not go to the XML Schema WG and ask them to specify how should 
>>>> SPARQL
>>>> handle this case, would we?
>>>>
>>>> The problem with rdf:text is *precisely* the same as the one that I 
>>>> outlined
>>>> above. At an abstract level, it can be stated as "Several syntactic 
>>>> forms of
>>>> literals get mapped to the semantically identical data values". AS
>>> demonstrated
>>>> above, this problem exists without rdf:text, so I don't see how 
>>>> rdf:text
>>> brings
>>>> anything new into the whole picture. Thus, you can apply to the 
>>>> rdf:text
>>> case
>>>> exactly the same solution that you would apply to xsd:integer and
>>> xsd:decimal.
>>>
>>> Your proposal is analogous to the D-entailment of numeric types, while
>>> I interpret the rdf:text last call wording as attempting to reduce the
>>> interrop challenges that would stem from spotty coverage with respect
>>> to that D-entailment.
>>>
>>>
>>>> If such a solution doesn't exist yet, then the SPARQL WG should address
>>> these
>>>> issues, and it should do so in general for all datatypes (xsd:integer,
>>>> xsd:decimal, and so on), not just for rdf:text.
>>>
>>> I'd argue that it's more of an RDF Core issue (admitting that they
>>> don't exist). To solve an entailment especially for SPARQL sidesteps
>>> the other folks who want to know what's in the graph, for instance, an
>>> RDF graph API (such as exist in Jena, Sesame, ...), other entailment
>>> regimes that may or may not stack on top of OWL (imagine an
>>> app-directed regime like FOAF smushing), as well as secondary
>>> consumers of RDF graphs, for instance, an XSLT wich runs on the XML
>>> results returned from a SPARQL query service.
>>>
>>>
>>>> To summarize, I think that the work from the point of view of the 
>>>> rdf:text
>>> WG is
>>>> *done* and that we should not do anything else in this forum.
>>>
>>> Andy has argued that approach 1 is the only of the 3 that is
>>> compatible with this text from the last call document:
>>> [[
>>> Despite the semantic equivalence between typed rdf:text literals and
>>> plain literals, the presence of typed rdf:text literals in an RDF
>>> graph might cause interoperability problems between RDF tools, as not
>>> all RDF tools will support rdf:text. Therefore, before exchanging an
>>> RDF graph with other RDF tools, an RDF tool that suports rdf:text MUST
>>> replace in the graph each typed rdf:text literal with the
>>> corresponding plain literal. The notion of graph exchange includes,
>>> but is not limited to, the process of serializing an RDF graph using
>>> any (normative or nonnormative) RDF syntax.
>>> ]]. \1 is clarifying the boundries of the above graph exchange.
>>>
>>>> Regards,
>>>>
>>>>     Boris
>>>>
>>>>> -----Original Message-----
>>>>> From: Eric Prud'hommeaux [mailto:eric@w3.org]
>>>>> Sent: 20 May 2009 13:18
>>>>> To: Boris Motik
>>>>> Cc: 'Seaborne, Andy'; 'Alan Ruttenberg'; public-rdf-text@w3.org; 
>>>>> 'Sandro
>>>>> Hawke'; 'Axel Polleres'
>>>>> Subject: Re: A summary of the proposal for resolving the issues with
>>> rdf:text
>>>>> --> Could you please check it one more time?
>>>>>
>>>>> On Wed, May 20, 2009 at 09:29:00AM +0200, Boris Motik wrote:
>>>>>> Hello,
>>>>>>
>>>>>> I don't see the benefit of option 1, as it makes things unnecessarily
>>>>> complex.
>>>>>> The fewer exceptions we have, the easier it will be to actually
>>> implement a
>>>>>> conformant system. The dichotomy between plain und typed literals is
>>> just an
>>>>>> example of an exception that just makes implementation difficult.
>>> Instead of
>>>>>> introducing more special cases, I think we should unify these 
>>>>>> whenever
>>>>> possible.
>>>>>>
>>>>>> Furthermore, I'm not sure whether sorting out things such as the ones
>>>>> pointed
>>>>>> out below is necessary to finalize the rdf:text specification. Please
>>> note
>>>>> that
>>>>>> rdf:text already has a well-defined lexical and value space, and 
>>>>>> this is
>>>>> *the
>>>>>> only* thing that we need to be able to plug rdf:text into the model
>>> theory
>>>>> of
>>>>>> RDF. That is, given RDF graphs G1 and G2 possibly containing rdf:text
>>>>> literals
>>>>>> and/or plain literals, using the definitions from the present 
>>>>>> rdf:text
>>>>>> specification one can unambiguously answer the question whether G1 D-
>>> entails
>>>>> G2.
>>>>>> For example, if G1 is
>>>>>>
>>>>>> <a, b, "abc@en"^^rdf:text>
>>>>>>
>>>>>> and G2 is
>>>>>>
>>>>>> <a, b, "abc"@en>
>>>>>>
>>>>>> then, according to the existing RDF model theory document, G1 
>>>>>> D-entails
>>> G2
>>>>> and
>>>>>> vice versa. I don't see what else is there for the rdf:text
>>> specification to
>>>>> do:
>>>>>> I really think that the specification is complete. If SPARQL or other
>>>>>> specifications want to apply rdf:text in a different way and create
>>> special
>>>>>> cases, they are free to do so; however, I don't think it is in 
>>>>>> scope of
>>> the
>>>>>> rdf:text specification to solve all such problems.
>>>>>
>>>>> (Hesitantly re-stating use case), consider the use case of the OWLIM
>>>>> plugin for Sesame. If OWLIM forward chains some triples into the
>>>>> Sesame repository with objects like "bob"@en, existing SPARQL queries
>>>>> on the existing Sesame engine will match them as expected. RIF rules
>>>>> can consume those triples and know that any rules applying to a domain
>>>>> of rdf:text apply.
>>>>>
>>>>> Constrast that with an OWLIM which emits triples with objects like
>>>>> "bob@en"^^rdf:text . These triples will not match conventional queries
>>>>> intended to discover e.g. all the folks named "Bob". The Sesame SPARQL
>>>>> implementation can be extended, but then we are in Pat's scenario of
>>>>> fixing RDF by visiting all the deployed code.
>>>>>
>>>>> I expect that any design of rdf:text would have it reacting to plain
>>>>> literals as if they had a datatype of rdf:text and the appropriate
>>>>> lexical transformation. I propose that the simplest complete design is
>>>>> one where the inference of rdf:text objects results in their
>>>>> expression as plain literals, avoiding a dualism between
>>>>> "bob@en"^^rdf:text and "bob"@en which would lose interroperability
>>>>> with existing queries, graph APIs, XPaths operating on SPARQL Results,
>>>>> non-OWL inferencing systems, ...
>>>>>
>>>>>
>>>>>> Regards,
>>>>>>
>>>>>>     Boris
>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: public-rdf-text-request@w3.org [mailto:public-rdf-text-
>>>>> request@w3.org]
>>>>>>> On Behalf Of Eric Prud'hommeaux
>>>>>>> Sent: 20 May 2009 03:18
>>>>>>> To: Seaborne, Andy
>>>>>>> Cc: Alan Ruttenberg; public-rdf-text@w3.org; Boris Motik; Sandro
>>> Hawke;
>>>>> Axel
>>>>>>> Polleres
>>>>>>> Subject: Re: A summary of the proposal for resolving the issues with
>>>>> rdf:text
>>>>>>> --> Could you please check it one more time?
>>>>>>>
>>>>>>> On Tue, May 19, 2009 at 03:57:11PM +0000, Seaborne, Andy wrote:
>>>>>>>> Apologies:
>>>>>>>>
>>>>>>>>> On Fri, May 15, 2009 at 11:50 AM, Seaborne, Andy
>>>>> <andy.seaborne@hp.com>
>>>>>>> wrote:
>>>>>>>>>> Monday PM end before 18:00 (GMT+1)
>>>>>>>>>> Thursday PM.
>>>>>>>>>> Tuesday @17:00 (GMT+1) for a short call; end before 17:30.
>>>>>>>>
>>>>>>>> I can't make the slot.
>>>>>>>>
>>>>>>>> Input: please consider interoperability of data between OWL and 
>>>>>>>> RDF.
>>>>> Option
>>>>>>> 1 is better for that than option 2 as Eric points out.
>>>>>>>>
>>>>>>>> This is also the least change to LC and IMHO is not a substantive
>>> change
>>>>> (it
>>>>>>> follows on from the current graph exchange intent) to add the text
>>> needed
>>>>> for
>>>>>>> SPARQL.  Roughly: the scoping graph of an rdf-text aware 
>>>>>>> D-entailment
>>> for
>>>>> BGP
>>>>>>> matching includes the RDF forms and does not include ^^rdf:text.
>>> (Non-
>>>>> aware
>>>>>>> entailment regimes would merely treat as a datatype form.)
>>>>>>>
>>>>>>> does anyone oppose option 1 (plain literals are considered to 
>>>>>>> satisfy
>>>>>>> entailments constrained to type rdf:text and entailments of type
>>> rdf:text
>>>>> are
>>>>>>> expressed as plain literals in the RDF graph)? (i'm wondering if we
>>> can
>>>>> work
>>>>>>> this out before we work out scheduling this phone call.)
>>>>>>>
>>>>>>>
>>>>>>>>     Andy
>>>>>>>>
>>>>>>>>> -----Original Message-----
>>>>>>>>> From: Alan Ruttenberg [mailto:alanruttenberg@gmail.com]
>>>>>>>>> Sent: 19 May 2009 16:01
>>>>>>>>> To: Axel Polleres
>>>>>>>>> Cc: Seaborne, Andy; public-rdf-text@w3.org; Boris Motik; Sandro
>>> Hawke;
>>>>>>>>> eric@w3.orf
>>>>>>>>> Subject: Re: A summary of the proposal for resolving the issues
>>> with
>>>>>>>>> rdf:text --> Could you please check it one more time?
>>>>>>>>>
>>>>>>>>> On Mon, May 18, 2009 at 10:03 AM, Axel Polleres
>>>>> <axel.polleres@deri.org>
>>>>>>>>> wrote:
>>>>>>>>>> Alan, since you were calling for the TC, is that fixed now?
>>>>>>>>>> Otherwise, I am afraid it is not possible before Friday.
>>>>>>>>>
>>>>>>>>> Yes, let's have whoever can make it meet at 5:30 BST = 12:30
>>> Boston
>>>>>>>>> time.
>>>>>>>>> Zakim, meet on irc #rdftext for the code. I will send a code
>>> earlier
>>>>> if
>>>>>>>>> I can.
>>>>>>>>>
>>>>>>>>> -Alan
>>>>>>>
>>>>>
>>>
>>> -- 
>>> -eric
>>>
>>> office: +1.617.258.5741 32-G528, MIT, Cambridge, MA 02144 USA
>>> mobile: +1.617.599.3509
>>>
>>> (eric@w3.org)
>>> Feel free to forward this message to any list for any purpose other than
>>> email address distribution.
>>
>>
>>
>>
> 
> ------------------------------------------------------------
> IHMC                                     (850)434 8903 or (650)494 3973
> 40 South Alcaniz St.           (850)202 4416   office
> Pensacola                            (850)202 4440   fax
> FL 32502                              (850)291 0667   mobile
> phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
> 
> 
> 
> 
> 


-- 
Dr. Axel Polleres
Digital Enterprise Research Institute, National University of Ireland,
Galway
email: axel.polleres@deri.org  url: http://www.polleres.net/
Received on Wednesday, 20 May 2009 21:18:45 UTC