Re: RDF *already* supports literal subjects - a thought experiment from Paola Di Maio on 2010-07-13 (semantic-web@w3.org from July 2010)

From: Paola Di Maio <paola.dimaio@gmail.com>
Date: Tue, 13 Jul 2010 08:08:51 +0000
To: Semantic Web <semantic-web@w3.org>
Message-ID: <AANLkTikfTfAkGGpXV5nwCFF497HR7C8e1CczjH9xd4_g@mail.gmail.com>
In Systems Engineering (system sciences these days)   differing theoretical
assumptions are settled  by  setting up rigorous benchmarking/testing
and measuring corresponding  performance levels, costs, benefits, etc with
painstaking detail

Then speak.with facts in hand

Only  thorough real measurements can demonstrate how the respective
propositions hold true

For this  to work (and not be biased) it is necessary to ensure the
soundness and neutrality of the tests

Otherwise  is just a  matter of who is capable of arguing the
strongest/longest (and some are more inclined than others to do so)

The world has been waiting to use the semantic web for  years, something,
somewhere must be going wrong


P


On Mon, Jul 12, 2010 at 11:11 PM, Pat Hayes <phayes@ihmc.us> wrote:

>
> On Jul 12, 2010, at 4:11 PM, Graham Klyne wrote:
>
>  Sandro Hawke wrote:
>>
>>> Hi Graham
>>>>
>>>>  So far, all this should lead to intended-literals in subject  position
>>>>> that can
>>>>> be read by any existing RDF/XML consuming application.
>>>>>
>>>>> What I'm less sure about is fixing the semantics:  as it stands, the
>>>>>  RDF
>>>>> semantics is expressed in terms of allowing arbitrary  interpretations
>>>>> --
>>>>> mappings to things in the domain of discourse -- for all URI nodes  in
>>>>> a graph.
>>>>> Would it be unreasonable or problematic to say that, for this
>>>>>  particular form of
>>>>> URI, the  denotation is fixed by the same general rules that govern
>>>>>  the
>>>>> denotation of literals?
>>>>>
>>>>
>>>> No, but it would be a semantic extension to RDF, so the folk who have
>>>>  invested so much into implementing RDF as of 2004 will not support it.  So
>>>> if this is standardized, their engines will not work properly  without
>>>> changing some code. So they will not be happy, for the same  reasons they
>>>> are not happy with the current suggestion.  LIke most  such suggestions
>>>> along these lines, it will produce problems of its  own, the most obvious
>>>> being that we would then have two syntactically  distinct but semantically
>>>> equivalent ways to write every literal in  the places where literals are
>>>> permitted, requiring engines to check  for all these different forms all the
>>>> time (in fact, to check *every*  URI in any RDF just in case it is a hidden
>>>> literal.)  In the case of  plain literals, we would actually have four such
>>>> ways to write them  instead of the two we have now.
>>>>
>>>> Although its ingenious, I think this is laying land-mines for future
>>>>  developers.
>>>>
>>> Still, it might be a good way to grandfather old systems and old
>>> syntaxes, at some point.   The duplication could be avoided just by
>>> saying don't do that.  (That is: never serialize as a data-uri-literal
>>> when you can syntactically use a real literal instead.)
>>>
>>
>> Hi Pat, Sandro,
>>
>> I think Sandro's response crystalizes what I was trying to suggest.
>>
>> To rewind a little, one of the biggest problems of standards deployment,
>> once one has an installed base, is to plot a suitable migration path.  That
>> is, deployment of a new feature should not break old systems.
>>
>> Maybe my view is limited, but my perception is that most deployed software
>> toolkits don't actually implement the formal semantics.  (I don't mean to
>> imply the formal semantics are not important - I think they are but, at the
>> current state of development, more of a guide to developers and data model
>> designers than enforced in software.)  With such a view, a change in the
>> formal semantics to fix (as in constrain, not repair) a family of URIs would
>> have little if any practical effect on deployed software.
>>
>> Taking a slightly different approach:  introducing the data: URIs as
>> suggested and not changing the RDF semantics would be entirely consistent
>> with todays RDF semantics; some of the intended inferences would not be
>> required by current semantics, though would not be disallowed or
>> inconsistent.  Thus, completeness of RDF semantics based inferences with
>> respect to the intended semantics would be sacrificed, but soundness would
>> not.
>>
>> ...
>>
>> So, if one truly does feel a need to introduce literals-as-subjects into
>> RDF' (RDF-prime), how is one to deal with existing RDF processing systems.
>>  Providing a URI-compatible form for literals seems a reasonable bridging
>> option.  But how does one minimize the cost of alternate forms for literals?
>>
>> I think the answer may lie in avoiding alternative forms in the abstract
>> syntax (with respect to which the formal semantics is defined).  Thus, in
>> the abstract syntax, the suggested data: URIs would be singled out for
>> prohibition, to be replaced by the corresponding literals (a stronger
>> version of Sandro's "Don't do that").  Software elements that need to apply
>> the formal semantics would be required to deal with only the literal node
>> forms.  And each serialization syntax would have its own mapping to the
>> abstract syntax, permitting data: URIs or literals or both, as befits the
>> circumstances.
>>
>> Jeremy noted that many of the potential costs are associated with user
>> interfaces that have been built on an assumption of subjects-as-URIs (or
>> bNodes).  I can't see the full range of problems here, but from my
>> experience, many of these interfaces are set up to use rdfs:label values to
>> represent such nodes - an approach that could apply just as well to data:
>> URIs, with the added possibility of "inferring" a suitable rdfs:label
>> property (which IIRC is semantically void) for any data: URI.  A harder
>> problem here, maybe, is that data: URIs don't in general lend themselves to
>> presentation as qnames, which are commonly used for presenting URIs
>> compactly (which also restricts their possible use as predicates in
>> RDF/XML).
>>
>> ...
>>
>> In summary, what Sandro said:  the suggested use of data: URIs be used as
>> a transitional measure, whose use is restricted to particular RDF
>> serialization forms, and mapped to a common abstract syntax so their use
>> doesn't pollute future generations of RDF representation and processing
>> software.
>>
>> #g
>>
>
> Let me try to state as crisply as possible what I see as wrong with this
> idea. In sum, it is about as bad an idea as anyone could propose, IMO: it
> does not solve the problem, it creates more confusion and complexity to work
> around a bug that should never have been allowed to happen in the first
> place, and it won't actually work, in practice, for utterly predictable
> social reasons. (Sorry, Graham, and nothing personal.)
>
> First, as others have noted, we do already have a workable, if ugly, way to
> state what anyone might need to state with a literal subject in RDF already:
> instead of writing the obvious
>
> <literal> :p :o .
>
> one can write
>
> _:x :same <literal> .
> _:x :p :o .
>
> using whatever form of :same one prefers, such as owl:sameAs. So we don't
> need another complicated work-around. The point of allowing literals as
> subjects was to avoid having to use a work-around, not to invent a new one;
> and also, in fact, to simplify RDF and make it more elegant, also not a
> purpose which is served by yet another work-around. So this idea doesn't
> really help.
>
> But worse, it creates a whole new set of awkwardnesses. While having
> something which is syntactically a URI but semantically a literal does sneak
> the literal past the parsers, it does not get it past any inference engines
> that might be waiting at the other side. And those engines now have a truly
> awful task. Some of the URIs they are looking at are actually literals in
> disguise, and those have to be treated specially, differently from other
> URIs. In fact they have to be treated like literals, because they are
> literals in disguise (LIDs). But which of the many URIs are LIDs? The only
> way to find out is to micro-parse the URIs themselves, and so you have to do
> that to all of them (in subject or object position). And when you do find a
> LID, what do you do? Its impossible to completely exhaust all the inferences
> that might be relevant to these LID things at one time, as new information
> might crop up later; and in any case, the same value might also occur as
> actual literals in object positions, and the engine needs to be smart enough
> to do to LIDS anything that it can do to a typed literal. So you have to
> somehow mark them as being LIDs with a literal value, and record that value
> in a form that allows interoperation with literals. In fact, the smartest
> thing to do would probably be to just replace them with the corresponding
> literal. Which gets us back to a familiar issue, one might recall.
>
> Worse still, this proposal drives a truck through the RDF model and
> semantics. The basic model of RDF is that URI references (IRIs, now) are
> basically names. Each of them identifies something, and that is all that
> they do. Then all the RDF meaning is defined by what they identify, and that
> is how the interpretation-based semantics works. This is entirely
> conventional and based on nice, standard, classical theory all out of the
> old textbooks. But if we allow the meanings of some (but not all) of these
> names to be determined by their micro-lexical syntax, this completely
> changes the game. Those LIDs aren't just names any more. I'm not saying it
> cannot be done - it can - but it would require re-writing (and re-thinking)
> the entire RDF syntax and semantic model from the ground up. This is WAY
> worse then just allowing literals in subject position, which is really
> almost no change at all to RDF itself (even if it does break some existing
> software.)
>
> Finally, just on social engineering grounds, Sandro's "Don't do that" idea
> is guaranteed to fail. People will do that (and who can blame them?) and
> then you will get RDF infected with these LIDs all over the place, and other
> RDF with actual literals, and then some mixed RDF with LIDs and literals.
> And (as folk will no doubt tell you, with some asperity) the semantics
> treats these as interchangeable and equivalent, so what is the problem? So
> actual deployed engines will, in fact, find it necessary to handle both
> kinds of literal forms in all kinds of positions, just to be able to survive
> in a world of real-life RDF.
>
> We have already been here, in fact, in a small way. RDF cognoscenti should
> be reminded at this point of the issue that we already had when we allowed
> plain RDF literals to be exactly equivalent to typed literals with the
> xsd:string datatype. This seemed harmless when we did it, and semantically
> it is trivial; but it gives all kinds of problems to inference engines, and
> so has turned out to be a nightmare of such horror that it has been
> seriously proposed to back-engineer RDF so that plain literals are
> considered to be retrospectively typed with a special RDF plain: datatype.
> BUt that raises problems of its own, as it violates the RDF specs in subtle
> but important ways. And there has to be a syntactic marker for the enclosed
> strings, and.... Blech. That was ONE datatype. Now amplify this mess by at
> least a dozen, going on a hundred, datatypes, all with what will be, in
> effect, literals in two forms, syntactically incompatible but semantically
> identical. If I were an RDF developer, I would go look for a different job
> at this point.
>
> Pat
>
>
>
>
> ------------------------------------------------------------
> IHMC                                     (850)434 8903 or (650)494 3973
> 40 South Alcaniz St.           (850)202 4416   office
> Pensacola                            (850)202 4440   fax
> FL 32502                              (850)291 0667   mobile
> phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
>
>
>
>
>
>
>
Received on Tuesday, 13 July 2010 08:09:21 UTC