Re: RDF *already* supports literal subjects - a thought experiment

Sandro Hawke wrote:
>> Hi Graham
>>> So far, all this should lead to intended-literals in subject  
>>> position that can
>>> be read by any existing RDF/XML consuming application.
>>> What I'm less sure about is fixing the semantics:  as it stands, the  
>>> RDF
>>> semantics is expressed in terms of allowing arbitrary  
>>> interpretations --
>>> mappings to things in the domain of discourse -- for all URI nodes  
>>> in a graph.
>>> Would it be unreasonable or problematic to say that, for this  
>>> particular form of
>>> URI, the  denotation is fixed by the same general rules that govern  
>>> the
>>> denotation of literals?
>> No, but it would be a semantic extension to RDF, so the folk who have  
>> invested so much into implementing RDF as of 2004 will not support it.  
>> So if this is standardized, their engines will not work properly  
>> without changing some code. So they will not be happy, for the same  
>> reasons they are not happy with the current suggestion.  LIke most  
>> such suggestions along these lines, it will produce problems of its  
>> own, the most obvious being that we would then have two syntactically  
>> distinct but semantically equivalent ways to write every literal in  
>> the places where literals are permitted, requiring engines to check  
>> for all these different forms all the time (in fact, to check *every*  
>> URI in any RDF just in case it is a hidden literal.)  In the case of  
>> plain literals, we would actually have four such ways to write them  
>> instead of the two we have now.
>> Although its ingenious, I think this is laying land-mines for future  
>> developers.
> Still, it might be a good way to grandfather old systems and old
> syntaxes, at some point.   The duplication could be avoided just by
> saying don't do that.  (That is: never serialize as a data-uri-literal
> when you can syntactically use a real literal instead.)  

Hi Pat, Sandro,

I think Sandro's response crystalizes what I was trying to suggest.

To rewind a little, one of the biggest problems of standards deployment, once 
one has an installed base, is to plot a suitable migration path.  That is, 
deployment of a new feature should not break old systems.

Maybe my view is limited, but my perception is that most deployed software 
toolkits don't actually implement the formal semantics.  (I don't mean to imply 
the formal semantics are not important - I think they are but, at the current 
state of development, more of a guide to developers and data model designers 
than enforced in software.)  With such a view, a change in the formal semantics 
to fix (as in constrain, not repair) a family of URIs would have little if any 
practical effect on deployed software.

Taking a slightly different approach:  introducing the data: URIs as suggested 
and not changing the RDF semantics would be entirely consistent with todays RDF 
semantics; some of the intended inferences would not be required by current 
semantics, though would not be disallowed or inconsistent.  Thus, completeness 
of RDF semantics based inferences with respect to the intended semantics would 
be sacrificed, but soundness would not.


So, if one truly does feel a need to introduce literals-as-subjects into RDF' 
(RDF-prime), how is one to deal with existing RDF processing systems.  Providing 
a URI-compatible form for literals seems a reasonable bridging option.  But how 
does one minimize the cost of alternate forms for literals?

I think the answer may lie in avoiding alternative forms in the abstract syntax 
(with respect to which the formal semantics is defined).  Thus, in the abstract 
syntax, the suggested data: URIs would be singled out for prohibition, to be 
replaced by the corresponding literals (a stronger version of Sandro's "Don't do 
that").  Software elements that need to apply the formal semantics would be 
required to deal with only the literal node forms.  And each serialization 
syntax would have its own mapping to the abstract syntax, permitting data: URIs 
or literals or both, as befits the circumstances.

Jeremy noted that many of the potential costs are associated with user 
interfaces that have been built on an assumption of subjects-as-URIs (or 
bNodes).  I can't see the full range of problems here, but from my experience, 
many of these interfaces are set up to use rdfs:label values to represent such 
nodes - an approach that could apply just as well to data: URIs, with the added 
possibility of "inferring" a suitable rdfs:label property (which IIRC is 
semantically void) for any data: URI.  A harder problem here, maybe, is that 
data: URIs don't in general lend themselves to presentation as qnames, which are 
commonly used for presenting URIs compactly (which also restricts their possible 
use as predicates in RDF/XML).


In summary, what Sandro said:  the suggested use of data: URIs be used as a 
transitional measure, whose use is restricted to particular RDF serialization 
forms, and mapped to a common abstract syntax so their use doesn't pollute 
future generations of RDF representation and processing software.


Received on Monday, 12 July 2010 21:12:20 UTC