- From: Pat Hayes <phayes@ihmc.us>
- Date: Mon, 12 Jul 2010 18:11:35 -0500
- To: Graham Klyne <GK@ninebynine.org>
- Cc: Sandro Hawke <sandro@w3.org>, Semantic Web <semantic-web@w3.org>
On Jul 12, 2010, at 4:11 PM, Graham Klyne wrote: > Sandro Hawke wrote: >>> Hi Graham >>> >>>> So far, all this should lead to intended-literals in subject >>>> position that can >>>> be read by any existing RDF/XML consuming application. >>>> >>>> What I'm less sure about is fixing the semantics: as it stands, >>>> the RDF >>>> semantics is expressed in terms of allowing arbitrary >>>> interpretations -- >>>> mappings to things in the domain of discourse -- for all URI >>>> nodes in a graph. >>>> Would it be unreasonable or problematic to say that, for this >>>> particular form of >>>> URI, the denotation is fixed by the same general rules that >>>> govern the >>>> denotation of literals? >>> >>> No, but it would be a semantic extension to RDF, so the folk who >>> have invested so much into implementing RDF as of 2004 will not >>> support it. So if this is standardized, their engines will not >>> work properly without changing some code. So they will not be >>> happy, for the same reasons they are not happy with the current >>> suggestion. LIke most such suggestions along these lines, it >>> will produce problems of its own, the most obvious being that we >>> would then have two syntactically distinct but semantically >>> equivalent ways to write every literal in the places where >>> literals are permitted, requiring engines to check for all these >>> different forms all the time (in fact, to check *every* URI in >>> any RDF just in case it is a hidden literal.) In the case of >>> plain literals, we would actually have four such ways to write >>> them instead of the two we have now. >>> >>> Although its ingenious, I think this is laying land-mines for >>> future developers. >> Still, it might be a good way to grandfather old systems and old >> syntaxes, at some point. The duplication could be avoided just by >> saying don't do that. (That is: never serialize as a data-uri- >> literal >> when you can syntactically use a real literal instead.) > > Hi Pat, Sandro, > > I think Sandro's response crystalizes what I was trying to suggest. > > To rewind a little, one of the biggest problems of standards > deployment, once one has an installed base, is to plot a suitable > migration path. That is, deployment of a new feature should not > break old systems. > > Maybe my view is limited, but my perception is that most deployed > software toolkits don't actually implement the formal semantics. (I > don't mean to imply the formal semantics are not important - I think > they are but, at the current state of development, more of a guide > to developers and data model designers than enforced in software.) > With such a view, a change in the formal semantics to fix (as in > constrain, not repair) a family of URIs would have little if any > practical effect on deployed software. > > Taking a slightly different approach: introducing the data: URIs as > suggested and not changing the RDF semantics would be entirely > consistent with todays RDF semantics; some of the intended > inferences would not be required by current semantics, though would > not be disallowed or inconsistent. Thus, completeness of RDF > semantics based inferences with respect to the intended semantics > would be sacrificed, but soundness would not. > > ... > > So, if one truly does feel a need to introduce literals-as-subjects > into RDF' (RDF-prime), how is one to deal with existing RDF > processing systems. Providing a URI-compatible form for literals > seems a reasonable bridging option. But how does one minimize the > cost of alternate forms for literals? > > I think the answer may lie in avoiding alternative forms in the > abstract syntax (with respect to which the formal semantics is > defined). Thus, in the abstract syntax, the suggested data: URIs > would be singled out for prohibition, to be replaced by the > corresponding literals (a stronger version of Sandro's "Don't do > that"). Software elements that need to apply the formal semantics > would be required to deal with only the literal node forms. And > each serialization syntax would have its own mapping to the abstract > syntax, permitting data: URIs or literals or both, as befits the > circumstances. > > Jeremy noted that many of the potential costs are associated with > user interfaces that have been built on an assumption of subjects-as- > URIs (or bNodes). I can't see the full range of problems here, but > from my experience, many of these interfaces are set up to use > rdfs:label values to represent such nodes - an approach that could > apply just as well to data: URIs, with the added possibility of > "inferring" a suitable rdfs:label property (which IIRC is > semantically void) for any data: URI. A harder problem here, maybe, > is that data: URIs don't in general lend themselves to presentation > as qnames, which are commonly used for presenting URIs compactly > (which also restricts their possible use as predicates in RDF/XML). > > ... > > In summary, what Sandro said: the suggested use of data: URIs be > used as a transitional measure, whose use is restricted to > particular RDF serialization forms, and mapped to a common abstract > syntax so their use doesn't pollute future generations of RDF > representation and processing software. > > #g Let me try to state as crisply as possible what I see as wrong with this idea. In sum, it is about as bad an idea as anyone could propose, IMO: it does not solve the problem, it creates more confusion and complexity to work around a bug that should never have been allowed to happen in the first place, and it won't actually work, in practice, for utterly predictable social reasons. (Sorry, Graham, and nothing personal.) First, as others have noted, we do already have a workable, if ugly, way to state what anyone might need to state with a literal subject in RDF already: instead of writing the obvious <literal> :p :o . one can write _:x :same <literal> . _:x :p :o . using whatever form of :same one prefers, such as owl:sameAs. So we don't need another complicated work-around. The point of allowing literals as subjects was to avoid having to use a work-around, not to invent a new one; and also, in fact, to simplify RDF and make it more elegant, also not a purpose which is served by yet another work- around. So this idea doesn't really help. But worse, it creates a whole new set of awkwardnesses. While having something which is syntactically a URI but semantically a literal does sneak the literal past the parsers, it does not get it past any inference engines that might be waiting at the other side. And those engines now have a truly awful task. Some of the URIs they are looking at are actually literals in disguise, and those have to be treated specially, differently from other URIs. In fact they have to be treated like literals, because they are literals in disguise (LIDs). But which of the many URIs are LIDs? The only way to find out is to micro-parse the URIs themselves, and so you have to do that to all of them (in subject or object position). And when you do find a LID, what do you do? Its impossible to completely exhaust all the inferences that might be relevant to these LID things at one time, as new information might crop up later; and in any case, the same value might also occur as actual literals in object positions, and the engine needs to be smart enough to do to LIDS anything that it can do to a typed literal. So you have to somehow mark them as being LIDs with a literal value, and record that value in a form that allows interoperation with literals. In fact, the smartest thing to do would probably be to just replace them with the corresponding literal. Which gets us back to a familiar issue, one might recall. Worse still, this proposal drives a truck through the RDF model and semantics. The basic model of RDF is that URI references (IRIs, now) are basically names. Each of them identifies something, and that is all that they do. Then all the RDF meaning is defined by what they identify, and that is how the interpretation-based semantics works. This is entirely conventional and based on nice, standard, classical theory all out of the old textbooks. But if we allow the meanings of some (but not all) of these names to be determined by their micro- lexical syntax, this completely changes the game. Those LIDs aren't just names any more. I'm not saying it cannot be done - it can - but it would require re-writing (and re-thinking) the entire RDF syntax and semantic model from the ground up. This is WAY worse then just allowing literals in subject position, which is really almost no change at all to RDF itself (even if it does break some existing software.) Finally, just on social engineering grounds, Sandro's "Don't do that" idea is guaranteed to fail. People will do that (and who can blame them?) and then you will get RDF infected with these LIDs all over the place, and other RDF with actual literals, and then some mixed RDF with LIDs and literals. And (as folk will no doubt tell you, with some asperity) the semantics treats these as interchangeable and equivalent, so what is the problem? So actual deployed engines will, in fact, find it necessary to handle both kinds of literal forms in all kinds of positions, just to be able to survive in a world of real- life RDF. We have already been here, in fact, in a small way. RDF cognoscenti should be reminded at this point of the issue that we already had when we allowed plain RDF literals to be exactly equivalent to typed literals with the xsd:string datatype. This seemed harmless when we did it, and semantically it is trivial; but it gives all kinds of problems to inference engines, and so has turned out to be a nightmare of such horror that it has been seriously proposed to back-engineer RDF so that plain literals are considered to be retrospectively typed with a special RDF plain: datatype. BUt that raises problems of its own, as it violates the RDF specs in subtle but important ways. And there has to be a syntactic marker for the enclosed strings, and.... Blech. That was ONE datatype. Now amplify this mess by at least a dozen, going on a hundred, datatypes, all with what will be, in effect, literals in two forms, syntactically incompatible but semantically identical. If I were an RDF developer, I would go look for a different job at this point. Pat ------------------------------------------------------------ IHMC (850)434 8903 or (650)494 3973 40 South Alcaniz St. (850)202 4416 office Pensacola (850)202 4440 fax FL 32502 (850)291 0667 mobile phayesAT-SIGNihmc.us http://www.ihmc.us/users/phayes
Received on Monday, 12 July 2010 23:12:38 UTC