- From: <Patrick.Stickler@nokia.com>
- Date: Thu, 23 Aug 2001 15:21:17 +0300
- To: sean@mysterylights.com
- Cc: www-rdf-interest@w3.org, www-rdf-comments@w3.org
> > Uhhh... so even if the NS spec says X and Y are different, > > RDF can do whatever it likes with them, including saying > > X = Y [...] > > Of course it doesn't say that two names are equivalent; it > simply uses the > QNames to form URI references. But if the two QNames are mapped to the same URI, then RDF *is* saying that they are equivalent -- or rather than even if they are lexically distinct, they are not allowed or able to bear any semantic distinction. > [...] > > If potentially four lexically distinct QNames (i.e. two QNames > > which collide on direct concatenation used both for elements > > and global attributes) are merged by RDF into one single URI > > derived by the present RDF function, then how can you possibly > > make statements about them to differentiate them [...] > > Er, perhaps using the URI scheme that you just invented? No. If we use the qn URI scheme, then we don't *need* to make such statements, since that information is explicit in the qn URI. If we use the current RDF mapping function, then all four lexically distinct QNames are mapped to the same URI and thus we are unnable to make any statements to differentiate them after we're in RDF-land since they no longer have distinct identity in the RDF space to serve as subjects in those differentiating statements. The mapping from QName to qn URI has to be the "official" mapping. > It doesn't > keep the QName > information because it is irrelevant; If lexically distinct QNames are capable of bearing distinct semantics, then their distinction cannot be considered irrelevant. > we are not using QNames on the > Semantic Web, we're using URI references. And if you want to identify > QNames, partitioned and all, well we have your URI scheme for that. I think you're missing the point entirely here. It's about preservation of identity as defined by QNames within the RDF URI space. The current system potentially introduces ambiguity which does not exist in the serialization, and cannot exist because the NS spec clearly defines their distinction. > The weird thing is that for years no one had had a problem > with this. Perhaps because for years, folks have been building small, closed systems using HTTP URI fragments with HTML fragment syntax and the clever hack of adding the '#' at the end of the namespace URI -- or just because they have been lucky. > That > doesn't necessarily mean that people have understood it, I > agree, Exactly. I don't think a majority of folks have understood the problem because it has both been hidden by usage of HTTP URLs and HTML fragment syntax and because the SW hasn't yet scaled up to a truly global context. We're just getting to a stage now where certain cracks are showing, but only if you're standing on the right side of the building do you see them. > but no > one has had a problem implementing the QName concatenation > thing that RDF > does, Just because no one has fallen into the hole yet, doesn't mean it isn't there or that it shouldn't be filled. The hole is at one end of the playground, and most folks have been playing at the other end, and those of us who have been scoping out the whole playground have seen the hole. Some of us have fallen into it ;-) > and no one has moaned that RDF violates any Web axioms, > and there are > a lot of people at the W3C who would do just that if they thought the > slightest little rule was being broken. As much respect as I have for most of the folks I have met who work for or with the W3C, no person or group is omniscient and all standards and technologies are imperfect. The scope and definition of the SW is evolving and being refined, as is all of the Web, and we are collectively learning new things every day. Each new attempt to push the envelope a little further reveals shorcomings, holes, and implicit but invalid assumptions in the standards that must be addressed to move onwards. It may simply be that the folks at the W3C simply haven't looked at the problem from the perspective needed to see it. But to ignore warnings from folks who have seen the problem is like a wagon train riding in the dark of night ignoring warnings from their scouts that they're headed for a cliff. The drivers of the wagon train may not see the cliff, but that doesn't mean they won't fall off it when they get to it. Eh? > It's not as if the > people who came > up with the concatenation mechanism weren't aware of exactly > what was going > on. Don't be so sure. The folks that were there will have to comment on that themselves, though. Maybe they were. Maybe they weren't. Standards are born of perspective, goals, needs, time, and energy. That's why we have to update and fix them from time to time, as needs change, perspectives broaden, and the passage of time brings deeper understanding. > > But then we get that nasty problem of element and global > > attribute QNames having identitical semantics according > > to RDF's condensed serialization syntax [...] > > Once again, it does not declare the QNames to be identical. > It simply uses > the QNames to form URI references. Once again, if it maps lexically distinct QNames to the same URI, then RDF declares them identical. > > [...] If I can't rely on *every* RDF engine used by every > > SW agent to interpret my data exacly as I have defined it, > > then the SW has no data integrity. Again, its about global > > consistency of data. > > But all data on the Semantic Web are resources, which may be > identified by > URI references. Right, data that is created as, stored as, and exchanged as serialized XML instances. The XML instance preceeds the knowledge base of triples. That's the way it works in the real world. If there is a lexical (and hence potentially semantic) distinction between two QNames in the serialization and my local RDF engine knows how to preserve that distinction but some remote RDF engine does not, then we have failed to maintain the integrity of resource identity (and hence the integrity of knowledge) on the SW. There has to be consistency of identity and knowledge representation across the entire SW as dictated by the standards. Identity cannot vary from agent to agent according to localized interpetations of QName to URI mapping! It's not about tools. It's not about systems. It's not about applications. It's about the standardized representation and interchange of knowledge on an global basis. If identity is not consistent, the SW won't work. No? > Perhaps you are getting confused that XML has > no inherent > semantics, and that therefore is not the primary candidate > for the Semantic > Web? ON the Semantic Web, all knowledge is grounded in URI > space, not XML > space. No, in fact, having a background in computational linguistics, I have a very solid understanding of the relationship between syntax and semantics. I might suggest that you don't, if you think that a URI identifying a resource bears any more semantics than a QName (please don't throw any heavy objects at me ;-) > The fact that it gets *serialized* into XML for transfer is > incedental, It should be incendental, but it's not, because it's not fully regular, bi-directional, consistent, etc. RDF has no reliable means of re-serialization that guaruntees the same QNames it got on input. It can serialize from a URI to "some" QName which when de-serialized again back into triples gets the same URI (i.e. URI->QName->URI is reliable), but there is no guaruntee that in a QName->URI->QName round trip transformation we will get the same QName out that we put in! This is because the RDF QName to URI mapping function loses the explicit partition between namespace and name, which is a fundamental, defining characteristic of the QName itself. > and you should not get hung up on the fact that the > serialization involves using QNames to form the URIs... and > yet you seem to > get hung up time after time. QNames and URIs are just lexical forms. They may have structure at some lower level that might be important for other operations, but insofar as either an XML application or RDF application are concerned, they are just unique identifiers of structural components. Such identifiers serve as constructs for representing semantics. The semantics is clearly not inherent in the identifiers themselves, no more so than I am contained within my social security number. In the case of XML, QNames identify nodes in a tree. In the case of RDF, URIs identify nodes in a graph. There's no fundamental difference between them whatsoever, insofar as their role as identifiers are concerned. I am "hung up" on the issue that RDF employs both forms of identifiers, and defines that there is an equivalence relation between them (hence the existence of QName to URI mapping function) -- but that the equivalence relation is many-to-one rather than one-to-one; and that some of that many-to-one mapping is accidental and potentially hidden to content producers who's knowledge participates in multisource syndication in some remote SW agent, and therefore cannot even prepare for and avoid the potential collision. Therefore it is possible to assign non-ambiguous semantics in an XML serialization which becomes ambiguous in an RDF graph, and therefore there is potential loss of information and unintended introduction of ambiguity into the SW. I see that as something worth getting "hung up" over (though I'm just as likely to simply get "hung" over it ;-) > > [...] However, since people are already thinking in terms > > of QNames when defining their Web based ontologies, > > What??? That's blatantly false! I was mostly thinking of the XML community at large, but my response to your comment is the same... The existence of countless scripts and hacks that scan for prefix:name patterns rather than (namespace}name pairs -- and the arguments from time to time one sees about "is this prefix reserved", etc. clearly shows that (normal) folks working with XML instances think in terms of qualified names -- whether they do so correctly and whether the Gods of the Web do so or not is beside the point -- and I would argue that alot of folks even find the need to declare the namespace prefixes an inconvenience. Real users are *not* thinking (http://purl.org/dc/elements/1.1/)title but rather 'dc:title'! Just *try* to pass around DC RDF instances that use e.g. 'foo:title' instead of 'dc:title' and listen to the people *scream*! If folks didn't think in terms of not only QNames but minimized prefixed QNames when designing SW ontologies, then why would e.g. both the DC and PRISM specs (and I'm sure many others) recommend the use of specific namespace prefixes in the interest of consistency of readability!? And what do you think folks would say about the following perfectly legal use of ns prefixes: <dc:RDF xmlns:dc="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:rdf="http://www.w3.org/2000/01/rdf-schema#" xmlns:rdfs="http://purl.org/dc/elements/1.1/"> <dc:Description dc:about="urn:foo:bar"> <rdfs:title> Ain't this a hoot! </rdfs:title> </dc:Description> <dc:Description dc:about="http://purl.org/dc/elements/1.1/title"> <rdf:subPropertyOf dc:resource="urn:foo:bar:bas"/> </dc:Description> </dc:RDF> The reason why the above is so distasteful is because we *do* think in terms of minimized, prefix based QNames in our ontologies, regardless of their form. True, use of a specific prefix is *not* required, but it is *pervasive* common practice to recommend that a consistent prefix be used, because folks do not want to be thinking (namespace)name if they can think prefix:name. In either case, though, they're still thinking in terms of QNames, and not URIs. When folks encode RDF statements in an XML instance and specify a URI, it's usually the URL of some web page, not a URN of some abstract concept. Subjects are HTTP URLs, predicates are QNames, and values are either HTTP URLs or literals. That's probably true of most of the current RDF presently defined on the planet. Whether you like it or not, QNames are *the* primary naming construct of the XML world, not URIs, and because RDF adopts XML serialization with namespaces, that means that QNames are also the primary naming construct of RDF properties or any other resource which may have QName identity in an RDF serialization. I see no reason why QNames (represented as URIs) should not become a primary naming construct for abstract resources such as properties in the RDF world. It would achieve perfect consistency with all serializations and follow common usage in the XML community at large. QNames are great universal identifiers. We need good URN schemes. Let's use QNames as URIs. > > Folks using XML *think* in terms of QNames insofar > > as their data models, vocabularies, and ontologies are > > concerned. > > I'll bet that not many XML developers really understand much about > namespace partitions and so on. Probably not, since they are non-normative, and many folks consider non-normative to equal not-required. But then, it's really XML parser developers who have to worry about such nuts-n-bolts issues, not XML developers in general, unless it impacts their own work. Apparently, the XML parser developers *do* understand QName partitions, an so things relying on the parsers don't blow up. > > [...] The fact that there is *not* an official, standard > > URI representation for QNames is what is surprising... > > Well, no one has really had a use for it, so not really. You may be right on that point. RDF may very well be the first standard to need an explicit QName to URI mapping -- in which case, it's even more important that RDF "get it right" as it sets a precidence for future standards and methodologies. > [...] > > > Using the concatenation mechanism is an excellent and > > > quick way to form those URIs out of QNames. > > > > Quick? Maybe. Excellent? No! > > > > It was a very clever hack that works with HTTP URLs > > using HTML fragment syntax, [...] > > I think you'll find that a) FragID syntax is independant of > URI scheme and b) the "hack" works with a wide range, indeed a gross > majority of URI schemes. ... > it's not much of a problem! I disagree. In various ways for various reasons. This issue has been discussed elsewhere on this list at great length. I won't re-address it here. > > It also does not maintain lexical distinctions defined by the > > NS spec, and for that reason alone, its validity is suspect. > > The NS specifications says nothing about what processors > should do with > QNames, it simply defines what QNames are. But if processors can disregard the distinctness of QNames, then there is no *purpose* for the NS spec, because *all* it does is define the distinctness of QNames! It's like saying the XML Spec doesn't say you can't deliberately randomize every instance tree when parsing, so it's OK to do that and no application using your freaked out parser can complain cause the spec doesn't explicitly rule such behavior out (that I know of, but if so, then hats off to the XML spec authors ;-) There is a certain degree of common sense that must be applied when interpreting standards, and that includes not violating the fundamental goals of the standard. The XML spec defines a way to achieve a consistent representation of structured data. To randomize it violates that fundamental goal. The NS spec defines the mechanisms for QName distinction -- that distinction being the very goal of the NS spec -- and to discard such distinctions is a violation of the NS spec. If the creators of the RDF spec made a boo boo because they failed to take in all perspectives and considerations that now face the SW, fair enough, we're all human. But if it *is* an error, then it needs to be addressed and (hopefully) fixed. > > If it's not standardized and mandated for all RDF applications, it's > > not a solution to the present problem(s). > > It's just a new URI scheme, and URIs are opaque. All RDF applications > already handle it. It's not enough to just handle it. That's been the whole point of this entire mapping issue. If the requirement that all RDF parsers map QNames to qn URIs in triples is *not* part of the standard, then the consistency of QName lexical identity that the qn URI offers cannot be achieved globally throughout the SW if it is not. > > Parentheses and escaping should do just fine [...] > > Cool. At least that part is clear ;-) Cheers, Patrick -- Patrick Stickler Phone: +358 3 356 0209 Senior Research Scientist Mobile: +358 50 483 9453 Software Technology Laboratory Fax: +358 7180 35409 Nokia Research Center Video: +358 3 356 0209 / 4227 Visiokatu 1, 33720 Tampere, Finland Email: patrick.stickler@nokia.com
Received on Thursday, 23 August 2001 08:31:17 UTC