- From: <Patrick.Stickler@nokia.com>
- Date: Fri, 24 Aug 2001 11:02:24 +0300
- To: sean@mysterylights.com
- Cc: www-rdf-interest@w3.org, www-rdf-comments@w3.org
> -----Original Message----- > From: ext Sean B. Palmer [mailto:sean@mysterylights.com] > Sent: 23 August, 2001 17:31 > To: Stickler Patrick (NRC/Tampere) > Cc: www-rdf-interest@w3.org; www-rdf-comments@w3.org > Subject: Re: QName URI Scheme Re-Visited, Revised, and Revealing > > > > > Of course it doesn't say that two names are equivalent; it > > > simply uses the QNames to form URI references. > > > > But if the two QNames are mapped to the same URI, then > > RDF *is* saying that they are equivalent -- or rather than > > even if they are lexically distinct, they are not allowed or able > > to bear any semantic distinction. > > It doesn't use QNames in the model! The model doesn't say > anything about > the QNames that are used in the syntax... the two are > entirely separate. Of course they are. And this isn't about the model. It's about the *standard* representation of that model in XML serialization. And I'm not saying that RDF should use QNames in the model. It should use URIs. But since QNames are an unavoidable mechanism of the XML Serialization, the employment of a fully regular mapping between QNames in serialization and QName URIs in the model/graph would alleviate certain problems (with no change in either the serialization syntax nor the formal model). > If > anyone infers that two lexically distinct QNames are the same > just because > using the RDF conceatenation mechanism they come out to be > the same URI, > then they are totally wrong. [Of course, it may well be that > the QNames are > semantically equivalent in some context, but that cannot and > should not be > inferred from RDF's use of QNames to form URI references.] Here you're arguing my point precisely. If one source says <rdf:RDF ... xmlns:foo="urn:x:abc"> <rdf:Description about="http://xyz.com/my_resource"> <foo:def >123</foo:def> </rdf:Description> </rdf:RDF> and a totally other source on the other side of the planet says <rdf:RDF ... xmlns:bar="urn:x:abcd"> <rdf:Description about="http://xyz.com/my_resource"> bar:ef=booga"/> </rdf:RDF> And those two sources are syndicated at run-time, where both creators of the knowledge were unaware of the other's use of QNames, and therefore could not forsee any potential problem of collision, we end up with the following ambiguous RDF triples: [http://xyz.com/my_resource, urn:x:abcdef, "123"] [http://xyz.com/my_resource, urn:x:abcdef, "booga"] And then a SW application that is expecting integers for the distinct and well defined (insofar as the creator believes) property (urn:x:abc)def asks for those values via the ambiguous URI 'urn:x:abcdef' and in addition to an integer gets a string! Oops. This *is* a serious problem, even if it is one that at the moment does not appear to be so serious, because folks are not at the moment encountering it in their present systems. RDF is supposed to provide the foundation for encoding and interchanging knowledge on the global SW -- so one would expect that issues such as data integrity would be of chief importance. And even though chances for collision might be considered to be small, the fact that such a possiblility is *known* to exist should cause great concern for the RDF and SW communities. If it was known, e.g. that Oracle or MySQL sometimes dropped a bit on very large integers -- but "that's no big deal, since data seldom contains such large integers" do you think that *anyone* would use them for serious applications? And if it were an international standard specifying that "it's OK to sometimes drop bits on very large integers" do you think anyone would take that standard seriously for global deployment as the primary backbone of a global database? I don't think so... I will confess that I have been playing "devils advocate" about this issue in order to view it from all sides -- and I will agree that in practice, it is unlikely that such collisions would occur in most contexts. However, in the interest of achieving as solid a foundation for the SW as possible, the problem should not IMO simply be ignored or dismissed, just because it isn't troubling anyone at the moment. Such would be a short-sighted and dangerous attitude with respect to an international standard -- and fortunately one that clearly is not maintained by the membership of the RDF Core working group. The one point of "saving grace" regarding such potential collisions is that they would occur at the boundary of namespace and name for namespaces which (at least according to every URI scheme I've seen) fall within the same scope of authority -- and to that end, that authority has the opportunity (however burdensome or impractical) to ensure the integrity of all QNames created within the scope of that authority. I.e. "urn:x:abc" and "urn:x:abcd" both belong to the authority which "owns" the urn URI prefix "urn:x:". Likewise, in similar examples, "http://xyz.com/abc" and "http://xyz.com/abce" both belong to the authority having the domain "xyz.com", etc. Still, this is a far weaker assurance of data integrity than that which would be provided by a completely collision-save mapping function -- and at the very least, any need for, or obligation of, such an authority to ensure against QName collisions should be highlighted in the official RDF documentation. (and of course, this presumes that is it is considered improper or even illegal to define and use namespaces belonging to third party authorities without their permission and approval) > I can imagine Bart writing on the blackboard, "I will not confuse RDF > Syntax with the RDF Model". And I can hear Bart saying "Hey man, eat my shorts" ;-) But seriously... > RDF says nothing about the QNames > that it uses > in the syntax, full stop. The model says nothing about it, but "RDF" (in its entirety) does. It says that QNames become URIs by direct concatenation of namespace and name. > In the syntax, we can do anything > we want with > the QNames, as long as the model is consistent. Agreed, but you are presuming a level of control and omniscience over the selection of namespaces and names that cannot exist on a truly global and semi-chaotic SW. My example above should *not* be able to happen. Period. It should not matter *what* URI is used for a namespace, so long as the namespace URI itself maintains global uniqueness. Collisions should not happen. Ever. Just because the current RDF spec allows them does not mean they are acceptable. > The syntax > forms the data, > but the model *is* the data, But syntax is needed to get data into the model, and therefore it is inseperable from the model on the practical level. This will remain to be a problem so long as the syntactic representation of resources differs from the model representation. The problem goes away as soon as you e.g. disallow QNames for identifying resources in XML serialization. But doing so is a greater change to the standard as that invalidates all of the existing instances already defined. By simply adopting a better mapping function, the data remains valid, the model remains valid and only the parsers have to be tweaked. > and semantic constructs such as > the notion of > equivalence can only be derived from there, not from the > method of going > from the syntax to the model. But if two disparate content creators, unknowingly define what they think is disjunct knowledge, and that collides to create ambiguity, then the path from syntax to model is broken and has to be fixed. Syntax is the doorway into the model. It is not "irrelevant". Just because a problem doesn't exist in the model doesn't mean that it can't impact the knowledge represented by that model. > ... I know that you're > asking for an > automated conversion of QNames to URIs in the RDF syntax, ... > If you do that, then you introduce > horrendous backwards incompatabilites, because we're already using the > QNames to form URIs. I stated that myself from the start. > But nothing stops you from creating the QN URIs from the > QNames. True, but that does not achieve global portability and consistency of resource identity across the SW. > Indeed, > you can do that automatically using XSLT: converting an XML > document into a > list of QN URIs for all QNames in the document. I can do pretty much anything in my own kitchen, but that doesn't mean that from the same ingredients I will get the same dish in any arbitrary restraunt on the planet. A clear and precise recipe is needed. The key issue is what is standardized, not what potentially could be done in any given localized context. > > The mapping from QName to qn URI has to be the > > "official" mapping. > > RDF has already chosen to use QNames to create URI > references, Which doesn't preserve data integrity. > at > the cost that it is slightly more expensive for people to > refer to QNames, > since they now have to convert them into QN URIs first. ?? Folks aren't using qn URIs yet. And the idea was for the RDF parser to do the conversion -- and to allow folks to just use prefixed QNames in their queries, instances, schemas and leave the qn URI representation to the triples space (graph) itself, allowing systems to benefit from the explicit and regular mapping between QName and qn URI. People like to interact with RDF knowledge using QNames rather than URIs. Look at the number of RDF tools that allow you to define namespace prefixes for use in queries and interfaces. It's a pain in the rear end to type long complex URIs rather than nsprefixed names -- especially since alot (most?) folks think in terms of prefixed names rather than URIs (i.e. they're thinking dc:title and not "http://purl.org/dc/elements/1.1/title", even if the latter is the official identity of the resource). I was simply saying, let's make this use of QNames as universal identifiers more "native" to RDF and more consistent and explicit by having a "proper" URI for QNames rather than just straight concatenation. That's all. But as Dan has clarified, this may be difficult to actually achieve in practice because there are already numerous "interpretations" of QName identity which are not fully compatible, and thus any qn URI scheme may either only be workable for RDF or may become grossly complex to meet all interpretations and needs of all the current standards using namespaces. > But > data on the > Semantic Web is represented by URIs, Agreed. And a qn URI is a URI. > and we can afford to make it more > difficult to represent QNames. And blimey, it's not that much more > difficult; you've only got to learn the new URI scheme that > you came up > with. But that presumes that folks will either (a) never use actual QNames in their RDF instances, using only explicit URIs, which actually is not reasonable to presume, or (b) provides localized custom conversion of RDF instances into triples where every QName is mapped to a qn URI, i.e. write their own RDF parser. If QNames are to be mapped to qn URIs, then it has to be done by every RDF parser in a fashion mandated by the standard (though it is highly unlikely that that would ever happen) . > The wonderful upshot of all this is that it's basically > tough: RDF is not > going to change, and that's it. It certainly looks that way. Which (to play devils advocate some more) may preclude it from serious consideration as the ideal or primary vechicle for the global backbone of the SW... > You may as well just learn > how to cope with > representing QNames in the RDF model by identifying them with > URIs. RDF is > not going to use QNames in the syntax to map to QNames in the model, > doesn't have to, and shouldn't have to. Firstly, I'm not asking that RDF have QNames explicitly in the model. If you got that impression, then I must have not expressed myself sufficiently well. The adoption of an explicit, standardized mapping (during parsing only) from XML QNames to qn URIs requires *no* change whatsoever to either the syntax or to the model. The syntax continues to use QNames as-is. The model continues to use URIs as-is. The proposal was simply to have a more explicit, bidirectional, and collision-safe URI derived from the QName than is now derived via direct concatenation. No more. No less. I never honestly expected that such an alternate mapping function would be adopted by RDF. As I mentioned in my original post regarding this qn URI scheme proposal, it was intended as food for discussion providing an alternate perspective on what remains to be a problem, based on an alternate mapping function that does not share the shortcomings of the current function. I.e. I attempted to show what RDF might be like had such a qn URI based mapping approache been adopted from the start. > > > It doesn't keep the QName information because it is > > > irrelevant; > > > > If lexically distinct QNames are capable of bearing distinct > > semantics, then their distinction cannot be considered > > irrelevant. > > It's irrelevant to the RDF processor. Per my above example of lexically distinct QNames from totally disparate sources, I hardly think that such a distinction could be considered irrelevant. > The RDF processor > merely sees QNames > as a method of creating URI references. That's all it can grok: URI > references. If you want to use QNames, you have to identify > them as URI > references. That's precisely what I was doing. I'm confused why that is not crystal clear. The qn URIs are totally opaque in the RDF model. They only have structure relevant to a parser mapping serializations to graphs or a serializer mapping a graph to a serialization. Just as an HTTP URL is opaque in the model, but has structure relevant to an HTTP server when dereferencing it. > [...] > > I think you're missing the point entirely here. It's about > preservation > > of identity as defined by QNames within the RDF URI space. > > Once again, if you use your URI scheme, the identity of the QNames is > preserved. But once again, not in a standardized, global fashion, unless it is done the same way by every RDF parser for every serialized instance. > > [...] We're just getting to a stage now where certain cracks > > are showing, but only if you're standing on the right side of > > the building do you see them. > > I think that the fact that you have struggled to come up with a few > contrived examples of these cracks is telling enough that > these cracks you > see are no more than lines that you've painted on the wall. They are real. They may not mean an impending total collapse of the structure, but they are real. > If you can > provide a decent example of where some real-world application breaks > because of the way RDF currently is, then I shall accept that > something > needs to be changed. Fair enough, but I think that might be fairly construed to be a narrow and somewhat short sighted view of the matter. > Where are you getting these analogies from? :-) They're cool, > but just a > product of some sloppy arguing on my behalf. Let's just stick to the > technical details. I keep trying to do that. Sorry if I waxed philosophical a few times... > [...] > > > Once again, it does not declare the QNames to be identical. > > > It simply uses the QNames to form URI references. > > > > Once again, if it maps lexically distinct QNames to the same URI, > > then RDF declares them identical. > > But the QNames aren't in the model. I never (intentionally) said they were. > If you mean "identical to an RDF parser that is > handling the > syntax", then yes, it sees them as identical Exactly. > , but that's > obvious. Obvious perhaps to folks building parsers but not necessarily to people (now or in the future) creating content based on the syntax. > The model is the > important bit of > RDF; the syntax is just a means to an end, to get triples. I agree with that wholeheartedly, but as syntax is the doorway to the model, it cannot be tossed aside as irrelevant. > [...] > > > But all data on the Semantic Web are resources, which may be > > > identified by URI references. > > > > Right, data that is created as, stored as, and exchanged as > serialized > > XML instances. > > XML RDF instances, right. Yes, but XML instances nonetheless, with QNames. Not NTriples instances. Not N3 encoded "instances". Not any other alternate non-standard serialization, but XML instances as defined by the RDF spec. > [...] > > If there is a lexical (and hence potentially semantic) distinction > > between two QNames in the serialization and my local RDF > > engine knows how to preserve that distinction > > If your "local RDF engine" preserves QNames that are in the syntax on > forming the model, then it is a non-conformant RDF parser, > i.e. hopelessly > broken. It's not preserving QNames. It's preserving distinctions in identity inherent in different QNames, by arriving at different URIs in the transition from serialization to graph. But, I agree, any parser (or rather parsing process from instance to triples) that behaves in a fashion contrary to that specified and allowed by the spec is broken. Of course. Yet in the same way, applying e.g. XSLT scripts to valid RDF XML instances to change resource identity prior to production of triples (as suggested by you and Dan), such that those changes are not mandated and defined by the spec, is also "broken" in that it results in a set of triples from valid serialized RDF instances which will be different from any other RDF system that strictly follows the spec. No? If the creator of knowledge encoded as an RDF XML instance says that an X is an X, then changing that to a Y during de-serialization to triples is contrary to the standard. If you have a specialized application that needs to make such changes to identity, fine, but it must be accepted that such changes impedes on the sanctity of the original knowledge and results in a representation of that original knowledge that is not globally consistent. My proposal was to achieve global consistency of representation across the syntax:model boundary. > > [...] then we have failed to maintain the integrity of resource > > identity (and hence the integrity of knowledge) on the SW. > > Yeah, if your parser is broken and then you get people to use > it, of course > you're going to mess up the Semantic Web. That's been my whole point. Whatever method is used, it has to be the *standardized* way of doing things. Custom, localized solutions are unnacceptable if we wish to have global consistency in our knowledge representation. > > [...] Identity cannot vary from agent to agent according to > > localized interpetations of QName to URI mapping! > > The RDF specification is *very* clear about that mapping; and there is > absolutely no room for "localized interpretations". I fully agree. But that's just what XSLT preproccessing results in, namely localized interpretations of QNames. > If you > preserve the > identity of syntax QNames in the model, then your parser is > non-conformant. Non-conformant to the present spec, agreed. My proposal was to change the spec to preserve that distinct identity. > You either comply or you don't, there is no "two ways about it". We're not at all in disagreement about this. > > [...] If identity is not consistent, the SW won't work. > > > > No? > > Of course not. If you want to sit there and write a parser > which ignores > the RDF specification, I think you've misunderstood what I was suggesting. > But the > Semantic Web uses > URI references, not QNames: Again, I wasn't proposing using "QNames", but URIs that are strongly equivalent to QName identity, providing reliable mapping between such URIs and QNames. And the SW uses opaque universal identifiers, not URI refs. It *adopts* URI refs as the realization of it's identifiers to take advantage of (a) that URI refs are intended to be globally unique, and hence meet the uniqueness requirement for universal identifiers, and (b) that to some applications, such URI refs may provide additional utility which is secondary to their primary role as universal identifiers (namely, you might be able to dereference them). But the SW per se does not use URI references. All resource identity within the SW is opaque. The SW and RDF could have used some completely different mechanism for defining its resource identifiers (though I agree that using URI refs adds alot of utility and is a good choice). > the semantics come from > interpreting the mass > of data made by triples of those URI references, as handled > by parsers and > inference engines etc. That's the beauty of it all. No disagreement there. > > [...] RDF has no reliable means of re-serialization > > that guaruntees the same QNames it got on input. > > Since it doesn't use QNames in the model, what does it > matter? As you've misunderstood me to be proposing QNames as first class objects in the RDF model (which I didn't) I'll just skip this part of the discussion... [snip] > But the Semantic Web uses DLGs... so why are you talking > about XML QNames? Because RDF also employes XML serialization and aspects of that serialization can impact the body of knowledge derived from the serialization into the graph. > [...] > > Therefore it is possible to assign non-ambiguous semantics > > in an XML serialization which becomes ambiguous in an RDF > > graph, and therefore there is potential loss of information and > > unintended introduction of ambiguity into the SW. > > Well, the trick is to convert the identifiers in the XML > serialization into > a form that is fit for the Semantic Web. In other words, you have to > convert the QNames into URIs that represent those QNames, and > then use the > URIs instead. RDF doesn't allow it any other way. Argh. But if you have to do that, then you have to do that with every parser in every case! The point of the RDF syntax was to define a form that is "fit for the Semantic Web". Are you now then agreeing that that current RDF syntax does not provide a form for identities that is fit for the semantic web? You just above argued that localized "tricks" will create a mess on the SW (which I fully agree) so how can localized tricks now be the solution to this loss of distinction?! > I agree that to a small extent this is a conceptual problem, > but there are > no particular use cases that sping to mind that would prove > that this is a > sufficiently difficult a mechanism to use in order that we > have to make > backwards incompatable changes to RDF. The fact is that it's > not impossible > to identify QNames in RDF, just slightly twisted, in that you > have an extra > processing step to do. But the extra processing step is non-standard and therefore unnacceptable if it is necessary to preserve such distinctions of identity globally in any SW application utilizing that knowledge. > > This issue has been discussed elsewhere on this list at great > > length. I won't re-address it here. > > Please, at least provide references. Cf. http://lists.w3.org/Archives/Public/www-rdf-comments/2001JulSep/0124.html and the threads referenced therein. > > [...] The XML spec defines a way to achieve a > > consistent representation of structured data. To > > randomize it violates that fundamental goal. > > The RDF specification is very, very, clear about the QName to > URI reference > mapping, and as such cannot be considered "random" at all. I didn't say RDF was random. Please re-read what I said. I gave a (bogus) example of how an *XML* parser (not RDF parser) might misbehave. > > [...] But if it *is* an error, then it needs to be addressed > > and (hopefully) fixed. > > It's not an error. That's certainly one view. But whether it is or not, it's a separate question whether it is something that will change. I don't (and never did) expect that it would -- but hopefully it will be sufficiently addressed in revisions of the spec. Cheers, Patrick -- Patrick Stickler Phone: +358 3 356 0209 Senior Research Scientist Mobile: +358 50 483 9453 Software Technology Laboratory Fax: +358 7180 35409 Nokia Research Center Video: +358 3 356 0209 / 4227 Visiokatu 1, 33720 Tampere, Finland Email: patrick.stickler@nokia.com
Received on Friday, 24 August 2001 04:02:40 UTC