- From: <Patrick.Stickler@nokia.com>
- Date: Tue, 14 Aug 2001 10:54:35 +0300
- To: sean@mysterylights.com, scranefield@infoscience.otago.ac.nz, www-rdf-interest@w3.org, www-rdf-logic@w3.org
(Apologies in advance if any of the following seems to be worded or expressed too strongly... insert smileys liberally ;-) > -----Original Message----- > From: ext Sean B. Palmer [mailto:sean@mysterylights.com] > Sent: 14 August, 2001 02:00 > To: Stickler Patrick (NRC/Tampere); > scranefield@infoscience.otago.ac.nz; > www-rdf-interest@w3.org > Subject: Re: Using urn:publicid: for namespaces > > > > > I did send a letter about this to www-rdf-interest a > > > short while ago; perhaps you missed it :-) > > > > I must have. Can you send me a copy? > > I can do better than that: I can give you a URL! [1] from RDF > Interest, last month. Thanks. Read through it. Comments to your proposal integrated below. > [...] > > > [ :ns <http://www.w3.org/1999/xhtml>; :expEType "title" ] . > [...] > > Firstly, I don't see how the above is a valid URI. > > I'm assuming that you know about anonymous nodes in RDF, but aren't > familiar with the Notation3 serialization. A "[]" is just an anonymous > node, q.v. [2]. You can give it a URI if you want. In fact, this would > have been useful for the XML Schema people to have defined the URIs > that they use to represent their QNames. This seems rather an "obese" solution. For every resource identified by a QName in a serialization, create an anonymous node with "some" URI and two child nodes, one for the namespace and one for the name. And that's supposed to be better than a single, transparent URI that all RDF parsers would derive from the QName. Sorry, I don't buy it. Nope. > [...] > > The problem I am focusing on in my proposal is getting from > > RDF/XML instances to triples such that no matter what RDF parser > > you are using, so long as it conforms to the standard, you will > > get exactly the same set of triples with the exact same URIs > > for resources, [...] > > Why? Why not just define them as anonymous nodes? You can say that a > combination of "ns" and "ExpEType/ExpAName" make for an unambiguous > subject using the following rule:- > > { { :x :ns :y; :expEType :z . :a :ns :y; expEType :z } > log:implies > { :x = :a } } a log:Truth; log:forAll :x , :y , :z , :a . Well, I'm probably going to get grilled for this comment, but personally I don't like anonymous nodes. After all, just what *is* an anonymous node. Every application that I've seen that uses them has had to give them some form of identity, and yet that identity is system dependent. IMO, anonymous nodes were a hack to allow collection structures as Objects, but yet collections (or rather ordered collections) in RDF do not work in an context of multi-source syndication (nor do DAML collections either). The proper way IMO to model collections is using an ontology of collection relations and plain old triples with no anonymous nodes; but that's a separate discussion that I don't want to start here. Issues of completeness required by the closed world folks can be addressed by assigning source or authority to statements so that one can selectively filter those collection members defined in a particular source or by a particular authority and "outsiders" cannot add to that "view" of the collection. IMO, the RDF conceptual model should have no anonymous nodes. Collections based on serialized, syntactic structures should have no realization in the underlying conceptual model; but again, that's yet another discussion ;-) I will concede that there *might* be valid and necessary uses for anonymous nodes which I am not yet aware of, but irregardless I get the impression (and I may very well be wrong, apologies in advance) that anonymous nodes are the new, "hot", interesting thing in RDF/DAML and so folks are predisposed to using them to solve every problem even when more constrained, simplier, and better alternatives may be available. For those who are convinced that anonymous nodes are a good thing, please think about the implementational burden and portability/interoperability issues they may introduce. There are lots of standards and models out there that have really interesting and even elegant concepts, but are just too darn hard to implement efficiently, so no tools exist and the standard dies (HyTime comes quickly to mind ;-). I hope that that doesn't happen to RDF because overly complex algorithms and data structures are needed to make sense of graphs with a plethora of anonymous nodes requiring constant recursive resolution by every SW agent to get to any "real" data that is useful for a given application. As someone who has to make stuff work in the "real world", I'd *much* rather get a single URI for a resource than some anonymous node with a namespace and name dangling off it. Even if you give that anonymous node a URI (in which case it is no longer anonymous ;-) my axioms cannot and will not reference that URI because they are defined in terms of resources, not complex QName data structures. And if different systems name their "anonymous" QName root nodes differently, my axioms are not portable. Sorry, I see an anonymous node-based treatment of QNames creating far, far more problems than it solves (see further below). Please let's get back to the core of the problem, which is the *mapping* (not representation) of QNames to single resource URIs in a consistent, standardized manner. > People simply aren't going to adopt "standard mappings". They want > flexible models. Flexible models are good, but standard mappings are critical, no? If we can't insure that every SW agent is going to arrive at the same set of triples from the same serialized instance, then we might as well pack it up and quit. Integrity and consistency in global, distributed knowledge representation for an environment such as SW is absolutely essential. Without it, it cannot work. There always has to be a balance between what is mandated by standards, for the sake of interoperability and consistency, and what is left open to accomodate new ideas, evolution of methodologies, competition, etc. Mappings between serializations and triples cannot be flexible. No way. (even if we never expect SW agents to talk in terms of triples but always reserialize to RDF/XML, it still raises problems with standardized axioms and internal logic of reused software components). > > If one RDF parser gives you ns:name -> nsname and another > > gives you ns:name -> ns#name and yet another gives you > > ns:name -> 'ns'name (a'la SWI Prolog + RDF) and yet another > > gives you ns:name -> urn:qname:ns/name, etc. etc. [...] > > Oh please! If the material being processed is indeed RDF, then the RDF > parser should only be expected to use the first form of resolution > from QName pair to URI. According to the current RDF spec, yes. BUT that form of resolution/concatenation has been shown to be unreliable and capable of producing ambiguous URIs! It's broken and *must* be replaced by something else. The current "popular" proposal, a'la XML Schema, inserting a '#' character is unnacceptable because it can produce invalid URIs and furthermore any combinatoric scheme based on simple concatenation cannot achieve all of the possible mappings from QNames to URI schemes, such as e.g. URN schemes which may employ nested bracketing. The current concatenation scheme used by RDF is based (IMO solely) on the use of HTTP URLs and HTML/XML fragment syntax -- and is grossly inadequate for addressing the possible cases of QName to URI mapping that are allowed and legal on the Web. It got RDF started, but cannot carry RDF through to a mature and functional SW. > That's not the issue. The issue is how to > represent XML QNames in RDF, not how to process the XML Qnames that > are used to form the RDF. Hello? What? QNames *in* RDF?! I don't think so! QNames are a creature of the SYNTAX ONLY! They have no, and should have no, realization in the set of triples derived from a serialized instance! *PLEASE* don't tell me that folks are working on how to model QNames in RDF! What's next? Processing instructions? Character entities? Start and end tags? Resources are identified by *URI*s, not sub-graphs with an anonymous root! > But yes, I agree with you very much that > this needs to be done somehow. It's useful to say that a certain > element in one language is the same as one in another language. But > you can do that using the anonymous node proposal above: no extra > syntax rubbish required. My proposal adds a *single* declaratory element to the mix, and is 100% backwards compatible with the existing spec. and *all* existing RDF systems. I see it as being a far more constrained and efficient solution than adding anonymous nodes and modelling QNames in RDF -- both of which increase the complexity load on SW software and needlessly complicates the data model; whereas dealing with the QName to URI issue at the front end as I propose, before getting to triples, adds no additional burden on the software whatsoever and allows *any* URI scheme to be used for *any* resource while making their syntactic representation explicit, consistent, and standardized. > > IMO it is the underlying conceptual model of triples that is the > > real value of RDF, and the serialization issues are entirely > > secondary. [...] > > Once again, very much agreed. Great, but this also means that QNames, being a creature of serialization, do not belong in the realm of triples and should dissapear as distinct data structures during parsing of the RDF/XML instance to triples. No? Just because you *can* model QName structures in RDF, for various reasons, does not mean such a representation should be core to all knowledge defined in RDF. If you want an ontology and methodology to talk about components of XML serialization, fine, but that's very different from carrying over those components into the underlying RDF data model. > > Furthermore, since humans need a means of easy data entry, > > and would prefer to enter 'en' for language rather than something > > like "http://some.authority.com/languages/English", > > You have to declare a datatype in that case, but sure, why not? > > this xml:lang "en" . > > I think there's an enumeration in XML Schema for those values > somewhere... it'd be cool if the W3C could post them in RDF using > DAML. Adopting XML Schema data types in RDF doesn't provide any actual validation, nor does it provide any of the data type hierarchy functionality provided by an XML Schema parser. There is no true integration of XML Schema with DAML or RDF. The XML Schema data types have simply been used as a standard vocabulary which DAML schemas can point to, but where one still has to code considerably to achieve any benefit. Now, if (1) there were XML Schemas for RDF, RDF Schema, DAML, etc. and (2) one defined XML Schemas for each ontology in addition to defining them in RDF Schema, and (3) there were production quality XML Schema capable parsers with full support of the XML Infoset, etc. etc. then one could use such an XML Schema parser to validate the serialized instance prior to importation via the RDF Parser, but that's still not the same thing as achieving actual validation of data types within an RDF engine simply by relating a property to some XML Schema data type class. Trust me, as someone who is involved in designing systems needing to manage millions of pages of complex technical documentation, and wanting to do so in a way that exploits metadata to the fullest potential, I have looked longingly towards XML Schema and the presumed "adoption" of XML Schema data types by DAML as a way to provide robust metadata validation in a flexible, modular, and extensible manner using minimal custom software code -- and unfortunately, in practice it's an illusion. At present, it's like the good old days of early SGML -- you have to roll your own, no matter what. > > and since we really want our SW Agents to deal with resources > > rather than literals as much as possible, we need to map the > > literal 'en' to the more informative and useful resource URI [...] > > Huh? Just use datatypes; no need to complicate things. Go through the > DAML walkthrough (linked to from [3]). Just because something sounds good on paper doesn't mean it holds up in application. I've been through the DAML walkthrough, and am not convinced that XML Schema data types are worth the effort, insofar as they would have significance within the RDF space (as opposed to the XML serialization space). I.e. 1. XML Schema is optimized/designed for serializations, not knowledge bases. 2. Most parity and collection related constraints cannot be defined with XML Schema in a way that works with syndication of knowledge from multiple sources (i.e. multiple serializations). 3. Saying that a given literal value is legal in a serialization says nothing about how that literal value might represent an actual resource in the knowledge base nor anything about the relationship of that resource to other resources or constraints placed on occurrences of that resource within the knowledge base. 4. Literals in serializations tend to be of three types: (a) shorthand aliases for resources (e.g. 'en') which typically belong to bounded enumerations, (b) values which are members of infinite, unbounded enumerations (i.e. data types such as integers, characters, floats, dates, etc., or (c) strings which are to be treated as opaque, insofar as the RDF engine is concerned (and which technically also are members of an infinite set, bound only by system limitations). Values of type (a) need to be mapped to resource URIs, and constraints for them should be defined using RDF (i.e. RDF Schema, DAML, etc.) and thus XML Schema provides no validation benefit. Leaving such values as literals in triples loses a considerable amount of knowledge or the ability to define constraints in terms of RDF. Values of type (b) can be validated using regular expressions, and for these XML Schema is IMO overkill. Values of type (c) require no validation (with regards to content) but must be accepted as-is. Thus, pointing to XML Schema data types in RDF Schemas provides no actual validation, perpetuates the use of literal aliases for actual resource URIs, and -- even if XML Schema was integrated into an RDF parser -- is overkill for the validation needed for "true" literal values, per type (b) above. Don't get me wrong. I like XML Schema for serializations. It's great and I am impatient for the tools to mature so I can toss DTDs once and for all out the window -- but XML Schema and RDF Schema are on two separate functional and conceptual planes, and trying to merge them is IMO far more trouble than it's worth. ------- TO REITERATE: Adding a single mapping element to RDF as proposed *completely* solves the whole QName vs. URI problem once and for all, *without* breaking a single existing RDF application, and works for *any* URI scheme, present or future. *And* it addresses the literal to URI mapping problem as well and gives you reasonable data type validation for RDF literals within the RDF environment without the extra baggage of XML Schema. What more could you ask for? Regards, Patrick -- Patrick Stickler Phone: +358 3 356 0209 Senior Research Scientist Mobile: +358 50 483 9453 Software Technology Laboratory Fax: +358 7180 35409 Nokia Research Center Video: +358 3 356 0209 / 4227 Visiokatu 1, 33720 Tampere, Finland Email: patrick.stickler@nokia.com
Received on Tuesday, 14 August 2001 03:55:09 UTC