- From: Dan Brickley <danbri@w3.org>
- Date: Wed, 2 Apr 2003 03:59:50 -0500
- To: public-esw@w3.org
(recovered belatedly from my laptop; from a meeting at Stilo on Wp5) Questions --------- Is there a line between 'multi-namespace chaos' that RDF is good for, vs static, tightly controlled homogenous namesaces, where schema annotation stops making sense? If you have a data model, how to get to yr concrete xml schema encoding, plus whatever annotations are needed to get round trip? (wizard) sb "writing schemas is quite difficult. People tend to think about instances and then work backwards. People get started by creating instances and then reverse-engineering the schema." db "do they do it well?" s "not bad..." purchase orders buyers, sellers, items a po has one or more items ... this tells us nothing about what a po document tells us looking at po ...issue of metadata about the doc, header info etc., which tends to come at the beginning of the document considered as a tree, and mixes in with the 'data proper'. from po.xml <purchaseOrder orderDate="1999-10-20"> <shipTo country="US"> <name>Alice Smith</name> <street>123 Maple Street</street> <city>Mill Valley</city> <state>CA</state> <zip>90952</zip> </shipTo> if po.xml is an xml document, ie a member of class eg:PODoc, is shipTo an attribute/property of that thing, or of some other entity which the eg:PODoc describes? ie. <rdf:Description rdf:about="po.xml"> <eg:shipTo> <rdf:Description> <eg:country>US</eg:country> .... <rdf:Description rdf:about="po.xml"> <x:descriptionOf> <eg:PurchaseOrder> <eg:shipTo> <rdf:Description> <eg:country>US</eg:country> .... </> [brian arrives] Can we write po.xsd po.dtd po.rng po.xtr ...so that they hve the same 'extensions' ie pick out the same xml docs as valid instances? given a dtd, can produce an xsd with the same class extension for some xsds, can do reverse? q: namespace prefixes, for example. brian: the class of classes you can describe within dtds is entirely contained within xsd. taking anything in the dtd class you can do a purely syntactic change to get the dtd equivalent. A mapping, D: dtd -> xsd such that L(d) = L(D(d)) (L being legal extension / Language) brian/s agrees S: xsd -> dtd such that L(s) subsetof< L(S(d)) & forall d' in mapping such that set of xsd describable is ... of dtd counter example: character Entities (see mathml, html for eg) can we think of this as a preprocessing stage? xsd's view: you can have a dtd as well as a schema, for entity stuff <!ELEMENT eg:thing> we can write an xml schema that accepts xmlns:eg but it will also accept xmlns:eg2, so long as namespace URIs are same. any document thats acceptable via a dtd, we can have a schema that generates exactly the same extension. (?except ns) S: if there are no namespaces in the dtd, we can generate a schema that has exactly same extension. <n:a xmlns+:n="myurl"> <n:b ... </n:b> </n:a> <m:a xmlns+:n="myurl"> <n:b ... </n:b> </m:a> (or ommitting the ns decl) Are these equal, equiv etc in any sense? (generate same psvi, for eg? or same infoset?) (xml c18n same?) eg. ' vs " how far in this direction does xml canonicalisation go? (@@todo) 'there is some strange sense in which these are the same document. unfortunately you can write a dtd that accepts one and rejects the other'. Do these have the same PSVI? problem: PSVI is tech specific to XML Schema. We want something common across all xml document typing tech.? Is there some (XML 1.0+ns based) characterisation of the commonality between these two/three/etc documents? What is common across the whole space? PSVI is stated only as 'what happens for xml schemas', perhaps there should be a generalised statment of this, ie. that the above 2 egs give the same canonicalised-in-some-sense representation. From an RDF perspective, we need to get from infosets to a set of RDF statements about the world, and then ask whether the two sets have the same truth conditions / make the same claims about the world. Could one be false while the true... etc ========== sticking with po.xml and po.xsd we want to be able to _generate_ a sensible po.xsd but starting from our uml/rdf/etc model triples: (i) payload of the instance data as rdf statements (ii) rdf schema statements (implied by the instance data (property, class skeletal definitions; that the domain includes ...) (iii) more statements, not implied by instance data, that give domain/range for these properties (iv) statements about classes of xml document, eg. a PODocType? If we.... - have an ontology/schema for purchase order world - have picked a schema language (XSD) - have picked a serialization strategy / xml writing convention (eg. no atrtibutes, edges-encode-properties) - (anything else?) ...what do we need before we can (auto)generate an XSD? - need to choose a root class (or is this arbitrary?) - need one root class from each disconnected segment... (because serializer could be serializing a disjoint graph) The classse and properties may be disconnected at schema level ...also The individuals and relations may or may not be connected. "although this is same as in rdf, someone looking at the instance data may be puzzled if it 'starts in wrong place'". eg. if shipsTo has the PO xml-inside it. The property/edge/element names encode assumptions about directedness, and about the use of the document. Example from Professional XML Schema book, re RDBMS mappings: ch12 creating XML Schema from existing databases. ...generate several different xml schemas from same data, for different purposes. RDF selling pt: its an account of what all the instance data from these various instance formats have in common. <e:Document dc:title="..."> <e:author> <e:Person foaf:name="Tim"> ...this couples our serialisation strategy to choice of namespace / vocab. <e:Document dc:title="..."> <e2:wrote x:map="inverse"> <e:Person foaf:name="Tim"> ...we're free (in princple) to do this. But its ugly and not typical colloquial XML. We can say in OWL <rdf:Property rdf:about="http://example.com/e#author"> <owl:inverse rdf:about="http://example.com/e2#wrote"/> </rdf:Property> Hypothesis: people create vocabulary (xml elements and hence implied RDF properties, if we take a naive mapping appropach) ...where they start with classes they're more concerned about, and put inside their xml-encoded descriptions mentions of instances of less interesting-to-them classes. So, a library might have Document at the top of the xml tree, which leads them to use an 'author' relation. A white pages directory, might start with people ,and have a 'wrote' relation pointing to docs. This relates to expected search strategies - do i look for papers written by ? or - documents about ? Serialization strategy depends on expected usage. We're generating from RDF world, an annotated XSD which includes hints, mapping rules, xslt etc that lets us get our RDF out again. We could generate: <e:Document dc:title="..."> <e2:wrote x:map="inverse"> <e:Person foaf:name="Tim"> or even (though evil) <e:Document dc:title="..."> <e2:wrote> <e:Person foaf:name="Tim"> or <e2:wrote> <e:Book foaf:name="Timetable"> <e:Person foaf:name="Tim"> </e2:wrote> or <e2:wrote> <e:Person foaf:name="Tim"> <e:Book foaf:name="Timetable"> </e2:wrote> <s:claim> <e2:wrote/> <e:Person foaf:name="Tim"> <e:Book foaf:name="Timetable"> </s:claim> <!-- polish form --> <s:claim> <s:rel reluri="e2:wrote"/> <e:Person foaf:name="Tim"> <e:Book foaf:name="Timetable"> </s:claim> <rdf:Statement> <rdf:predicate rdf:resource="http://example.com/e#wrote"/> <!-- ... --> </rdf:Statement> OpenMath adopts a similar very generalised style. <s:claim> <s:rel reluri="e2:wrote"/> <s:obj objuri="e:Person" foaf:name="Tim"/> </s:claim> ...things become very regular, and data is pushed into content rather than markup. Similar strategy seen in RDF SQL triplestores, where the anticipated schema becomes general, and the content does all the work. "Deep embedding" Looking at Henry's work: Q: how tied to XML Schema is this? eg. need for PSVI... maps on types as well as elements and attributes. Q for Henry: in po-mapped.xml why - <ns_2:shipTo xmlns:ns_2="" country="US" map:item-to="property" map:item-name="" map:minOccurs="" map:maxOccurs="" map:type-to="" map:type-name="{}type.Address.1096"> ...is country still an attribute, not normlaised Re the generated Java, what's the purpose? Why aren't property names apparent? Why not use Java classes more explicitly? What's the value of creating mapping to java objects, versus using Java interfaces to the original data, XML (SAX, DOM), RDF etc? Comparison: SOAP serializers that dump Java OO stuff into XML (-> WP5) notes: Schema Adjunct can map to SQL... NExt steps: make the report page into a table of contents. Separate docs for dan, brian, stephen aim to release draft for review in 2 weeks time. Next meeting: feb 13th, review and publish meeting. Stilo 10.15am 2003-02-13. Possible Stilo staff: Steve Healey Examples / test data: - PO and other Edinburgh stuff (quicken?) - Doc/Person/wrote example + illustration (also bibliography/RAL) - Wine ontology simple egs. (8 line DTD) - projects/people/docs more real world examples: - danbri: wsdl, rss, calendar (ongoing not per feb deadline) - ral: cerif (common euro research info format sql/xml and rdf reps)
Received on Wednesday, 2 April 2003 03:59:50 UTC