- From: Stasinos Konstantopoulos <konstant@iit.demokritos.gr>
- Date: Sun, 30 Mar 2008 20:18:38 +0300
- To: public-powderwg@w3.org
On Wed Mar 26 11:10:17 2008 Phil Archer said: > True, but you've not negated the query strings... would you keep > excluderegex?? Well, you know, copy-pasting... > Could I ask you please to create a POWDER-S OWL class that captured this? No prob. On Wed Mar 26 12:08:03 2008 Phil Archer said: > More on this... > > I've been playing with the regular expressions that one would need to > write to capture the meaning of the string elements. To do this I've set > up a little tool at [1] that allows you to put in a Reg ex and a string > and see if the two match. > > Let's start with includehosts. The Reg Ex needs to be pretty specific so > > [...] > > I ended up with > > ^\w+://[^\:\/\?\#\@ ]+\/foo > > And so on. > > The question is... is mapping each IRI constraint to a regular > expression like this actually better than just using the element names? > What's the benefit Stasinos? Trying to answer these and also react to other stuff hapeening around the POWDER mailing list, I caught myself having a hard time remembering all the various stuff I have proposed or said. So I sat down and put everything together in a single text. It comprises the multi-iriset idea, the its-all-regexps-anyway idea, a new suggestiosn about flat-string tags, and a revisit of the original resource sets. There is some boring semantics stuff around the middle, involving two alternative ways of substantiating the resource to IRI string leap. Alternative 1 is more directly based on jjc's suggestion, but extends it handle regexps, port ranges, and IP ranges, as opposed to the original hasValue restriction over string literals. Alternative 2 is an attempt to restrict the added expressive to exactly what is needed without opening the pandora expressivity box. There's some stuff about XML types that I had no idea about and had to read up on today. Kevin, please have a look and let me know it's all sound. After he boring semantics stuff there's the IRI Sets and Extensions sections. s Intro ===== POWDER/XML documents receive formal semantics through a GRDDL transform, associated with the POWDER namespace, that allows the XML data to be rendered and processed as OWL/RDF. Or, rather, POWDER-S, a fragment of OWL/RDF extended in a way that allows to referring to and operating upon the string representation of a resource. The POWDER/XML format specifies a number of elements denoting attribution, validity time, and other issues relating to the level of trust assigned to a POWDER document. These fall though the transform and are not meant to be interpreted in OWL/RDF; they are only meaningful when used by POWDER tools that use them as input to an extra-logical procedure which MAY use this data to decide whether the POWDER document _as a whole_ should be taken into account or discarded. We shall not deal with these elements any further, and proceed under the assumption that our document has passed all relevant tests. Unqualified names should be assumed to be in the wdr: namespace. DR Semantics ============ POWDER documents are used to describe sets of resources using description vocabularies defined in RDF or plain string literals (tags). POWDER/XML documents have <dr/> elements, each assigning all and every member of a set of descriptors to a set of resources. As an example, consider: <dr> <iriset>...</iriset> <descriptorset> <voc:colour ref="http://rgb.org/colours.rdf#red"/> <voc:shape>square</voc:shape> <tag>red</tag> <tag>light red</tag> <taglist>light red</taglist> </descriptorset> </dr> where <iriset/> specifies a set or resources in a way that will be dealt with later, and voc: is an arbitrary RDF vocabulary. The <voc:colour/> element specifies that the <voc:colour/> relation holds between all resources in specified by <iriset/> and the http://rgb.org/colours.rdf#red resource. The content of <voc:shape/> is interpreted as a string literal. The <voc:shape/> element specifies that all resources in <iriset/> has the value "square" for the <voc:shape/> dataproperty. <tag/> is a string property defined by POWDER. Its content is a single string literal, possibly including spaces. <taglists/> is a string property defined by POWDER. Its content is a space-separated list of string literals. The overall description of the resources in <iriset/> is the union of the descriptions in the <descriptorset/>. In our example: a voc:colour relation to http://rgb.org/colours.rdf#red AND a voc:shape "square" AND the tags "red", "light", and "light red" We formally interpret the above as follows: there is an OWL class containing all resources that share all of these properties, and there is an OWL class of all resources denoted by <iriset/>, and the latter is a subset of the former. In OWL/RDF we say: <RDF> <owl:Class rdf:ID="resourceset_1"> all resources specified by <iriset>...</iriset> </owl:Class> <owl:Class rdf:ID="description_1"> <owl:intersectionOf rdf:parseType="Collection"> <owl:Restriction> <owl:onProperty rdf:resource="voc:color"/> <owl:hasValue rdf:resource="http://rgb.org/colours.rdf#red"/> </owl:Restriction> <owl:Restriction> <owl:onProperty rdf:resource="voc:shape"/> <owl:hasValue>square</owl:hasValue> </owl:Restriction> <owl:Restriction> <owl:onProperty rdf:resource="wdr:tag"/> <owl:hasValue>red</owl:hasValue> </owl:Restriction> <owl:Restriction> <owl:onProperty rdf:resource="wdr:tag"/> <owl:hasValue>light</owl:hasValue> </owl:Restriction> <owl:Restriction> <owl:onProperty rdf:resource="wdr:tag"/> <owl:hasValue>red light</owl:hasValue> </owl:Restriction> </owl:intersectionOf> </owl:Class> <owl:Class rdf:about="#resourceset_1"> <rdfs:subClassOf rdf:ID="description_1"/> </owl:Class> </RDF> It is possible to have more than one <iriset/> elements, in which case a resource receives all of the the descriptions by belonging to any one of the corresponding resource sets. For example: <dr> <iriset>.1.</iriset> <iriset>.2.</iriset> <descriptorset> <voc:colour ref="http://rgb.org/colours.rdf#red"/> <taglist>light red</taglist> </descriptorset> </dr> receives the following semantics: <RDF> <owl:Class rdf:ID="resourceset_1"> all resources specified by <iriset>.1.</iriset> </owl:Class> <owl:Class rdf:ID="resourceset_2"> all resources specified by <iriset>.2.</iriset> </owl:Class> <owl:Class rdf:ID="description_1"> <owl:intersectionOf rdf:parseType="Collection"> <owl:Restriction> <owl:onProperty rdf:resource="voc:color"/> <owl:hasValue rdf:resource="http://rgb.org/colours.rdf#red"/> </owl:Restriction> <owl:Restriction> <owl:onProperty rdf:resource="wdr:tag"/> <owl:hasValue>red</owl:hasValue> </owl:Restriction> <owl:Restriction> <owl:onProperty rdf:resource="wdr:tag"/> <owl:hasValue>light</owl:hasValue> </owl:Restriction> </owl:intersectionOf> </owl:Class> <owl:Class> <owl:unionOf rdf:parseType="Collection"> <owl:Class rdf:about="#resourceset_1"/> <owl:Class rdf:about="#resourceset_2"/> </owl:unionOf> <rdfs:subClassOf rdf:ID="description_2"/> </owl:Class> </RDF> A POWDER/XML implementio is free to choose any traversal policy for treating miltiple </iriset> elements in a DR: first match wins, last match wins, shortest irisets first, and so on, as long as all irisets are tried before deciding that DR does not apply to a resource. The ordering of irisets is not important and a POWDER/XML implementation is free to try them in any order whatsoever (in order listed, shorter first, etc), as long as all irisets are tried before deciding that a resource is outside the scope of the DR. DR authors may use the order of the irisets to suggest an efficient scope evaluation strategy, by putting the irisets with the widest coverage first, so that an implementation that chooses to follow the suggested evaluation order is more likely to terminate the evaluation after fewer checks. POWDER Semantics ================ A POWDER document may have any number of <dr> elements, all of which are simultaneously asserted and ordering is not important. So, for example: <powder> <dr> <iriset>.1.</iriset> <descriptorset> <voc:shape>square</voc:shape> </descriptorset> </dr> <dr> <iriset>.2.</iriset> <descriptorset> <voc:colour ref="http://rgb.org/colours.rdf#red"/> </descriptorset> </dr> </powder> receives the following semantics: <RDF> <owl:Class rdf:ID="resourceset_1"> all resources specified by <iriset>.1.</iriset> </owl:Class> <owl:Class rdf:ID="resourceset_2"> all resources specified by <iriset>.2.</iriset> </owl:Class> <owl:Class rdf:ID="description_1"> <owl:intersectionOf rdf:parseType="Collection"> <owl:Restriction> <owl:onProperty rdf:resource="voc:shape"/> <owl:hasValue>square</owl:hasValue> </owl:Restriction> </owl:intersectionOf> </owl:Class> <owl:Class rdf:ID="description_2"> <owl:intersectionOf rdf:parseType="Collection"> <owl:Restriction> <owl:onProperty rdf:resource="voc:color"/> <owl:hasValue rdf:resource="http://rgb.org/colours.rdf#red"/> </owl:Restriction> </owl:intersectionOf> </owl:Class> <owl:Class rdf:about="#resourceset_1"> <rdfs:subClassOf rdf:resource="#description_1"/> </owl:Class> <owl:Class rdf:about="#resourceset_2"> <rdfs:subClassOf rdf:resource="#description_2"/> </owl:Class> </RDF> The <owl:intersectionOf/> of a singleton collection is the latter's single element anyway, so it is better to keep the <owl:intersectionOf/> element even though it is redundant, in order to keep the transform simple and not require the extra check. Note that resourceset_1 and resourceset_2 are not necessarity disjoint, so that some resources may be both red AND square. A POWDER document may have an <ol/> element with is an ordered list of <dr> elements, which receives a first-match semantics. <ol/> elements are meant to be used to express exceptions to more general rules. So, for example: <powder> <ol> <dr> <iriset>.1.</iriset> <descriptorset> <voc:shape>square</voc:shape> </descriptorset> </dr> <dr> <iriset>.2.</iriset> <descriptorset> <voc:shape>round</voc:shape> </descriptorset> </dr> <dr> <iriset>.3.</iriset> <descriptorset> <voc:shape>triangle</voc:shape> </descriptorset> </dr> </ol> </powder> receives the following formal semantics, where belonging to description_1 automatically precludes belonging to description_2 and description_3; and belonging to description_2 automatically precludes belonging to description_3: <RDF> <owl:Class rdf:ID="resourceset_1"> all resources specified by <iriset>.1.</iriset> </owl:Class> <owl:Class rdf:ID="resourceset_2"> all resources specified by <iriset>.2.</iriset> </owl:Class> <owl:Class rdf:ID="resourceset_3"> all resources specified by <iriset>.3.</iriset> </owl:Class> <owl:Class rdf:ID="description_1"> <owl:intersectionOf rdf:parseType="Collection"> <owl:Restriction> <owl:onProperty rdf:resource="voc:shape"/> <owl:hasValue>square</owl:hasValue> </owl:Restriction> </owl:intersectionOf> </owl:Class> <owl:Class rdf:ID="description_2"> <owl:intersectionOf rdf:parseType="Collection"> <owl:Restriction> <owl:onProperty rdf:resource="voc:shape"/> <owl:hasValue>round</owl:hasValue> </owl:Restriction> </owl:intersectionOf> </owl:Class> <owl:Class rdf:ID="description_3"> <owl:intersectionOf rdf:parseType="Collection"> <owl:Restriction> <owl:onProperty rdf:resource="voc:shape"/> <owl:hasValue>triangle</owl:hasValue> </owl:Restriction> </owl:intersectionOf> </owl:Class> <owl:Class rdf:about="#resourceset_1"> <rdfs:subClassOf rdf:resource="#description_1"/> </owl:Class> <owl:Class> <owl:intersectionOf rdf:parseType="Collection"> <owl:Class rdf:about="#resourceset_2"/> <owl:complementOf> <owl:Class rdf:about="#resourceset_1"/> </owl:complementOf> </owl:intersectionOf> <rdfs:subClassOf rdf:ID="description_2"/> </owl:Class> <owl:Class> <owl:intersectionOf rdf:parseType="Collection"> <owl:Class rdf:about="#resourceset_3"/> <owl:complementOf> <owl:Class rdf:about="#resourceset_2"/> </owl:complementOf> <owl:complementOf> <owl:Class rdf:about="#resourceset_1"/> </owl:complementOf> </owl:intersectionOf> <rdfs:subClassOf rdf:ID="description_3"/> </owl:Class> </RDF> IRISet Semantics ================ The last missing bit of the transformation now is the one that builds the <owl:Class rdf:ID="resourceset_X"/> descriptions from <iriset/> elements. <iriset/> elements subsume one or more elements, each representing a range of values for IRIs. An IRI is in the <iriset/> if it is covered by ALL of the elements in <iriset/>. The following six range specifications are supported: <includepattern/>,<excludepattern/>, <includeports/>,<excludeports/>, <includeCIDRranges/>,<excludeCIDRranges/> Patterns are a single <xsd:pattern/> element, as defined in the XML Schema [1]. <includepattern/> can be applied to any IRI, regardless of whether it is resolvable or not. Ports are a space-speparated list of ports or port ranges. CIDR ranges are specified a space-speparated list of CIDR IP range specifications. Port and CIDR range elements can be applied to URLs (is there an IRL acronym?) only, and are meaningless for other kinds of IRIs. For example: <iriset> <includepattern> <xsd:pattern value="^http://[\w\.]+.example\.org(:(\d)+)?/" /> </includepattern> <includeports>80 8080-8100</includeports> <excludeports>8085 8090-8095</excludeports> </iriset> specifies all resources on http://example.org and any subdomain thereof, fetched from ports 80, 8080-8084, 8086-8089, or 8096-8100. It might sometimes be easier to concetrate on parts of an IRI and specify constraints as a series of regexps, all of which must match. Fon instance, the IRISet: <iriset> <includepattern> <xsd:pattern value="^http://[\w\.]+.example\.org(:(\d)+)?/" /> </includepattern> <includepattern> <xsd:pattern value="^[^?]+\?(.*&)?s=football[&$]" /> </includepattern> <includepattern> <xsd:pattern value="^[^?]+\?(.*&)?c=gr[&$]" /> </includepattern> <includepattern> <xsd:pattern value="^[^?]+\?(.*&)?l=first[&$]" /> </includepattern> </iriset> is a way of requesting three query conjuncts in any order, and is much shorter and clearer than having to list all possible permutations. The <iriset/> mechanism allows a DR to express any grouping of resources whatsoever, no matter how complex: (A) each include* and exclude* element expresses an atomic proposition. For all X, if includeX exists, excludeX also exists and vice versa; furthermore includeX and excludeX are mutually exclusive. Hence, one can negate all atomic propositions, although not complex propositions. (B) An <iriset/> may contain multiple include* and exclude* tags, and all must hold for the iriset to hold. Hence one can express the conjunction of any set of atomic propositions and negations of atomic propositions. (C) A DR may contain multiple <iriset/> elements, and if any of them holds, then the DR holds. Hence one can express the disjunction of conjunctions of sets of atomic propositions and negations of atomic propositions. The three expressions above allow the expression of Disjunctive Normal Form proposition. Since arbitrarily complex propositions can be brought into DNF, the three expressions above allow the expression of any proposition. ALTERNATIVE 1 Providing OWL/RDF semantics for <iriset/> elements is not directly possible, since RDF does not provide any means for accessing or manipulating the string representation of an IRI. We extend OWL/RDF with a built-in hasIRI data property as follows: hasIRI rdf:type owl:DatatypeProperty . hasIRI rdf:type owl:Property . hasIRI rdfs:domain owl:Thing . hasIRI rdfs:range xsd:string . and the further stipulation that R owl:hasIRI s . iff the string representation of resource R is s. It is now possible to provide semantics to <iriset/> by deriving the XML datatype that only includes the strings specified by pattern p [1]. So now: <includepattern> <xsd:pattern value="p1"/> </includepattern> <excludepattern> <xsd:pattern value="p2"/> </excludepattern> specify these classes of resources: <xsd:simpleType name="iritype_1"> <xsd:restriction base="string"> <xsd:pattern value="p1" /> </xsd:restriction> </xsd:simpleType> <xsd:simpleType name="iritype_2"> <xsd:restriction base="string"> <xsd:pattern value="p2" /> </xsd:restriction> </xsd:simpleType> <owl:Class> <owl:Restriction> <owl:onProperty rdf:resource="owl:hasIRI"/> <owl:hasValue rdf:datatype="&xsd;iritype_1" /> </owl:Restriction> </owl:Class> <owl:Class> <owl:ComplementOf> <owl:Restriction> <owl:onProperty rdf:resource="owl:hasIRI"/> <owl:hasValue rdf:datatype="&xsd;iritype_2" /> </owl:Restriction> <owl:ComplementOf> </owl:Class> which means: "Here are the definitions of xsd:iritype_1, xsd:iritype_2, sub-types of xsd:string. I don't know the exact value to put in hasValue, but is must be of type xsd:iritype_1, xsd:iritype_2." Port ranges are treated similarly, by defining the relevant hasPort property, ranging over appropriate XML type. The xsd:pattern restriction is not useful here, but xsd:integer supports xsd:maxInclusive, xsd:minInclusive numerical restrictions. So: <includeports>80 8080-8100</includeports> means: <xs:simpleType name="iritype_3"> <xsd:restriction base="integer"> <xsd:minInclusive value="80" /> <xsd:maxInclusive value="80" /> </xsd:restriction> </xs:simpleType> <xs:simpleType name="iritype_4"> <xsd:restriction base="integer"> <xsd:minInclusive value="8080" /> <xsd:maxInclusive value="8100" /> </xsd:restriction> </xs:simpleType> <owl:Class> <owl:unionOf> <owl:Restriction> <owl:onProperty rdf:resource="owl:hasPort"/> <owl:hasValue rdf:datatype="&xsd;iritype_3" /> </owl:Restriction> <owl:Restriction> <owl:onProperty rdf:resource="owl:hasPort"/> <owl:hasValue rdf:datatype="&xsd;iritype_4" /> </owl:Restriction> <owl:unionOf> </owl:Class> CIDR ranges are trickier, as they require bit-wise calculations. Assume a hasIP property, as before, ranging over a complex XML type [2] of 4 bytes. <includeCIDRranges>x.y.z.w/r</includeCIDRranges> <xs:complexType name="iritype_5"> <xs:sequence> <xs:element> <xsd:enumeration base="byte">x</xsd:enumeration> </xs:element> <xs:element> <xsd:enumeration base="byte">y</xsd:enumeration> </xs:element> <xs:element> <xsd:enumeration base="byte">z</xsd:enumeration> </xs:element> <xsd:restriction base="byte"> HARD, TO BE WORKED OUT. OTHERWISE JUST ENUMERATE (OUCH!). </xsd:restriction> </xs:element> </xs:sequence> </xs:complexType> <owl:Class> <owl:Restriction> <owl:onProperty rdf:resource="owl:hasIP"/> <owl:hasValue rdf:datatype="&xsd;iritype_5" /> </owl:Restriction> </owl:Class> If no /r is given, the class D segment of the IP is simply given as a sigleton enumeration, just like for classes A, B, and C. OWL needs to be extended to allow user-defined types, which it currently does not, [3]. ALTERNATIVE 2 Providing OWL/RDF semantics for <iriset/> elements is not directly possible, since RDF does not provide any means for accessing or manipulating the string representation of an IRI. We extend OWL/RDF with a hasIRIFrom restriction as follows: We assert the existence of the class of the various IRI classes: rdf:IRIClass rdf:type rdfs:Datatype . We assert the existence of a new class of restriction nodes: owl:hasIRIFrom rdf:type rdfs:Class . The members of this class are OWL restrictions, with the following abstract OWL syntax: restriction(ID, hasIRIFrom(xs:iritype)) where ID is a node ID and xs:iritype is the ID of a a user-defined type, as above. If T() is the mapping from node IDs to nodes, the semantics of such a restriction is that the datatype is also and rdfs:Class, with the constraint that resources in this class have a IRI the string representation of which is in the scope of xs:iritype. It is then straightforward to provide the semantics of the restriction: T(xs:iritype) rdf:type rdfs:Datatype . T(xs:iritype) rdfs:subClassOf rdf:IRIClass . T(xs:iritype) rdf:type rdfs:Class . _:x rdf:type owl:Restriction . _:x rdf:type owl:Class . _:x rdf:type T(xs:iritype) . We can now say: <owl:Class> <owl:Restriction> <owl:hasIRIFrom> <xsd:simpleType> <xsd:restriction base="string"> <xsd:pattern value="p" /> </xsd:restriction> </xsd:simpleType> </owl:hasIRIFrom> </owl:Restriction> </owl:Class> to mean "the class of all things that have an IRI that has a string representation that matches "p". In Description Logic terms, we have allowed defining concepts based on restrictions on the form of the string representations of abstract instances, but the restricted the usage of such concepts in universal quantification constructs. COMPARISON I will have to look into this more closely, but my first impression is that ALT 2 provides the necessary expressivity to enable resource grouping, but restricts the extension so that it does not allow any other kind of reference to IRI strings. The logic remains agnostic as to the internal reresentation of resources, except for their appearing as members of various IRI Classes for no (logically) apparent reason. ALT 1, on the other hand, creates a hasIRI property which it then exposes to the concrete domain of the logic, permitting the full expressivity of the logic to operate on it. IRISet Extensions ================= In Sect "IRISet Semantics" above, a vocabulary of 6 tags was specified for defining sets of resources through their IRIs. Except for the numerical port and IP restrictions over URLs, the only operation supported over generic IRIs is regular expession matching. Creators of POWDER documents may extend the vocabulary used in specifying IRI Sets, by defining new <iriset/> elements. All such extentions to the POWDER vocabulary MUST be defined by means of GRDDL transformations [GRDDL] to terms of the basic POWDER vocabulary in the wdr: namespace. Extensions do not need to, but are well advised to, define pairs of complementary vocabulary items (includeX and excludeX) for the reasons explained above. Developers of POWDER tools MAY directly implement extensions they know about, but MUST include a mechanism for retrieving and applying the GRDDL transformations to extensions they do not know about. The URLSet Extension ==================== POWDER's basic use cases involve information resources available on the Web, identified by URLs containing host names, directory paths, IP addresses, port numbers, and so on. POWDER-WG provides the URLSet extension to IRISet, by defining the following vocabulary items under the wdrurl namespace: <wdrurl:includeschemes/> <wdrurl:excludeschemes/> <wdrurl:includehosts/> <wdrurl:excludehosts/> <wdrurl:includeexactpaths/> <wdrurl:excludeexactpaths/> <wdrurl:includepathcontains/> <wdrurl:excludepathcontains/> <wdrurl:includepathstartswith/> <wdrurl:excludepathstartsWith/> <wdrurl:includepathendswith/> <wdrurl:excludepathendsWith/> <wdrurl:includequerycontains/> <wdrurl:excludequerycontains/> <wdrurl:includeexactqueries/> <wdrurl:excludeexactqueries/> pathcontains and querycontains may appear any number of times within an IRI set definition, but the rest may appear up to once. These receive semantics in terms of the POWDER IRISet vocabulary as follows: <wdrurl:includeschemes>sch1 sch2</wdrurl:includeschemes> means: <includepattern> <xsd:pattern value="^(sch1)|(sch2)://" /> </includepattern> And <wdrurl:includehosts>host1 host2</wdrurl:includehosts> means: <includepattern> <xsd:pattern value="^[^:]://([\w\.]+\.)?(host1)|(host2)[:\?/]" /> </includepattern> And so on. So that the URL Set: <iriset> <wdrurl:includeschemes>http</wdrurl:includeschemes> <wdrurl:includehosts>example.org example.net</wdrurl:includehosts> <wdrurl:includequerycontains>s=football</wdrurl:includequerycontains> <wdrurl:includequerycontains>c=gr</wdrurl:includequerycontains> <wdrurl:includequerycontains>l=first</wdrurl:includequerycontains> </iriset> translates this, much more verbose, vanilla POWDER/XML IRI Set: <iriset> <includepattern> <xsd:pattern value="^http://" /> </includepattern> <includepattern> <xsd:pattern value="^[^:]://([\w\.]+\.)?(example\.org)|(example\.net)[:\?/]" /> </includepattern> <includepattern> <xsd:pattern value="^[^?]+\?(.*&)?s=football[&$]" /> </includepattern> <includepattern> <xsd:pattern value="^[^?]+\?(.*&)?c=gr[&$]" /> </includepattern> <includepattern> <xsd:pattern value="^[^?]+\?(.*&)?l=first[&$]" /> </includepattern> </iriset> The WAF Extension ================= Q to group: does POWDER also need to provide this transformation? Or have the WAF people already written it? Enabling Read Access for Web Resources WG jas defined a Unix shell-like wildcard mechanism. <waf:includeiripattern>*.example.org</waf:includeiripattern> <wdr:includepattern> <xsd:pattern value="http://.*\.example.org(/.*)?" /> </wdr:includepattern> Multiple Layers of Extensions ============================= It might sometimes be useful to also build upon already defined extensions. For example, some content providers serve dynamic content stored in a database, so that IRIs express queries to the database. This kind of IRIs have certain structure, but this structure is neither obvious nor easily human-interpreted. Furthemore, conventional grouping mechanisms cannot be used to group resources, as the site structure does not match any directory hierarchy. As an example, consider sport.example.com, a sports news site, where IRIs look like the one shown in Example 3-2-1. The adopted scheme is systematic so that sport=2&countryID=16 provides a front page with news about Greek basketball and links to various Greek basketball leagues, sport=3&countryID=16 a front page about Greek volleyball, etc. Eg: http://sport.example.com/matches.asp?sport=1&countryID=16&champID=2 A POWDER document providing metadata about this web site would have to use regular expression matching with explicit reference to the numerical values in the country and sport fields of the query. This process is error-prone, and requires extensive changes if the underlying database schema is modified or extended. As an alternative, the site developer may provide a POWDER vocabulary extension that abstracts away from the database schema to allow reference to sports and countries. POWDER document authors can then use the properties in this extension to create POWDER documents are valid even if the site schema is modified, as long as the site developer updates the relevant transformations. So a POWDER/XML document might look like this: <wdr:iriset> <wdrurl:includeschemes>http</wdrurl:includeschemes> <wdrurl:includehosts>sport.example.com</wdrurl:includehosts> <sport:countries>Greece</sport:countries> <sport:sports>Football Basketball</sport:sports> </wdr:iriset> A POWDER/XML tool specifically built for sport.example.com other site following the same query patterns will immediately know how to handle this information. Other POWDER tools will apply the GRDDL transform associated with the sport: namespace to get the following translation: <wdr:iriset> <wdrurl:includeschemes>http</wdrurl:includeschemes> <wdrurl:includehosts>sport.example.com</wdrurl:includehosts> <wdrurl:includequerycontains>countryID=16</wdrurl:includequerycontains> <wdrurl:includequerycontains>countryID=16</wdrurl:includequerycontains> <wdrurl:includequerycontains>sport=1 sport=2</wdrurl:includequerycontains> </wdr:iriset> A web-oriented POWDER/XML tool will immediately know what to do with wdrurl: vocabulary items. Other POWDER tools will apply the GRDDL transform associated with the wdrurl: namespace to get the following translation: <iriset> <includepattern> <xsd:pattern value="^http://" /> </includepattern> <includepattern> <xsd:pattern value="^[^:]://([\w\.]+\.)?(sport\.example\.com)[:\?/]" /> </includepattern> <includepattern> <xsd:pattern value="^[^?]+\?(.*&)?countryID=16[&$]" /> </includepattern> <includepattern> <xsd:pattern value="^[^?]+\?(.*&)?(sport=1)|(sport=2)[&$]" /> </includepattern> </iriset> Finally, an even more generic RDF/OWL tool will apply the transform associated with the wdr: namespace to get the even more verbose RDF/OWL translation, as described above. Non-URL Identifiers =================== Although POWDER is mostly involved with resources that are identified by URLs, there is a number of other use cases; for example one might use POWDER to provide meta-data about physical, off-line resources like books or DVDs. The International Standard Audiovisual Number [ISAN1] is a voluntary numbering system for the identification of audiovisual works. Following ISO 15706, the numbers are written as 24 bit hexadecimal digits in the following format [ISAN2]. -----root----- episode -version- ISAN 1881-66C7-3420 - 0000 -7- 9F3A-0245 -U The root of an ISAN number is assigned to a core work with the other numbers being used for things like episodes, different language versions, promotional trailers and so on. Since ISAN numbers are URNs [URN], and hence IRIs of the urn: scheme [URIS], a vocabulary can readily be defined to allow IRI Sets to be defined based on ISAN numbers. The terms might be along the lines of: includeroots — the value of which would be a white space separated of hexadecimal digits and hyphens that would be matched against the first three blocks in the ISAN number. includeepisodes — a white space separated list of hexadecimal digits and hyphens that would be matched against the 4th block of 4 digits in the ISAN number. includeversions — a white space separated list of hexadecimal digits and hyphens that would be matched against the 5th and 6th blocks of 4 digits in the ISAN number. The set of all audio visual resources that relate to two particular works might then be so defined: Custom ISAN pattern: <wdr:iriset> <isan:includeroots>1881-66C7-3420 1881-66C7-3421</isan:includeroots> </wdr:iriset> Corresponding vanilla POWDER/XML: <iriset> <includepattern> <xsd:pattern value="^urn:isan:(1881-66C7-3420)|(1881-66C7-3421)" /> </includepattern> </iriset> This example demonstrates one major extendability glitch in the approach described here: numerical constraints (like, here, defining numerical ranges for, say, the 3rd block) cannot be defined using wdr: primitives. As the reader might also have noticed, port and IP ranges (although specific to URLs) were hard-coded in the IRI level and not defined as wdrurl: extensions. This is because XML types do not provide a mechanism for using regexps to extract character groups from strings, and then apply further numerical or other tests on the extracted groups; a string either matches a regexp or does not, and that is all. One interesting approach would be to license use of XSLT 2 [XSLT2] in the extension definitions, which provides for using regexps to extract character groups. To be investigated. Resource Sets ============= One of the original desiderata of the group, later abandonded, was the ability to group resources by property as well as by name. This is a considerable expressivity leap for the POWDER/XML language. This idea was abandonded in the Athens F2F, when it became obvious that the POWDER grouping mechanism should not refer to the resources themselves, but to the string representations of their IRIs. Since it is the resources that have properties like being blue and not the IRIs, the whole idea of grouping by property collapsed. If it is important enough for POWDER, some limited expressivity might be re-introduced in the form of a parallel grouping mechanism, by intersecting the results of the two mechanism before finally applying the descriptors. In other words: <dr> <iriset> <wdrurl:includehosts>example.com</wdrurl:includehosts> </iriset> <resourceset> <voc:colour ref="http://rgb.org/colours.rdf#blue"/> </resourceset> <descriptorset> <voc:shape>square</voc:shape> </descriptorset> </dr> might be used to express that "on example.com, all blue resources are also square". A resouce has to both be on example.com AND be blue in order to also be described as square. This can be very naturally expressed in OWL, and OWL tools will be able to figure out which resources are blue, but it might be a considerable strain on POWDER/XML tools which will care more about efficiency than reasoning completeness. Furthermore, this opens a hole through which circular definitions can creep, and loop detection will also be a considerable strain to POWDER/XML implementations. My suggestion is to drop it in the sake of efficiency or, at most, leave an extension door open for logical statements that fall through to the underlying POWDER-S; just in case one really needs to express such a thing in POWDER/XML instead of OWL. REFERENCES ========== [1] http://www.w3.org/TR/2004/REC-xmlschema-2-20041028/datatypes.html#rf-pattern [2] http://www.w3.org/TR/2004/REC-xmlschema-1-20041028/structures.html#Complex_Type_Definitions [3] http://www.w3.org/TR/owl-semantics/syntax.html#2.1 [4] http://www.w3.org/TR/owl-semantics/mapping.html [GRDDL] http://www.w3.org/TR/grddl/ [URN] http://www.iana.org/assignments/urn-namespaces [ISAN1] http://www.isan.org/ [ISAN2] http://www.isan.org/portal/page?_pageid=166,41960&_dad=portal&_schema=PORTAL [XSLT2] http://www.w3.org/TR/xslt20/
Received on Sunday, 30 March 2008 17:19:39 UTC