- From: Phil Archer <parcher@icra.org>
- Date: Fri, 25 Apr 2008 15:22:16 +0100
- To: Public POWDER <public-powderwg@w3.org>
In two recent e-mails [1,2], Stasinos has presented a formalism of POWDER semantics. Since [2] revises a chunk of [1] I have copied and pasted the full revised text below for easy reference. The semantics of the descriptor set is clear and is inline with what we've been discussing recently and which for I have tried to derive generalised rules for the GRDDL (XSLT) transform [3]. However, what Stas is proposing for the IRI set semantics differs substantially from what is in the latest (member only) version of the Grouping Doc [4] (and the currently published one, [5]). And of course, it's the semantics of our IRI sets that sets POWDER apart from other parts of the Semantic Web. Jeremy's solution, proposed on this list and discussed further in Athens, is to define a Semantic Extension for a property wdr:hasIRI that effectively maps "http://example.com/" to <http://example.com> and then says of other properties - see Semantic Extension above for details". It uses the same approach and mathematical terminology as used to define the formal semantics of RDF itself [6]. Stas is suggesting something much more detailed - and arguably more precise as a result. For example, under the Jeremy model we'd retain terms like 'includehosts' in POWDER-S. Stas says we can do away with that and (programmatically) reduce such elements to a regular expression - which I've been working on and, I think, proved [7] - at least for the string-based ones. As I understand it, Stas has gone a little further and established a framework for the expression of POWDER-S in which the processing steps to be taken are made explicit through the use of elements from the XSLT 2 and XSD namespaces. For each element it says "extract /this value/ from the candidate IRI and match is against /this/ value" - with regular expressions etc. provided. The same approach works for the string parts of a URI and the numerical ones (port numbers and CIDR blocks) - although the latter are, of necessity, complex. Note, even this approach requires the semantic extension that maps strings to IRIs. So, on one reading of the Stasinos approach, is that a POWDER Processor must support XSLT 2. Kevin clarified that this is not the case [8] (thankfully!). So it seems to me that the XSLT 2 and XSD elements formalise the semantics and processing model for POWDER, but do not, of themselves, create a constraint on implementations. OK - our task now is to decide what to do with all this. Bearing in mind that at our face to face in Athens in January we said we'd be ready for Last Call 'by Easter'. We hoped that we meant Occidental Easter but given the location of that meeting we really meant Orthodox Easter - and that means today, 25th April which is Good Friday by the Orthodox calendar. Whatever we do - we're already late and we are seriously running out of time. Our already extended charter expires at the end of the year (31st December by Orthodox and Occidental calendars!) - and we have the small matter of CR and PR to get through yet. I see three possibilities - if you have a fourth, please say so. 1. We carry on as we are. The Semantic Extension in the grouping doc is cited as the formal basis for an IRI set and we quietly leave Stasinos' work to one side, perhaps using sections where appropriate for defining what 'A POWDER Processor' must do. 2. We incorporate Stasinos' work into the two primary tech documents, probably replacing the text semantic extension section of the grouping document at [5] (I'm not sure how to do this). 3. We discuss and tidy up Stas's work a little but essentially we already have 90% of a a new document called 'POWDER: Formal Semantics". We re-phrase the relevant sections of the DR and Grouping docs, passing the formalism off to this new doc. Options 2 and 3 have two possible variants: Variant a) POWDER-S includes all the XSLT 2 and XSD elements Stas has used. Variant b) POWDER-S looks like it does in my 'try Again' e-mail [3] but the semantics of the terms regex (and, I think, portranges and ipranges) are defined in the Stasinos style. In other words we have a new layer to our semantics: 1. POWDER - Nice and friendly, mostly XML 2. POWDER-S - RDF/OWL* - i.e. it's OWL if you know what you've got 3. POWDER-Formal - What POWDER-S means Of these we would only fully implement POWDER 1 and 2 as part of our CR work (as we plan now) but 3 would provide further underpinning for POWDER-S and for the implementation of it. We would surely have to do at least some testing using an XSLT 2 tool to give the formalism some validity - if only to check the angle brackets. Stasinos' paper is below the references WDYT? Phil. [1] http://lists.w3.org/Archives/Public/public-powderwg/2008Mar/0017.html [2] http://lists.w3.org/Archives/Member/member-powderwg/2008Apr/0044.html [3] http://lists.w3.org/Archives/Public/public-powderwg/2008Apr/0054.html [4] http://lists.w3.org/Archives/Member/member-powderwg/2008Apr/0041.html [5] http://www.w3.org/TR/2008/WD-powder-grouping-20080324/#formalSemantics [6] http://www.w3.org/TR/rdf-mt/ [7] http://lists.w3.org/Archives/Public/public-powderwg/2008Apr/0013.html [8] http://lists.w3.org/Archives/Member/member-powderwg/2008Apr/0048.html Intro ===== POWDER/XML documents receive formal semantics through a GRDDL transform, associated with the POWDER namespace, that allows the XML data to be rendered and processed as OWL/RDF. Or, rather, POWDER-S, a fragment of OWL/RDF extended in a way that allows to referring to and operating upon the string representation of a resource. The POWDER/XML format specifies a number of elements denoting attribution, validity time, and other issues relating to the level of trust assigned to a POWDER document. These fall though the transform and are not meant to be interpreted in OWL/RDF; they are only meaningful when used by POWDER tools that use them as input to an extra-logical procedure which MAY use this data to decide whether the POWDER document _as a whole_ should be taken into account or discarded. We shall not deal with these elements any further, and proceed under the assumption that our document has passed all relevant tests. Unqualified names should be assumed to be in the wdr: namespace. DR Semantics ============ POWDER documents are used to describe sets of resources using description vocabularies defined in RDF or plain string literals (tags). POWDER/XML documents have <dr/> elements, each assigning all and every member of a set of descriptors to a set of resources. As an example, consider: <dr> <iriset>...</iriset> <descriptorset> <voc:colour ref="http://rgb.org/colours.rdf#red"/> <voc:shape>square</voc:shape> <tag>red</tag> <tag>light red</tag> <taglist>light red</taglist> </descriptorset> </dr> where <iriset/> specifies a set or resources in a way that will be dealt with later, and voc: is an arbitrary RDF vocabulary. The <voc:colour/> element specifies that the <voc:colour/> relation holds between all resources in specified by <iriset/> and the http://rgb.org/colours.rdf#red resource. The content of <voc:shape/> is interpreted as a string literal. The <voc:shape/> element specifies that all resources in <iriset/> has the value "square" for the <voc:shape/> dataproperty. <tag/> is a string property defined by POWDER. Its content is a single string literal, possibly including spaces. <taglists/> is a string property defined by POWDER. Its content is a space-separated list of string literals. The overall description of the resources in <iriset/> is the union of the descriptions in the <descriptorset/>. In our example: a voc:colour relation to http://rgb.org/colours.rdf#red AND a voc:shape "square" AND the tags "red", "light", and "light red" We formally interpret the above as follows: there is an OWL class containing all resources that share all of these properties, and there is an OWL class of all resources denoted by <iriset/>, and the latter is a subset of the former. In OWL/RDF we say: <RDF> <owl:Class rdf:ID="resourceset_1"> all resources specified by <iriset>...</iriset> </owl:Class> <owl:Class rdf:ID="description_1"> <owl:intersectionOf rdf:parseType="Collection"> <owl:Restriction> <owl:onProperty rdf:resource="voc:color"/> <owl:hasValue rdf:resource="http://rgb.org/colours.rdf#red"/> </owl:Restriction> <owl:Restriction> <owl:onProperty rdf:resource="voc:shape"/> <owl:hasValue>square</owl:hasValue> </owl:Restriction> <owl:Restriction> <owl:onProperty rdf:resource="wdr:tag"/> <owl:hasValue>red</owl:hasValue> </owl:Restriction> <owl:Restriction> <owl:onProperty rdf:resource="wdr:tag"/> <owl:hasValue>light</owl:hasValue> </owl:Restriction> <owl:Restriction> <owl:onProperty rdf:resource="wdr:tag"/> <owl:hasValue>red light</owl:hasValue> </owl:Restriction> </owl:intersectionOf> </owl:Class> <owl:Class rdf:about="#resourceset_1"> <rdfs:subClassOf rdf:ID="description_1"/> </owl:Class> </RDF> It is possible to have more than one <iriset/> elements, in which case a resource receives all of the the descriptions by belonging to any one of the corresponding resource sets. For example: <dr> <iriset>.1.</iriset> <iriset>.2.</iriset> <descriptorset> <voc:colour ref="http://rgb.org/colours.rdf#red"/> <taglist>light red</taglist> </descriptorset> </dr> receives the following semantics: <RDF> <owl:Class rdf:ID="resourceset_1"> all resources specified by <iriset>.1.</iriset> </owl:Class> <owl:Class rdf:ID="resourceset_2"> all resources specified by <iriset>.2.</iriset> </owl:Class> <owl:Class rdf:ID="description_1"> <owl:intersectionOf rdf:parseType="Collection"> <owl:Restriction> <owl:onProperty rdf:resource="voc:color"/> <owl:hasValue rdf:resource="http://rgb.org/colours.rdf#red"/> </owl:Restriction> <owl:Restriction> <owl:onProperty rdf:resource="wdr:tag"/> <owl:hasValue>red</owl:hasValue> </owl:Restriction> <owl:Restriction> <owl:onProperty rdf:resource="wdr:tag"/> <owl:hasValue>light</owl:hasValue> </owl:Restriction> </owl:intersectionOf> </owl:Class> <owl:Class> <owl:unionOf rdf:parseType="Collection"> <owl:Class rdf:about="#resourceset_1"/> <owl:Class rdf:about="#resourceset_2"/> </owl:unionOf> <rdfs:subClassOf rdf:ID="description_2"/> </owl:Class> </RDF> A POWDER/XML implementio is free to choose any traversal policy for treating miltiple </iriset> elements in a DR: first match wins, last match wins, shortest irisets first, and so on, as long as all irisets are tried before deciding that DR does not apply to a resource. The ordering of irisets is not important and a POWDER/XML implementation is free to try them in any order whatsoever (in order listed, shorter first, etc), as long as all irisets are tried before deciding that a resource is outside the scope of the DR. DR authors may use the order of the irisets to suggest an efficient scope evaluation strategy, by putting the irisets with the widest coverage first, so that an implementation that chooses to follow the suggested evaluation order is more likely to terminate the evaluation after fewer checks. POWDER Semantics ================ A POWDER document may have any number of <dr> elements, all of which are simultaneously asserted and ordering is not important. So, for example: <powder> <dr> <iriset>.1.</iriset> <descriptorset> <voc:shape>square</voc:shape> </descriptorset> </dr> <dr> <iriset>.2.</iriset> <descriptorset> <voc:colour ref="http://rgb.org/colours.rdf#red"/> </descriptorset> </dr> </powder> receives the following semantics: <RDF> <owl:Class rdf:ID="resourceset_1"> all resources specified by <iriset>.1.</iriset> </owl:Class> <owl:Class rdf:ID="resourceset_2"> all resources specified by <iriset>.2.</iriset> </owl:Class> <owl:Class rdf:ID="description_1"> <owl:intersectionOf rdf:parseType="Collection"> <owl:Restriction> <owl:onProperty rdf:resource="voc:shape"/> <owl:hasValue>square</owl:hasValue> </owl:Restriction> </owl:intersectionOf> </owl:Class> <owl:Class rdf:ID="description_2"> <owl:intersectionOf rdf:parseType="Collection"> <owl:Restriction> <owl:onProperty rdf:resource="voc:color"/> <owl:hasValue rdf:resource="http://rgb.org/colours.rdf#red"/> </owl:Restriction> </owl:intersectionOf> </owl:Class> <owl:Class rdf:about="#resourceset_1"> <rdfs:subClassOf rdf:resource="#description_1"/> </owl:Class> <owl:Class rdf:about="#resourceset_2"> <rdfs:subClassOf rdf:resource="#description_2"/> </owl:Class> </RDF> The <owl:intersectionOf/> of a singleton collection is the latter's single element anyway, so it is better to keep the <owl:intersectionOf/> element even though it is redundant, in order to keep the transform simple and not require the extra check. Note that resourceset_1 and resourceset_2 are not necessarity disjoint, so that some resources may be both red AND square. A POWDER document may have an <ol/> element with is an ordered list of <dr> elements, which receives a first-match semantics. <ol/> elements are meant to be used to express exceptions to more general rules. So, for example: <powder> <ol> <dr> <iriset>.1.</iriset> <descriptorset> <voc:shape>square</voc:shape> </descriptorset> </dr> <dr> <iriset>.2.</iriset> <descriptorset> <voc:shape>round</voc:shape> </descriptorset> </dr> <dr> <iriset>.3.</iriset> <descriptorset> <voc:shape>triangle</voc:shape> </descriptorset> </dr> </ol> </powder> receives the following formal semantics, where belonging to description_1 automatically precludes belonging to description_2 and description_3; and belonging to description_2 automatically precludes belonging to description_3: <RDF> <owl:Class rdf:ID="resourceset_1"> all resources specified by <iriset>.1.</iriset> </owl:Class> <owl:Class rdf:ID="resourceset_2"> all resources specified by <iriset>.2.</iriset> </owl:Class> <owl:Class rdf:ID="resourceset_3"> all resources specified by <iriset>.3.</iriset> </owl:Class> <owl:Class rdf:ID="description_1"> <owl:intersectionOf rdf:parseType="Collection"> <owl:Restriction> <owl:onProperty rdf:resource="voc:shape"/> <owl:hasValue>square</owl:hasValue> </owl:Restriction> </owl:intersectionOf> </owl:Class> <owl:Class rdf:ID="description_2"> <owl:intersectionOf rdf:parseType="Collection"> <owl:Restriction> <owl:onProperty rdf:resource="voc:shape"/> <owl:hasValue>round</owl:hasValue> </owl:Restriction> </owl:intersectionOf> </owl:Class> <owl:Class rdf:ID="description_3"> <owl:intersectionOf rdf:parseType="Collection"> <owl:Restriction> <owl:onProperty rdf:resource="voc:shape"/> <owl:hasValue>triangle</owl:hasValue> </owl:Restriction> </owl:intersectionOf> </owl:Class> <owl:Class rdf:about="#resourceset_1"> <rdfs:subClassOf rdf:resource="#description_1"/> </owl:Class> <owl:Class> <owl:intersectionOf rdf:parseType="Collection"> <owl:Class rdf:about="#resourceset_2"/> <owl:complementOf> <owl:Class rdf:about="#resourceset_1"/> </owl:complementOf> </owl:intersectionOf> <rdfs:subClassOf rdf:ID="description_2"/> </owl:Class> <owl:Class> <owl:intersectionOf rdf:parseType="Collection"> <owl:Class rdf:about="#resourceset_3"/> <owl:complementOf> <owl:Class rdf:about="#resourceset_2"/> </owl:complementOf> <owl:complementOf> <owl:Class rdf:about="#resourceset_1"/> </owl:complementOf> </owl:intersectionOf> <rdfs:subClassOf rdf:ID="description_3"/> </owl:Class> </RDF> IRI Sets ======== The last missing bit of the transformation now is the one that builds the <owl:Class rdf:ID="resourceset_X"/> descriptions from <iriset/> elements. <iriset/> elements subsume one or more elements, each representing a range of values for IRIs. An IRI is in the <iriset/> if it is covered by ALL of the elements in <iriset/>. The following range specifications MUST be supported: <includeIRItype/>,<excludeIRItype/> <includeIRItype/> and <excludeIRItype/> elements have two children nodes: an <xsl:analyze-string/> element, as defined in the XSLT2 specification [XSLT2] and an <xsd:simpleType/> element, as defined in the XML Schema specification [1]. An IRI is in the range of <includeIRItype/> if, after being transformed by <xsl:analyze-string/>, the result of the transformation is within the lexical space of the XSD type. An IRI is in the range of <excludeIRItype/> if, after being transformed by <xsl:analyze-string/>, the result of the transformation is outside the lexical space of the XSD type. The intended use of this mechanism is that <xsl:analyze-string/> is used to tokenize the IRI into meaningful sub-strings, which can then be checked against XSD facet restrictions. This allows POWDER to handle situations where numerical comparisons are required, like port ranges. For example: <iriset> <includeIRItype> <xsl:analyze-string select="." regex = "{'^"{'^http://([^:/?#@]*)\.example\.org:([0-9]+)'}"> <xsl:matching-substring> <xsl:value-of select="regex-group(2)"/> </xsl:matching-substring> <xsl:non-matching-substring> 0 </xsl:non-matching-substring> </xsl:analyze-string> <xs:simpleType> <xsd:restriction base="integer"> <xsd:minInclusive value="80" /> <xsd:maxInclusive value="100" /> </xsd:restriction> </xs:simpleType> </includeIRItype> </iriset> specifies all resources on http://example.org and any subdomain thereof, fetched from ports 80-100. It might sometimes be easier to concetrate on parts of an IRI and specify constraints as a series of restrictions, all of which must match. We shall revisit this point when discussing the wdrurl extension. The <iriset/> mechanism allows a DR to express any grouping of resources whatsoever, no matter how complex: (A) each include* and exclude* element expresses an atomic proposition. For all X, if includeX exists, excludeX also exists and vice versa; furthermore includeX and excludeX are mutually exclusive. Hence, one can negate all atomic propositions, although not complex propositions. (B) An <iriset/> may contain multiple include* and exclude* tags, and all must hold for the iriset to hold. Hence one can express the conjunction of any set of atomic propositions and negations of atomic propositions. (C) A DR may contain multiple <iriset/> elements, and if any of them holds, then the DR holds. Hence one can express the disjunction of conjunctions of sets of atomic propositions and negations of atomic propositions. The three expressions above allow the expression of Disjunctive Normal Form proposition. Since arbitrarily complex propositions can be brought into DNF, the three expressions above allow the expression of any proposition. IRI Set Semantics ================= Providing OWL/RDF semantics for <iriset/> elements is not directly possible, since RDF does not provide any means for accessing or manipulating the string representation of an IRI. We extend OWL/RDF with a built-in hasIRI datatype property as follows: hasIRI rdf:type owl:DatatypeProperty . hasIRI rdf:type owl:Property . hasIRI rdfs:domain owl:Thing . hasIRI rdfs:range xsd:string . and the further stipulation that R owl:hasIRI s . iff the string representation of resource R is s. Furthermore, we extend the RDF datatype map with a new datatype for each <includeIRItype/> element in the POWDER/XML document. All these datatypes d are subsumed by the wdr:IRIType datatype, which is subsumed by xsd:string : wdr:iriType rdf:type rdfs:Datatype . wdr:iriType rdfs:subClassOf rdfs:Literal . wdr:iriType rdfs:subClassOf xsd:string . d rdf::type rdfs:Datatype . d rdfs:subClassOf wdr:iriType . These iriType nodes have: (a) a wdr:transform property with an xsl:analyze-string value, (b) a wdr:hasType property with an xsd:simpleType value. wdr:transform rdfs:domain wdr:iriType . wdr:hasType rdfs:domain wdr:iriType . The semantics of wdr:iriType nodes is: (a) their lexical space is the subset of xsd:string that, after going through the transformation pointed at by wdr:transform, will be in the lexical space of the XSD type pointed at by wdr:hasType (b) their lexical-to-value mapping is the same as for xsd:string (c) their value space is the same as for xsd:string It is now possible to provide semantics to <iriset/> by constructing an RDF datatype from the <iriset/> and restricting the values of hasIRI to the new datatype. So the example above becomes: <owl:Class> <owl:Restriction> <owl:onProperty rdf:resource="&owl;hasIRI"/> <owl:allValuesFrom> <rdfs:Datatype> <rdfs:subClassOf rdf:resource="&wdr;iriType"/> <wdr:transform> <xsl:analyze-string select="." regex = "{'^"{'^http://([^:/?#@]*)\.example\.org:([0-9]+)'}"> <xsl:matching-substring> <xsl:value-of select="regex-group(2)"/> </xsl:matching-substring> <xsl:non-matching-substring>0</xsl:non-matching-substring> </xsl:analyze-string> </wdr:transform> <wdr:hasType> <xs:simpleType> <xsd:restriction base="integer"> <xsd:minInclusive value="80" /> <xsd:maxInclusive value="100" /> </xsd:restriction> </xs:simpleType> </wdr:hasType> <rdfs:Datatype> </owl:allValuesFrom> </owl:Restriction> </owl:Class> which describes the set of all abstract resources, the concrete IRI string of which is such that when transformed as described by wdr:transform will yield a literal which is in the lexical space of the value of wdr:hasType. An <excludeIRItype/> element would translate to: <owl:Class> <owl:ComplementOf> <owl:Restriction> <owl:onProperty rdf:resource="owl:hasIRI"/> <owl:allValuesFrom> <rdfs:Datatype> ... </rdfs:Datatype> </owl:allValuesFrom> </owl:Restriction> <owl:ComplementOf> </owl:Class> to describe the set of all abstract resources, the concrete IRI string of which is such that when transformed as described by wdr:transform will yield a literal which is not in the lexical space of the value of wdr:hasType. IRISet Extensions ================= In Sect "IRISet Semantics" above, a vocabulary of 6 tags was specified for defining sets of resources through their IRIs. [[ PA: these got lost in the revision. The 6 referred to are <includepattern/>,<excludepattern/>, <includeports/>,<excludeports/>, <includeCIDRranges/>,<excludeCIDRranges/> ]] Except for the numerical port and IP restrictions over URLs, the only operation supported over generic IRIs is regular expession matching. Creators of POWDER documents may extend the vocabulary used in specifying IRI Sets, by defining new <iriset/> elements. All such extentions to the POWDER vocabulary MUST be defined by means of GRDDL transformations [GRDDL] to terms of the basic POWDER vocabulary in the wdr: namespace. Extensions do not need to, but are well advised to, define pairs of complementary vocabulary items (includeX and excludeX) for the reasons explained above. Developers of POWDER tools MAY directly implement extensions they know about, but MUST include a mechanism for retrieving and applying the GRDDL transformations to extensions they do not know about. The URLSet Extension ==================== POWDER's basic use cases involve information resources available on the Web, identified by URLs containing host names, directory paths, IP addresses, port numbers, and so on. POWDER-WG provides the URLSet extension to IRISet, by defining the following vocabulary items under the wdrurl namespace: <wdrurl:includeschemes/> <wdrurl:excludeschemes/> <wdrurl:includehosts/> <wdrurl:excludehosts/> <wdrurl:includeexactpaths/> <wdrurl:excludeexactpaths/> <wdrurl:includepathcontains/> <wdrurl:excludepathcontains/> <wdrurl:includepathstartswith/> <wdrurl:excludepathstartsWith/> <wdrurl:includepathendswith/> <wdrurl:excludepathendsWith/> <wdrurl:includequerycontains/> <wdrurl:excludequerycontains/> <wdrurl:includeexactqueries/> <wdrurl:excludeexactqueries/> <wdrurl:includepattern/> <wdrurl:excludepattern/> <wdrurl:includeports/> <wdrurl:excludeports/> <wdrurl:includeCIDRranges/> <wdrurl:excludeCIDRranges> pathcontains and querycontains may appear any number of times within an IRI set definition, but the rest may appear up to once. These receive semantics in terms of the POWDER IRISet vocabulary through the Rabin regular expression [Rabin], which splitis URIs into their component parts: (([^:/?#]+):)?(//([^:/?#@]*)(:([0-9]+))?)?([^?#]*)(\?([^#]*))? We shall write rre to mean the string representation of the Rabin regular expression. In this manner, <wdrurl:includeschemes>http ftp</wdrurl:includeschemes> means: <iriset> <includeIRItype> <xsl:analyze-string select="." regex = "{'rre'}"> <xsl:matching-substring> <xsl:value-of select="regex-group(2)"/> </xsl:matching-substring> <xsl:non-matching-substring> 0 </xsl:non-matching-substring> </xsl:analyze-string> <xs:simpleType> <xsd:restriction base="string"> <enumeration value="http"/> <enumeration value="ftp"/> </xsd:restriction> </xs:simpleType> </includeIRItype> </iriset> wdrurl:includehosts is more complicated, as it specifies the suffix of the host group of the IRI, and not the whole group. <wdrurl:includehosts>example.org example.net</wdrurl:includehosts> means: <iriset> <includeIRItype> <xsl:analyze-string select="." regex = "{'rre'}"> <xsl:matching-substring> <xsl:value-of select="regex-group(4)"/> </xsl:matching-substring> <xsl:non-matching-substring> 0 </xsl:non-matching-substring> </xsl:analyze-string> <xs:simpleType> <xsd:restriction base="string"> <xsd:pattern value="^|\.(example\.org)|(example\.net)$" /> </xsd:restriction> </xs:simpleType> </includeIRItype> </iriset> And so on for the various string parts. <wdrurl:includepattern>some_reg_exp</wdrurl:includepattern> can be used as a less verbose way of saying: <includeIRItype> <xsl:analyze-string select="." regex = "{'some_reg_exp'}"> <xsl:matching-substring>yes</xsl:matching-substring> <xsl:non-matching-substring>no</xsl:non-matching-substring> </xsl:analyze-string> <xs:simpleType> <xsd:restriction base="string"> <enumeration value="yes"/> </xsd:restriction> </xs:simpleType> </includeIRItype> It might sometimes be easier to concetrate on parts of an IRI and specify constraints as a series of restrictions, all of which must match. For instance, the IRISet: <iriset> <includehosts>example.org</includehosts> <includepattern> ^[^?]+\?(.*&)?s=football[&$] </includepattern> <includepattern> ^[^?]+\?(.*&)?c=gr[&$] </includepattern> <includepattern> ^[^?]+\?(.*&)?l=first[&$] </includepattern> </iriset> is a way of requesting three query conjuncts in any order, and is much shorter and clearer than having to list all possible permutations. Port ranges are handled slightly differently, are they impose numerical restrictions, so that: <includeports>80 8080-8100</includeports> translates to (noting that absence of a port in the IRI defaults to port 80): <includeIRItype> <xsl:analyze-string select="." regex = "{'rre'}"> <xsl:matching-substring> <xsl:value-of select="regex-group(6)"/> </xsl:matching-substring> <xsl:non-matching-substring> 80 </xsl:non-matching-substring> </xsl:analyze-string> <xsd:simpleType> <xsd:union> <xsd:simpleType> <xsd:restriction base="integer"> <xsd:enumeration value="80"/> </xsd:restriction> </xsd:simpleType> <xsd:simpleType> <xsd:restriction base="integer"> <xsd:minInclusive value="8080" /> <xsd:maxInclusive value="8100" /> </xsd:restriction> </xsd:simpleType> </xsd:union> </xsd:simpleType> </includeIRItype> CIDR ranges are even trickier, as they require some more sophisticated calculations. <includeCIDRranges>aaa.bbb.ccc.ddd/rr</includeCIDRranges> means: <includeIRItype> <xsl:analyze-string select="." regex = "{'([0-9]{1-3})\.([0-9]{1-3})\.([0-9]{1-3})\.([0-9]{1-3})'}"> <xsl:matching-substring> <xsl:value-of select="regex-group(1) * 255 * 255 * 255 + regex-group(2) * 255 * 255 + regex-group(3) * 255 + regex-group(4)"/> </xsl:matching-substring> <xsl:non-matching-substring> -1 </xsl:non-matching-substring> </xsl:analyze-string> <xsd:simpleType> <xsd:restriction base="integer"> <xsd:minInclusive value="minV" /> <xsd:maxInclusive value="maxV" /> </xsd:restriction> </xsd:simpleType> </includeIRItype> where minV and maxV are replaced by appropriate numerical values at the time of the wdrurl -> wdr transform as follows: (UNTESTED, but you get the general gist: convert the 4-tuple of bytes to a single integer, so one can do comparisons.) <xsl:template match="includeCIDRranges"> <includeIRItype> <axsl:analyze-string select="." regex = "{'([0-9]{1-3})\.([0-9]{1-3})\.([0-9]{1-3})\.([0-9]{1-3})'}"> <axsl:matching-substring> <axsl:value-of select="regex-group(1) * 255 * 255 * 255 + regex-group(2) * 255 * 255 + regex-group(3) * 255 + regex-group(4)"/> </axsl:matching-substring> <axsl:non-matching-substring> -1 </axsl:non-matching-substring> </axsl:analyze-string> <xsd:simpleType> <xsd:restriction base="integer"> <xsl:analyze-string select="." regex = "{'([0-9]{1-3})\.([0-9]{1-3})\.([0-9]{1-3})\.([0-9]{1-3})(/([0-9]{1-2}))?'}"> <xsl:matching-substring> <xsl:call-template name="minIP"> <xsl:with-param name="ip" <xsl:value-of select="regex-group(1) * 255 * 255 * 255 + regex-group(2) * 255 * 255 + regex-group(3) * 255 + regex-group(4)"/> <xsl:with-param name="rr" select="regex-group(6)"/> <xsl:with-param name="acc" "0"/> </xsl:call-template> <xsl:call-template name="maxIP"> <xsl:with-param name="ip" <xsl:value-of select="regex-group(1) * 255 * 255 * 255 + regex-group(2) * 255 * 255 + regex-group(3) * 255 + regex-group(4)"/> <xsl:with-param name="rr" select="regex-group(6)"/> <xsl:with-param name="acc" "0"/> </xsl:call-template> </xsl:matching-substring> <xsl:non-matching-substring> <xsl:call-template name="minIP"> <xsl:with-param name="ip" <xsl:value-of select="regex-group(1) * 255 * 255 * 255 + regex-group(2) * 255 * 255 + regex-group(3) * 255 + regex-group(4)"/> <xsl:with-param name="rr" "32"/> <xsl:with-param name="acc" "0"/> </xsl:call-template> <xsl:call-template name="maxIP"> <xsl:with-param name="ip" <xsl:value-of select="regex-group(1) * 255 * 255 * 255 + regex-group(2) * 255 * 255 + regex-group(3) * 255 + regex-group(4)"/> <xsl:with-param name="rr" "32"/> <xsl:with-param name="acc" "0"/> </xsl:call-template> </xsl:non-matching-substring> </xsd:restriction> </xsd:simpleType> </includeIRItype> </xsl:template> <xsl:template name="minIP"> <xsl:param name="ip"/> <xsl:param name="rr"/> <xsl:variable name="acc" as="xs:integer" select="{$ip}"> <xsl:for-each select="1 to {$rr}"> <xsl:value-of select=". idiv 2"/> </xsl:for-each> </xsl:variable> <xsl:variable name="min" as="xs:integer" select="{$acc}"> <xsl:for-each select="1 to {$rr}"> <xsl:value-of select=". * 2"/> </xsl:for-each> </xsl:variable> <xsd:minInclusive value="{$min}" /> <xsl:template name="minIP"> <xsl:template name="maxIP"> <xsl:param name="ip"/> <xsl:param name="rr"/> <xsl:variable name="acc" as="xs:integer" select="{$ip}"> <xsl:for-each select="1 to {$rr}"> <xsl:value-of select="(. idiv 2) + 1"/> </xsl:for-each> </xsl:variable> <xsl:variable name="max" as="xs:integer" select="{$acc}"> <xsl:for-each select="1 to {$rr}"> <xsl:value-of select=". * 2"/> </xsl:for-each> </xsl:variable> <xsd:maxInclusive value="{$max}" /> <xsl:template name="maxIP"> Multiple Layers of Extensions ============================= It might sometimes be useful to also build upon already defined extensions. For example, some content providers serve dynamic content stored in a database, so that IRIs express queries to the database. This kind of IRIs have certain structure, but this structure is neither obvious nor easily human-interpreted. Furthemore, conventional grouping mechanisms cannot be used to group resources, as the site structure does not match any directory hierarchy. As an example, consider sport.example.com, a sports news site, where IRIs look like the one shown in Example 3-2-1. The adopted scheme is systematic so that sport=2&countryID=16 provides a front page with news about Greek basketball and links to various Greek basketball leagues, sport=3&countryID=16 a front page about Greek volleyball, etc. Eg: http://sport.example.com/matches.asp?sport=1&countryID=16&champID=2 A POWDER document providing metadata about this web site would have to use regular expression matching with explicit reference to the numerical values in the country and sport fields of the query. This process is error-prone, and requires extensive changes if the underlying database schema is modified or extended. As an alternative, the site developer may provide a POWDER vocabulary extension that abstracts away from the database schema to allow reference to sports and countries. POWDER document authors can then use the properties in this extension to create POWDER documents are valid even if the site schema is modified, as long as the site developer updates the relevant transformations. So a POWDER/XML document might look like this: <wdrsport:SportWDR xmlns:wdrsport="http://www.sports.example.com/resolvable#" xmlns:wdrurl="http://www.w3.org/2007/05/powder/resolvable#" xmlns:wdr="http://www.w3.org/2007/05/powder#" xmlns:voc="http://www.example.org/vocabulary.rdf#"> <wdr:dr> <wdr:iriset> <wdrurl:includeschemes>http</wdrurl:includeschemes> <wdrurl:includehosts>sport.example.com</wdrurl:includehosts> <countries>Greece</countries> <sports>Football Basketball</sports> </wdr:iriset> <wdr:descriptorset> <voc:shape>round</voc:shape> </wdr:descriptorset> </wdr:dr> </wdrsport:SportWDR> A POWDER/XML tool specifically built for sport.example.com or other sites following the same query patterns will immediately know how to handle this information. Other POWDER tools will apply the GRDDL transform associated with the wdrsport: namespace to get the following translation: <wdrurl:POWDER xmlns:wdrurl="http://www.w3.org/2007/05/powder/resolvable#" xmlns:wdr="http://www.w3.org/2007/05/powder#" xmlns:voc="http://www.example.org/vocabulary.rdf#"> <wdr:dr> <wdr:iriset> <includeschemes>http</includeschemes> <includehosts>sport.example.com</includehosts> <includequerycontains>countryID=16</includequerycontains> <includequerycontains>countryID=16</includequerycontains> <includequerycontains>sport=1 sport=2</includequerycontains> </wdr:iriset> <wdr:descriptorset> <voc:shape>round</voc:shape> </wdr:descriptorset> </wdr:dr> </wdrurl:POWDER> A web-oriented POWDER/XML tool will immediately know what to do with wdrurl: vocabulary items. Other POWDER tools will apply the GRDDL transform associated with the wdrurl: namespace to get the vanilla POWDER translation. Finally, an even more generic RDF/OWL tool will apply the transform associated with the wdr: namespace to get the even more verbose RDF/OWL translation, as described above. Non-URL Identifiers =================== Although POWDER is mostly involved with resources that are identified by URLs, there is a number of other use cases; for example one might use POWDER to provide meta-data about physical, off-line resources like books or DVDs. The International Standard Audiovisual Number [ISAN1] is a voluntary numbering system for the identification of audiovisual works. Following ISO 15706, the numbers are written as 24 bit hexadecimal digits in the following format [ISAN2]. -----root----- episode -version- ISAN 1881-66C7-3420 - 0000 -7- 9F3A-0245 -U The root of an ISAN number is assigned to a core work with the other numbers being used for things like episodes, different language versions, promotional trailers and so on. Since ISAN numbers are URNs [URN], and hence IRIs of the urn: scheme [URIS], a vocabulary can readily be defined to allow IRI Sets to be defined based on ISAN numbers. The terms might be along the lines of: includeroots — the value of which would be a white space separated of hexadecimal digits and hyphens that would be matched against the first three blocks in the ISAN number. includeepisodes — a white space separated list of hexadecimal digits and hyphens that would be matched against the 4th block of 4 digits in the ISAN number. includeversions — a white space separated list of hexadecimal digits and hyphens that would be matched against the 5th and 6th blocks of 4 digits in the ISAN number. The set of all audio visual resources that relate to two particular works might then be so defined: Custom ISAN pattern: <wdr:iriset> <isan:includeroots>1881-66C7-3420 1881-66C7-3421</isan:includeroots> </wdr:iriset> Corresponding vanilla POWDER/XML: <iriset> <includeIRItype> <xsl:analyze-string select="." regex = "{'^urn:isan:([0-9A-F]{4})-([0-9A-F]{4})-([0-9A-F]{4})-([0-9A-F]{4})-[0-9A-F]-([0-9A-F]{4})-([0-9A-F]{4})-[0-9A-F]'}"> <xsl:matching-substring> <xsl:value-of select="regex-group(1)"/> <xsl:value-of select="regex-group(2)"/> <xsl:value-of select="regex-group(3)"/> </xsl:matching-substring> <xsl:non-matching-substring>GGGG-GGGG-GGGG</xsl:non-matching-substring> </xsl:analyze-string> <xsd:simpleType> <xsd:union> <xsd:simpleType> <xsd:restriction base="string"> <enumeration value="1881-66C7-3420"/> </xsd:restriction> </xsd:simpleType> <xsd:simpleType> <xsd:restriction base="string"> <enumeration value="1881-66C7-3421"/> </xsd:restriction> </xsd:simpleType> </xsd:union> </xsd:simpleType> </includeIRItype> </iriset> This example demonstrates the extendability power offered by using XSLT2 transformations: numerical constraints (like, here, defining numerical ranges for, say, the 3rd block) can easily be defined using wdr: primitives. REFERENCES ========== [1] http://www.w3.org/TR/2004/REC-xmlschema-2-20041028/datatypes.html#rf-pattern [2] http://www.w3.org/TR/2004/REC-xmlschema-1-20041028/structures.html#Complex_Type_Definitions [3] http://www.w3.org/TR/owl-semantics/syntax.html#2.1 [4] http://www.w3.org/TR/owl-semantics/mapping.html [GRDDL] http://www.w3.org/TR/grddl/ [XSLT2] http://www.w3.org/TR/xslt20/ [Rabin] J. Rabin, URI Pattern Matching for Groups of Resources. Draft 0.1 17 June 2006. http://www.w3.org/2005/Incubator/wcl/matching.html [URN] http://www.iana.org/assignments/urn-namespaces [ISAN1] http://www.isan.org/ [ISAN2] http://www.isan.org/portal/page?_pageid=166,41960&_dad=portal&_schema=PORTAL
Received on Friday, 25 April 2008 14:23:27 UTC