Re: XSLT for port splitting

Moving this to the public list...

This is important and very useful work, thanks Stasinos. I've had a look
through and it looks as if we can add some detail that has been missing so
far. Good. Kevin will follow up on the XSLT side of things but my
immediate concern is working out how and where to integrate this with the
current documents.

We have a vocabulary doc and a datatype doc that haven't been touched for
ages (they go back to pre-TPAC meeting times when we were young and
care-free) and some of what you've written looks as if it belongs in one
or other of those? I'm also a little confused about where a POWDER doc
ends and the XSLT begins - the examples here are all for the XSLT and
datatype/schema doc? Maybe all of what you've written becomes a new
section in the DR doc? OR the grouping doc?

Andrea - I know you're over worked and Alessandro will be doing his best
to ensure that a full night's sleep dissolves into historical legend for
you but I'd really welcome your input if possible.

Of course I'm also worried about the schedule... CR by the summer and all
that.

But - off list I've been talking to Alan about developing a Technosite
implementation and that side of things should be starting up... progress
is being made all round, I'm just trying to make sure we tick the boxes on
the Recommendations Track as we go - our charter runs out in 7 months.

Stasinos is about to disappear for 2 weeks (Orthodox Easter followed by a
conference), Smith minimus due any minute and so on. And we don't even
have a shell for the Test Suite yet...

Sorry - I'm just venting anxiety here.

Phil

Smith, Kevin, VF-Group wrote:
> Thanks Stasinos. I'll read this in detail when I'm in the office, but
just one question now: the candidate URI is not available to the XSLT,
because the XSLT is not a POWDER processor but a generic POWDER to
POWDER-S transformer. So the XSLT cannot itself determine resourceset
membership for a candidate URI. So when in the example below you have:
> <iriset>
>   <includeIRItype>
>      <xsl:analyze-string select="."
>
> ...then what is the node that "." refers to?
>
> Cheers
> Kevin
>
> ----- Original Message -----
> From: member-powderwg-request@w3.org <member-powderwg-request@w3.org>
> To: Phil Archer <parcher@icra.org>
> Cc: Member POWDER <member-powderwg@w3.org>
> Sent: Thu Apr 24 03:46:33 2008
> Subject: Re: XSLT for port splitting
>
>
> On Wed Apr 23 12:35:44 2008 Phil Archer said:
>
>> Smith, Kevin, VF-Group wrote:
>>> Hi Stasinos,
>>>
>>> Here is a sample XSLT showing how to split a whitespace separated list
>>> of ports into a number of XSD Datatypes, either constrained to a single
>>> value or range. I hope it is still relevant given this morning's emails!
>>>
>>> Phil, I must confess I'm not sure what I need to do with the 'Rabin
>>> Regex', since the candidate URI is not available to the XSLT...?
>> I'm not sure either, it's one of those Stasinos mysteries, but I think
>> it's about saying (in the XSLT) match the candidate IRI against
>> http://www.w3.org/2007/powder/Group/powder-grouping/20080416.html#rabinsRegEx
>>
>> and then doing something clever with the result...
>
> Pretty much, yes. It's ugly, but I guess almost nobody will ever use
> vanilla POWDER, as there will all the right domain-specific (URL, ISAN,
> etc) extensions available.
>
> So, here you go, a further elaboration of the iriset & extension chunk of
> the original long email:
>
>
> IRI Sets
> ========
>
>
> The last missing bit of the transformation now is the one that builds
> the <owl:Class rdf:ID="resourceset_X"/> descriptions from <iriset/>
> elements.
>
> <iriset/> elements subsume one or more elements, each
> representing a range of values for IRIs. An IRI is in the <iriset/> if
> it is covered by ALL of the elements in <iriset/>. The following
> range specifications MUST be supported:
>
>  <includeIRItype/>,<excludeIRItype/>
>
> <includeIRItype/> and <excludeIRItype/> elements have two children
> nodes: an <xsl:analyze-string/> element, as defined in the XSLT2
> specification [XSLT2] and an <xsd:simpleType/> element, as defined in
> the XML Schema specification [1]. An IRI is in the range of
> <includeIRItype/> if, after being transformed by
> <xsl:analyze-string/>, the result of the transformation is within the
> lexical space of the XSD type. An IRI is in the range of
> <excludeIRItype/> if, after being transformed by
> <xsl:analyze-string/>, the result of the transformation is outside the
> lexical space of the XSD type.
>
> The intended use of this mechanism is that <xsl:analyze-string/> is
> used to tokenize the IRI into meaningful sub-strings, which can then
> be checked against XSD facet restrictions. This allows POWDER to
> handle situations where numerical comparisons are required, like port
> ranges. For example:
>
> <iriset>
>   <includeIRItype>
>     <xsl:analyze-string select="."
>                         regex =
"{'^http://([^:/?#@]*)\.example\.org:([0-9]+)'}">
>       <xsl:matching-substring>
>         <xsl:value-of select="regex-group(2)"/>
>       </xsl:matching-substring>
>       <xsl:non-matching-substring>
>         0
>       </xsl:non-matching-substring>
>     </xsl:analyze-string>
>     <xs:simpleType>
>       <xsd:restriction base="integer">
>         <xsd:minInclusive value="80" />
>         <xsd:maxInclusive value="100" />
>       </xsd:restriction>
>     </xs:simpleType>
>   </includeIRItype>
> </iriset>
>
> specifies all resources on http://example.org and any subdomain
> thereof, fetched from ports 80-100.
>
> It might sometimes be easier to concetrate on parts of an IRI and
> specify constraints as a series of restrictions, all of which must
> match. We shall revisit this point when discussing the wdrurl
> extension.
>
> The <iriset/> mechanism allows a DR to express any grouping of
> resources whatsoever, no matter how complex:
>
> (A) each include* and exclude* element expresses an atomic
>     proposition. For all X, if includeX exists, excludeX also exists
>     and vice versa; furthermore includeX and excludeX are mutually
>     exclusive. Hence, one can negate all atomic propositions, although
>     not complex propositions.
>
> (B) An <iriset/> may contain multiple include* and exclude* tags, and
>     all must hold for the iriset to hold. Hence one can express the
>     conjunction of any set of atomic propositions and negations of
>     atomic propositions.
>
> (C) A DR may contain multiple <iriset/> elements, and if any of them
>     holds, then the DR holds. Hence one can express the disjunction of
>     conjunctions of sets of atomic propositions and negations of
>     atomic propositions.
>
> The three expressions above allow the expression of Disjunctive Normal
> Form proposition. Since arbitrarily complex propositions can be
> brought into DNF, the three expressions above allow the expression of
> any proposition.
>
>
> IRI Set Semantics
> =================
>
> Providing OWL/RDF semantics for <iriset/> elements is not directly
> possible, since RDF does not provide any means for accessing or
> manipulating the string representation of an IRI. We extend OWL/RDF
> with a built-in hasIRI datatype property as follows:
>
> hasIRI rdf:type owl:DatatypeProperty .
> hasIRI rdf:type owl:Property .
> hasIRI rdfs:domain owl:Thing .
> hasIRI rdfs:range xsd:string .
>
> and the further stipulation that
>  R owl:hasIRI s .
> iff the string representation of resource R is s.
>
> Furthermore, we extend the RDF datatype map with a new datatype for
> each <includeIRItype/> element in the POWDER/XML document. All these
> datatypes d are subsumed by the wdr:IRIType datatype, which is
> subsumed by xsd:string :
>
> wdr:iriType rdf:type rdfs:Datatype .
> wdr:iriType rdfs:subClassOf rdfs:Literal .
> wdr:iriType rdfs:subClassOf xsd:string .
> d rdf::type rdfs:Datatype .
> d rdfs:subClassOf wdr:iriType .
>
> These iriType nodes have:
> (a) a wdr:transform property with an xsl:analyze-string value,
> (b) a wdr:hasType property with an xsd:simpleType value.
>
> wdr:transform rdfs:domain wdr:iriType .
> wdr:hasType rdfs:domain wdr:iriType .
>
> The semantics of wdr:iriType nodes is:
>
> (a) their lexical space is the subset of xsd:string that, after going
>     through the transformation pointed at by wdr:transform, will be in
>     the lexical space of the XSD type pointed at by wdr:hasType
> (b) their lexical-to-value mapping is the same as for xsd:string
> (c) their value space is the same as for xsd:string
>
> It is now possible to provide semantics to <iriset/> by constructing
> an RDF datatype from the <iriset/> and restricting the values of
> hasIRI to the new datatype. So the example above becomes:
>
> <owl:Class>
>   <owl:Restriction>
>     <owl:onProperty rdf:resource="&owl;hasIRI"/>
>     <owl:allValuesFrom>
>       <rdfs:Datatype>
>         <rdfs:subClassOf rdf:resource="&wdr;iriType"/>
>         <wdr:transform>
>           <xsl:analyze-string
>                   select="."
>                   regex = "{'^http://([^:/?#@]*)\.example\.org:([0-9]+)'}">
>             <xsl:matching-substring>
>               <xsl:value-of select="regex-group(2)"/>
>             </xsl:matching-substring>
>             <xsl:non-matching-substring>0</xsl:non-matching-substring>
>           </xsl:analyze-string>
>         </wdr:transform>
>         <wdr:hasType>
>           <xs:simpleType>
>             <xsd:restriction base="integer">
>               <xsd:minInclusive value="80" />
>               <xsd:maxInclusive value="100" />
>             </xsd:restriction>
>           </xs:simpleType>
>         </wdr:hasType>
>       <rdfs:Datatype>
>     </owl:allValuesFrom>
>   </owl:Restriction>
> </owl:Class>
>
> which describes the set of all abstract resources, the concrete IRI
> string of which is such that when transformed as described by
> wdr:transform will yield a literal which is in the lexical space of
> the value of wdr:hasType.
>
> An <excludeIRItype/> element would translate to:
>
> <owl:Class>
>   <owl:ComplementOf>
>     <owl:Restriction>
>       <owl:onProperty rdf:resource="owl:hasIRI"/>
>       <owl:allValuesFrom>
>         <rdfs:Datatype> ... </rdfs:Datatype>
>       </owl:allValuesFrom>
>     </owl:Restriction>
>   <owl:ComplementOf>
> </owl:Class>
>
> to describe the set of all abstract resources, the concrete IRI string
> of which is such that when transformed as described by wdr:transform
> will yield a literal which is not in the lexical space of the value of
> wdr:hasType.
>
>
> IRISet Extensions
> =================
>
> In Sect "IRISet Semantics" above, a vocabulary of 6 tags was specified for
> defining sets of resources through their IRIs. Except for the
> numerical port and IP restrictions over URLs, the only operation
> supported over generic IRIs is regular expession matching.
>
> Creators of POWDER documents may extend the vocabulary used in
> specifying IRI Sets, by defining new <iriset/> elements. All such
> extentions to the POWDER vocabulary MUST be defined by means of GRDDL
> transformations [GRDDL] to terms of the basic POWDER vocabulary in the
> wdr: namespace.
>
> Extensions do not need to, but are well advised to, define pairs
> of complementary vocabulary items (includeX and excludeX) for the
> reasons explained above.
>
> Developers of POWDER tools MAY directly implement extensions they know
> about, but MUST include a mechanism for retrieving and applying the
> GRDDL transformations to extensions they do not know about.
>
>
> The URLSet Extension
> ====================
>
> POWDER's basic use cases involve information resources available on
> the Web, identified by URLs containing host names, directory paths, IP
> addresses, port numbers, and so on. POWDER-WG provides the URLSet
> extension to IRISet, by defining the following vocabulary items under
> the wdrurl namespace:
>
> <wdrurl:includeschemes/>        <wdrurl:excludeschemes/>
> <wdrurl:includehosts/>          <wdrurl:excludehosts/>
> <wdrurl:includeexactpaths/>     <wdrurl:excludeexactpaths/>
> <wdrurl:includepathcontains/>   <wdrurl:excludepathcontains/>
> <wdrurl:includepathstartswith/> <wdrurl:excludepathstartsWith/>
> <wdrurl:includepathendswith/>   <wdrurl:excludepathendsWith/>
> <wdrurl:includequerycontains/>  <wdrurl:excludequerycontains/>
> <wdrurl:includeexactqueries/>   <wdrurl:excludeexactqueries/>
> <wdrurl:includepattern/>        <wdrurl:excludepattern/>
> <wdrurl:includeports/>          <wdrurl:excludeports/>
> <wdrurl:includeCIDRranges/>     <wdrurl:excludeCIDRranges>
>
> pathcontains and querycontains may appear any number of times within
> an IRI set definition, but the rest may appear up to once.
>
> These receive semantics in terms of the POWDER IRISet vocabulary
> through the Rabin regular expression [Rabin], which splitis URIs into
> their component parts:
>   (([^:/?#]+):)?(//([^:/?#@]*)(:([0-9]+))?)?([^?#]*)(\?([^#]*))?
> We shall write rre to mean the string representation of the
> Rabin regular expression.
>
> In this manner,
>
>   <wdrurl:includeschemes>http ftp</wdrurl:includeschemes>
>
> means:
>
> <iriset>
>   <includeIRItype>
>     <xsl:analyze-string select="." regex = "{'rre'}">
>       <xsl:matching-substring>
>         <xsl:value-of select="regex-group(2)"/>
>       </xsl:matching-substring>
>       <xsl:non-matching-substring>
>         0
>       </xsl:non-matching-substring>
>     </xsl:analyze-string>
>     <xs:simpleType>
>       <xsd:restriction base="string">
>         <enumeration value="http"/>
>         <enumeration value="ftp"/>
>       </xsd:restriction>
>     </xs:simpleType>
>   </includeIRItype>
> </iriset>
>
> wdrurl:includehosts is more complicated, as it specifies the suffix
> of the host group of the IRI, and not the whole group.
>
>   <wdrurl:includehosts>example.org example.net</wdrurl:includehosts>
>
> means:
>
> <iriset>
>   <includeIRItype>
>     <xsl:analyze-string select="." regex = "{'rre'}">
>       <xsl:matching-substring>
>         <xsl:value-of select="regex-group(4)"/>
>       </xsl:matching-substring>
>       <xsl:non-matching-substring>
>         0
>       </xsl:non-matching-substring>
>     </xsl:analyze-string>
>     <xs:simpleType>
>       <xsd:restriction base="string">
>         <xsd:pattern value="^|\.(example\.org)|(example\.net)$" />
>       </xsd:restriction>
>     </xs:simpleType>
>   </includeIRItype>
> </iriset>
>
> And so on for the various string parts.
>
> <wdrurl:includepattern>some_reg_exp</wdrurl:includepattern> can be
> used as a less verbose way of saying:
>
>   <includeIRItype>
>     <xsl:analyze-string select="." regex = "{'some_reg_exp'}">
>       <xsl:matching-substring>yes</xsl:matching-substring>
>       <xsl:non-matching-substring>no</xsl:non-matching-substring>
>     </xsl:analyze-string>
>     <xs:simpleType>
>       <xsd:restriction base="string">
>         <enumeration value="yes"/>
>       </xsd:restriction>
>     </xs:simpleType>
>   </includeIRItype>
>
> It might sometimes be easier to concetrate on parts of an IRI and
> specify constraints as a series of restrictions, all of which must match.
> For instance, the IRISet:
>
> <iriset>
>   <includehosts>example.org</includehosts>
>   <includepattern>
>     ^[^?]+\?(.*&)?s=football[&$]
>   </includepattern>
>   <includepattern>
>     ^[^?]+\?(.*&)?c=gr[&$]
>   </includepattern>
>   <includepattern>
>     ^[^?]+\?(.*&)?l=first[&$]
>   </includepattern>
> </iriset>
>
> is a way of requesting three query conjuncts in any order, and is much
> shorter and clearer than having to list all possible permutations.
>
>
> Port ranges are handled slightly differently, are they impose
> numerical restrictions, so that:
>
> <includeports>80 8080-8100</includeports>
>
> translates to (noting that absence of a port in the IRI defaults
> to port 80):
>
>   <includeIRItype>
>     <xsl:analyze-string select="." regex = "{'rre'}">
>       <xsl:matching-substring>
>         <xsl:value-of select="regex-group(6)"/>
>       </xsl:matching-substring>
>       <xsl:non-matching-substring>
>         80
>       </xsl:non-matching-substring>
>     </xsl:analyze-string>
>     <xsd:simpleType>
>       <xsd:union>
>         <xsd:simpleType>
>           <xsd:restriction base="integer">
>             <xsd:enumeration value="80"/>
>           </xsd:restriction>
>         </xsd:simpleType>
>         <xsd:simpleType>
>           <xsd:restriction base="integer">
>             <xsd:minInclusive value="8080" />
>             <xsd:maxInclusive value="8100" />
>           </xsd:restriction>
>         </xsd:simpleType>
>       </xsd:union>
>     </xsd:simpleType>
>   </includeIRItype>
>
>
> CIDR ranges are even trickier, as they require some more
> sophisticated calculations.
>
> <includeCIDRranges>aaa.bbb.ccc.ddd/rr</includeCIDRranges>
>
> means:
>
>   <includeIRItype>
>     <xsl:analyze-string select="." regex =
"{'([0-9]{1-3})\.([0-9]{1-3})\.([0-9]{1-3})\.([0-9]{1-3})'}">
>       <xsl:matching-substring>
>         <xsl:value-of select="regex-group(1) * 255 * 255 * 255 +
regex-group(2) * 255 * 255 + regex-group(3) * 255 +
regex-group(4)"/>
>       </xsl:matching-substring>
>       <xsl:non-matching-substring>
>         -1
>       </xsl:non-matching-substring>
>     </xsl:analyze-string>
>     <xsd:simpleType>
>       <xsd:restriction base="integer">
>         <xsd:minInclusive value="minV" />
>         <xsd:maxInclusive value="maxV" />
>       </xsd:restriction>
>     </xsd:simpleType>
>   </includeIRItype>
>
> where minV and maxV are replaced by appropriate numerical values at
> the time of the wdrurl -> wdr transform as follows:
> (UNTESTED, but you get the general gist: convert the 4-tuple of bytes
> to a single integer, so one can do comparisons.)
>
> <xsl:template match="includeCIDRranges">
>   <includeIRItype>
>     <axsl:analyze-string select="." regex =
"{'([0-9]{1-3})\.([0-9]{1-3})\.([0-9]{1-3})\.([0-9]{1-3})'}">
>       <axsl:matching-substring>
>         <axsl:value-of select="regex-group(1) * 255 * 255 * 255 +
regex-group(2) * 255 * 255 + regex-group(3) * 255 +
regex-group(4)"/>
>       </axsl:matching-substring>
>       <axsl:non-matching-substring>
>         -1
>       </axsl:non-matching-substring>
>     </axsl:analyze-string>
>     <xsd:simpleType>
>       <xsd:restriction base="integer">
>         <xsl:analyze-string select="." regex =
"{'([0-9]{1-3})\.([0-9]{1-3})\.([0-9]{1-3})\.([0-9]{1-3})(/([0-9]{1-2}))?'}">
>           <xsl:matching-substring>
>             <xsl:call-template name="minIP">
>               <xsl:with-param name="ip" <xsl:value-of
select="regex-group(1) * 255 * 255 * 255 + regex-group(2)
* 255 * 255 + regex-group(3) * 255 + regex-group(4)"/>
>               <xsl:with-param name="rr" select="regex-group(6)"/>
>               <xsl:with-param name="acc" "0"/>
>             </xsl:call-template>
>             <xsl:call-template name="maxIP">
>               <xsl:with-param name="ip" <xsl:value-of
select="regex-group(1) * 255 * 255 * 255 + regex-group(2)
* 255 * 255 + regex-group(3) * 255 + regex-group(4)"/>
>               <xsl:with-param name="rr" select="regex-group(6)"/>
>               <xsl:with-param name="acc" "0"/>
>             </xsl:call-template>
>           </xsl:matching-substring>
>           <xsl:non-matching-substring>
>             <xsl:call-template name="minIP">
>               <xsl:with-param name="ip" <xsl:value-of
select="regex-group(1) * 255 * 255 * 255 + regex-group(2)
* 255 * 255 + regex-group(3) * 255 + regex-group(4)"/>
>               <xsl:with-param name="rr" "32"/>
>               <xsl:with-param name="acc" "0"/>
>             </xsl:call-template>
>             <xsl:call-template name="maxIP">
>               <xsl:with-param name="ip" <xsl:value-of
select="regex-group(1) * 255 * 255 * 255 + regex-group(2)
* 255 * 255 + regex-group(3) * 255 + regex-group(4)"/>
>               <xsl:with-param name="rr" "32"/>
>               <xsl:with-param name="acc" "0"/>
>             </xsl:call-template>
>           </xsl:non-matching-substring>
>       </xsd:restriction>
>     </xsd:simpleType>
>   </includeIRItype>
> </xsl:template>
>
> <xsl:template name="minIP">
>   <xsl:param name="ip"/>
>   <xsl:param name="rr"/>
>
>   <xsl:variable name="acc" as="xs:integer" select="{$ip}">
>     <xsl:for-each select="1 to {$rr}">
>       <xsl:value-of select=". idiv 2"/>
>     </xsl:for-each>
>   </xsl:variable>
>   <xsl:variable name="min" as="xs:integer" select="{$acc}">
>     <xsl:for-each select="1 to {$rr}">
>       <xsl:value-of select=". * 2"/>
>     </xsl:for-each>
>   </xsl:variable>
>   <xsd:minInclusive value="{$min}" />
> <xsl:template name="minIP">
>
> <xsl:template name="maxIP">
>   <xsl:param name="ip"/>
>   <xsl:param name="rr"/>
>
>   <xsl:variable name="acc" as="xs:integer" select="{$ip}">
>     <xsl:for-each select="1 to {$rr}">
>       <xsl:value-of select="(. idiv 2) + 1"/>
>     </xsl:for-each>
>   </xsl:variable>
>   <xsl:variable name="max" as="xs:integer" select="{$acc}">
>     <xsl:for-each select="1 to {$rr}">
>       <xsl:value-of select=". * 2"/>
>     </xsl:for-each>
>   </xsl:variable>
>   <xsd:maxInclusive value="{$max}" />
> <xsl:template name="maxIP">
>
>
>
>
> Multiple Layers of Extensions
> =============================
>
> It might sometimes be useful to also build upon already defined
> extensions. For example, some content providers serve dynamic content
> stored in a database, so that IRIs express queries to the database.
> This kind of IRIs have certain structure, but this structure is
> neither obvious nor easily human-interpreted. Furthemore, conventional
> grouping mechanisms cannot be used to group resources, as the site
> structure does not match any directory hierarchy.
>
> As an example, consider sport.example.com, a sports news site,
> where IRIs look like the one shown in Example 3-2-1. The adopted
> scheme is systematic so that sport=2&countryID=16 provides a front
> page with news about Greek basketball and links to various Greek
> basketball leagues, sport=3&countryID=16 a front page about Greek
> volleyball, etc. Eg:
>   http://sport.example.com/matches.asp?sport=1&countryID=16&champID=2
>
> A POWDER document providing metadata about this web site would have to
> use regular expression matching with explicit reference to the
> numerical values in the country and sport fields of the query. This
> process is error-prone, and requires extensive changes if the
> underlying database schema is modified or extended.
>
> As an alternative, the site developer may provide a POWDER vocabulary
> extension that abstracts away from the database schema to allow
> reference to sports and countries. POWDER document authors can then
> use the properties in this extension to create POWDER documents
> are valid even if the site schema is modified, as long as the site
> developer updates the relevant transformations.
>
> So a POWDER/XML document might look like this:
>
> <wdrsport:SportWDR
>    xmlns:wdrsport="http://www.sports.example.com/resolvable#"
>    xmlns:wdrurl="http://www.w3.org/2007/05/powder/resolvable#"
>    xmlns:wdr="http://www.w3.org/2007/05/powder#"
>    xmlns:voc="http://www.example.org/vocabulary.rdf#">
>
>   <wdr:dr>
>     <wdr:iriset>
>       <wdrurl:includeschemes>http</wdrurl:includeschemes>
>       <wdrurl:includehosts>sport.example.com</wdrurl:includehosts>
>       <countries>Greece</countries>
>       <sports>Football Basketball</sports>
>     </wdr:iriset>
>     <wdr:descriptorset>
>       <voc:shape>round</voc:shape>
>     </wdr:descriptorset>
>   </wdr:dr>
> </wdrsport:SportWDR>
>
> A POWDER/XML tool specifically built for sport.example.com or other sites
> following the same query patterns will immediately know how to handle
> this information. Other POWDER tools will apply the GRDDL transform
> associated with the wdrsport: namespace to get the following translation:
>
> <wdrurl:POWDER
>    xmlns:wdrurl="http://www.w3.org/2007/05/powder/resolvable#"
>    xmlns:wdr="http://www.w3.org/2007/05/powder#"
>    xmlns:voc="http://www.example.org/vocabulary.rdf#">
>
>   <wdr:dr>
>     <wdr:iriset>
>       <includeschemes>http</includeschemes>
>       <includehosts>sport.example.com</includehosts>
>       <includequerycontains>countryID=16</includequerycontains>
>       <includequerycontains>countryID=16</includequerycontains>
>       <includequerycontains>sport=1 sport=2</includequerycontains>
>     </wdr:iriset>
>     <wdr:descriptorset>
>       <voc:shape>round</voc:shape>
>     </wdr:descriptorset>
>   </wdr:dr>
>
> </wdrurl:POWDER>
>
> A web-oriented POWDER/XML tool will immediately know what to do with
> wdrurl: vocabulary items. Other POWDER tools will apply the GRDDL transform
> associated with the wdrurl: namespace to get the vanilla POWDER
translation.
> Finally, an even more generic RDF/OWL tool will apply the transform
> associated with the wdr: namespace to get the even more verbose
> RDF/OWL translation, as described above.
>
>
> Non-URL Identifiers
> ===================
>
> Although POWDER is mostly involved with resources that are identified
> by URLs, there is a number of other use cases; for example one might
> use POWDER to provide meta-data about physical, off-line resources
> like books or DVDs.
>
> The International Standard Audiovisual Number [ISAN1] is a voluntary
> numbering system for the identification of audiovisual works.
> Following ISO 15706, the numbers are written as 24 bit hexadecimal
> digits in the following format [ISAN2].
>
> 	-----root----- 		episode 		-version-
> ISAN 	1881-66C7-3420 	- 	0000 	-7- 	9F3A-0245 	-U
>
> The root of an ISAN number is assigned to a core work with the other
> numbers being used for things like episodes, different language
> versions, promotional trailers and so on.
>
> Since ISAN numbers are URNs [URN], and hence IRIs of the urn: scheme
> [URIS], a vocabulary can readily be defined to allow IRI Sets to be
> defined based on ISAN numbers. The terms might be along the lines of:
>
> includeroots &#8212; the value of which would be a white space separated of
> hexadecimal digits and hyphens that would be matched against the first
> three blocks in the ISAN number.
>
> includeepisodes &#8212; a white space separated list of hexadecimal digits
> and hyphens that would be matched against the 4th block of 4 digits in
> the ISAN number.
>
> includeversions &#8212; a white space separated list of hexadecimal digits
> and hyphens that would be matched against the 5th and 6th blocks of 4
> digits in the ISAN number.
>
> The set of all audio visual resources that relate to two particular
> works might then be so defined:
>
> Custom ISAN pattern:
>
> <wdr:iriset>
>   <isan:includeroots>1881-66C7-3420 1881-66C7-3421</isan:includeroots>
> </wdr:iriset>
>
> Corresponding vanilla POWDER/XML:
>
>
> <iriset>
>   <includeIRItype>
>     <xsl:analyze-string select="."
>                         regex =
"{'^urn:isan:([0-9A-F]{4})-([0-9A-F]{4})-([0-9A-F]{4})-([0-9A-F]{4})-[0-9A-F]-([0-9A-F]{4})-([0-9A-F]{4})-[0-9A-F]'}">
>       <xsl:matching-substring>
>         <xsl:value-of select="regex-group(1)"/> <xsl:value-of
select="regex-group(2)"/> <xsl:value-of
select="regex-group(3)"/>
>       </xsl:matching-substring>
>       <xsl:non-matching-substring>GGGG-GGGG-GGGG</xsl:non-matching-substring>
>     </xsl:analyze-string>
>     <xsd:simpleType>
>       <xsd:union>
>         <xsd:simpleType>
>           <xsd:restriction base="string">
>             <enumeration value="1881-66C7-3420"/>
>           </xsd:restriction>
>         </xsd:simpleType>
>         <xsd:simpleType>
>           <xsd:restriction base="string">
>             <enumeration value="1881-66C7-3421"/>
>           </xsd:restriction>
>         </xsd:simpleType>
>       </xsd:union>
>     </xsd:simpleType>
>   </includeIRItype>
> </iriset>
>
> This example demonstrates the extendability power offered by using
> XSLT2 transformations: numerical constraints (like, here, defining
> numerical ranges for, say, the 3rd block) can easily be defined
> using wdr: primitives.
>
>
>
>
> REFERENCES
> ==========
>
> [1]
http://www.w3.org/TR/2004/REC-xmlschema-2-20041028/datatypes.html#rf-pattern
> [2]
http://www.w3.org/TR/2004/REC-xmlschema-1-20041028/structures.html#Complex_Type_Definitions
> [3] http://www.w3.org/TR/owl-semantics/syntax.html#2.1
> [4] http://www.w3.org/TR/owl-semantics/mapping.html
> [GRDDL] http://www.w3.org/TR/grddl/
> [XSLT2] http://www.w3.org/TR/xslt20/
> [Rabin] J. Rabin, URI Pattern Matching for Groups of Resources.
>         Draft 0.1 17 June 2006.
http://www.w3.org/2005/Incubator/wcl/matching.html
> [URN] http://www.iana.org/assignments/urn-namespaces
> [ISAN1] http://www.isan.org/
> [ISAN2]
http://www.isan.org/portal/page?_pageid=166,41960&_dad=portal&_schema=PORTAL
>

Received on Thursday, 24 April 2008 09:52:29 UTC