Formalisation of OWL semantics (was Re: String Matching -> Reg Ex is not always easy)

Hi Stasinos,

Once again, thank you for taking the time and trouble to prepare such a 
detailed contribution.

I'll make a few comments here rather than inline.

In your first example you've put tags and controlled vocabulary terms in 
the same element. I'd much rather see these kept separate as currently 
set out [1]. Also, I think having <tag /> and <taglist /> elements is 
going to be unnecessarily confusing, no? The DR doc currently has

<tagset>
   <tag>...</tag>
   <tag>...</tag>
   ...
</tagset>

and this is meant to be consistent with <descriptorset /> and <iriset 
/>. Keeping tagset separate allows the easy inclusion of an external 
reference such as

<tagset ref="http://encyclopaedia.example.com/gherkin.html
              http://photo.example.com/gherkin.jpg">
   <tag>London</tag>
   <tag>Swiss Re</tag>
   <tag>gherkin</tag>
</tagset>

which transforms into an OWL Class that includes a couple of 
rdfs:seeAlso annotations with a value of rdf:resource that links to the 
two URIs. Such an association through rdfs:seeAlso may not apply to 
controlled vocabulary properties, hence my desire to keep tag sets and 
descriptor sets separate.

Make sense to you??

Your formulation concerning multiple <iriset /> elements make a lot of 
sense. I'd like to propose that as a solution to the WG.

Where it gets beyond my ken is your two alternatives for the IRI set 
semantics. I'll see if Jeremy is able to cast an eye over this and 
compare it with the semantic extensions he has so far proposed. One 
thing though, I don't think we can propose new terms within other 
people's vocabularies/ontologies, specifically, I'd be unhappy to see us 
being dependent on new terms being added to OWL. Is there anything in 
OWL 1.1 that might cover what you suggest? They had a f2f recently and 
have a bunch of documents moving to FPWD [2-4]. If not, I see that as a 
red light to that approach (we're no more than 4 weeks from LC remember!).

As you found, and as Andrea found when he looked at it, defining 
datatype for CIDR blocks ain't easy. I'm a little worried about the "if 
there's no /x at the end then it's a single URI, i.e. assume x-32. But 
Andrea seems happy with this and if everyone else is, I won't mind. Can 
we not just say wdrd:cidr is a space separated list of CIDR blocks and 
have done with it? Probably not, I know, but I'm trying to make sure we 
don't take on stuff we can leave alone.

No, the WAF group hasn't defined a Reg Exp for their pattern, we cando 
that, as long as it's 100% consistent with their EBNF.

The extension mechanism you propose looks very neat but I'm not sure it 
would work exactly as you have it. If you have the root element in the 
POWDER namespace then the GRDDL transform is automatically associated 
with the document. I think you'd need to have the root element in a 
different namespace and then refer to an XSLT to that generated POWDER 
that could then be turned into POWDER-S if required, but I don't think 
you can add another XSLT on top of the GRDDL transform associated with 
the POWDER namespace ad hoc.

So I'd be inclined to write something like

<Sport xmlns="&sports;" xmlns:wdr="&wdr;">
   <wdr:attribution>
   ...
   </wdr:attribution>

   <wdr:iriset>
     <wdrurl:includeschemes>http</wdrurl:includeschemes>
     <wdrurl:includehosts>sport.example.com</wdrurl:includehosts>
     <sport:countries>Greece</sport:countries>
     <sport:sports>Football Basketball</sport:sports>
   </wdr:iriset>

   ...
</Sport>

As for the ISAN example and the lack of support for value extraction in 
XML regular expressions, we said way back that we'd use XML patterns _as 
modified by_ XQuery 1.0 and XPath 2.0 Functions and Operators [5] but I 
don't think even this allows for what we'd need here. But I don't think 
it matters... it's up to the extension author to define the regular 
expression, and in the case of ISAN, it would just be a case of matching 
a portion of the number - I wouldn't worry too much about this.

I agree that we have to keep resource set definitions out of the picture 
but we _do_ have a specific use case where the kind of thing you suggest 
does occur. When we use a DR to certify another DR, it's useful to be 
able to include a hash of the DR we're certifying. See [6]

<descriptorset>
   <sha1sum>j6lwx3rvEPO0vKtMup4NbeVu8nk=</sha1sum>
   <certified>true</certified>
   <displaytext>authority.example.org certifies that claims made
      by example.com are true. Valid throughout 2008.</displaytext>
   <displayicon>http://authority.example.org/icon.png</displayicon>
</descriptorset>

I just wrote this into the doc, we haven't discussed it but I'd very 
much like to. It says that the description of the IRI set (which in the 
full example is a single URI) has a SHA-1 hash value. There is no formal 
semantics at work here, just a "we say that if you take a SHA-1 hash of 
the resource, it's this value."

If there's a way to formalise this it might be useful. Also, we noted 
_way_ back that it's possible to create circular arguments with resource 
set by properties - all square resources are round and all that. We 
resolved that we'd just warn people to look for this.

I've written this e-mail throughout the latter half of today in between 
numerous family-related interruptions so I hope it still makes sense.

Thanks again Stasinos.

Phil.

[1] http://www.w3.org/TR/2008/WD-powder-dr-20080317/#tags
[2] http://www.w3.org/2007/OWL/draft/ED-owl2-xml-serialization-20080408/
[3] http://www.w3.org/2007/OWL/draft/ED-owl2-profiles-20080408/
[4] http://www.w3.org/2007/OWL/draft/ED-owl2-primer-20080408/
[5] http://www.w3.org/TR/xpath-functions/#regex-syntax
[6] http://www.w3.org/TR/2008/WD-powder-dr-20080317/#certification

Stasinos Konstantopoulos wrote:
<snip>

> Intro
> =====
> 
> POWDER/XML documents receive formal semantics through a GRDDL
> transform, associated with the POWDER namespace, that allows the XML
> data to be rendered and processed as OWL/RDF. Or, rather, POWDER-S, a
> fragment of OWL/RDF extended in a way that allows to referring to and
> operating upon the string representation of a resource.
> 
> The POWDER/XML format specifies a number of elements denoting
> attribution, validity time, and other issues relating to the level of
> trust assigned to a POWDER document. These fall though the transform
> and are not meant to be interpreted in OWL/RDF; they are only
> meaningful when used by POWDER tools that use them as input to an
> extra-logical procedure which MAY use this data to decide whether the
> POWDER document _as a whole_ should be taken into account or
> discarded. We shall not deal with these elements any further, and
> proceed under the assumption that our document has passed all relevant
> tests.
> 
> Unqualified names should be assumed to be in the wdr: namespace.
> 
> 
> DR Semantics
> ============
> 
> POWDER documents are used to describe sets of resources using
> description vocabularies defined in RDF or plain string literals (tags).
> POWDER/XML documents have <dr/> elements, each assigning all and every
> member of a set of descriptors to a set of resources.
> 
> As an example, consider:
> 
> <dr>
>  <iriset>...</iriset>
>  <descriptorset>
>    <voc:colour ref="http://rgb.org/colours.rdf#red"/>
>    <voc:shape>square</voc:shape>
>    <tag>red</tag>
>    <tag>light red</tag>
>    <taglist>light red</taglist>
>  </descriptorset>
> </dr>
> 
> where <iriset/> specifies a set or resources in a way that will be
> dealt with later, and voc: is an arbitrary RDF vocabulary.
> 
> The <voc:colour/> element specifies that the <voc:colour/> relation
> holds between all resources in specified by <iriset/> and the
> http://rgb.org/colours.rdf#red resource.
> 
> The content of <voc:shape/> is interpreted as a string literal. The 
> <voc:shape/> element specifies that all resources in <iriset/>
> has the value "square" for the <voc:shape/> dataproperty.
> 
> <tag/> is a string property defined by POWDER. Its content is a
> single string literal, possibly including spaces.
> <taglists/> is a string property defined by POWDER. Its content is a
> space-separated list of string literals.
> 
> The overall description of the resources in <iriset/> is the union of
> the descriptions in the <descriptorset/>. In our example:
>  a voc:colour relation to http://rgb.org/colours.rdf#red
> AND
>  a voc:shape "square"
> AND
>  the tags "red", "light", and "light red"
> 
> We formally interpret the above as follows: there is an OWL class
> containing all resources that share all of these properties, and there
> is an OWL class of all resources denoted by <iriset/>, and the latter
> is a subset of the former. In OWL/RDF we say:
> 
> <RDF>
> 
>   <owl:Class rdf:ID="resourceset_1">
>     all resources specified by <iriset>...</iriset>
>   </owl:Class>
> 
>   <owl:Class rdf:ID="description_1">
>      <owl:intersectionOf rdf:parseType="Collection">
>        <owl:Restriction>
>          <owl:onProperty rdf:resource="voc:color"/>
>          <owl:hasValue rdf:resource="http://rgb.org/colours.rdf#red"/>
>        </owl:Restriction>
>        <owl:Restriction>
>          <owl:onProperty rdf:resource="voc:shape"/>
>          <owl:hasValue>square</owl:hasValue>
>        </owl:Restriction>
>        <owl:Restriction>
>          <owl:onProperty rdf:resource="wdr:tag"/>
>          <owl:hasValue>red</owl:hasValue>
>        </owl:Restriction>
>        <owl:Restriction>
>          <owl:onProperty rdf:resource="wdr:tag"/>
>          <owl:hasValue>light</owl:hasValue>
>        </owl:Restriction>
>        <owl:Restriction>
>          <owl:onProperty rdf:resource="wdr:tag"/>
>          <owl:hasValue>red light</owl:hasValue>
>        </owl:Restriction>
>      </owl:intersectionOf>
>   </owl:Class>
>   
>   <owl:Class rdf:about="#resourceset_1">
>     <rdfs:subClassOf rdf:ID="description_1"/>
>   </owl:Class>
> 
> </RDF>
> 
> It is possible to have more than one <iriset/> elements, in which case
> a resource receives all of the the descriptions by belonging to any
> one of the corresponding resource sets. For example:
> 
> <dr>
>  <iriset>.1.</iriset>
>  <iriset>.2.</iriset>
>  <descriptorset>
>    <voc:colour ref="http://rgb.org/colours.rdf#red"/>
>    <taglist>light red</taglist>
>  </descriptorset>
> </dr>
> 
> receives the following semantics:
>  
> <RDF>
> 
>   <owl:Class rdf:ID="resourceset_1">
>     all resources specified by <iriset>.1.</iriset>
>   </owl:Class>
> 
>   <owl:Class rdf:ID="resourceset_2">
>     all resources specified by <iriset>.2.</iriset>
>   </owl:Class>
> 
>   <owl:Class rdf:ID="description_1">
>      <owl:intersectionOf rdf:parseType="Collection">
>        <owl:Restriction>
>          <owl:onProperty rdf:resource="voc:color"/>
>          <owl:hasValue rdf:resource="http://rgb.org/colours.rdf#red"/>
>        </owl:Restriction>
>        <owl:Restriction>
>          <owl:onProperty rdf:resource="wdr:tag"/>
>          <owl:hasValue>red</owl:hasValue>
>        </owl:Restriction>
>        <owl:Restriction>
>          <owl:onProperty rdf:resource="wdr:tag"/>
>          <owl:hasValue>light</owl:hasValue>
>        </owl:Restriction>
>      </owl:intersectionOf>
>   </owl:Class>
>   
>   <owl:Class>
>     <owl:unionOf rdf:parseType="Collection">
>       <owl:Class rdf:about="#resourceset_1"/>
>       <owl:Class rdf:about="#resourceset_2"/>
>     </owl:unionOf>
>     <rdfs:subClassOf rdf:ID="description_2"/>
>   </owl:Class>
> 
> </RDF>
> 
> A POWDER/XML implementio is free to choose any traversal policy for
> treating miltiple </iriset> elements in a DR: first match wins, last
> match wins, shortest irisets first, and so on, as long as all irisets
> are tried before deciding that DR does not apply to a resource.
> 
> The ordering of irisets is not important and a POWDER/XML
> implementation is free to try them in any order whatsoever (in order
> listed, shorter first, etc), as long as all irisets are tried before
> deciding that a resource is outside the scope of the DR.
> 
> DR authors may use the order of the irisets to suggest an efficient
> scope evaluation strategy, by putting the irisets with the widest
> coverage first, so that an implementation that chooses to follow the
> suggested evaluation order is more likely to terminate the evaluation
> after fewer checks.
> 
> 
> POWDER Semantics
> ================ 
> 
> A POWDER document may have any number of <dr> elements, all of which
> are simultaneously asserted and ordering is not important. So, for
> example:
> 
> <powder>
>   <dr>
>    <iriset>.1.</iriset>
>    <descriptorset>
>      <voc:shape>square</voc:shape>
>    </descriptorset>
>   </dr>
>   <dr>
>    <iriset>.2.</iriset>
>    <descriptorset>
>      <voc:colour ref="http://rgb.org/colours.rdf#red"/>
>    </descriptorset>
>   </dr>
> </powder>
> 
> receives the following semantics:
> 
> <RDF>
>   <owl:Class rdf:ID="resourceset_1">
>     all resources specified by <iriset>.1.</iriset>
>   </owl:Class>
> 
>   <owl:Class rdf:ID="resourceset_2">
>     all resources specified by <iriset>.2.</iriset>
>   </owl:Class>
> 
>   <owl:Class rdf:ID="description_1">
>      <owl:intersectionOf rdf:parseType="Collection">
>        <owl:Restriction>
>          <owl:onProperty rdf:resource="voc:shape"/>
>          <owl:hasValue>square</owl:hasValue>
>        </owl:Restriction>
>      </owl:intersectionOf>
>   </owl:Class>
>   
>   <owl:Class rdf:ID="description_2">
>      <owl:intersectionOf rdf:parseType="Collection">
>        <owl:Restriction>
>          <owl:onProperty rdf:resource="voc:color"/>
>          <owl:hasValue rdf:resource="http://rgb.org/colours.rdf#red"/>
>        </owl:Restriction>
>      </owl:intersectionOf>
>   </owl:Class>
> 
>   <owl:Class rdf:about="#resourceset_1">
>     <rdfs:subClassOf rdf:resource="#description_1"/>
>   </owl:Class>
> 
>   <owl:Class rdf:about="#resourceset_2">
>     <rdfs:subClassOf rdf:resource="#description_2"/>
>   </owl:Class>
> </RDF>
> 
> The <owl:intersectionOf/> of a singleton collection is the latter's
> single element anyway, so it is better to keep the
> <owl:intersectionOf/> element even though it is redundant, in order to
> keep the transform simple and not require the extra check.
> 
> Note that resourceset_1 and resourceset_2 are not necessarity
> disjoint, so that some resources may be both red AND square.
> 
> A POWDER document may have an <ol/> element with is an ordered list of
> <dr> elements, which receives a first-match semantics. <ol/> elements
> are meant to be used to express exceptions to more general rules. So,
> for example:
> 
> <powder>
>   <ol>
>     <dr>
>      <iriset>.1.</iriset>
>      <descriptorset>
>        <voc:shape>square</voc:shape>
>      </descriptorset>
>     </dr>
>     <dr>
>      <iriset>.2.</iriset>
>      <descriptorset>
>        <voc:shape>round</voc:shape>
>      </descriptorset>
>     </dr>
>     <dr>
>      <iriset>.3.</iriset>
>      <descriptorset>
>        <voc:shape>triangle</voc:shape>
>      </descriptorset>
>     </dr>
>   </ol>
> </powder>
> 
> receives the following formal semantics, where belonging to
> description_1 automatically precludes belonging to description_2 and
> description_3; and belonging to description_2 automatically precludes
> belonging to description_3:
> 
> <RDF>
>   <owl:Class rdf:ID="resourceset_1">
>     all resources specified by <iriset>.1.</iriset>
>   </owl:Class>
> 
>   <owl:Class rdf:ID="resourceset_2">
>     all resources specified by <iriset>.2.</iriset>
>   </owl:Class>
> 
>   <owl:Class rdf:ID="resourceset_3">
>     all resources specified by <iriset>.3.</iriset>
>   </owl:Class>
> 
>   <owl:Class rdf:ID="description_1">
>      <owl:intersectionOf rdf:parseType="Collection">
>        <owl:Restriction>
>          <owl:onProperty rdf:resource="voc:shape"/>
>          <owl:hasValue>square</owl:hasValue>
>        </owl:Restriction>
>      </owl:intersectionOf>
>   </owl:Class>
>   
>   <owl:Class rdf:ID="description_2">
>      <owl:intersectionOf rdf:parseType="Collection">
>        <owl:Restriction>
>          <owl:onProperty rdf:resource="voc:shape"/>
>          <owl:hasValue>round</owl:hasValue>
>        </owl:Restriction>
>      </owl:intersectionOf>
>   </owl:Class>
> 
>   <owl:Class rdf:ID="description_3">
>      <owl:intersectionOf rdf:parseType="Collection">
>        <owl:Restriction>
>          <owl:onProperty rdf:resource="voc:shape"/>
>          <owl:hasValue>triangle</owl:hasValue>
>        </owl:Restriction>
>      </owl:intersectionOf>
>   </owl:Class>
> 
>   <owl:Class rdf:about="#resourceset_1">
>     <rdfs:subClassOf rdf:resource="#description_1"/>
>   </owl:Class>
> 
>   <owl:Class>
>     <owl:intersectionOf rdf:parseType="Collection">
>       <owl:Class rdf:about="#resourceset_2"/>
>       <owl:complementOf>
>         <owl:Class rdf:about="#resourceset_1"/>
>       </owl:complementOf>
>     </owl:intersectionOf>
>     <rdfs:subClassOf rdf:ID="description_2"/>
>   </owl:Class>
> 
>   <owl:Class>
>     <owl:intersectionOf rdf:parseType="Collection">
>       <owl:Class rdf:about="#resourceset_3"/>
>       <owl:complementOf>
>         <owl:Class rdf:about="#resourceset_2"/>
>       </owl:complementOf>
>       <owl:complementOf>
>         <owl:Class rdf:about="#resourceset_1"/>
>       </owl:complementOf>
>     </owl:intersectionOf>
>     <rdfs:subClassOf rdf:ID="description_3"/>
>   </owl:Class>
> </RDF>
> 
> 
> IRISet Semantics
> ================
> 
> 
> The last missing bit of the transformation now is the one that builds
> the <owl:Class rdf:ID="resourceset_X"/> descriptions from <iriset/>
> elements.
> 
> <iriset/> elements subsume one or more elements, each
> representing a range of values for IRIs. An IRI is in the <iriset/> if
> it is covered by ALL of the elements in <iriset/>. The following six
> range specifications are supported:
> 
>  <includepattern/>,<excludepattern/>,
>  <includeports/>,<excludeports/>,
>  <includeCIDRranges/>,<excludeCIDRranges/>
> 
> Patterns are a single <xsd:pattern/> element, as defined in the XML
> Schema [1]. <includepattern/> can be applied to any IRI, regardless of
> whether it is resolvable or not. Ports are a space-speparated list of
> ports or port ranges. CIDR ranges are specified a space-speparated
> list of CIDR IP range specifications. Port and CIDR range elements can
> be applied to URLs (is there an IRL acronym?) only, and are
> meaningless for other kinds of IRIs.
> 
> For example:
> 
> <iriset>
>   <includepattern>
>     <xsd:pattern value="^http://[\w\.]+.example\.org(:(\d)+)?/" />
>   </includepattern>
>   <includeports>80 8080-8100</includeports>
>   <excludeports>8085 8090-8095</excludeports>
> </iriset>
> 
> specifies all resources on http://example.org and any subdomain
> thereof, fetched from ports 80, 8080-8084, 8086-8089, or 8096-8100.
> 
> It might sometimes be easier to concetrate on parts of an IRI and
> specify constraints as a series of regexps, all of which must match.
> Fon instance, the IRISet:
> 
> <iriset>
>   <includepattern>
>     <xsd:pattern value="^http://[\w\.]+.example\.org(:(\d)+)?/" />
>   </includepattern>
>   <includepattern>
>     <xsd:pattern value="^[^?]+\?(.*&)?s=football[&$]" />
>   </includepattern>
>   <includepattern>
>     <xsd:pattern value="^[^?]+\?(.*&)?c=gr[&$]" />
>   </includepattern>
>   <includepattern>
>     <xsd:pattern value="^[^?]+\?(.*&)?l=first[&$]" />
>   </includepattern>
> </iriset>
> 
> is a way of requesting three query conjuncts in any order, and is much
> shorter and clearer than having to list all possible permutations.
> 
> The <iriset/> mechanism allows a DR to express any grouping of
> resources whatsoever, no matter how complex:
> 
> (A) each include* and exclude* element expresses an atomic
>     proposition. For all X, if includeX exists, excludeX also exists
>     and vice versa; furthermore includeX and excludeX are mutually
>     exclusive. Hence, one can negate all atomic propositions, although
>     not complex propositions.
> 
> (B) An <iriset/> may contain multiple include* and exclude* tags, and
>     all must hold for the iriset to hold. Hence one can express the
>     conjunction of any set of atomic propositions and negations of
>     atomic propositions.
> 
> (C) A DR may contain multiple <iriset/> elements, and if any of them
>     holds, then the DR holds. Hence one can express the disjunction of
>     conjunctions of sets of atomic propositions and negations of
>     atomic propositions.
> 
> The three expressions above allow the expression of Disjunctive Normal
> Form proposition. Since arbitrarily complex propositions can be
> brought into DNF, the three expressions above allow the expression of
> any proposition.
> 
> 
> ALTERNATIVE 1
> 
> Providing OWL/RDF semantics for <iriset/> elements is not directly
> possible, since RDF does not provide any means for accessing or
> manipulating the string representation of an IRI. We extend OWL/RDF
> with a built-in hasIRI data property as follows:
> 
> hasIRI rdf:type owl:DatatypeProperty .
> hasIRI rdf:type owl:Property .
> hasIRI rdfs:domain owl:Thing .
> hasIRI rdfs:range xsd:string .
> 
> and the further stipulation that
>  R owl:hasIRI s .
> iff the string representation of resource R is s.
> 
> It is now possible to provide semantics to <iriset/> by deriving the
> XML datatype that only includes the strings specified by
> pattern p [1]. So now:
> 
> <includepattern>
>   <xsd:pattern value="p1"/>
> </includepattern>
> 
> <excludepattern>
>   <xsd:pattern value="p2"/>
> </excludepattern>
> 
> specify these classes of resources:
> 
> <xsd:simpleType name="iritype_1">
>   <xsd:restriction base="string">
>     <xsd:pattern value="p1" />
>   </xsd:restriction>
> </xsd:simpleType>
> 
> <xsd:simpleType name="iritype_2">
>   <xsd:restriction base="string">
>     <xsd:pattern value="p2" />
>   </xsd:restriction>
> </xsd:simpleType>
> 
> <owl:Class>
>   <owl:Restriction>
>     <owl:onProperty rdf:resource="owl:hasIRI"/>
>     <owl:hasValue rdf:datatype="&xsd;iritype_1" />
>   </owl:Restriction>
> </owl:Class>
> 
> <owl:Class>
>   <owl:ComplementOf>
>     <owl:Restriction>
>       <owl:onProperty rdf:resource="owl:hasIRI"/>
>       <owl:hasValue rdf:datatype="&xsd;iritype_2" />
>     </owl:Restriction>
>   <owl:ComplementOf>
> </owl:Class>
> 
> which means: "Here are the definitions of xsd:iritype_1, xsd:iritype_2,
> sub-types of xsd:string. I don't know the exact value to put in
> hasValue, but is must be of type xsd:iritype_1, xsd:iritype_2."
> 
> Port ranges are treated similarly, by defining the relevant hasPort
> property, ranging over appropriate XML type. The xsd:pattern
> restriction is not useful here, but xsd:integer supports
> xsd:maxInclusive, xsd:minInclusive numerical restrictions. So:
> 
> <includeports>80 8080-8100</includeports>
> 
> means:
> 
> <xs:simpleType name="iritype_3">
>   <xsd:restriction base="integer">
>     <xsd:minInclusive value="80" />
>     <xsd:maxInclusive value="80" />
>   </xsd:restriction>
> </xs:simpleType>
> 
> <xs:simpleType name="iritype_4">
>   <xsd:restriction base="integer">
>     <xsd:minInclusive value="8080" />
>     <xsd:maxInclusive value="8100" />
>   </xsd:restriction>
> </xs:simpleType>
> 
> <owl:Class>
>   <owl:unionOf>
>     <owl:Restriction>
>       <owl:onProperty rdf:resource="owl:hasPort"/>
>       <owl:hasValue rdf:datatype="&xsd;iritype_3" />
>     </owl:Restriction>
>     <owl:Restriction>
>       <owl:onProperty rdf:resource="owl:hasPort"/>
>       <owl:hasValue rdf:datatype="&xsd;iritype_4" />
>     </owl:Restriction>
>   <owl:unionOf>
> </owl:Class>
> 
> CIDR ranges are trickier, as they require bit-wise calculations.
> Assume a hasIP property, as before, ranging over a complex
> XML type [2] of 4 bytes.
> 
> <includeCIDRranges>x.y.z.w/r</includeCIDRranges>
> 
> <xs:complexType name="iritype_5">
>   <xs:sequence>
>     <xs:element>
>       <xsd:enumeration base="byte">x</xsd:enumeration>
>     </xs:element>
>     <xs:element>
>       <xsd:enumeration base="byte">y</xsd:enumeration>
>     </xs:element>
>     <xs:element>
>       <xsd:enumeration base="byte">z</xsd:enumeration>
>     </xs:element>
>       <xsd:restriction base="byte">
>         HARD, TO BE WORKED OUT.
>         OTHERWISE JUST ENUMERATE (OUCH!).
>       </xsd:restriction>
>     </xs:element>
>   </xs:sequence>
> </xs:complexType>
> 
> <owl:Class>
>   <owl:Restriction>
>     <owl:onProperty rdf:resource="owl:hasIP"/>
>     <owl:hasValue rdf:datatype="&xsd;iritype_5" />
>   </owl:Restriction>
> </owl:Class>
> 
> If no /r is given, the class D segment of the IP is simply given as
> a sigleton enumeration, just like for classes A, B, and C.
> 
> OWL needs to be extended to allow user-defined types, which it
> currently does not, [3].
> 
> 
> ALTERNATIVE 2
> 
> Providing OWL/RDF semantics for <iriset/> elements is not directly
> possible, since RDF does not provide any means for accessing or
> manipulating the string representation of an IRI. We extend OWL/RDF
> with a hasIRIFrom restriction as follows:
> 
> We assert the existence of the class of the various IRI classes:
>   rdf:IRIClass rdf:type rdfs:Datatype .
> 
> We assert the existence of a new class of restriction nodes:
>   owl:hasIRIFrom rdf:type rdfs:Class .
> 
> The members of this class are OWL restrictions, with
> the following abstract OWL syntax:
>   restriction(ID, hasIRIFrom(xs:iritype))
> where ID is a node ID and xs:iritype is the ID of a a user-defined
> type, as above.
> 
> If T() is the mapping from node IDs to nodes,
> the semantics of such a restriction is that the datatype is also and
> rdfs:Class, with the constraint that resources in this class have a
> IRI the string representation of which is in the scope of xs:iritype.
> It is then straightforward to provide the semantics of the restriction:
> 
>   T(xs:iritype) rdf:type rdfs:Datatype .
>   T(xs:iritype) rdfs:subClassOf rdf:IRIClass .
>   T(xs:iritype) rdf:type rdfs:Class .
>   _:x rdf:type owl:Restriction .
>   _:x rdf:type owl:Class .
>   _:x rdf:type T(xs:iritype) .
> 
> We can now say:
> 
> <owl:Class>
>   <owl:Restriction>
>     <owl:hasIRIFrom>
>       <xsd:simpleType>
>         <xsd:restriction base="string">
>           <xsd:pattern value="p" />
>         </xsd:restriction>
>       </xsd:simpleType>
>     </owl:hasIRIFrom>
>   </owl:Restriction>
> </owl:Class>
> 
> to mean "the class of all things that have an IRI that has a string
> representation that matches "p".
> 
> In Description Logic terms, we have allowed defining concepts based on
> restrictions on the form of the string representations of abstract
> instances, but the restricted the usage of such concepts in universal
> quantification constructs.
> 
> 
> COMPARISON
> 
> I will have to look into this more closely, but my first impression is
> that ALT 2 provides the necessary expressivity to enable resource
> grouping, but restricts the extension so that it does not allow any
> other kind of reference to IRI strings. The logic remains agnostic as
> to the internal reresentation of resources, except for their appearing
> as members of various IRI Classes for no (logically) apparent reason.
> 
> ALT 1, on the other hand, creates a hasIRI property which it then
> exposes to the concrete domain of the logic, permitting the full
> expressivity of the logic to operate on it.
> 
> 
> IRISet Extensions
> =================
> 
> In Sect "IRISet Semantics" above, a vocabulary of 6 tags was specified for
> defining sets of resources through their IRIs. Except for the
> numerical port and IP restrictions over URLs, the only operation
> supported over generic IRIs is regular expession matching.
> 
> Creators of POWDER documents may extend the vocabulary used in
> specifying IRI Sets, by defining new <iriset/> elements. All such
> extentions to the POWDER vocabulary MUST be defined by means of GRDDL
> transformations [GRDDL] to terms of the basic POWDER vocabulary in the
> wdr: namespace.
> 
> Extensions do not need to, but are well advised to, define pairs
> of complementary vocabulary items (includeX and excludeX) for the
> reasons explained above.
> 
> Developers of POWDER tools MAY directly implement extensions they know
> about, but MUST include a mechanism for retrieving and applying the
> GRDDL transformations to extensions they do not know about.
> 
> 
> The URLSet Extension
> ====================
> 
> POWDER's basic use cases involve information resources available on
> the Web, identified by URLs containing host names, directory paths, IP
> addresses, port numbers, and so on. POWDER-WG provides the URLSet
> extension to IRISet, by defining the following vocabulary items under
> the wdrurl namespace:
> 
> <wdrurl:includeschemes/>        <wdrurl:excludeschemes/>
> <wdrurl:includehosts/>          <wdrurl:excludehosts/>
> <wdrurl:includeexactpaths/>     <wdrurl:excludeexactpaths/>
> <wdrurl:includepathcontains/>   <wdrurl:excludepathcontains/>
> <wdrurl:includepathstartswith/> <wdrurl:excludepathstartsWith/>
> <wdrurl:includepathendswith/>   <wdrurl:excludepathendsWith/>
> <wdrurl:includequerycontains/>  <wdrurl:excludequerycontains/>
> <wdrurl:includeexactqueries/>   <wdrurl:excludeexactqueries/>
> 
> pathcontains and querycontains may appear any number of times within
> an IRI set definition, but the rest may appear up to once.
> 
> These receive semantics in terms of the POWDER IRISet vocabulary as
> follows:
> 
> <wdrurl:includeschemes>sch1 sch2</wdrurl:includeschemes>
> 
> means:
> 
> <includepattern>
>     <xsd:pattern value="^(sch1)|(sch2)://" />
> </includepattern>
> 
> And
> 
> <wdrurl:includehosts>host1 host2</wdrurl:includehosts>
> 
> means:
> 
> <includepattern>
>   <xsd:pattern value="^[^:]://([\w\.]+\.)?(host1)|(host2)[:\?/]" />
> </includepattern>
> 
> And so on. So that the URL Set:
> 
> <iriset>
>   <wdrurl:includeschemes>http</wdrurl:includeschemes>
>   <wdrurl:includehosts>example.org example.net</wdrurl:includehosts>
>   <wdrurl:includequerycontains>s=football</wdrurl:includequerycontains>
>   <wdrurl:includequerycontains>c=gr</wdrurl:includequerycontains>
>   <wdrurl:includequerycontains>l=first</wdrurl:includequerycontains>
> </iriset>
> 
> translates this, much more verbose, vanilla POWDER/XML IRI Set:
> 
> <iriset>
>   <includepattern>
>     <xsd:pattern value="^http://" />
>   </includepattern>
>   <includepattern>
>     <xsd:pattern value="^[^:]://([\w\.]+\.)?(example\.org)|(example\.net)[:\?/]" />
>   </includepattern>
>   <includepattern>
>     <xsd:pattern value="^[^?]+\?(.*&)?s=football[&$]" />
>   </includepattern>
>   <includepattern>
>     <xsd:pattern value="^[^?]+\?(.*&)?c=gr[&$]" />
>   </includepattern>
>   <includepattern>
>     <xsd:pattern value="^[^?]+\?(.*&)?l=first[&$]" />
>   </includepattern>
> </iriset>
> 
> 
> The WAF Extension
> =================
> 
> Q to group: does POWDER also need to provide this transformation?
> Or have the WAF people already written it?
> 
> Enabling Read Access for Web Resources WG jas defined a Unix
> shell-like wildcard mechanism.
> 
> <waf:includeiripattern>*.example.org</waf:includeiripattern>
> 
> <wdr:includepattern>
>     <xsd:pattern value="http://.*\.example.org(/.*)?" />
> </wdr:includepattern>
> 
> 
> Multiple Layers of Extensions
> =============================
> 
> It might sometimes be useful to also build upon already defined
> extensions. For example, some content providers serve dynamic content
> stored in a database, so that IRIs express queries to the database.
> This kind of IRIs have certain structure, but this structure is
> neither obvious nor easily human-interpreted. Furthemore, conventional
> grouping mechanisms cannot be used to group resources, as the site
> structure does not match any directory hierarchy.
> 
> As an example, consider sport.example.com, a sports news site,
> where IRIs look like the one shown in Example 3-2-1. The adopted
> scheme is systematic so that sport=2&countryID=16 provides a front
> page with news about Greek basketball and links to various Greek
> basketball leagues, sport=3&countryID=16 a front page about Greek
> volleyball, etc. Eg:
>   http://sport.example.com/matches.asp?sport=1&countryID=16&champID=2
> 
> A POWDER document providing metadata about this web site would have to
> use regular expression matching with explicit reference to the
> numerical values in the country and sport fields of the query. This
> process is error-prone, and requires extensive changes if the
> underlying database schema is modified or extended.
> 
> As an alternative, the site developer may provide a POWDER vocabulary
> extension that abstracts away from the database schema to allow
> reference to sports and countries. POWDER document authors can then
> use the properties in this extension to create POWDER documents
> are valid even if the site schema is modified, as long as the site
> developer updates the relevant transformations.
> 
> So a POWDER/XML document might look like this:
> 
> <wdr:iriset>
>   <wdrurl:includeschemes>http</wdrurl:includeschemes>
>   <wdrurl:includehosts>sport.example.com</wdrurl:includehosts>
>   <sport:countries>Greece</sport:countries>
>   <sport:sports>Football Basketball</sport:sports>
> </wdr:iriset>
> 
> A POWDER/XML tool specifically built for sport.example.com other site
> following the same query patterns will immediately know how to handle
> this information. Other POWDER tools will apply the GRDDL transform
> associated with the sport: namespace to get the following translation:
> 
> <wdr:iriset>
>   <wdrurl:includeschemes>http</wdrurl:includeschemes>
>   <wdrurl:includehosts>sport.example.com</wdrurl:includehosts>
>   <wdrurl:includequerycontains>countryID=16</wdrurl:includequerycontains>
>   <wdrurl:includequerycontains>countryID=16</wdrurl:includequerycontains>
>   <wdrurl:includequerycontains>sport=1 sport=2</wdrurl:includequerycontains>
> </wdr:iriset>
> 
> A web-oriented POWDER/XML tool will immediately know what to do with 
> wdrurl: vocabulary items. Other POWDER tools will apply the GRDDL transform
> associated with the wdrurl: namespace to get the following translation:
> 
> <iriset>
>   <includepattern>
>     <xsd:pattern value="^http://" />
>   </includepattern>
>   <includepattern>
>     <xsd:pattern value="^[^:]://([\w\.]+\.)?(sport\.example\.com)[:\?/]" />
>   </includepattern>
>   <includepattern>
>     <xsd:pattern value="^[^?]+\?(.*&)?countryID=16[&$]" />
>   </includepattern>
>   <includepattern>
>     <xsd:pattern value="^[^?]+\?(.*&)?(sport=1)|(sport=2)[&$]" />
>   </includepattern>
> </iriset>
> 
> Finally, an even more generic RDF/OWL tool will apply the transform
> associated with the wdr: namespace to get the even more verbose
> RDF/OWL translation, as described above.
> 
> 
> Non-URL Identifiers
> ===================
> 
> Although POWDER is mostly involved with resources that are identified
> by URLs, there is a number of other use cases; for example one might
> use POWDER to provide meta-data about physical, off-line resources
> like books or DVDs.
> 
> The International Standard Audiovisual Number [ISAN1] is a voluntary
> numbering system for the identification of audiovisual works.
> Following ISO 15706, the numbers are written as 24 bit hexadecimal
> digits in the following format [ISAN2].
> 
>  -----root-----   episode   -version-  
> ISAN  1881-66C7-3420  -  0000  -7-  9F3A-0245  -U
> 
> The root of an ISAN number is assigned to a core work with the other
> numbers being used for things like episodes, different language
> versions, promotional trailers and so on.
> 
> Since ISAN numbers are URNs [URN], and hence IRIs of the urn: scheme
> [URIS], a vocabulary can readily be defined to allow IRI Sets to be
> defined based on ISAN numbers. The terms might be along the lines of:
> 
> includeroots — the value of which would be a white space separated of
> hexadecimal digits and hyphens that would be matched against the first
> three blocks in the ISAN number.
> 
> includeepisodes — a white space separated list of hexadecimal digits
> and hyphens that would be matched against the 4th block of 4 digits in
> the ISAN number.
> 
> includeversions — a white space separated list of hexadecimal digits
> and hyphens that would be matched against the 5th and 6th blocks of 4
> digits in the ISAN number.
> 
> The set of all audio visual resources that relate to two particular
> works might then be so defined:
> 
> Custom ISAN pattern:
> 
> <wdr:iriset>
>   <isan:includeroots>1881-66C7-3420 1881-66C7-3421</isan:includeroots>
> </wdr:iriset>
> 
> Corresponding vanilla POWDER/XML:
> 
> <iriset>
>   <includepattern>
>     <xsd:pattern value="^urn:isan:(1881-66C7-3420)|(1881-66C7-3421)" />
>   </includepattern>
> </iriset>
> 
> This example demonstrates one major extendability glitch in the
> approach described here: numerical constraints (like, here, defining
> numerical ranges for, say, the 3rd block) cannot be defined using wdr:
> primitives. As the reader might also have noticed, port and IP ranges
> (although specific to URLs) were hard-coded in the IRI level and not
> defined as wdrurl: extensions. This is because XML types do not
> provide a mechanism for using regexps to extract character groups from
> strings, and then apply further numerical or other tests on the
> extracted groups; a string either matches a regexp or does not, and
> that is all.
> 
> One interesting approach would be to license use of XSLT 2 [XSLT2] in
> the extension definitions, which provides for using regexps to extract
> character groups. To be investigated.
> 
> 
> Resource Sets
> =============
> 
> One of the original desiderata of the group, later abandonded, was the
> ability to group resources by property as well as by name. This is a
> considerable expressivity leap for the POWDER/XML language.
> 
> This idea was abandonded in the Athens F2F, when it became obvious
> that the POWDER grouping mechanism should not refer to the resources
> themselves, but to the string representations of their IRIs. Since it
> is the resources that have properties like being blue and not the
> IRIs, the whole idea of grouping by property collapsed.
> 
> If it is important enough for POWDER, some limited expressivity
> might be re-introduced in the form of a parallel grouping mechanism,
> by intersecting the results of the two mechanism before finally
> applying the descriptors. In other words:
> 
> <dr>
>  <iriset>
>    <wdrurl:includehosts>example.com</wdrurl:includehosts>
>  </iriset>
>  <resourceset>
>    <voc:colour ref="http://rgb.org/colours.rdf#blue"/>
>  </resourceset>
>  <descriptorset>
>    <voc:shape>square</voc:shape>
>  </descriptorset>
> </dr>
> 
> might be used to express that "on example.com, all blue resources are
> also square". A resouce has to both be on example.com AND be blue in
> order to also be described as square.
> 
> This can be very naturally expressed in OWL, and OWL tools will be
> able to figure out which resources are blue, but it might be a
> considerable strain on POWDER/XML tools which will care more about
> efficiency than reasoning completeness. Furthermore, this opens a hole
> through which circular definitions can creep, and loop detection will
> also be a considerable strain to POWDER/XML implementations. My
> suggestion is to drop it in the sake of efficiency or, at most, leave
> an extension door open for logical statements that fall through to the
> underlying POWDER-S; just in case one really needs to express such a
> thing in POWDER/XML instead of OWL.
> 
> 
> 
> REFERENCES
> ==========
> 
> [1] http://www.w3.org/TR/2004/REC-xmlschema-2-20041028/datatypes.html#rf-pattern
> [2] http://www.w3.org/TR/2004/REC-xmlschema-1-20041028/structures.html#Complex_Type_Definitions
> [3] http://www.w3.org/TR/owl-semantics/syntax.html#2.1
> [4] http://www.w3.org/TR/owl-semantics/mapping.html
> [GRDDL] http://www.w3.org/TR/grddl/
> [URN] http://www.iana.org/assignments/urn-namespaces
> [ISAN1] http://www.isan.org/
> [ISAN2] http://www.isan.org/portal/page?_pageid=166,41960&_dad=portal&_schema=PORTAL
> [XSLT2] http://www.w3.org/TR/xslt20/
> 
> 

Received on Monday, 7 April 2008 19:34:35 UTC