extracting statements from XML (again)

I've been thinking some more about adorning schemas to enable the
extraction of statements[1]. To illustrate what I mean, imagine an
XML instance (or a fragment thereof):

  <widget name="thing">
    <stock-properties store="down-the-road">
      <price unit="dollars">1.95</price>
      <stock>5</stock>
    <stock-properties>
  </widget>

This fragment has a corresponding XML Schema fragment which might
look something like;

  <element name="widget">
    <complexType>
      <sequence>
        <element name="stock-properties">
          <complexType>
            <sequence>
              <element name="price">
                <complexType>
                  <simpleContent>
                    <extension base="decimal">
                      <attribute name="unit" type="string"/>                   
                    </extension>
                  </simpleContent>
                </complexType>
              </element>
              <element name="stock" type="nonNegativeInteger"/>
            </sequence>  
          </complexType>
        </element>
      </sequence>
      <attribute name="store" type="string"/>
    </complexType>
  </element>  

When an implementation runs across this, we'd like to extract
statements along these lines:

  @prefix rdf:  <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
  @prefix :  <http://example.com/inventory#> .

  <> a :Inventory ; 
     :location [ a :Store; :place "down-the-road" ] ; 
     :items [ a rdf:Bag ;    
              rdf:_1 [ a :Widget ;    
                      :name "thing" ;
                      :price [ :amount "1.95"; :unit "dollars" ] ;
                      :stock "5"
                     ]
            ] .

There are a few ways to realise this. In some ways, the most
efficient way to do so would be to adorn the Schema directly; e.g.,

  <element name="stock" type="nonNegativeInteger" 
    extract:subject="parent::widget"
    extract:object="self::text()"
  />

and so forth.

Here, the predicate is implicitly 'stock', and the object is the
result of evaluating the expression in the current context. We'd need
to come up with some syntax to refer to different constructs in the
RDF (so, as in this example, we could point the subject to the proper
place). This is just a shot in the dark, there are other ways that
this could be done; the idea is just to expoit the structure of the
schema to nominate the statement components.

Unfortunately, I'm led to believe that such adornment is illegal in
Schema. So, something like SAF[2] could be used to attach the
directives for extracting statements; this would still allow some
advantage to be taken from the schema's structure, if it's put into
the its appinfo. I can see two obvous ways to express this.

First, one could use XML-escaped n3 with some XML tags interspersed;
for example,
 
 :location [ a :Store; 
             :place "<xsl:value-of select="stock-properties/@store">" ] ; 

Alternatively, one could just spell the triples out, something like;

  <extract:statement>
    <extract:subject>self</extract:subject>
    <extract:predicate type="literal">
      http://example.com/inventory#Store
    </extract:predicate>
    <extract:object>stock-properties/@store</extract:object>
  </extract:statement>

This would, of course, be more verbose in most cases.

This is all very similar to blindfold grammars[3], it's just a matter
of syntax, I suppose; if it's accepted that XML Schema is an
interesting and useful place to put this facility, something other
than BNF (for all its charms ;) might be good.

Thoughts? I'm not sure of what this looks like from a purist Schema
perspective (or a purist Semantic Web perspective, for that matter),
but I find the possibility of gathering statements from any XML in
such a manner intriguing and potentially very useful.


1. http://lists.w3.org/Archives/Public/www-rdf-interest/2001Sep/0060.html
2. http://www.extensibility.com/resources/saf.htm
3. http://www.w3.org/2001/06/blindfold/grammar

-- 
Mark Nottingham
http://www.mnot.net/
 

Received on Wednesday, 3 October 2001 02:54:24 UTC