Re: extracting statements from XML (again)

I've gotten some encouragement on this from a few people (thanks!).

I think it's possible to get most of the benefit of extracting RDF
statements from XML with a fairly simply syntax, and have written up
a straw-man proposal:

'ex:statement' is an element valid as a child of any appinfo
element. It allows the indirect association of one or more RDF
statments with the occurence of an XML construct in an instance
document.

The following are attributes valid on ex:statement, xsd:element or
xsd:attribute:
  'ex:subject'   (value is an IDREF or QName)
  'ex:predicate' (value is a QName)
  'ex:value'     (value is an IDREF or QName) 
When the value is an IDREF, it points to the matching ex:id attribute
in the schema document. Together, these three attributes allows an
RDF statement to be associated with XML Schema constructs, either
directly in the <element>s or <attribute>s, or in their associated
<appinfo> sections.

When a Schema processor validates an instance document, it uses these
attributes to extract a statement each time the containing element or
attribute is encountered.

For example, an adorned XML schema might contain:
  <element name="shoe"
           ex:subject="clothes:shoe"
           ex:predicate="clothes:colour"
           ex:object="colors:blue"
  />

Alternatively, an complex element's schema might contain an appinfo
section with:
  <ex:statment ex:subject="clothes:shoe"
               ex:predicate="clothes:colour"
               ex:object="colors:blue"
  />
  
The following special values are available to these attributes;
  'ex:document' refers to the URI of the instance document being
     processed.
  'ex:value' refers to the value of the element or attribute. It is
     only available with attributes of 'element' or 'attribute' schema 
     elements.
  'ex:anonymous' instantiates a new anonymous object. It cannot be
     used as a value for ex:predicate.
  'ex:newitem' inserts a new subject in a bag.

This allows more functional statments to be extracted. For example,
  <element name="shoe"
           ex:subject="clothes:shoe"
           ex:predicate="clothes:manufacturer"
           ex:object="ex:value"
  >

Would, when evaluating the instance fragment:
  <shoe>Nike</shoe>
result in a statement:
clothes:shoe clothes:manufacturer "Nike" .

Going back to our example from before (slightly modified), an
instance fragment:

  <inventory store="down-the-road">
    <widget name="thing">
      <price unit="dollars">1.95</price>
      <stock>5</stock>
    </widget>
  </inventory>

with a cooresponding schema:

 <schema xmlns="http://www.w3.org/2001/XMLSchema"
        xmlns:inv="http://example.com/inventory#"
        xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
        xmlns:ex="http://mnot.net/schema-rdf-extension/0.1"
 >
  <element name="inventory">
    <annotation>
      <appinfo>
        <ex:statement ex:subject="ex:document" 
         ex:predicate="rdf:Type" ex:object="inv:Inventory" />
        <ex:statement ex:subject="ex:document" ex:predicate="inv:location" 
         ex:object="ex:anonymous" ex:id="loc"/>
        <ex:statement ex:subject="ex:document" ex:predicate="inv:items" 
         ex:object="ex:anonymous" ex:id="items" />
        <ex:statement ex:subject="#loc" ex:predicate="rdf:Type"
         ex:object="inv:Store"/>
        <ex:statement ex:subject="#items" ex:predicate="rdf:Type"
         ex:object="rdf:Bag" ex:id="bag" />
      </appinfo>
    </annotation>
    <complexType>
      <sequence>
        <element name="widget" maxOccurs="unbounded">
          <annotation>
            <appinfo>
              <ex:statement ex:subject="#bag" ex:predicate="ex:newitem" 
               ex:object="ex:anonymous" ex:id="item" />
              <ex:statement ex:subject="#item" ex:predicate="inv:price" 
               ex:object="ex:anonymous" ex:id="price" />
            </appinfo>
          </annotation>
          <complexType>
            <sequence> 
              <element name="price" ex:subject="#price" 
               ex:predicate="inv:amount" ex:object="ex:value">
                <complexType>
                  <simpleContent>
                    <extension base="decimal">
                      <attribute name="unit" type="string"
                       ex:subject="#price" ex:predicate="inv:unit" 
                       ex:object="ex:value"/>
                    </extension>
                  </simpleContent>
                </complexType>
              </element>
              <element name="stock" type="nonNegativeInteger"
               ex:subject="#item" ex:predicate="inv:stock"
               ex:object="ex:value" />
            </sequence> 
            <attribute name="name" value="string"
             ex:subject="#item" ex:predicate="inv:name"
             ex:object="ex:value" /> 
          </complexType> 
        </element>
      </sequence> 
      <attribute name="store" type="string"
       ex:subject="#loc" ex:predicate="inv:place" ex:object="ex:value"/>
    </complexType>
  </element> 
 </schema>

would generate these statements:

  @prefix rdf:  <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
  @prefix :  <http://example.com/inventory#> .

  <> a :Inventory ; 
     :location [ a :Store; :place "down-the-road" ] ; 
     :items [ a rdf:Bag ;    
              rdf:_1 [ a :Widget ;    
                      :name "thing" ;
                      :price [ :amount "1.95"; :unit "dollars" ] ;
                      :stock "5"
                     ]
            ] .

Although using XPath or similar would be more powerful, it appears
that this mechanism gives enough power and flexibility in the common
cases. What concerns me more is whether it would be necessary to
support taking   
  <shoe>Nike</shoe>
and extracting statements like
  clothes:shoe clothes:manufacturer companies:Nike .

I think the next step is to start working on a test implementation;
xsv seems to be the obvious choice. Any comments or suggestions would
be appreciated.

-- 
Mark Nottingham
http://www.mnot.net/
 

Received on Sunday, 7 October 2001 18:31:10 UTC