Re: Suggestions for implementing a "document matches" program?

Dick,

Note: This is a roundabout method that I'm proposing, but it should
generate some interesting discussion I hope.

> Hi Dan.  Let me see if I can clarify my message with a couple of
> examples.  If I hand this XML instance document to PROGRAM it should
> output "Yes, it MATCHES"
> 
> <snip/>
> 

Let's assume the premise that every RDF document (in formats RDF/XML,
N-Triples, CSV...) describes a valid graph.  Furthermore, a single graph
can be represented by two different documents that have the same format.
As an example...

<jfc:Thing rdf:about="#foo" jfc:prop="bar" />

...is the same as...

<rdf:Description rdf:about="#foo">
     <rdf:type rdf:resource="&jfc;#Thing />
     <jfc:prop>bar</jfc:prop>
</rdf:Description>

...even though each document looks nothing like the other. (Note:
namespace and entity definitions not shown for brevity.)  There are even
other ambiguities about the RDF working drafts that haven't been
resolved yet!  So the problem is figuring out a method to "normalize" a
document into a data structure that unique for a graph [1].

First I would craft an object model (possibly similar to the W3C's DOM
[2]) to represent the RDF graph in memory.  Anyone have any ideas what
this could look like?

Secondly, I'd design a Moore Machine (a type of finite state machine or
other sequential device) to interface between SAX [3] events (input) and
the object model utility calls (i.e., to construct, query and modify the
graph).

Then I'd choose a SAX-compliant library to fire SAX events from a loaded
RDF/XML (or N-Triples or another format - SAX is nice because the input
doesn't have to be XML!) document.

Finally, I'd choose a standard and unique representation of the RDF
document object so it can be recorded to disk.  This way common file
checking utilities can be used to compare RDF graphs (such as the one
you want.  As an alternative, I'd write a subroutine to compare
'invariant nodes and arcs' in the reference and input graphs.  If they
match, then you're problem is solved.

Of course, this is a shoe-string idea after about 30 minutes of thought.
There's probably a better way of doing it.  Any suggestions?

> The PROGRAM should be able to realize (by consulting the Camera
> Ontology) that
> 
>    SLR is a type of Camera
>    f-stop is synonymous with aperture
>    focal-length is synonymous with size
> 
> Further, the desired values for aperture and size are met by this XML
> instance document.  Thus, this XML instance document is a match!
> 
> On the other hand, the PROGRAM should be able to recognize that this
XML
> instance document ...
> 
> <snip/>
>
> ... is not a match, since it is not talking about Cameras.

That's a hard question to answer.  Is the program analyzing the actual
concept of "talking about"?  Or is it simply comparing the structure of
two normalized sets of data (graphs)?  Furthermore, is there really a
difference between the two interpretations?  I don't know; there are
probably smarter people who do know, however.

L8r.

--
James F. Cerra 

[1] Here's an interesting question: could an RDF graph be uniquely
represented literally a graph in SVG (ignoring layout and strictly from
a topological or "connectionist" view).

[2] http://www.w3.org/DOM/

[3] http://www.saxproject.org/

Received on Tuesday, 22 April 2003 00:20:02 UTC