Re: a concern on SW technologies: document content

On 08/12/06, Lee Feigenbaum <feigenbl@us.ibm.com> wrote:
>
> Hi SWEOids,

SWEOysters?

> Wing and I had an interesting and somewhat enlightening conversation with
> another IBMer today. Our colleague was somewhat familiar with the SW world
> and is very familiar with the XML world, and he expressed concerns that SW
> technologies (and RDF / SPARQL in particular) may fall short in one
> prominent area in which XML / XQuery shines: dealing with content-oriented
> (often mixed content) documents. He was concerned about this given some of
> our claims about the value of RDF/SW technologies as a unifying
> environment for data and metadata.

I'm glad you raised this, I do think we need to cover this somehow.
The web is currently somewhat content-oriented...

> He gave various examples ranging from insurance policies to resumes to
> rentral agreements, with the basic idea being that XQuery can easily
> answer questions that involve searching within a document (or, more-so,
> searching for text in a particular paragraph of a document, perhaps with
> emphasis added) which uses XML markup.

It would be nice to have some concrete examples...

He wondered aloud and we discussed
> what the SW approach to this would be, and we agreed that it's lacking
> right now.

I suspect the best approach is to treat the technologies as
complementary - RDF tools are good at large scale, arbitrarily
connected data, XML tools are good at in-document, hierarchical data.
In other words, XQuery, XSLT etc are part of the Semantic Web toolkit
(XML is in the cake after all).

Another key point is that XML tools generally depend on kind-of local,
syntax-implied models - useful transformation/query of a doc depends
on domain-specific knowledge, knowing the significance of the shape of
the doc. But something like GRDDL (heh, XSLT) can globalise the data.

He expressed worry that whereas XML can wrap data that might be
> best expressed as relational or RDF data (and then join that data in
> XQuery queries with document data), the RDF world may not have as nice a
> story.

I'm not so sure about that - an RDF literal can take any XML (as long
as it's namespaced? I forget), Codd-style relations can be exploded
into lots of little binary relations. XML starts looking weak the
moment you need to step outside the tree (and Web != tree).

Representing docs at an addressable level in RDF is difficult, XML
gets a lot for free - document order, simpler structure. If you don't
mind a data/metadata split then the approach taken by e.g. Annotea can
give the best of both worlds - it uses XPath/XPointer to point inside
XHTML docs. (Can't ever remember seeing XPointer used anywhere else,
shame really, it's quite nifty).

(Funnily enough just the other week Reto and I were dabbling with
manipulating docs in RDF, but we cut quite a few corners by having the
doc model in the RDF, not trying to cover everything HTML can).

> I (personally) need to think the issues here through a bit more, but to me
> it was not an objection that I've heard commonly, but it was an
> interesting one to which I had no immediate response, so I wanted to throw
> it out here and solicit thoughts and/or feedback. (I don't think it's
> imperative that we have an immediate or bulletproof response to every
> potential SW objection, but thinking about where the technologies fall
> short in addition to where they excel should help us craft our messaging.)

It may be a question to pass over to the semantic-web list. Meanwhile
I'll ping a couple of XML+RDF people.

Cheers,
Danny.

-- 

http://dannyayers.com

Received on Saturday, 9 December 2006 10:29:29 UTC