Re: Semantic Document Framework(s) from Kurt Cagle on 2000-11-11 (www-rdf-interest@w3.org from November 2000)

From: Kurt Cagle <cagle@olywa.net>
Date: Fri, 10 Nov 2000 17:04:44 -0800
To: "Sean B. Palmer" <sean@mysterylights.com>, "Aaron Swartz" <aswartz@swartzfam.com>
Cc: <www-rdf-interest@w3.org>, <www-talk@w3.org>
Message-ID: <003501c04b7b$628751f0$60c8fdce@siren>
I'd concur; a rules based mechanism would seem to make sense as one part of
a dynamic semantic framework. The example that I gave before was a Perl-type
regular expression (RegEx or regex), and it will be an integral part of XML
Schema and likely XSLT 1.1; Schema recognizes regexes already. They are
powerful tools for locating content matches because they can represent VERY
complex patterns. The XML snippet that I produced demonstrated one way of
potentially encoding the associational map, though of course there are many
others.
One of the nice things I like about regexes is that you create aggregate
matches, so that multiple matches can be associated with the same tag
structure.

I think that the editor discussed earlier provides some insight into the
potential role of SDF.  RDF is a static descriptive entity, but most
Semantics are largely driven by dynamic content -- if things didn't change,
then all of this would be simple, but it would also be dull ... there really
wouldn't need to be a rich framework for describing semantics in the first
place.

Natural Language Processing, unfortunately, isn't necessarily all that
reduceable. English is a particularly nasty language in that regard because
it has SO MANY exceptions to the rules. French has about 200 irregular
verbs, for instance, English has thousands. The irony is that while the
encapsulation of objects (nouns) is a relatively simple proposition, verbs
introduce a whole layer of complexity to the process. Thus any editor would
be able to filter only a subset of a document automatically even with
perfect matches. Still, it is better than simple keyword matching.

Thus an SDF framework would require (as I see it) the following components:
1) A schema for encoding abstract information about a document. This may be
RDF or some logical superset of RDF, encodings on an XSD schema (perhaps as
a namespace for use within the annotation node), or some new schema format.
2) A rules based mechanism for mapping the contents of a document of a given
schema to it's corresponding SDF schema. This would probably be best served
as some combination of XSLT and Regular Expressions, working against an
encoding contained within the SDF schema.
3) A linking mechanism that would register a document or document space and
create xlink associations dynamically. This may introduce the notion of
weighting or LODing the space to create a more holigraphic architecture.
4) A garbage collection mechanism that would delete broken links, simplify
the database periodically, reduce the complexity of the SDF instance, and
perform notifications to other subscribers to the SDF framework when a
document changed.

It should be noted that a framework is not in and of itself a standard,
though SDF encoding may be. Thus there is a question as to what degree this
should get pushed into the W3C as a standard, and how much of it is simply
an application, albeit one of great import. SDF makes a great deal of sense
in the move toward distributed programming, since we're rapidly approaching
a point where we were a decade ago with the host of file retrieval protocols
and no real idea about what the files they point to actually contain. The
difference is that in most cases there was NO semantic information in 1992
beyond what little that could be gleaned by looking at the filename, while
with XML we have the capability of creating very extensive semantic
structures.

-- Kurt



----- Original Message -----
From: "Sean B. Palmer" <sean@mysterylights.com>
To: "Kurt Cagle" <cagle@olywa.net>; "Aaron Swartz" <aswartz@swartzfam.com>
Cc: <www-rdf-interest@w3.org>; <www-talk@w3.org>; "Ken Levy"
<klevy@xmlfund.com>
Sent: Thursday, November 09, 2000 3:45 PM
Subject: Re: Semantic Document Framework(s)


> Double reply: [1] To Kurt Cagle, [2] To Aaron Swartz
> > A weather report is easy to
> > encode, because there is a fairly limited vocabulary, but  such a
semantic
> > editor could also work in the opposite direction -- a user can add an
item
> > to the set of rules simply by selecting the text in question, assigning
it
> > to a specific encoding tag (or defining a new tag if the editor is in
> schema
> > generation mode), and storing it (or editing it as a regex). In this
> manner,
> > you could add semantic content very easily.
>
> [1] A rule, in it's own namespace, is a good way of getting the mechanism
> moving: i.e. to accomplish rule based processing. The whole point of a
> Semantic Web is that everything is findable in a very simple process: or
so
> the theory goes. Therefore, in information terms, nothing should be beyond
> it. I hope everyone follows: the old Web constraints do not apply to the
SW,
> we are working on a different system, no matter how much it resembles, and
> indeed evolves from. Of course I am only talking about the *aim* there,
real
> life will probably be very different. Theory vs. Practise etc.
>
> > I bet -- looks like an AI-complete problem to me, at least for the
general
> > case. Specific cases could be done with really complex heuristics, but
> > that'd be rather hairy. I'd love for this to work, but all I see you
doing
> > is offloading the English parsing to the writer, rather than the reader.
> > You still need the software to parse the English. That's going to be
> really
> > hard to do, except in perhaps really specific cases. Certainly, it'll be
a
> lot
> > harder to write than your average HTML editor!
>
> [2] English is just a complicated programming language. At the end of the
> day it is still processable, if not quite as easy as SGML/XML systems.
With
> rules etc., it will be made easier. It's not really AI: just a very very
> complex system, that looks like AI. (There's a quote in there from someone
I
> think).
> We are dealing with Engliah as a language, and as an information resource.
> The SW will help to break down that invisible barrier.
>
> Kindest Regards,
> Sean B. Palmer
> ----------------------------------------------------
> The Semantic Web: A Resource - http://xhtml.waptechinfo.com/swr/
> WAP Tech Info - http://www.waptechinfo.com/
> Mysterylights.com - http://www.mysterylights.com/
> ----------------------------------------------------
> "The Internet; is that thing still around?" - Homer J. Simpson
>
>
Received on Thursday, 9 November 2000 20:04:53 UTC