- From: Kurt Cagle <cagle@olywa.net>
- Date: Fri, 10 Nov 2000 17:04:44 -0800
- To: "Sean B. Palmer" <sean@mysterylights.com>, "Aaron Swartz" <aswartz@swartzfam.com>
- Cc: <www-rdf-interest@w3.org>, <www-talk@w3.org>
I'd concur; a rules based mechanism would seem to make sense as one part of a dynamic semantic framework. The example that I gave before was a Perl-type regular expression (RegEx or regex), and it will be an integral part of XML Schema and likely XSLT 1.1; Schema recognizes regexes already. They are powerful tools for locating content matches because they can represent VERY complex patterns. The XML snippet that I produced demonstrated one way of potentially encoding the associational map, though of course there are many others. One of the nice things I like about regexes is that you create aggregate matches, so that multiple matches can be associated with the same tag structure. I think that the editor discussed earlier provides some insight into the potential role of SDF. RDF is a static descriptive entity, but most Semantics are largely driven by dynamic content -- if things didn't change, then all of this would be simple, but it would also be dull ... there really wouldn't need to be a rich framework for describing semantics in the first place. Natural Language Processing, unfortunately, isn't necessarily all that reduceable. English is a particularly nasty language in that regard because it has SO MANY exceptions to the rules. French has about 200 irregular verbs, for instance, English has thousands. The irony is that while the encapsulation of objects (nouns) is a relatively simple proposition, verbs introduce a whole layer of complexity to the process. Thus any editor would be able to filter only a subset of a document automatically even with perfect matches. Still, it is better than simple keyword matching. Thus an SDF framework would require (as I see it) the following components: 1) A schema for encoding abstract information about a document. This may be RDF or some logical superset of RDF, encodings on an XSD schema (perhaps as a namespace for use within the annotation node), or some new schema format. 2) A rules based mechanism for mapping the contents of a document of a given schema to it's corresponding SDF schema. This would probably be best served as some combination of XSLT and Regular Expressions, working against an encoding contained within the SDF schema. 3) A linking mechanism that would register a document or document space and create xlink associations dynamically. This may introduce the notion of weighting or LODing the space to create a more holigraphic architecture. 4) A garbage collection mechanism that would delete broken links, simplify the database periodically, reduce the complexity of the SDF instance, and perform notifications to other subscribers to the SDF framework when a document changed. It should be noted that a framework is not in and of itself a standard, though SDF encoding may be. Thus there is a question as to what degree this should get pushed into the W3C as a standard, and how much of it is simply an application, albeit one of great import. SDF makes a great deal of sense in the move toward distributed programming, since we're rapidly approaching a point where we were a decade ago with the host of file retrieval protocols and no real idea about what the files they point to actually contain. The difference is that in most cases there was NO semantic information in 1992 beyond what little that could be gleaned by looking at the filename, while with XML we have the capability of creating very extensive semantic structures. -- Kurt ----- Original Message ----- From: "Sean B. Palmer" <sean@mysterylights.com> To: "Kurt Cagle" <cagle@olywa.net>; "Aaron Swartz" <aswartz@swartzfam.com> Cc: <www-rdf-interest@w3.org>; <www-talk@w3.org>; "Ken Levy" <klevy@xmlfund.com> Sent: Thursday, November 09, 2000 3:45 PM Subject: Re: Semantic Document Framework(s) > Double reply: [1] To Kurt Cagle, [2] To Aaron Swartz > > A weather report is easy to > > encode, because there is a fairly limited vocabulary, but such a semantic > > editor could also work in the opposite direction -- a user can add an item > > to the set of rules simply by selecting the text in question, assigning it > > to a specific encoding tag (or defining a new tag if the editor is in > schema > > generation mode), and storing it (or editing it as a regex). In this > manner, > > you could add semantic content very easily. > > [1] A rule, in it's own namespace, is a good way of getting the mechanism > moving: i.e. to accomplish rule based processing. The whole point of a > Semantic Web is that everything is findable in a very simple process: or so > the theory goes. Therefore, in information terms, nothing should be beyond > it. I hope everyone follows: the old Web constraints do not apply to the SW, > we are working on a different system, no matter how much it resembles, and > indeed evolves from. Of course I am only talking about the *aim* there, real > life will probably be very different. Theory vs. Practise etc. > > > I bet -- looks like an AI-complete problem to me, at least for the general > > case. Specific cases could be done with really complex heuristics, but > > that'd be rather hairy. I'd love for this to work, but all I see you doing > > is offloading the English parsing to the writer, rather than the reader. > > You still need the software to parse the English. That's going to be > really > > hard to do, except in perhaps really specific cases. Certainly, it'll be a > lot > > harder to write than your average HTML editor! > > [2] English is just a complicated programming language. At the end of the > day it is still processable, if not quite as easy as SGML/XML systems. With > rules etc., it will be made easier. It's not really AI: just a very very > complex system, that looks like AI. (There's a quote in there from someone I > think). > We are dealing with Engliah as a language, and as an information resource. > The SW will help to break down that invisible barrier. > > Kindest Regards, > Sean B. Palmer > ---------------------------------------------------- > The Semantic Web: A Resource - http://xhtml.waptechinfo.com/swr/ > WAP Tech Info - http://www.waptechinfo.com/ > Mysterylights.com - http://www.mysterylights.com/ > ---------------------------------------------------- > "The Internet; is that thing still around?" - Homer J. Simpson > >
Received on Thursday, 9 November 2000 20:04:53 UTC