markup for content analysis

1) Have you any comments on the thread partly reproduced below?   It seems
to have stumped the local Internet and Linux user groups.
2) What are some effective ways to come up to speed on the pre-requisites
for understanding the proceedings of the Semantic Web Working Group and
friends?


> Trent Shipley wrote:
> > No.  This is essentially a content analysis problem.  The idea is to
mark up
> > *meaning* not structure or format.  In a sense the idea is to press XML
|
> > SGML into service as a semantic content markup language.  The problem
with
> > your break idea is that it:
>
> You should check out the W3C's Semantic Web rantings then.
>
> Regarding your example and SGML.  I think except for the attributes in
> the close tags, that format is SGML compliant.
>
> > Trent Shipley wrote:
> > No.  This is essentially a content analysis problem.  The idea is to
mark up *meaning* not structure or format.  In a sense the idea is to press
XML | SGML into service as a semantic content markup language.  The problem
with your break idea is that it:
> >
> > 1: Makes it look like there are two sets of text that need to be tagged
with content tag #1 when really there is only one element.
> >
> > 2: It makes it look like the second section of #1 content is contained
in a passage of type #2.  In reality passage #1 and passage #2 share text
and content (sort of like using the same DLL).  If the underlying
meaning-structure is misrepresented one can imagine that it could reduce the
utility for linguistic analysis.  Methinks Larry Wall would *not* approve.
> >
> > As for the desirability of non-nested elements, an in-elegant hack would
be to put them in comments.  The HRAF codes are used for statistics and
extracting data.  The markup parser doesn't really need to see them, but if
it did it would save having to develop a content markup language and parser.
> >
> >
> > Still the idea of a Semantic Content Markup Language is intriguing.
> >      It would even have some immediate commercial application in market
research and data retrieval services.  It could even be extended to help
automate esoteric tasks like compiling HRAF files and indexes.
> >
> >
> > > -----Original Message-----
> > > From: plug-discuss-admin@lists.PLUG.phoenix.az.us
> > > [mailto:plug-discuss-admin@lists.PLUG.phoenix.az.us]On Behalf Of
Kimbro
> > > Staken
> > > Sent: Thursday, May 31, 2001 1:25 PM
> > > To: plug-discuss@lists.PLUG.phoenix.az.us
> > > Subject: Re: SGML vs XML
> > >
> > >
> > > Trent Shipley wrote:
> > > >
> > > > I have decided that my dissertation has outgrown WordPerfect
> > > and EndNote,
> > > > but especially WordPerfect.  It looks like XML + XSL will work
> > > pretty well.
> > > > However, for future compatibility it would be nice to markup
> > > the document
> > > > with Human Relations Area File codes (HRAF codes).  The problem
> > > is that text
> > > > referenced by HRAF codes might not nest.
> > > >
> > > > Note that a single code should never overlap itself.  It is as
> > > if there were
> > > > N codes and the document was scanned for the applicability of each
code,
> > > > effectively resulting in N documents.  However, some of those N
> > > codes will
> > > > not apply anywhere in the document.  In effect some subset m of the
N
> > > > documents consists of NULL documents.  Then all the interesting N-m
> > > > documents are projected into a single document.
> > > >
> > > > The markup could looks something like this:
> > > >
> > > > <HRAF code="1">The Raboof blah blah blah blah yadda yadda.
> > > <HRAF code="2">
> > > > Their women blah yadda blah blah.</HRAF code="1">  Meanwhile
> > > the children
> > > > blah.<HRAF code="2">
> >

Trent Shipley

Personal:
mailto:tshipley@u.arizona.edu
http://www.u.arizona.edu/~tshipley/

Work:
(602) 522-7502
mailto:tshipley@symbio-tech.com
http://www.symbio-tech.com

Received on Thursday, 31 May 2001 18:30:19 UTC