Re: RDF in XHTML [was: Re: Authors describing what their URIs mean] from Murray Altheim on 2001-04-16 (www-rdf-interest@w3.org from April 2001)

From: Murray Altheim <altheim@eng.sun.com>
Date: Mon, 16 Apr 2001 01:34:06 -0700
To: Seth Russell <seth@robustai.net>
CC: Danny Ayers <danny@panlanka.net>, "Sean B. Palmer" <sean@mysterylights.com>, Joshua Allen <joshuaa@microsoft.com>, RDF Interest <www-rdf-interest@w3.org>
Message-ID: <3ADAAE7E.A7B67B7@eng.sun.com>
Seth Russell wrote:
> 
> From: "Sean B. Palmer" <sean@mysterylights.com>
> 
> > Yes, because RDF in (X)HTML does not conform to any grammar
> > specifically published by the W3C as a recommendation to this date
> > (unless you create a grammar yourself, a la XHTML Modularization [1]).
> 
> Thanks for the pointers, I think I understand the technical side of this
> now,  it's the political side that is still giving me head aches.
> 
> Shouldn't there *already exist* a general purpose module written to the
> Modularization Spec [1] which defines a document type such that authors can
> embedded the RDF description of their resources in those resources?  Why
> can't I find it?
> 
> [1]  http://www.w3.org/TR/xhtml-modularization

XHTML modularization is designed to enable creation of variants of XHTML
and combinations of XHTML and other markup languages, as described by
DTDs. It's there to enable validation as defined in XML 1.0. RDF by its
use of namespaces makes DTD validation very difficult, since it mixes
markup and content and forces authors (not document type designers) to
design markup. I think specific applications of RDF (such as DC) make
sense as XHTML modules because validation to some people is important.

I long ago gave up proselytizing validation; if you don't feel it's 
important, go right ahead and create well-formed XML. I find validation
in most instances to be worth the trouble, like I find a Java debugger
helpful. If you've got an XHTML plus SVG document and its acting up,
I hope you have a DTD.
 
> I mean the W3C has already endorsed this practice:
> 
>      "This document has been reviewed by W3C Members
>        and other interested parties and has been endorsed by
>        the Director as a W3C Recommendation. "
>
> What am I missing?

I'm not sure if you're being sarcastic or not. You're missing nothing.
There's nothing in the XHTML modularization scheme designed to allow
well-formed markup to coexist in a validation framework. We could add
a module that had content models of "ANY", but that still requires 
that the element types be declared, and this for RDF designed by 
authors isn't much of a help. I'd like to point you to XML Schemas as
a solution, but the markup complexity that XML namespaces introduces
is going to make validation of six or seven author-designed namespaces
by *any* methodology extremely difficult. XHTML modularization doesn't
try to tackle such a problem; only well-defined markup languages that
each have their own DTD are mixed.

> > > There must be some way for authors to correctly describe their
> > > web documents with RDF embedded in the documents.  The
> > > problem is not that this particular example fails validation, the
> > > problem is how to embed RDF in a www document
> > > correctly.

Well-formed, unless you're using a well-established RDF application
like Dublin Core. We could add an <rdf> element into <head>, but we'd
still have the "ANY" problem described above.

> > Let's see... how many tools can extract RDF from HTML at the moment?
> 
> Well for one, Jason Diamond's RepatCOM.RdfReader [3] which is being used by
> our Sembrowser [4] will extract triples from the page ... you can point it
> at the example [5] and verify this for yourself.  I would hazard a guess
> that Redlands works as well.
> 
> [3] http://www.injektilo.org/
> [4] http://groups.yahoo.com/group/sem-dev
> [5] http://robustai.net/rdf-commons/ex7.html
> 
> > How many can scrape it from RDDL links, or the HTML <link/> element? I
> > think the answer varies between "none" and "very few". The main thing
> > that is holding us all back is the lack of validation: it is simply
> > next to impossible, unless you use the recent XHTML m12n
> > recommendation [1].
> 
> Well I like Joshua Allen's more practical approach to this, see [6].  I
> don't think that validation is all that important in this rapidly growing
> field ... it appears, to me, to be just an excuse to hold back.  The need
> for authors to describe the semantics of their web pages eclipses the desire
> for a page to be validated by a couple orders of magnitude.   When we get
> the authors descriptions on their pages, then the bookmarking process itself
> will be able to remember the bookmarked page in semantic memory permitting
> smart retrievals.  But when we get the pages to validate (which we will
> someday), very few people will even know or care.

You obviously don't work at a big company or a government agency. I guess
we'll just have to disagree on this one. Describing the semantics of their
web pages will likely be quite a bit more difficult if the markup isn't 
valid. In fact, a lot more difficult. I've got a small Java application 
based on xerces that won't work at all if the markup is not well-formed
(given it uses an XML parser), and produces some very strange results on
test documents that had validation errors. If you feel you can trust the
operations of your business to such a situation, go right ahead. But I'll
not follow. Not all markup is unimportant content-wise. What if 23 pages
of an insurance policy were missing from a report because of a markup
error (something I've seen in the QA dept. before). 

> [6] http://lists.w3.org/Archives/Public/www-rdf-interest/2001Apr/0204.html
> 
> >FWIW, I published a very simple module for Dublin
> > Core in XHTML ages ago [2]. Note that this might not be a valid XHTML
> > Module. I'll update it if anyone really wants me to.
> 
> see [2] http://xhtml.waptechinfo.com/modules/rdf/rdf.mod
> 
> This is great ... proves that we can do it ... right?   But, me thinks, we
> need a general purpose RDF module ... one where we can just include whatever
> namespace we want to .. it's not practical for each author who wants to use
> a RDF term from a new schema, to make up another DTD.   Hasn't this
> Modularization technique provided for defining the RDF element in such a way
> that every namespace defined within the RDF in the document is automatically
> valid?

Perhaps another look at the definition of XML validation would answer this
one. It's a non-sequitor. Even XML Schemas won't allow this.

> > Another idea is to simply link to the RDF file, either using a
> > HyperText link, or a document metadata profile giving an appropriate
> > link type.
> 
> I suppose that would work as well just as long as it became the standard
> method for a automated agent to find the author's description of the page
> and what it's URI signifies.  But I doubt that it will be useful for us to
> invent something ad hoc; rather we need to get everybody to do it the same
> way, and that means the W3C needs to stand up and say .. do it this way.
> 
> Yet I don't even see this on their issue tracking page :(

Well, the W3C has been developing HTML for about four or five years now
(kinda lost track). I don't think you're going to see a stronger set of
semantic elements inside of XHTML, nor do I see a trend toward improving
XHTML itself in this direction. The emphasis is on other XML markup 
languages, and in trying to embed them in XHTML. It's a bit sad, because
the entire web community, which must now number in the millions, have 
read up on HTML and can make web pages. It'd be an order of magnitude 
easier to get them to use <author> and <abstract> elements than to use 
some funky namespace markup, but the direction isn't that way. Sorta 
like one of those Disney movies of somebody who is instrumental to the
community being ignored once the community has "arrived."

Murray

...........................................................................
Murray Altheim                            <mailto:altheim&#x40;eng.sun.com>
XML Technology Center
Sun Microsystems, Inc., MS MPK17-102, 1601 Willow Rd., Menlo Park, CA 94025

      In the evening
      The rice leaves in the garden
      Rustle in the autumn wind
      That blows through my reed hut.  -- Minamoto no Tsunenobu
Received on Monday, 16 April 2001 04:11:16 UTC