[Fwd: Re: [whatwg] Trying to work out the problems solved by RDFa] from Dan Brickley on 2009-01-10 (www-archive@w3.org from January 2009)

From: Dan Brickley <danbri@danbri.org>
Date: Sat, 10 Jan 2009 16:01:34 +0100
To: Steven Pemberton <steven.pemberton@cwi.nl>
Cc: "www-archive@w3.org" <www-archive@w3.org>, Libby Miller <libby@nicecupoftea.org>
Message-ID: <4968B84E.3080603@danbri.org>
Hi Steven,

(cc www-archive, libby)

Re the alumni/people page scenario, I asked on the whatwg list about 
whether html5 is attempting any particular mechanism for saying which 
bits of a page are 'comments' or untrusted. But it seems from Toby's 
reply that RDFa is quite handy here.

I've been thinking about how one might use the hypertext path from 
http://www.w3.org/ to /People and ..etc/Alumni to indicate that they 
have the same creator/publisher.

1st idea - use a custom relation like 'alumniPage'
2nd idea - generalise that - 'staffInfoPage', 'aboutOrg page'
3rd idea - generalise further - use RDF to state that those pages have a 
dc:creator / foaf:maker which is the organization W3C
4th idea - use POWDER to claim that all pages matching some URI prefix 
have these properties

I think 4. is probably the way to go, but haven't dug into current state 
of POWDER. The others would cause needless proliferation of properties 
and clutter each hyperlink with additional link-typing annotations.

This would allow some Org (companies, nonprofits, whatever) to say in 
RDF on their homepage "all HTML pages whose URI matches 
http://eg.example.com/aboutus/*html" are pages whose foaf:maker is the 
organization whose homepage is http://eg.example.com/ and whose name is 
"E.G. Org.".

The point of this being that we need a way of picking out those pages 
(and pieces of pages) whose provenance/source is the main publisher, 
versus other things on the site (or in the page) that might be user 
supplied. On w3.org, the msgid: proxy that includes all of lists.w3.org 
into www.w3.org is a good use case; but also various W3C-linked people, 
WG/IG members etc., have write access to bits of the site.

In parallel to this I'm still exploring the xmldsig route. Here is a 
test (linked by wot:assurance from foaf.rdf) signing of my foaf file:
http://danbri.org/foaf.rdf.sigdata ... although done with a random 
generated key that I didn't write the java code to manage properly.

Use case for that is: how do we know whether to believe the foaf:tipjar 
property claim in http://danbri.org/foaf.rdf and buy danbri a book?

Hope this makes some sense! So I think next step is to check out POWDER. 
http://www.w3.org/TR/2008/WD-powder-primer-20081114/

I think they're using GRDDL due to the need to include quoted fragments 
of full RDF within each site 'label', something that's ugly to do in 
pure RDF (we tried in the earlier WCL design)...

cheers,

Dan


-------- Original Message --------
Subject: Re: [whatwg] Trying to work out the problems solved by RDFa
Date: Sat, 10 Jan 2009 13:51:26 +0000
From: Toby A Inkster <mail@tobyinkster.co.uk>
To: whatwg@lists.whatwg.org

Dan Brickley wrote:

> While I'm unsure about the "commercial relationship" clause quite
> capturing what's needed, the basic idea seems sound. Is there any
> provision (or plans) for applying this notion to entire blocks of
> markup, rather than just to simple hyperlinks? This would be rather
> useful for distinguishing embedded metadata that comes from the page
> author from that included from blog comments or similar.

While that might be useful for natural language processing, for RDFa
it is actually completely unneeded. The syntax of RDFa allows for
blocks of markup to be made "invisible" by making an ancestor node
into an XMLLiteral.

For example, a comment might be marked up as:

<section typeof="atom:Entry" xmlns:foaf="http://xmlns.com/foaf/0.1/"
xmlns:atom="http://bblfish.net/work/atom-owl/2006-06-06/#">
   <address rel="atom:author">
     On <time property="atom:published" content="2009-01-10"
     >10 Jan 2009</time>,
     <a property="foaf:name" rel="foaf:page"
     href="http://joe.example.com">Joe Bloggs</a> wrote:
   </address>
   <div rel="atom:content">
     <blockquote property="atom:xhtml">
       <!-- The comment goes here. -->
     </blockquote>
   </div>
</section>

The RDFa processing instructions say that as the blockquote doesn't
have an explicit datatype set, it is to be treated entirely as a
string literal (if it doesn't have any child elements) or an XML
literal (if it does), and that parsers must not look inside it for
triples. Thus spammers can't use the comment form for stuffing
triples into the page.

It should be noted in this case that RDFa also allows natural
language parsers to be made more useful. By looking at the RDFa which
marks up the author's name and website, they may be able to determine
that the comment has been written by someone other than the page's
main author, and thus not afford it the same level of trust granted
to the rest of the page. So the natural language processing can
benefit from RDFa.

-- 
Toby A Inkster
<mailto:mail@tobyinkster.co.uk>
<http://tobyinkster.co.uk>
Received on Saturday, 10 January 2009 15:02:14 UTC