RDFa DOM API remote document loading

We have an issue related to RDFa data and CORS.

Assumptions:

1. It would be really nice if we could follow-your-nose using the RDFa
   DOM API. This would lead to some very cool applications that can
   perform auto-discovery of more facts by extracting RDFa in the
   local document and then loading remote URLs to discover more data.
2. This is impossible to do in Javascript unless the remote document
   is CORS-enabled.
3. This is impossible to do securely in a native browser implementation
   because of XSS RDFa data-based attacks.
4. Most web developers aren't going to CORS-enable their documents.
5. It would be very bad for us to ignore the XSS RDFa data-based
   attacks. If we bypass CORS, and only return triples back to the
   Javascript environment, we still have the issue where some very
   important triples (real name, friends lists, bank account numbers,
   etc.) are extract-able via RDFa.

Initially, we had been concerned about loading @profile documents. This
was somewhat mitigated by depending on CORS to fetch a @profile
document... but what's to stop someone from using the DOM to insert a
@profile to a bank page, and then requesting the browser to parse the
triples from the @profile page. Granted, this is an example of a bad
implementation, we should prevent people from implementing @profile this
way - triples should never leak into the current document from other
documents.

So, we can either not allow remote-loading of RDFa documents - which
would make this a non-issue.

Or, we could think about how to allow remote-loading of RDFa documents,
without CORS, in a way that would allow the protection of data on the
remote site.

RDFa is in a unique position - the rest of the web needs CORS because of
all of the data that is already available as HTML. However, RDFa is just
starting and all the RDFa markup is fairly new.

We could introduce a new attribute that would basically mean "if the
originating site isn't the same as this site, hide this information".
Let's call it @remoteaccess

So, for the following markup at ("http://foobar.org/remote"):

<div about="#me">
...
   <span property="foaf:nick">yakko</span>
...
   <div remoteaccess="disallow">
   ...
   </div>
...
</div>

This would effectively mean that any triples that are generated in the
inner-most div with the @remoteaccess property set to "disallow" won't
be exposed by the User Agent. So, if one were to do this from a page in
a site like http://example.org/:

document.data.parse("http://foobar.org/remote");

The following triple would be placed into the local data store:

<http://foobar.org/remote#me> foaf:nick "yakko" .

but any triple in the disallowed section wouldn't be placed in the local
data store.

Thoughts?

-- manu

-- 
Manu Sporny (skype: msporny, twitter: manusporny)
President/CEO - Digital Bazaar, Inc.
blog: Bitmunk 3.2.2 - Good Relations and Ditching Apache+PHP
http://blog.digitalbazaar.com/2010/05/06/bitmunk-3-2-2/2/

Received on Tuesday, 18 May 2010 22:40:09 UTC