RDFa issues from cdr on 2007-09-16 (semantic-web@w3.org from September 2007)

From: cdr <_@whats-your.name>
Date: Sat, 15 Sep 2007 21:48:33 -0400
To: semantic-web@w3.org
Message-ID: <20070916014833.GA2536@replic.net>

ive been wondering a few things about RDFa for a while, that i never see mentioned in spec docs, or the repeated simplistic evangelical example posts linked off rdfa.info.

i already use it and have for some time, but never have found that other tools can read my RDFa, so im not sure its even RDFa. it would be cool if things were interoperable. otherwise i might as well stop using HTML entirely and just switch to some minimal s-expressions / cairo thing that only runs on linux and doesnt have the Firefox albatross on its back...

1) is RDFa really supposed to be valid XML? why?

i spent about an hour, or longer, trying to get my html parsible by triplr/raptor, elias torres', and ivan herman's RDFa tools. adding closing tags, trailing /s in elements without closing tags, quotes around attributes etc. eventually i got bizarre errors and parser exceptions that i couldnt figure out and gave up on all 3 tools.

are there some benefits to this. if its true. (i dont consider being able to use a few strict JAVA/python XML tools as a benefit, as the tools built upon parsers that handle real-world HTML are generally more agile and lightweight and user/developer-friendly anyways)

2) its impossible to exactly roundtrip content within either innerHTML (due to the parsing/reserializing (which is itself slightly diff in each agent), and (X)HTML reserved chars), or the content attribute of an element (at the very least, you need to escape [>"'])

i guess if its perfect XML, you can exactly roundtrip the innerHTML, using some  trick or something?. my pages are all twice as big as they should be beacuse of this issue, and the duplicate data in the content attribute (which is encoded twice, in a certain order..)

how do you denote that the content attribute is plain text, vs URLencoded text, vs URLencoded JSON, Base64, etc.

ive thought about just using a triple. id prefer to shy away from @profile and other header stuff, since often the content is returned via AJAX and is not an entire page.. but merely a RDFa chunk representing some resource.

maybe the attribute name should be different. contentURLEncoded. contentBase64?

Received on Sunday, 16 September 2007 01:48:48 UTC