- From: Thomas Baker <tom@tombaker.org>
- Date: Fri, 18 May 2012 01:08:12 -0400
- To: Gregg Kellogg <gregg@greggkellogg.net>
- Cc: Danny Ayers <danny.ayers@gmail.com>, Dan Brickley <danbri@danbri.org>, public-rdfa <public-rdfa@w3.org>, "hugh@hubns.com" <hugh@hubns.com>, Richard Cyganiak <richard.cyganiak@deri.org>, Jon Phipps <jphipps@madcreek.com>, Stuart Sutton <sasutton@dublincore.net>
Gregg, all, The script currently generates RDFa [4] that says, about itself: <http://purl.org/dc/terms/> <http://purl.org/dc/terms/creator> "DCMI Usage Board"@en . <http://purl.org/dc/terms/> <http://purl.org/dc/terms/description> "This document is...etc..."@en . <http://purl.org/dc/terms/> <http://purl.org/dc/terms/identifier> <http://dublincore.org/documents/2012/05/21/dcmi-terms/> . <http://purl.org/dc/terms/> <http://purl.org/dc/terms/isVersionOf> <http://dublincore.org/documents/dcmi-terms/> . <http://purl.org/dc/terms/> <http://purl.org/dc/terms/issued> "2012-05-21"^^<http://www.w3.org/2001/XMLSchema#date> . <http://purl.org/dc/terms/> <http://purl.org/dc/terms/replaces> <http://dublincore.org/documents/2010/10/11/dcmi-terms/> . <http://purl.org/dc/terms/> <http://purl.org/dc/terms/title> "DCMI Metadata Terms"@en . Issues: -- The subject should not be http://purl.org/dc/terms/, which is just one of the four namespace IRIs in DCMI Metadata Terms, but (I guess) http://purl.org/dc/. Where is the script picking up /dc/terms/ (and not, say, /elements/1.1/), and can it be tweaked to output just /dc/? -- When translated using the distiller [7], the other serializations of RDF end up saying about themselves that they are versions of /documents/.../dcmi-terms/. If the other serializations (RDF/XML and Turtle) are derived from the RDFa, I guess it is correct for them all to point to the RDFa document. I'm not sure this is best practice, but it seems reasonable. Any opinions about that here? -- The distiller-derived RDF serializations also say: <https://raw.github.com/dublincore/website/master/build/html/dcmi-terms/index.shtml> <http://purl.org/dc/terms/tableOfContents> <https://raw.github.com/dublincore/website/master/build/html/dcmi-terms/index.shtml#contents> . I guess the distiller, if run on the index.shtml after publication, would show correct values for the subject and object, but I'm flagging it as something would need to be hand-edited out of any serializations generated from the RDFa before they were published to the website. Could we perhaps simply to suppress the generation of this triple in the script? I'm working with Jon Phipps on figuring out the content negotiation piece of the puzzle, and we are guardedly optimistic that we may be able to implement this before publication, which I am postponing from Monday of next week to, say, the end of next week or even longer if we're close to a solution and just need more time. We have a few extra days to work out the bugs. One basic policy decision DCMI has taken is that we will continue, at least for now, to serve RDF/XML (or Turtle) -- not _just_ RDFa. (I'd like to hear opinions about whether Turtle should already be the new default, instead of RDF/XML.) That means that if we do not manage to get content negotiation working, we may have to point the PURLs to an RDF/XML (or Turtle) representation. The RDFa would still be there, but it would not be reachable from the PURLs -- a situation we would need to keep trying to rectify. However, if we do get content negotiation to work, we need to decide how the RDF/XML (or Turtle) will be served. Following the pattern of [9], my initial idea was to publish one consolidated RDF schema with terms sharing all four namespace IRIs [10] at http://dublincore.org/2012/05/30/dc.rdf. However, Jon thinks this break with our decade-old practice of publishing a separate schema for each of the four namespace prefixes might be confusing to data consumers. He is proposing an approach whereby PURLs using one of the four namespace IRIs would resolve to four schemas (as they do now); he may have more to say about this idea tomorrow. If we were to hear support (e.g., from this list) for the idea of publishing four (or five) schemas, I would face the very practical problem of how to generate four separate schemas from one RDFa document. I initially considered reviving the scripts used to generate the RDF/XML schemas from the common source -- we deleted these last week and I have now retrieved them from an old commit and archived them in [14] (with header files in [15]). However, the output of these scripts would need to be tested against the output of the RDFa-generating script, and the scripts would need to be edited to produce compatible output -- not just today, but potentially in the future. This does not seem like a good idea. I was hoping I could extract Turtle representations of terms, by namespace IRI, with something quite simple like: rapper -o ntriples dc.rdf | gawk '$1 ~/dc\/terms/' | rapper -i ntriples -o turtle >dcterms.ttl The script would need some sed transforms along the way to tweak the title, description, etc, but this approach would be quick and simple and we could rest assured that it would represent the RDF content of the RDFa document accurately. On the downside, the script would not output Turtle with @prefix declarations, but would use full IRIs everywhere, making it a bit less readable. But that is all theoretical because the script above simply does not work. Maybe someone here can say why? Are there are more powerful tools that could make these transformations? The basic question is whether we need to have four (or five!) separate RDF/XML and four (or five!) separate Turtle representations at all, or can instead serve up just one dc.rdf and/or one dc.ttl. What does everything think? Tom On Fri, May 11, 2012 at 10:57:56PM -0400, Gregg Kellogg wrote: > You can try using the "raw" mode [6], and use it in the distiller URI > field. Just make sure you speciify the "rdfa" input format. If it was > an actual HTML file, you probably could rely on content detection. > > You should be able to turn the result into turtle using [7]. > > There is a way to be able to view the file as formatted HTML, but I > think you need to put it in a "ghpages" branch [8]. > > > I'm a bit new to Git but proceeding carefully. Please let me know > > if there are any problems with the merge... > > > > Tom > > > > [4] https://github.com/dublincore/website/blob/master/build/html/dcmi-terms/index.shtml > > [5] https://github.com/RDFLib/pyrdfa3 > [6] https://raw.github.com/dublincore/website/master/build/html/dcmi-terms/index.shtml > [7] http://rdf.greggkellogg.net/distiller?format=turtle&in_fmt=rdfa&uri=https://raw.github.com/dublincore/website/master/build/html/dcmi-terms/index.shtml > [8] http://help.github.com/pages/ [9] http://dublincore.org/2010/10/11/dcterms.rdf# [10] https://github.com/dublincore/website/blob/master/build/html/dcmi-terms/dc.rdf [11] http://purl.org/dc/elements/1.1/ [12] http://purl.org/dc/terms/ [13] http://purl.org/dc/dcmi-type-vocabulary/ [13] http://purl.org/dc/dcam/ [14] https://github.com/dublincore/website/tree/master/archive/xsl-old [15] https://github.com/dublincore/website/tree/master/archive/headers-old -- Tom Baker <tom@tombaker.org>
Received on Friday, 18 May 2012 05:09:03 UTC