Preliminary HTML Note

I created the HTML Note [1] with two sections, the first on simply embedding metadata in a script tag, and the second on extracting tabular data from HTML tables. Note that this is a sketch of how to do such an implementation without extremely detailed procedure for processing the HTML tables, but I think the intention is pretty clear. (This also uses the experimental 2016 W3C style).

Note that the fact that fragment identifiers are used to identify the tables means that when generating URLs for JSON and RDF output, there is a conflict between the table fragment, and the row and/or property fragments. As a best practice, set aboutUrl, propertyUrl, and valueUrl to not depend on fragment URL generation, and use minimal rather than standard processing.

The content of the note can itself be processed to extract the JSON and RDF shown in the final two examples. This is implemented in an unreleased version of my rdf-tabular Ruby gem. Other than some assumptions on when an HTML document is parsed for metadata (no fragment identifier) or for the tabular data content (a fragment identifier identifying a particular table), the extra work to get content from HTML Tables is limited to the same places we extract titles and row data from CSV.

Suggestions on improving the narrative content and technical details are welcome.

Gregg Kellogg
gregg@greggkellogg.net

[1] http://w3c.github.io/csvw/html-note/
[2] https://github.com/ruby-rdf/rdf-tabular

Received on Saturday, 28 November 2015 22:30:29 UTC