- From: Dan Brickley <danbri@danbri.org>
- Date: Mon, 12 Nov 2012 12:24:23 +0000
- To: W3C RDFa WG <public-rdfa-wg@w3.org>
- Cc: Egor Antonov <elderos@yandex-team.ru>
Dear RDFa WG, I'm looking for some advice on schema.org markup options. I hope to join the WG shortly but wanted to start a conversation as early as possible. Schema.org's markup for breadcrumbs is both popular and (currently) broken. The issue at http://www.w3.org/2011/webschema/track/issues/10 gives some backstory, but factors include Microdata's rule for concatenating subelements, as well as the difficulty of representing ordered lists of link/label pairs as simple triples without complex markup. For the purposes of this mail, I am only interested in the RDFa 1.1 possibilities. Egor (cc:'d) has made a draft of a proposal for improving our design, http://www.w3.org/wiki/WebSchemas/Breadcrumbs . This draft explores an approach that makes explicit within the extracted graph, the ordering, labelling and URLs from a 'breadcrumbs' section of HTML. I would very much like to get the RDFa WG's perspective on this issue. Looking at http://www.w3.org/TR/2012/REC-rdfa-core-20120607/#markup-fragments-and-rdfa and http://www.w3.org/TR/2012/REC-rdfa-core-20120607/#s-xml-literals it seems an alternate design might be possible with RDFa. Instead of trying to make the entire 'breadcrumb' structure explicit as a graph, we could put the whole breadcrumb into a single property value as a larger piece of markup. The current spec shows this example: <h2 property="dc:title" datatype="rdf:XMLLiteral"> E = mc<sup>2</sup>: The Most Urgent Problem of Our Time </h2> ...presumably this will be adjusted in the HTML+RDFa world. There was discussion in the RDF WG earlier this year towards HTMLLiteral or HTML as a datatype; http://lists.w3.org/Archives/Public/public-rdf-wg/2012May/0612.html and the latest drafts now have such a datatype: http://dvcs.w3.org/hg/rdf/raw-file/default/rdf-concepts/index.html#section-html http://www.w3.org/TR/2012/WD-rdf11-concepts-20120605/#section-html (latest public and editor's drafts seem identical) "5.2 The rdf:HTML Datatype RDF provides for HTML content as a possible literal value. This allows markup in literal values. Such content is indicated in an RDF graph using a literal whose datatype is a special built-in datatype rdf:HTML. This datatype is defined as follows[...]" Let's look at the older Microdata example we still publish and schema.org. Can we talk through how this might look as an HTML fragment? First, the current example: <body itemscope itemtype="http://schema.org/WebPage"> ... <div itemprop="breadcrumb"> <a href="category/books.html">Books</a> > <a href="category/books-literature.html">Literature & Fiction</a> > <a href="category/books-classics">Classics</a> </div> ... </body> Now, let's put that in RDFa 1.1, with the whole markup block as the value of the 'breadcrumb' property: <body typeof="http://schema.org/WebPage"> ... <div property="breadcrumb" datatype="rdf:HTML"> <a href="category/books.html">Books</a> > <a href="category/books-literature.html">Literature & Fiction</a> > <a href="category/books-classics">Classics</a> </div> ... </body> While this meets our goal of simple markup, I see a couple of potential problems. Firstly the name of the datatype looks a little odd from an HTML markup perspective. Secondly, the RDF spec requires that all supporting context, declarations and base URIs be packed into the markup. So the relative URIs wouldn't work. "Any language annotation (lang="…") or XML namespaces (xmlns) desired in the HTML content must be included explicitly in the HTML literal. Relative URLs in attributes such as hrefdo not have a well-defined base URL and are best avoided." My conclusion so far is that our markup would have to be either A) <body typeof="http://schema.org/WebPage"> ... <div property="breadcrumb" datatype="rdf:HTML"> <a href="http://example.com/category/books.html">Books</a> > <a href="http://example.com/category/books-literature.html">Literature & Fiction</a> > <a href="http://example.com/category/books-classics">Classics</a> </div> ... </body> B) put base="http://example.com/" in the HTML <head>. >From http://www.w3.org/TR/2012/REC-rdfa-core-20120607/#s_curieprocessing I understand that an RDFa 1.1 parser will help by resolving relative URI paths, but only for the values of the core RDFa attributes. Am I correct to understand that they will not rewrite rdf:HTML markup blocks to make URI references absolute? Apologies for the long mail, but both crawl data and schema.org site logs show that breadcrumb markup is of great interest to Web developers, so I would like to do everything possible to explore the design space while we still have some possibility to fine-tune the designs at schema.org and in the RDFa/HTML spec. Does the direction I sketch make sense, from an RDFa WG perspective? Is there anything we can do to make the markup easier for publishers and developers? Would another named markup datatype that absolute-ized relative links be feasible at this stage? Did I miss any other design options? Would more formal requirements analysis be useful? cheers, Dan
Received on Monday, 12 November 2012 12:24:50 UTC