Re: recommended pattern for markup-valued 'breadcrumb' properties in RDFa

Hi Dan, all,

I think these are all good examples of what's possible. It shows that
with full RDFa, you can achieve fairly compact forms that capture a
lot of information.

In fact, even within the constaints of Lite, we can make use of an
old, deprecated friend if we want to:

    <ul property="breadcrumb" typeof="rdf:Seq">
      <li property="rdf:_1" typeof>... url and name ...</li> >
      <li property="rdf:_2" typeof>... url and name ...</a></li> >
    </ul>

But I don't think we want to.. :)

Anyway, this has been mostly about what might be palatable for an
author, depending on taste and motives (ranging from "throw something
out there" to "mark up as much information as possible").

And the important question is another. What is the breadcrumb property
of a page intended for, in terms of how consumers are supposed to use
the data? I wonder what needs there are here for rich semantics? For
instance, I can't really think of many practical cases for doing
SPARQL queries over huge datasets of (disjoint) interlinked breadcrumb
resources. :) And if I want to expose a rich and complex set of
interrelated, hierarchical pages, I would name the links between the
pages (in each distinct page) using e.g. DC or SIOC. I *may* do that
within a breadcrumb block, but if so I'd use e.g. dc:isPartOf, and not
the instrumental concept of the breadcrumb.

So for capturing the digested, solid form of a breadcrumb, I think Dan
is spot on: the rdf:HTML form is apt for the job. And I do think that
@datatype="rdf:HTML" can be palatable enough for the authors to throw
it in if prescribed by schema.org (barring that it's not RDFa Lite).

Then to consuming that data (e.g. parsing the HTML fragment afterwards
for doing something special in a service). For the relative links,
just capture the web page URL as well, so they can be resolved against
that later on. I suggest:

    <body vocab="http://schema.org/" typeof="WebPage">
      ...
      <nav property="breadcrumb" datatype="rdf:HTML">
        ...

By putting the @typeof on the <body> and/or adding an explicit
@resource="", the subject of the breadcrumb statement will be the URL
of the current page – as it should be (the page is the WebPage)! Since
that is then stored, just resolve the @href:s later on against it, and
you'll have the full URLs. Or if really necessary, add:

    <link property="url" href="">

(within the @typeof="WebPage" block) and resolve against the resulting base url.

As for language, either recommend an explicit @lang within the literal, e.g.:

    <nav property="breadcrumb" datatype="rdf:HTML">
      <ul lang="en">...

Or the addition of:

    <meta property="inLanguage" content="en">

and an assumption that captured markup is also in that language.

Regarding @datatype not being Lite, both that and @inlist comes up
from time to time. That to me begs the question: could Lite eventually
be updated? I think that should be influenced by schema.org picking
prudently from the full set of RDFa features. Which should be very
much in the spirit of letting many flowers bloom. ;)

Cheers,
Niklas

On Wed, Feb 6, 2013 at 5:48 AM, Dan Brickley <danbri@danbri.org> wrote:
> On 5 February 2013 15:10, Stéphane Corlosquet <scorlosquet@gmail.com> wrote:
>> Hi Dan,
>>
>> Like Gregg, I'm not a big fan of shoving the whole breadcrumb items into a
>> single rdf:HTML value.
>>
>> First maybe we should define what info you want to capture in the
>> breadcrumbs. I'm going to assume that you want to have the URL and the name
>> of each item, and each item should be typed with
>> http://schema.org/Breadcrumb (correct me if I'm wrong with this assumption).
>> If you're only interested in URLs or names only, the markup becomes much
>> simpler.
>
> We want all that, plus the ordering of the items too. Painful in RDF, isn't it?
>
> Dan
>
>> ## 1. @rel and no schema:url
>>
>> The simpler and shorter option involves just adding a span element inside
>> each breadcrumb item and wrapping everything with @rel and @inlist
>> attributes:
>>
>> <div vocab="http://schema.org/" typeof="WebPage">
>>     <div rel="breadcrumb" inlist="">
>>       <a typeof="Breadcrumb" href="category/books.html"><span
>> property="name">Books</span></a> >
>>       <a typeof="Breadcrumb" href="category/books-literature.html"><span
>> property="name">Literature and Fiction</span></a> >
>>     </div>
>> </div>
>>
>> which yields:
>>
>>  [ a schema:WebPage;
>>     schema:breadcrumb (<category/books.html>
>> <category/books-literature.html>)] .
>>
>> <category/books-literature.html> a schema:Breadcrumb;
>>    schema:name "Literature and Fiction" .
>>
>> <category/books.html> a schema:Breadcrumb;
>>    schema:name "Books" .
>>
>> I like this option the best because the markup is very succinct and doesn't
>> repeat any data. In this case, you don't get an explicit schema:url, but
>> instead you get this value from the URI of each breadcrumb resource (no
>> blank nodes either!).
>>
>>
>> ## 2. no @rel and no schema:url
>>
>> Like Gregg said, you can also assert @property for each item if you want to
>> avoid a wrapping @rel. This examples yields the same output as the previous
>> one:
>>
>> <div vocab="http://schema.org/" typeof="WebPage">
>>     <div>
>>       <a property="breadcrumb" typeof="Breadcrumb" inlist=""
>> href="category/books.html"><span property="name">Books</span></a> >
>>       <a property="breadcrumb" typeof="Breadcrumb" inlist=""
>> href="category/books-literature.html"><span property="name">Literature and
>> Fiction</span></a> >
>>     </div>
>> </div>
>>
>>
>> ## 3. @rel and schema:url
>>
>> If an explicit schema:url is required, it is still possible at the expense
>> of more markup:
>>
>> <div vocab="http://schema.org/" typeof="WebPage">
>>     <div rel="breadcrumb" inlist="">
>>       <span typeof="Breadcrumb"><a property="url"
>> href="category/books.html"><span property="name">Books</span></a></span> >
>>       <span typeof="Breadcrumb"><a property="url"
>> href="category/books-literature.html"><span property="name">Literature and
>> Fiction</span></a></span>
>>     </div>
>> </div>
>>
>> which yields
>>
>>  [ a schema:WebPage;
>>     schema:breadcrumb ([ a schema:Breadcrumb;
>>         schema:name "Books";
>>         schema:url <category/books.html>] [ a schema:Breadcrumb;
>>         schema:name "Literature and Fiction";
>>         schema:url <category/books-literature.html>])] .
>>
>>
>> ## 4. no @rel with schema:url
>>
>> Finally, the previous example with explicit schema:url also works with
>> inline @property attributes and gives the same output:
>>
>> <div vocab="http://schema.org/" typeof="WebPage">
>>     <div>
>>       <span property="breadcrumb" typeof="Breadcrumb" inlist=""><a
>> property="url" href="category/books.html"><span
>> property="name">Books</span></a></span> >
>>       <span property="breadcrumb" typeof="Breadcrumb" inlist=""><a
>> property="url" href="category/books-literature.html"><span
>> property="name">Literature and Fiction</span></a></span> >
>>     </div>
>> </div>
>>
>>
>> In conclusion, could you live without an explicit schema:url? It does reduce
>> the amount of markup quite a bit, which is quite crucial in the context of
>> breadcrumbs where there can be a lot of items.
>>
>> Re. Egor's proposal where each child item is wrapped into its parent, I'm
>> not sure the HTML for that is very intuitive, I'd prefer to just have a flat
>> bunch of elements rather than nesting them in HTML, it's less error prone
>> IMO. His first argument was "Current breadcrumbs cannot be stored in an
>> unordered storage like JSON", but afaik, JSON can preserve order. The second
>> argument is about multiple breadcrumb chains, and I admit it's a valid
>> argument to have hierarchies of breadcrumb items in this scenario, but is
>> this use case very popular in the reality? could we see some examples?
>>
>> HTH,
>> Steph.
>>
>>
>>
>> On Mon, Nov 12, 2012 at 1:40 PM, Gregg Kellogg <gregg@greggkellogg.net>
>> wrote:
>>>
>>> On Nov 12, 2012, at 4:24 AM, Dan Brickley <danbri@danbri.org> wrote:
>>>
>>> > Dear RDFa WG,
>>> >
>>> > I'm looking for some advice on schema.org markup options. I hope to
>>> > join the WG shortly but wanted to start a conversation as early as
>>> > possible.
>>> >
>>> > Schema.org's markup for breadcrumbs is both popular and (currently)
>>> > broken. The issue at http://www.w3.org/2011/webschema/track/issues/10
>>> > gives some backstory, but factors include Microdata's rule for
>>> > concatenating subelements, as well as the difficulty of representing
>>> > ordered lists of link/label pairs as simple triples without complex
>>> > markup. For the purposes of this mail, I am only interested in the
>>> > RDFa 1.1 possibilities.
>>> >
>>> > Egor (cc:'d) has made a draft of a proposal for improving our design,
>>> > http://www.w3.org/wiki/WebSchemas/Breadcrumbs . This draft explores an
>>> > approach that makes explicit within the extracted graph, the ordering,
>>> > labelling and URLs from a 'breadcrumbs' section of HTML.
>>> >
>>> > I would very much like to get the RDFa WG's perspective on this issue.
>>>
>>> Well, I can give you my perspective on this issue. From a Linked Data/RDF
>>> perspective, I would expect to see breadcrumbs to give me an ordered list of
>>> links to the relevant resources, not HTML markup that has meaning only to a
>>> human.
>>>
>>> >From a Microdata+RDF perspective, schema:breadcrumbs is described as a
>>> property having an ordered list of values, so that parsing the following
>>> yields a list in Turtle:
>>>
>>> <div itemscope itemtype="http://schema.org/WebPage">
>>>     <div itemprop="breadcrumb">
>>>       <a href="category/books.html">Books</a> >
>>>       <a href="category/books-literature.html">Literature and Fiction</a>
>>> >
>>>       <a href="category/books-classics">Classics</a>
>>>     </div>
>>> </div>
>>>
>>> @prefix md: <http://www.w3.org/ns/md#> .
>>> @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
>>> @prefix rdfa: <http://www.w3.org/ns/rdfa#> .
>>> @prefix schema: <http://schema.org/> .
>>>
>>> <> md:item ([ a schema:WebPage;
>>>        schema:breadcrumb ("""
>>>       Books >
>>>       Literature and Fiction >
>>>       Classics
>>>     """)]);
>>>    rdfa:usesVocabulary schema: .
>>>
>>> The intention was for each link to be a URI in this list, so you could do
>>> the following, instead:
>>>
>>> <div itemscope itemtype="http://schema.org/WebPage">
>>>     <div>
>>>       <a itemprop="breadcrumb" href="category/books.html">Books</a> >
>>>       <a itemprop="breadcrumb"
>>> href="category/books-literature.html">Literature and Fiction</a> >
>>>       <a itemprop="breadcrumb" href="category/books-classics">Classics</a>
>>>     </div>
>>> </div>
>>>
>>> Which would give you:
>>>
>>> @prefix md: <http://www.w3.org/ns/md#> .
>>> @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
>>> @prefix rdfa: <http://www.w3.org/ns/rdfa#> .
>>> @prefix schema: <http://schema.org/> .
>>>
>>> <> md:item ([ a schema:WebPage;
>>>        schema:breadcrumb (<category/books.html>
>>> <category/books-literature.html> <category/books-classics>)]);
>>>    rdfa:usesVocabulary schema: .
>>>
>>> In RDFa 1.1 (not Lite), you can do this with @inlist and @rel:
>>>
>>> <div vocab="http://schema.org/" typeof="WebPage">
>>>     <div rel="breadcrumb" inlist>
>>>       <a  href="category/books.html">Books</a> >
>>>       <a href="category/books-literature.html">Literature and Fiction</a>
>>> >
>>>       <a href="category/books-classics">Classics</a>
>>>     </div>
>>> </div>
>>>
>>> Giving:
>>>
>>> @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
>>> @prefix rdfa: <http://www.w3.org/ns/rdfa#> .
>>> @prefix schema: <http://schema.org/> .
>>>
>>> <> rdfa:usesVocabulary schema: .
>>>
>>>  [ a schema:WebPage;
>>>     schema:breadcrumb (<category/books.html>
>>> <category/books-literature.html> <category/books-classics>)] .
>>>
>>> In RDFa 1.1 Lite, you'd need to use @property and repeat both @proprty on
>>> each <a>. but I don't think @inlist is officially part of RDFa 1.1 Lite.
>>>
>>> > Looking at
>>> > http://www.w3.org/TR/2012/REC-rdfa-core-20120607/#markup-fragments-and-rdfa
>>> > and http://www.w3.org/TR/2012/REC-rdfa-core-20120607/#s-xml-literals
>>> > it seems an alternate design might be possible with RDFa. Instead of
>>> > trying to make the entire 'breadcrumb' structure explicit as a graph,
>>> > we could put the whole breadcrumb into a single property value as a
>>> > larger piece of markup. The current spec shows this example:
>>> >
>>> > <h2 property="dc:title" datatype="rdf:XMLLiteral">
>>> >  E = mc<sup>2</sup>: The Most Urgent Problem of Our Time
>>> > </h2>
>>> >
>>> > ...presumably this will be adjusted in the HTML+RDFa world. There was
>>> > discussion in the RDF WG earlier this year towards HTMLLiteral or HTML
>>> > as a datatype;
>>> > http://lists.w3.org/Archives/Public/public-rdf-wg/2012May/0612.html
>>> > and the latest drafts now have such a datatype:
>>> >
>>> >
>>> > http://dvcs.w3.org/hg/rdf/raw-file/default/rdf-concepts/index.html#section-html
>>> > http://www.w3.org/TR/2012/WD-rdf11-concepts-20120605/#section-html
>>> > (latest public and editor's drafts seem identical)
>>>
>>> Not a fan of this use case, but I believe that our intention is to support
>>> rdf:HTML in HTML+RDFa 1.1, certainly my processor does.
>>>
>>> > "5.2 The rdf:HTML Datatype
>>> >
>>> > RDF provides for HTML content as a possible literal value. This allows
>>> > markup in literal values. Such content is indicated in an RDF graph
>>> > using a literal whose datatype is a special built-in datatype
>>> > rdf:HTML. This datatype is defined as follows[...]"
>>> >
>>> > Let's look at the older Microdata example we still publish and
>>> > schema.org. Can we talk through how this might look as an HTML
>>> > fragment?
>>> >
>>> > First, the current example:
>>> >
>>> > <body itemscope itemtype="http://schema.org/WebPage">
>>> > ...
>>> > <div itemprop="breadcrumb">
>>> >  <a href="category/books.html">Books</a> >
>>> >  <a href="category/books-literature.html">Literature & Fiction</a> >
>>> >  <a href="category/books-classics">Classics</a>
>>> > </div> ...
>>> > </body>
>>> >
>>> > Now, let's put that in RDFa 1.1, with the whole markup block as the
>>> > value of the 'breadcrumb' property:
>>> >
>>> > <body typeof="http://schema.org/WebPage">
>>> > ...
>>> > <div property="breadcrumb" datatype="rdf:HTML">
>>> >  <a href="category/books.html">Books</a> >
>>> >  <a href="category/books-literature.html">Literature & Fiction</a> >
>>> >  <a href="category/books-classics">Classics</a>
>>> > </div> ...
>>> > </body>
>>> >
>>> >
>>> > While this meets our goal of simple markup, I see a couple of
>>> > potential problems. Firstly the name of the datatype looks a little
>>> > odd from an HTML markup perspective.  Secondly, the RDF spec requires
>>> > that all supporting context, declarations and base URIs be packed into
>>> > the markup. So the relative URIs wouldn't work.
>>> >
>>> > "Any language annotation (lang="…") or XML namespaces (xmlns) desired
>>> > in the HTML content must be included explicitly in the HTML literal.
>>> > Relative URLs in attributes such as hrefdo not have a well-defined
>>> > base URL and are best avoided."
>>> >
>>> > My conclusion so far is that our markup would have to be either
>>> >
>>> > A)
>>> > <body typeof="http://schema.org/WebPage">
>>> > ...
>>> > <div property="breadcrumb" datatype="rdf:HTML">
>>> >  <a href="http://example.com/category/books.html">Books</a> >
>>> >  <a href="http://example.com/category/books-literature.html">Literature
>>> > & Fiction</a> >
>>> >  <a href="http://example.com/category/books-classics">Classics</a>
>>> > </div> ...
>>> > </body>
>>> >
>>> > B) put base="http://example.com/" in the HTML <head>.
>>> >
>>> >> From
>>> >> http://www.w3.org/TR/2012/REC-rdfa-core-20120607/#s_curieprocessing
>>> > I understand that an RDFa 1.1 parser will help by resolving relative
>>> > URI paths, but only for the values of the core RDFa attributes. Am I
>>> > correct to understand that they will not rewrite rdf:HTML markup
>>> > blocks to make URI references absolute?
>>>
>>> URI expansion comes from HTML semantics, and works with any attributes
>>> that takes a URL (although it is somewhat broken for @href and @src in
>>> HTML5).
>>>
>>> > Apologies for the long mail, but both crawl data and schema.org site
>>> > logs show that breadcrumb markup is of great interest to Web
>>> > developers, so I would like to do everything possible to explore the
>>> > design space while we still have some possibility to fine-tune the
>>> > designs at schema.org and in the RDFa/HTML spec.
>>> >
>>> > Does the direction I sketch make sense, from an RDFa WG perspective?
>>> > Is there anything we can do to make the markup easier for publishers
>>> > and developers? Would another named markup datatype that absolute-ized
>>> > relative links be feasible at this stage? Did I miss any other design
>>> > options? Would more formal requirements analysis be useful?
>>>
>>> Another possibility would be to use BNodes for each element, with
>>> schema:name and schema:url, which would give you something like the
>>> following:
>>>
>>> @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
>>> @prefix rdfa: <http://www.w3.org/ns/rdfa#> .
>>> @prefix schema: <http://schema.org/> .
>>>
>>> <> rdfa:usesVocabulary schema: .
>>>
>>>  [ a schema:WebPage;
>>>     schema:breadcrumb (
>>>       [a schema:Breadcrumb; schema:url <category/books.html>; schema:name
>>> "Books" ]
>>>       [a schema:Breadcrumb; schema:url <category/books-literature.html>;
>>> schema:name "Literature" ]
>>>       [a schema:Breadcrumb; schema:url <category/books-classics>;
>>> schema:name "Classics"
>>>   )] .
>>>
>>> That could fallout with reasonable application of @inlist, and @typeof. It
>>> could work in Microdata too, with greater use of @itemscope and @itemtype.
>>>
>>> Gregg
>>>
>>> > cheers,
>>> >
>>> > Dan
>>> >
>>>
>>>
>>
>>
>>
>> --
>> Steph.
>

Received on Wednesday, 6 February 2013 22:40:45 UTC