Re: ISSUE-41: Facebook open graph protocol

On 04/22/2010 08:38 PM, Maciej Stachowiak wrote:
> Here's my best attempts to answer based on my knowledge of HTML+RDFa:

Adding some things that may not be obvious from the current state of
affairs (re: Facebook, RDFa WG, HTML5 spec, HTML+RDFa spec):

>> 1) is the combination of HTML+RDFa plus Open Graph Protocol a
>> "vendor-neutral extensions" permitted by the current draft of the
>> HTML5 specification[1]?
> 
> I don't know, and I'm not sure if that question is relevant. HTML+RDFa
> is a "vendor-neutral extension" permitted by the current HTML5 draft.
> Open Graph Protocol, as I understand it, makes use of an extension point
> provided by HTML+RDFa (the ability to embed RDF vocabularies), therefore
> the applicable standard for Open Graph Protocol is whatever HTML+RDFa
> requires for vocabularies that use its extension points.

The Open Graph Protocol is fully backwards-compatible with XHTML+RDFa
1.0 and is fully compatible with XHTML+RDFa 1.1. It specifically uses
the following RDFa 1.0 features:

1. The 'xmlns:' attribute to declare prefix mappings.
2. The 'property' attribute to declare predicates.
3. The 'content' attribute to express data.
4. The automatic 'about' attribute on HEAD to express the subject.

After speaking with a couple of people that are close to the issue at
Facebook, this was a deliberate move on their part. There are a few
things that are currently creating confusion in the general web
community when attempting to markup RDFa in HTML5:

1. The validator.nu site hasn't been updated to support RDFa, AFAICT.
All of the RDFa attributes (and xmlns:) cause validation errors at the
moment.
2. The HTML+RDFa draft hasn't been updated to match the RDFa 1.1 rule
set. An updated heartbeat draft is in the works and we hope to publish
it by mid-May 2010.

Neither one of these items are keeping people from adding RDFa to HTML5
documents (as defined by the latest HTML5 specification).

That said, bug-free RDFa Processors will extract triples without any
issue from well-formed HTML5 documents. Bug-free RDFa Processors will
also extract triples from badly formed HTML5 documents, but the triples
aren't guaranteed to be what the author intended. Before anyone starts
to jump on this as an issue in the design of RDFa, the same holds true
for the display of the page - when you re-order elements, the display of
the page can change. When you re-order elements, the semantic meaning of
a page can change.

In most cases, it doesn't matter... and even in the cases where it does
matter, incentives like being mis-categorized by search engines or a
drop in search rank will provide incentives to fix the RDFa markup in
the page.

>> 2) I don't believe that HTML+RDFa changes any parsing rules, but does
>> change conformance rules.
>>
>> 2a) Can anybody confirm the above?
> 
> Correct. This section summarizes UA conformance for HTML+RDFa:
> <http://www.w3.org/TR/rdfa-in-html/#user-agent-conformance>.

Yes, exactly. One of the design goals of RDFa was to not change the
parsing rules of the host language. So, for SVG, ODF, XHTML and HTML, it
is the host language that defines the parsing rules.

There is, however, a slight gotcha for HTML+RDFa. The rules for building
the DOM/Infoset are currently inconsistent and confusing in relation to
xmlns:. The HTML+RDFa spec attempts to correct the inconsistency between
XML-mode and non-XML mode HTML5.

>> 2b) If so, conforming RDFa will be parsed into a DOM differently based
>> on the MIME type.  Does the RDFa draft make this clear?
> 
> The HTML+RDFa draft makes clear that it operates on the DOM level. But
> as far as I can tell it does not highlight the fact that xmlns:*
> attributes will produce different DOMs in text/html and
> application/xhtml+xml. 

There is a good bit text that covers this very issue in the latest
HTML+RDFa heartbeat.

The spec notes that xmlns: attributes MUST be preserved in the DOM (this
has been in there for over 7 months):

http://www.w3.org/TR/rdfa-in-html/#conformance-criteria-for-xmlns:-prefixed-attributes

The spec also notes that the proper namespace tuples must be created in
XML and non-XML mode in HTML5 for Infoset-based processors (this was
added in the latest heartbeat draft on March 4th 2010):

http://www.w3.org/TR/rdfa-in-html/#preserving-namespaces-via-coercion-to-infoset

It also notes that the same should occur for DOM Level2-based processors
(albeit, informatively):

http://www.w3.org/TR/rdfa-in-html/#dom-level-2-based-processors

The goal here is to ensure that the same DOM is available for XML and
non-XML-mode documents as far as xmlns: is concerned. This is to ensure
that there aren't any surprises for authors when an XML-mode HTML5
document is delivered as a non-XML mode document.

We're still waiting on feedback on this particular requirement from the
browser manufacturers.

Even if the browser manufacturers reject this attempt at namespace data
structure consistency across XML-mode and non-XML-mode HTML5, the RDFa
Community has devised a mechanism to still generate the same triples in
XML-mode and non-XML-mode HTML5.

> A while ago I filed this bug, which is a
> consequence of not explaining the DOM difference:
> <http://www.w3.org/Bugs/Public/show_bug.cgi?id=8987>. I don't believe
> there is any bug directly asking for the HTML+RDFa spec to highlight the
> difference. I encourage you to file one if you think it's important.
> 
> RDFa Core 1.1 introduces @prefix as a way to binding CURI prefixes, and
> deprecates xmlns:* attributes. HTML+RDFa is going to get updated to be
> based on RDFa Core 1.1. This issue may get addressed as part of the
> update. For RDFa Core 1.1 changes, see here and scroll down:
> <http://www.w3.org/TR/2010/WD-rdfa-core-20100422/#s_syntax>.

The Facebook announcement is interesting because the argument for a long
time was that xmlns: isn't used very much, there is a movement away from
DTD-based validation, and that compound documents via namespacing was
not a preferred path forward.

With the adoption of RDFa, we're seeing a very sharp up-tick in the use
of xmlns: and Facebook has furthered this by requiring the "fb"
namespace in Facebook application markup. This is in addition to their
use of xmlns: in the Open Graph Protocol that was released last week.

RDFa Core 1.1 provides @prefix for namespace-unaware languages (such as
HTML5). However, Facebook pages are HTML5 documents and there are a ton
of them that are going to be created now with xmlns: in them.

I still don't know how we should feel about that, but one assumption
that all of this has affected was this notion that only a small subset
of the web uses xmlns:. I think that assumption is being eroded at an
accelerated rate, although I don't know where the tipping point will be
for browser manufacturers and this Working Group.

>> 3) Does this affect the Polyglot spec?
> 
> That depends on the goals of the Polyglot spec. If the Polyglot spec's
> goal is to only allow documents that produce an identical DOM, then the
> consequence would be that polyglot documents can't use RDFa 1.0.
> However, they could use RDFa 1.1 with @profile instead of @xmlns:*.

... and they could also use @prefix. In fact, that's why we started
discussing an alternative to xmlns: - what happens when the HTML WG
decides that xmlns: is no longer supported and/or deprecated.

> However, drawing a hard line on DOM differences would lead to
> disallowing xml:lang, which might not be desirable. If the polyglot spec
> allows exceptions, then Polyglot could allow RDFa 1.0 constructs, if
> that seems like a sufficiently important use case.

Right... and bottom line: HTML+RDFa specification is designed to be
ambivalent to the preservation or removal of xmlns:

There is already a solution in RDFa if the browser vendors decide to not
harmonize the namespace triples between XML-mode and non-XML-mode.

There is already a solution in RDFa if the HTML WG decide to reject
xmlns: completely.

These solutions are independent of one another. It will be the design
goals of this working group that will drive how HTML5+RDFa looks going
forward, not the other way around.

>> 4) The current w3c validator declares this markup to be in error[2]. 
>> Is somebody planning on updating the validator to handle RDFa?
> 
> I don't know what the validator team's plans are. Note that HTML5
> doesn't require validators to support any particular set of extensions.
> It's up to the maintainers of any given validator to decide which they
> consider to be "applicable specifications". I would hope that the
> validator.nu team offers a profile including RDFa as at least one
> validation option, whether or not it is the default.

We've been very busy with publishing the RDFa Core 1.1 and XHTML+RDFa
FPWDs. The RDFa DOM API is next, followed by HTML5+RDFa 1.1 and then the
RDFa 1.1 Primer.

Once we are fairly certain that the new RDFa 1.1 attributes are
acceptable (@prefix, @vocab, and @profile), we'll contact the validator
team and see if they are willing to create an HTML+RDFa 1.1 validator.

If they are understaffed, or are not willing to do so, for whatever
reason, we'll send a request out to the RDFa community to create a
validator for HTML5+RDFa.

-- manu

-- 
Manu Sporny (skype: msporny, twitter: manusporny)
President/CEO - Digital Bazaar, Inc.
blog: PaySwarming Goes Open Source
http://blog.digitalbazaar.com/2010/02/01/bitmunk-payswarming/

Received on Monday, 26 April 2010 20:39:00 UTC