Fragment Identifiers and Agent Perspectives

Jonathan Rees and I have been having an ongoing discussion about
fragment identifiers, RDFa and semantics for the better part of a year
now. Every now and then the TAG requests that the RDFa WG do something
about it, we have a very long discussion about it, and resolve to not do
anything because it's not in the RDFa/RDFWA charter to make changes to
specs like RFC 3023-bis, RFC 3986, and HTML5.

Our latest discussion touched on HTTP Range-14, RFC 3023-bis, and how
pedants should be able to follow their nose from a Media Type
registration to how a fragment identifier should be interpreted. This is
complicated further by the way that the RDFa Core specification is
designed. Much like XML namespaces, xml:id, and ARIA - it's just a bunch
of attributes and processing rules. There is no Media Type registration
for it and thus the only way to follow your nose back to the RDFa Core
spec is through the other specs that integrate it into the language.

For SVG Tiny it is:

image/svg+xml -> SVG Tiny 1.2 spec -> XHTML+RDFa 1.1

For HTML5, it could be (with a few changes to the spec):

text/html Media Type -> HTML5 -> HTML+RDFa 1.1 -> RDFa Core 1.1

Our latest discussion resolved to push many of these issues back to the
TAG because we are in no position to actually make the necessary changes
- it's just not in our charter:

http://www.w3.org/2010/02/rdfa/meetings/2011-10-06#resolution_1
http://www.w3.org/2010/02/rdfa/meetings/2011-10-06#resolution_2
http://www.w3.org/2010/02/rdfa/meetings/2011-10-06#resolution_3

Part of the reason that we're going to this trouble is to finally
establish what a fragment identifier means when used in a document.
Jonathan stated that the TAG may update RFC 3986 to clarify what a
fragment identifier means - that's a good idea. You're also going to
have to make sure that all specs utilizing RDFa update the Media Type
registrations to achieve the spec-to-spec jumping that is required to
understand how a fragment identifier is interpreted.

In an ideal world, I would like fragment identifiers to be interpreted
in the same way that they're interpreted in HTML and XHTML - that is,
they identify a fragment of the document /or/ a concept in a document.

The key word above being "OR". Jonathan Rees has called this "dual use"
in the past. We've been grasping for terminology to use when describing
how to interpret fragment identifiers. This may help lay a foundation:

What a fragment identifier means is dependent on the Agent's
Perspective. The Agent could be a User Agent, or it could be Semantic
Agent. How the fragment identifier is interpreted is based entirely on
who is asking the question.

That is, what the Agent sees is entirely dependent on who they are. For
example, when a User Agent sees a fragment identifier, they're looking
for a portion of a document to jump to. When a Semantic Agent sees a
fragment identifier, they're looking for a concept contained in the
document. Therefore, for this URL:

http://example.com/foo#bar

A User Agent processing an HTML5 document would be looking for id="bar".

A Semantic Agent processing an HTML5+RDFa document would be looking for
the concept described using about="#bar".

There are times where someone could do the following:

<div id="bar" about="#bar">...</div>

However, given the definition above for Agent Perspectives - the meaning
of the fragment identifier is clarified by who is asking the question.
The answer, in both cases, is crystal clear - but not the same - and
that's perfectly fine.

Now, people may argue that having an entirely different set of
identifiers for semantic purposes would have been better. For example,
if we could go back 20 years and do this:

http://example.com/foo@bar
http://example.com/foo#bar

We would know that @bar is a semantic concept identifier and #bar is a
document fragment identifier. Unfortunately, that's not the reality that
we have. We started overloading fragment identifiers for the Semantic
Web a long time ago and we can't change that practice now without
breaking the Semantic Web. However - because we didn't also overload @id
in RDFa, we have a clear path forward using the concept of an Agent
Perspective.

Is there anywhere that this conceptual framework falls down? For
example, RDF/XML? or XML+RDFa? If not, I believe that this is the
correct conceptual framework for fragment identifier interpretation for
the Web.

-- manu

-- 
Manu Sporny (skype: msporny, twitter: manusporny)
President/CEO - Digital Bazaar, Inc.
Standardizing Payment Links
http://manu.sporny.org/2011/payment-links/

Received on Friday, 7 October 2011 22:32:31 UTC