Re: Fragment Identifiers and Agent Perspectives from Martin J. Dürst on 2011-10-11 (www-tag@w3.org from October 2011)

From: Martin J. Dürst <duerst@it.aoyama.ac.jp>
Date: Tue, 11 Oct 2011 10:22:33 +0900
To: Manu Sporny <msporny@digitalbazaar.com>
CC: W3C TAG <www-tag@w3.org>
Message-ID: <4E939A59.8020905@it.aoyama.ac.jp>
This is in some way a follow-up on Roy's message, with some more 
details. But I started it yesterday, so it's not a reply to Roy's message.

On 2011/10/08 7:32, Manu Sporny wrote:
> Jonathan Rees and I have been having an ongoing discussion about
> fragment identifiers, RDFa and semantics for the better part of a year
> now. Every now and then the TAG requests that the RDFa WG do something
> about it, we have a very long discussion about it, and resolve to not do
> anything because it's not in the RDFa/RDFWA charter to make changes to
> specs like RFC 3023-bis, RFC 3986, and HTML5.
>
> Our latest discussion touched on HTTP Range-14, RFC 3023-bis, and how
> pedants should be able to follow their nose from a Media Type
> registration to how a fragment identifier should be interpreted. This is
> complicated further by the way that the RDFa Core specification is
> designed. Much like XML namespaces, xml:id, and ARIA - it's just a bunch
> of attributes and processing rules. There is no Media Type registration
> for it and thus the only way to follow your nose back to the RDFa Core
> spec is through the other specs that integrate it into the language.
>
> For SVG Tiny it is:
>
> image/svg+xml ->  SVG Tiny 1.2 spec ->  XHTML+RDFa 1.1
>
> For HTML5, it could be (with a few changes to the spec):
>
> text/html Media Type ->  HTML5 ->  HTML+RDFa 1.1 ->  RDFa Core 1.1
>
> Our latest discussion resolved to push many of these issues back to the
> TAG because we are in no position to actually make the necessary changes
> - it's just not in our charter:
>
> http://www.w3.org/2010/02/rdfa/meetings/2011-10-06#resolution_1
> http://www.w3.org/2010/02/rdfa/meetings/2011-10-06#resolution_2
> http://www.w3.org/2010/02/rdfa/meetings/2011-10-06#resolution_3
>
> Part of the reason that we're going to this trouble is to finally
> establish what a fragment identifier means when used in a document.
> Jonathan stated that the TAG may update RFC 3986 to clarify what a
> fragment identifier means - that's a good idea.

The TAG cannot update RFC 3986. The IETF can update it. Of course, 
members of the TAG can help with that update. But given it's an Internet 
Standard, and given the update seems to be just about a few words, and 
given the goal is that even "pedants should be able to follow their 
nose", I'd expect quite a bit of resistance (if only passive, but that 
might be enough).

> You're also going to
> have to make sure that all specs utilizing RDFa update the Media Type
> registrations to achieve the spec-to-spec jumping that is required to
> understand how a fragment identifier is interpreted.

Is it necessary to update the registration? Isn't it enough if it's in 
the relevant spec? [Please note that Media Type registration is now 
under revision (see 
http://tools.ietf.org/html/draft-freed-media-type-regs-01), so it would 
be good to bring the up any issues to its authors or the relevant 
mailing list.]

In the limit, if I have a registration for an XML-based media type, and 
that registration points to a spec, and that spec says that it's okay to 
have foreign elements/attributes (and says or implies that the semantics 
of these elements/attributes apply), and these foreign 
elements/attributes have a spec that's easy to find (e.g. via a 
namespace page) and that spec says how to treat some of the fragment 
identifiers, and isn't in conflict with the main spec, then the chain of 
reference should work, shouldn't, even for pedants?

> In an ideal world, I would like fragment identifiers to be interpreted
> in the same way that they're interpreted in HTML and XHTML - that is,
> they identify a fragment of the document /or/ a concept in a document.
>
> The key word above being "OR". Jonathan Rees has called this "dual use"
> in the past. We've been grasping for terminology to use when describing
> how to interpret fragment identifiers. This may help lay a foundation:
>
> What a fragment identifier means is dependent on the Agent's
> Perspective. The Agent could be a User Agent, or it could be Semantic
> Agent. How the fragment identifier is interpreted is based entirely on
> who is asking the question.

As Noah has said, that seems a to be a bad idea, because any kind of 
agent could do anything.

> That is, what the Agent sees is entirely dependent on who they are. For
> example, when a User Agent sees a fragment identifier, they're looking
> for a portion of a document to jump to. When a Semantic Agent sees a
> fragment identifier, they're looking for a concept contained in the
> document. Therefore, for this URL:
>
> http://example.com/foo#bar
>
> A User Agent processing an HTML5 document would be looking for id="bar".
>
> A Semantic Agent processing an HTML5+RDFa document would be looking for
> the concept described using about="#bar".

This is true to the extent that agents (I don't like this word 
capitalized, sorry) may be only interested in a subset of fragment 
identifiers, or may only be able to handle a subset of fragment 
identifiers. But the question of what each piece of software can deal 
with should be separate from the question of what the fragment 
identifier 'means'.

> There are times where someone could do the following:
>
> <div id="bar" about="#bar">...</div>

If you subscribe to the theory that these identify two different things, 
then RFC 3986 clearly says this is a bad idea. I don't think there's 
much more we can say.

> However, given the definition above for Agent Perspectives - the meaning
> of the fragment identifier is clarified by who is asking the question.
> The answer, in both cases, is crystal clear - but not the same - and
> that's perfectly fine.
>
> Now, people may argue that having an entirely different set of
> identifiers for semantic purposes would have been better. For example,
> if we could go back 20 years and do this:
>
> http://example.com/foo@bar
> http://example.com/foo#bar
>
> We would know that @bar is a semantic concept identifier and #bar is a
> document fragment identifier. Unfortunately, that's not the reality that
> we have. We started overloading fragment identifiers for the Semantic
> Web a long time ago and we can't change that practice now without
> breaking the Semantic Web. However - because we didn't also overload @id
> in RDFa, we have a clear path forward using the concept of an Agent
> Perspective.

That's a possibility. But maybe if we had two different fragment 
identifiers, we'd have come up with problems where it would have been 
better to have three or four different fragment identifiers. We already 
had the JavaScript example.

One of the very basic ideas of URIs/IRIs is that there's a single space 
so that overlapping usages are possible when they make sense (even if 
they don't always do). I think that also applies to fragment 
identifiers. It's easily possible to have some static fragment 
identifiers for the case that JavaScript isn't active, but use 
JavaScript if it's active.

Regards,    Martin.


> Is there anywhere that this conceptual framework falls down? For
> example, RDF/XML? or XML+RDFa? If not, I believe that this is the
> correct conceptual framework for fragment identifier interpretation for
> the Web.
>
> -- manu
>
Received on Tuesday, 11 October 2011 01:23:13 UTC