Re: Fragment Identifiers and Agent Perspectives from Martin J. Dürst on 2011-10-11 (www-tag@w3.org from October 2011)

From: Martin J. Dürst <duerst@it.aoyama.ac.jp>
Date: Tue, 11 Oct 2011 14:15:17 +0900
To: Manu Sporny <msporny@digitalbazaar.com>
CC: W3C TAG <www-tag@w3.org>
Message-ID: <4E93D0E5.9040301@it.aoyama.ac.jp>
On 2011/10/11 12:21, Manu Sporny wrote:
> On 10/10/2011 09:22 PM, "Martin J. Dürst" wrote:
>>> Part of the reason that we're going to this trouble is to finally
>>> establish what a fragment identifier means when used in a document.
>>> Jonathan stated that the TAG may update RFC 3986 to clarify what a
>>> fragment identifier means - that's a good idea.
>>
>> The TAG cannot update RFC 3986. The IETF can update it. Of course,
>> members of the TAG can help with that update. But given it's an Internet
>> Standard, and given the update seems to be just about a few words, and
>> given the goal is that even "pedants should be able to follow their
>> nose", I'd expect quite a bit of resistance (if only passive, but that
>> might be enough).
>
> I tend to agree that this discussion can be pedantic.
>
> However, I also find myself having to explain this to Web developers
> often and when they ask me which spec states what a fragment identifier
> means on the semantic web and what a fragment identifier means on the
> document-based Web, I don't have a good place to point them for the
> semantic web case. I've pointed people to RFC 3986 before, but it
> clearly doesn't talk about how to use fragment identifiers on the
> semantic web.

And I guess that's the right thing to do. By the way, it also doesn't 
talk about how to use fragment identifiers on the "document web", 
either. The relevant pieces of text are:

    The fragment identifier component of a URI allows indirect
    identification of a secondary resource by reference to a primary
    resource and additional identifying information.  The identified
    secondary resource may be some portion or subset of the primary
    resource, some view on representations of the primary resource, or
    some other resource defined or described by those representations.

The last line clearly works for the semantic Web, or doesn't it?


> The existence of the semantic web case makes them then question whether
> the fragment identifier section in RFC 3986 is up to date at all. I
> typically shrug and tell them that I don't know if RFC 3986 really ever
> considered the semantic web at all since it doesn't really elaborate on
> how fragment identifiers can be used to refer to sections of a document,
> application state or semantic concepts.

As Roy said, the semantic web *was* considered when RFC 3986 was 
written. You can safely tell that to whomever asks you, and can check it 
in the mailing list archive. The fact that RFC 3986 doesn't elaborate 
too much about details is a feature, not a problem. It makes RFC 3986 
much more resilient to the invention of new technologies.

> So, while the discussion can be pedantic - I'm having a very hard time
> pointing people that are just learning about this stuff to a normative
> specification that talks about fragment identifiers and semantic web /
> Linked Data concepts with any authority.

Doesn't one or the other of the RDF specs say how fragment identifiers 
work for RDF? Of course, the same should be the case for HTML5 assuming 
that RDFa and/or microdata get integrated, but that's a spec-specific 
problem. And then there are best-practice-like issues like whether it's 
a good idea to use fragments for concepts, or whether it's better to use 
URIs without fragments, as has been discussed on this list. That's 
definitely something where the TAG might be involved.

>>> You're also going to
>>> have to make sure that all specs utilizing RDFa update the Media Type
>>> registrations to achieve the spec-to-spec jumping that is required to
>>> understand how a fragment identifier is interpreted.
>>
>> Is it necessary to update the registration? Isn't it enough if it's in
>> the relevant spec?
>
> The TAG has made it seem to me as if this is not enough. I can see their
> point and I don't see much harm in making the linkages between Media
> Type and their corresponding specification more clear.

RFC 3986 says:

    The semantics of a fragment identifier are defined by the set of
    representations that might result from a retrieval action on the
    primary resource.  The fragment's format and resolution is therefore
    dependent on the media type [RFC2046] of a potentially retrieved
    representation, even though such a retrieval is only performed if the
    URI is dereferenced.  If no such representation exists, then the
    semantics of the fragment are considered unknown and are effectively
    unconstrained.  Fragment identifier semantics are independent of the
    URI scheme and thus cannot be redefined by scheme specifications.

I don't see anywhere where it said that this information has to be in 
the media type registration form. I also don't remember anybody from the 
TAG who said that this is mandatory. If you do, could you give a 
pointer? I definitely do remember that people said it would be better if 
the media type registration form contained some information about 
fragment identifiers. But that's different from a requirement. In many 
cases, the exact workings of fragment identifiers may be too complicated 
to write them down in a registration form, so some text in the spec is 
required anyway.


>> In the limit, if I have a registration for an XML-based media type, and
>> that registration points to a spec, and that spec says that it's okay to
>> have foreign elements/attributes (and says or implies that the semantics
>> of these elements/attributes apply), and these foreign
>> elements/attributes have a spec that's easy to find (e.g. via a
>> namespace page) and that spec says how to treat some of the fragment
>> identifiers, and isn't in conflict with the main spec, then the chain of
>> reference should work, shouldn't, even for pedants?
>
> It works for some, doesn't for others.

Why not? If a Content-Type tells you it's FOO, and you find the spec for 
FOO with the help of the media type registry, and that spec says how to 
deal with fragment identifiers, what could possibly be wrong in actually 
implementing that?

> I think that's why we're
> discussing this. It wouldn't take much effort to do something that
> worked for everybody and provide documentation that waved its hands a
> bit less.

If this is about updating RFC 3986, you're heavily underestimating the 
effort involved. If it's about HTML5, then the editing effort will be 
much smaller, the main work is to convince the editor. If it's about a 
TAG finding, then that may look easy, but on average, they also tend to 
take quite some time. If it's about 
http://tools.ietf.org/html/draft-freed-media-type-regs-01, then now is 
the best chance, and you should try and take it before it's gone.

>>> What a fragment identifier means is dependent on the Agent's
>>> Perspective. The Agent could be a User Agent, or it could be Semantic
>>> Agent. How the fragment identifier is interpreted is based entirely on
>>> who is asking the question.
>>
>> As Noah has said, that seems a to be a bad idea, because any kind of
>> agent could do anything.
>
> ... but this is already happening on the Web, isn't it? That's kind of
> the point of what Roy was saying, right - anybody can do anything?

Well, yes, because it depends on the tool. Let's say somebody creates a 
tool that takes a list of URIs with fragments and puts all these 
fragments in a single document. We don't want to preclude this and 
similar stuff by too strict specifications.

>>> http://example.com/foo#bar
>>>
>>> A User Agent processing an HTML5 document would be looking for id="bar".
>>>
>>> A Semantic Agent processing an HTML5+RDFa document would be looking for
>>> the concept described using about="#bar".
>>
>> This is true to the extent that agents (I don't like this word
>> capitalized, sorry) may be only interested in a subset of fragment
>> identifiers, or may only be able to handle a subset of fragment
>> identifiers. But the question of what each piece of software can deal
>> with should be separate from the question of what the fragment
>> identifier 'means'.
>
> I thought what a piece of software can do with a fragment identifier was
> clear.

Above, you say "anybody can do anything". Now you say it's clear. Do you 
mean that if anybody can do anything, that's clear enough? Or something 
else?

> I thought the bit that wasn't clear is what the fragment
> identifier 'means'? What am I missing?

'means' meaning what? If you look at the 
http://weather.example.com/oaxaca example in 
http://www.w3.org/TR/webarch/, then the fact that this 'means' the 
weather in Oaxaca is something human users can deduce, not something 
that's built into the system. That's the same for fragment identifiers.

>>> There are times where someone could do the following:
>>>
>>> <div id="bar" about="#bar">...</div>
>>
>> If you subscribe to the theory that these identify two different things,
>> then RFC 3986 clearly says this is a bad idea.
>
> I agree, my example was misleading and detracted from my point, which is...
>
>> I don't think there's much more we can say.
>
> Well, technically, they do identify two different things - one
> identifies a document fragment, the other identifies a semantic concept.
> You can wave your hands a bit and say they're "well, they're effectively
> talking about the same thing". We do this hand waving quite a bit in Web
> Vocabulary documents, for example:
>
> http://payswarm.com/vocabs/security#publicKey
>
> #publicKey identifies a document fragment that contains human-readable
> text that explains the public key vocabulary term. If one were to
> extract RDF from the document (via RDFa), they would also be able to
> access the machine-readable data associated with the vocabulary term.
>
> I agree that it would be bad to have the #publicKey document fragment
> talk about public keys and #publicKey semantic concept talk about
> private keys. However, I don't agree with this statement - "I don't
> think there's much more we can say".

So let's assume the HTML5 spec (by reference or however) includes RDFa. 
What more do you need?


Regards,   Martin.

>> One of the very basic ideas of URIs/IRIs is that there's a single space
>> so that overlapping usages are possible when they make sense (even if
>> they don't always do). I think that also applies to fragment
>> identifiers. It's easily possible to have some static fragment
>> identifiers for the case that JavaScript isn't active, but use
>> JavaScript if it's active.
>
> I agree.
>
> -- manu
>
Received on Tuesday, 11 October 2011 05:15:55 UTC