FragIds in semantic web (ACTION-543) from Jeni Tennison on 2011-04-06 (www-tag@w3.org from April 2011)

From: Jeni Tennison <jeni.tennison@googlemail.com>
Date: Wed, 6 Apr 2011 23:18:12 +0100
To: Larry Masinter <masinter@adobe.com>
Cc: "www-tag@w3.org List" <www-tag@w3.org>
Message-Id: <F304E3F4-4C1E-44F6-BF03-DB7A9F11F63E@jenitennison.com>
Hi Larry,

I have an action (ACTION-543: Propose addition to MIME/Web draft to discuss sem-web use of fragids not grounded in media type) to propose some wording to slot into your "MIME and the Web" draft which I'm taking to be the version at:

  http://tools.ietf.org/id/draft-masinter-mime-web-info-02.html

You already have a Section 4.6 (Fragment identifiers) which touches on the issue, so I suggest extending that to read something like:

---
  The Web added the notion of being able to address part of an entity 
  and not the whole content by adding a 'fragment identifier' to the 
  URL that addressed the data. Of course, this originally made sense 
  for the original Web with just HTML, but how would it apply to other 
  content types? The URL spec glibly noted that "the definition of the 
  fragment identifier meaning depends on the Internet Media Type", but 
  unfortunately, few of the Internet Media Type definitions included 
  this information, and practices diverged greatly.

  Content negotiation becomes extremely difficult when the interpretation 
  of fragment identifiers depends on the MIME type as there is no 
  guarantee that the syntax of a fragment identifier that is legal for
  one MIME type is also legal (or interpreted in an equivalent way) for
  another MIME type. For example, the common `#identifier` syntax for
  HTML is not consistent with the XPointer-based syntax defined for XML.

  This is exacerbated in common semantic web practice, which not only 
  makes heavy use of content negotiation but in which URLs with fragment 
  identifiers are used to identify real-world Things. In these cases, 
  the URI as a whole is used to identify the real-world Thing, and the
  fragment identifier does not address a part of any entity, so 
  interpreting the fragment identifier based on the MIME type of whatever 
  entity happens to be returned does not make sense.
---

Section 5.1.3 (Fragment identifiers) talks briefly about what might be done about fragment identifiers, stating that the problem is that MIME type definitions don't talk about fragment identifiers. I think the problem goes deeper than that because of the inconsistency of interpretation across media types. I think we might want to do something at the level of the URL specification to guarantee support for simple fragment identifiers (ie #identifier) across media types.


Having read through, I've also got one suggestion and some small editorial fixes.

The one suggestion is to include somewhere a section that describes the 'application/atom+xml' or 'application/schema+json' pattern (introduced in RFC 3023 I believe) in which there's a generic MIME type (application/xml or application/json) for a meta-language and a syntax pattern for MIME types for languages based on that meta-language. Perhaps it might make sense to have lower/fewer hoops to jump through if you're defining a MIME type for a language based on a meta-language. Maybe there are implications of compatibility between the language and the meta-language in sniffing and the interpretation of fragment identifiers that mean the registration needn't be so detailed.

The editorial fixes are:

1. Introduction: s/are describes./are described./

2.1 Origins of MIME: s/Message sent from A to B./Message is sent from A to B./

2.2 Introducing MIME into the Web: s/HTTP have minor/HTTP are minor/

3.1 Lack of clarity:

  s/its uses, the meaning/its uses, and the meaning/
  s/W3C specifications TAG findings and Internet/W3C specifications, TAG findings, and Internet/

It would be good to have some examples of the incorrect assumptions that this paragraph talks about.


3.2 Differences between email and Web delivery

Can you clarify for me, in the first bullet point where you say 'GET has no content', is that always the case? I can't see the part of HTTP (1.1 or bis) that says this but suspect that's because I'm missing something.


3.3 The Rules Weren't Quite Followed:

  s/that are registration/that are registered/
  s/sherperding/shepherding/
  s/Orgnaizations/Organizations/


4.4 Evolution, Versioning, Forking:

  s/litle/little/
  s/try to insure/try to ensure/

5. Recommendations:

  s/aggreement/agreement/
  s/to use of MIME/to the use of MIME/

5.1.4. Application info: s/section to be clearer/section be clearer/

Hope this is useful,

Jeni
-- 
Jeni Tennison
http://www.jenitennison.com
Received on Thursday, 7 April 2011 01:28:18 UTC