W3C home > Mailing lists > Public > public-ietf-w3c@w3.org > November 2009

Fwd: draft minutes from IRI BoF at IETF 76

From: Thomas Roessler <tlr@w3.org>
Date: Wed, 25 Nov 2009 10:37:38 +0100
Cc: Thomas Roessler <tlr@w3.org>
Message-Id: <FEF62233-C7B2-49F7-ACFA-75CD10E3A42A@w3.org>
To: public-ietf-w3c <public-ietf-w3c@w3.org>
For those who don't follow the public-iri list.
--
Thomas Roessler, W3C  <tlr@w3.org>







Begin forwarded message:

> From: Peter Saint-Andre <stpeter@stpeter.im>
> Date: 25 November 2009 03:44:15 GMT+01:00
> To: public-iri@w3.org
> Subject: draft minutes from IRI BoF at IETF 76
> archived-at: <http://www.w3.org/mid/4B0C99FF.3030504@stpeter.im>
> 
> These are draft minutes from the "Birds of a Feather" session on IRIs
> held at IETF 76 in Hiroshima, Japan on November 10, 2009. If you have
> changes, please send them to the BoF chairs so that they can upload
> final minutes.
> 
> /psa
> 
> ========================================================================
> 
> IRI BoF Minutes
> IETF 76
> Tuesday, November 10, 2009, 1520-1700
> (Afternoon Session II -- Cattelya East)
> 
> Chairs: Ted Hardie and Pete Resnick
> 
> Jabber Scribes / Note Takers: Pete Resnick and Peter Saint-Andre
> 
> Minutes Editor: Peter Saint-Andre
> 
> Agenda:
> 
> <http://www.ietf.org/proceedings/09nov/agenda/iri.html>
> 
> Slides:
> 
> <https://datatracker.ietf.org/meeting/76/materials.html#wg-iri>
> 
> Audio:
> 
> <ftp://videolab.uoregon.edu/pub/videolab/media/ietf76/ietf76-ch4-tue-afnoon2.mp3>
> 
> Chat Log:
> 
> <http://www.ietf.org/jabber/logs/iri/2009-11-10.txt>
> 
> Dramatis Personae
> 
> AD = Adam Roach
> AM = Alexey Melnikov
> BL = Barry Leiba
> JH = Joe Hildebrand
> JK = John Klensin
> LD = Lisa Dusseault
> LM = Larry Masinter
> MD = Martin Duerst
> MS = Michael Smith
> PR = Pete Resnick
> SC = Stuart Cheshire
> TH = Ted Hardie
> 
> ========================================================================
> 
> IETF NOTE WELL statement reiterated.
> 
> Agenda bashing.
> 
> TH: We are trying to restore interoperability to a part of the Internet
> infrastructure where it has been lost. URI mechanism is one of the most
> important pieces of the application space. URIs were originally designed
> to be (1) under the hood and (2) ASCII. Assumption that IRIs were a way
> of presenting data that was really represented in a URI. That assumption
> has changed to make IRIs more of a first-class citizen. XML being a prime
> example. Now there are no less than 9 communities working on IRIs (W3C,
> etc.). Not trying to add a 10th.
> 
> JK: Counter-theory is that IRIs have been a disaster for
> internationalization.
> 
> LM: There is a horrible mess, but there is the possibility of
> making things a little bit better.
> 
> [slide] History of prior IRI specifications
> 
> [slide] Review of current documents
> 
> - draft-duerst-iri-bis-07
> - draft-duerst-mailto-bis-07
> - RFC 4395
> 
> Other documents explicitly left out of scope.
> 
> Issues:
> 
> 1. IRI as protocol element vs. mapping of IRI to URI. Until now, IRI
> has been defined as a sequence of Unicode characters that's converted
> into a URI by translating to UTF-8 and then percent-encoding. The
> meaning of the IRI was to be exactly the URI to which it was mapped.
> Later, it became clear that most implementations parsed the UTF-8 and
> translated to hex-encoding only if necessary. We also found that some
> applications were using IRIs as strings for namespaces (e.g., XML
> namespaces). In other words, applications were using IRIs directly as
> protocol elements.
> 
> 2. Normative reference to IDNA. The IRI spec defined translation of
> domain name components using IDNA, but IDNA is under transition.
> 
> 3. Different levels of "liberal processing". IRIs that weren't
> actually valid were accepted in places like HTML documents. Two levels
> here: one defined by XML community, another by the HTML community (i.e.,
> what browsers currently implement).
> 
> [slide] Other documents and committees
> 
> - HTML5 work in WhatWG and W3C HTML WG
> - IETF IDNABIS WG
> - IETF EAI WG
> 
> TH: Start introducing discussion. In particular, let's have comments and
> questions on IRI as protocol element.
> 
> MD: We still need to make sure that the conversion from IRI to URI is
> well-defined.
> 
> PR: Is everybody OK with movement from presentation layer to protocol
> element?
> 
> AM: Is there any difference on the wire? If so, it would be nice to show
> some examples.
> 
> LM: There are protocols and protocol elements. That's why we came up
> with IRIs vs. URIs in the first place.
> 
> AM: That was not my question. Will conversion processes described in
> different versions of the spec produce the same data on the wire?
> 
> LM: No, because formerly Unicode to UTF-8, hex encoded. It was listed as
> an option to do Unicode to punycode. This would give you two processing
> paths, because you might end up with hex-encoded UTF-8 *or* ASCII
> punycode. If you then took a URI that had percent-encoding and passed it
> to a non-Unicode-aware resolver, it would give you different results
> than if you had passed it a punycode hostname in the URI.
> 
> MD: RFC 3987 didn't explain that very well. But you don't know if
> underneath you are dealing with DNS or some other kind of system. This
> needs to be an open issue.
> 
> JK: The plenary on Thursday night will discuss these issues as well. The
> argument there will be that it's not a good idea to have two different
> encodings of the same information (UTF-8 and punycode). It's an even
> worse idea to have three (UTF-8, UTF-8 hex-encoded, and punycode).
> 
> TH: That would appear to be opposed to deployed reality. Certainly we
> have these IRIs/URIs in content (e.g., HTML files), not just on the wire.
> 
> JK: One of the problems here is that we're showing many signs of digging
> a deep hole.
> 
> TH: So stop digging?
> 
> JK: Start digging a hole over which we have more control.
> 
> JH: I want to make sure there's an encoding that doesn't require
> conversion to punycode or hex-encoding.
> 
> PR: You mean UTF-8?
> 
> JH: XMPP is all UTF-8, it would be nice if we could just use that.
> 
> LM: URIs are a sequence of characters, not bytes. That needs to be
> encoded as UTF-8 or UTF-16 or whatever if you want a sequence of bytes
> instead of characters.
> 
> SC: We'll discuss this on Thursday night at the plenary. Unicode
> identifies characters, but you need to encode them.
> 
> AR: I hear that we're trying to make changes based on implementation,
> but are we trying to get rid of IDNA?
> 
> LM: No, we're discussing these issues with implementors, not ignoring
> implementation reality.
> 
> MD: I agree that two representations are bad, and three are worse.
> Problem is that domain names don't appear in an IRI or URI only in the
> path component, they can appear elsewhere (e.g., in query string).
> 
> TH: The reality is that things are very messy. Application developers
> try very hard to get people where they want to go, even if that input is
> not valid. Even if we came up with a totally new identifier, that would
> not fix what's out there. My take is that scrapping what we have now and
> starting over does us no good, because it might be completely better but
> completely undeployed. We need to find something in the middle, because
> leaving things as they are right now doesn't help.
> 
> MS: HTML5 spec has ended up including text about (1) error handling for
> URIs and (2) character encodings in query strings. We hope to have spec
> text that we can normatively reference in HTML5 so that implementors can
> do the right thing going forward.
> 
> TH: Was there discussion of other URI schemes at that point (in the W3C
> or WhatWG), or was it limited to HTTP?
> 
> MS: I believe it was limited to HTTP.
> 
> MS: The "goals" page linked to from the draft charter is very useful and
> I'd like to see those concerns addressed.
> 
> [
> 
> Meeting Editor's Note:
> 
>  http://trac.tools.ietf.org/area/app/trac/wiki/DraftIriCharter
>  http://trac.tools.ietf.org/area/app/trac/wiki/IriWorkGoals
> 
> ]
> 
> JK: Ted, I agree with you, but URI syntax is currently employed as a
> user interface, and I think we'll need to move away from that
> eventually. I'd be happy if we could deprecate ASCII-only URIs.
> 
> TH: Do we have consensus to do the work necessary to deprecate
> ASCII-only URIs?
> 
> JK: I would define it differently. Some URIs are not user-facing but
> instead are network-facing only, so we haven't felt the need to define
> internationalized versions of those.
> 
> TH: Yes, the browser location bar screwed us royally because HTTP URIs
> were originally supposed to be hidden behind hyperlinks in web pages.
> 
> JK: The choices are either looking at things on a scheme-by-scheme
> basis, or solving the problem globally for all schemes.
> 
> TH: How much work are people willing to put in? Because that's a lot of
> work.
> 
> MD: I've already put in a lot of time, but I'm willing to continue
> working. The deployed content that is not compliant is a small
> percentage, but the mountain of content is extremely large.
> 
> LM: A few points. draft-duerst-iri-bis-07 recommends (1) renaming the
> existing IANA registry of URI schemes to be a registry of URI/IRI
> schemes, (2) adding as a requirement for new schemes to define the
> non-ASCII characters that are appropriate for the scheme, and (3)
> reviewing all the existing schemes for their appropriateness as IRI
> identifiers, where the default is that it's an old-style URI. This would
> go a long way to making all of the identifiers into IRIs.
> 
> TH: I remember an effort to bring all the URIs up to date, and we burned
> out three people in the process. Now it would be just as much work, or
> more.
> 
> MD: Say there is a URI scheme for IP addresses, then it's just numbers
> and we don't need any internationalization. On the other hand, for
> something like mailto you can only use ASCII because it is so old.
> 
> JK: If we had a URI scheme for IP addresses, you can be sure that we
> would hear calls for encoding those digits in a localized version,
> instead of in "Arabic numerals".
> 
> LM: I think we have enough problems without imagining new ones.
> 
> BL: Why is this not a presentation issue instead of a protocol issue?
> 
> TH: Barry, that was the theory, but it failed the test of implementation.
> 
> BL: As we discover that more things need internationalization in the
> presentation layer, can't we just say that applications need to become
> better at presentation?
> 
> AR: If I understand that we'd go back and convert old URIs to IRIs, I
> would suggest that the effort to do that for SIP alone would be
> enormous.
> 
> LM: Currently you could hex-encode UTF-8.
> 
> AR: No-ASCII characters are not allowed in SIP URIs. But SIP has messed
> this up because there is no normative text about it. I would be stunned
> if we're not the only protocol in that boat.
> 
> LM: So either it works or it doesn't. If it doesn't, it's ASCII only
> until someone fixes it. The registry would enable that to be defined.
> 
> AR: So you would be defining a framework, not fixing each scheme.
> 
> LM: Right. A framework that says it's either like IRIs now, like URIs
> now, or some other definition.
> 
> TH: To Barry's point, we've never been able to force people how to layer
> things correctly in their applications. The danger is that we're going
> to go back into defining human-friendly names. The reality is that the
> protocol elements will bleed into the human side of things. But we can't
> be so liberal that things break if there are no humans involved, e.g. if
> the data is provided to a lower layer.
> 
> LM: Preventing bad things from happening to humans is a high priority --
> dealing with issues of ambiguity and reliability. Let me speak a bit
> about the bleeding of protocol elements into human interfaces, because a
> big part of the Internet economy came from being able to advertise your
> domain name -- which was bleeding of a protocol element into user space.
> 
> BL: I disagree, because if things worked right then a Chinese or
> Japanese or Cyrllic name could be converted correctly into a protocol
> element.
> 
> LM: I'm confused. The use of i18n identifiers works and it's deployed.
> There are just a few issues around the edges. The problem is that we
> need to address those issues in a coordinated fashion.
> 
> PR: The presentation layer leaks into protocol elements. Larry said IRIs
> are used as protocol elements, but they are not encoded. And percent
> encoded representations provide i18n, too. What we want to get away from
> is conflating i18n with a particular encoding. E.g., people use UTF-8,
> so what's being argued for is to standardize that usage by saying that
> internationalized identifiers are to be encoded in UTF-8.
> 
> LM: Sorry that I was not specific enough. Where they are deployed is in
> HTML documents.
> 
> PR: But those are not in UTF-8, they are in ISO 8859-foo Do you want
> these identifiers to be represented in *any* encoding? We need to be
> careful about which path we're on.
> 
> LM: This isn't something that I *want*, it's what *is*. There is a lot
> of software and content that treats a sequence of characters in the
> encoding of that document, converts that into Unicode (usually UTF-16
> but perhaps also UTF-8), and uses the result as a URI/IRI.
> 
> BL: The inclusion of this in an HTML does not make this into a protocol
> element.
> 
> TH: Some people think they are protocol elements. Both views might be
> correct. We can get sidetracked into which encodings we prefer. One of
> the goals here is to minimize the number of translations that occur
> between applications. Make it as simple as possible, but not simpler.
> Don't simplify identifiers, reduce the number of iterations of
> translation in any given protocol handoff. Not easy, but different from
> what was just described.
> 
> PR: Let's reframe. In RFC 822, in the header an address is a protocol
> element. In the body, the address is a protocol element in text.
> 
> BL: This is going in the wrong direction.
> 
> TH: Does anyone think we don't have a problem here?
> 
> [laughter]
> 
> BL: I'm not saying we shouldn't do this, only that we need to scope it.
> 
> [slide] Charter review
> 
> LM: There's a draft charter, perhaps we can review that? Can we address
> the problem only by working on these three documents?
> 
> - draft-duerst-iri-bis-07
> - draft-duerst-mailto-bis-07
> - RFC 4395
> 
> Do we need to work on more? Can we get away with working on even fewer?
> 
> JK: First, I don't think we can narrow the scope to this. We need to
> look at the impact on all schemes, not just HTTP URIs. Second, I'm leery
> of having this WG try to fix mailto, especially in the context of EAI,
> because mailto needs to be fixed by people who understand mail.
> 
> MD: The mailto draft says very little about EAI. I'm not an expert about
> all the details about how you do escaping in email addresses and the
> like, so we need people who know about that to provide comments.
> 
> AM: I'm happy about reducing the scope because success is good, but the
> mailto/EAI interaction needs to be addressed. However, if this WG will
> focus only on HTTP then it might make choices that are not generic
> enough.
> 
> LM: The intent is not to focus on HTTP.
> 
> AM: But HTTP is the main application. Maybe the concern lacks a basis.
> 
> LM: The charter mentions explicit coordination with other groups. Not
> meant to be an exclusive list, and mainly to coordinate requirements.
> 
> JK: We have once again fallen into the habit of always mentioning the
> the example of web browsers. I agree with Alexey that we need more
> examples, at least one example but three more would be even better. But
> mailto is probably not a good example.
> 
> TH: Maybe look at URNs. They are similar but different enough to be
> useful for this work.
> 
> MD: RFC 3987 references POP, IMAP, data URI scheme, URN, etc. The idea
> to look at other schemes is important. But do we put out a new spec for
> those?
> 
> LM: I don't think we need to update data: URIs.
> 
> MD: We need to look at other schemes, but we don't know which until we
> investigate them in detail.
> 
> JH: I do think XMPP is a good example because it is more recent. I think
> we also need recommendations for people defining new schemes. We can
> discuss in the XMPP WG whether more formal cooperation is needed.
> 
> LM: So maybe take mailto out and put others in.
> 
> AM: Completing the mailto update can happen elsewhere.
> 
> LM: There is discussion in the charter of perhaps splitting the
> documents, e.g., move everything about domain names into a separate
> document. Also perhaps a separate BCP for informational text about why
> some characters are problematic and others are not.
> 
> Chairs begin asking questions of the room...
> 
> TH: HUM Is there a problem here for the IETF to solve?
> 
> Hum "yes" -- many
> Hum "no" -- silence
> Hum "not enough information to decide" -- silence
> 
> TH: Raise your hand if you are willing to:
> - be on a mailing list to discuss these issues
> - review documents
> - replace me and Pete at the front of the room
> 
> TH: A fair number of volunteers on the first two.
> 
> TH: I'm going to ask for two directional questions about the charter....
> 
> TH: HUM Should it be within the charter to scrap the existing approach
> from RFC 3987 and start over with an entirely new approach?
> 
> Hum "yes" -- about one-third.
> Hum "no" -- about one-third.
> Hum "not enough information to decide" -- about one-third.
> 
> TH: HUM Should the charter include an explicit list of schemes to review?
> 
> Hum "yes" -- about one-third.
> Hum "no" -- about one-third.
> Hum "not enough information to decide" -- about one-third.
> 
> Clarifying question from the mic: would that include making it a
> minimum list, not a limiting list?
> 
> LD: Can we ask if this is a deal-breaker?
> 
> TH: I don't think we need to go there now, let's clarify the charter
> first.
> 
> TH: Re-hum?
> 
> JK: I don't care which specific schemes are selected for review, just as
> long as the WG reviews multiple schemes.
> 
> TH: HUM Must we decide specifically which schemes need to be investigated
> *before* the WG is chartered, or can the WG sort that out?
> 
> Hum "yes" -- none for "before"
> Hum "no" -- many for "WG can sort it out"
> 
> TH: Must the charter specify a minimum number of schemes to investigate,
> or can the WG sort that out?
> 
> Hum "yes" -- one hum for "minimum number"
> Hum "no" -- many for "WG can sort it out"
> 
> TH: So we seem to have consensus that there are volunteers to work on
> IRIs at the IETF. Thanks!
> 
> END
> 
> ========================================================================
> 
> 
Received on Wednesday, 25 November 2009 09:38:15 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 25 November 2009 09:38:15 GMT