- From: Thomas Roessler <tlr@w3.org>
- Date: Wed, 25 Nov 2009 10:37:38 +0100
- To: public-ietf-w3c <public-ietf-w3c@w3.org>
- Cc: Thomas Roessler <tlr@w3.org>
- Message-Id: <FEF62233-C7B2-49F7-ACFA-75CD10E3A42A@w3.org>
For those who don't follow the public-iri list. -- Thomas Roessler, W3C <tlr@w3.org> Begin forwarded message: > From: Peter Saint-Andre <stpeter@stpeter.im> > Date: 25 November 2009 03:44:15 GMT+01:00 > To: public-iri@w3.org > Subject: draft minutes from IRI BoF at IETF 76 > archived-at: <http://www.w3.org/mid/4B0C99FF.3030504@stpeter.im> > > These are draft minutes from the "Birds of a Feather" session on IRIs > held at IETF 76 in Hiroshima, Japan on November 10, 2009. If you have > changes, please send them to the BoF chairs so that they can upload > final minutes. > > /psa > > ======================================================================== > > IRI BoF Minutes > IETF 76 > Tuesday, November 10, 2009, 1520-1700 > (Afternoon Session II -- Cattelya East) > > Chairs: Ted Hardie and Pete Resnick > > Jabber Scribes / Note Takers: Pete Resnick and Peter Saint-Andre > > Minutes Editor: Peter Saint-Andre > > Agenda: > > <http://www.ietf.org/proceedings/09nov/agenda/iri.html> > > Slides: > > <https://datatracker.ietf.org/meeting/76/materials.html#wg-iri> > > Audio: > > <ftp://videolab.uoregon.edu/pub/videolab/media/ietf76/ietf76-ch4-tue-afnoon2.mp3> > > Chat Log: > > <http://www.ietf.org/jabber/logs/iri/2009-11-10.txt> > > Dramatis Personae > > AD = Adam Roach > AM = Alexey Melnikov > BL = Barry Leiba > JH = Joe Hildebrand > JK = John Klensin > LD = Lisa Dusseault > LM = Larry Masinter > MD = Martin Duerst > MS = Michael Smith > PR = Pete Resnick > SC = Stuart Cheshire > TH = Ted Hardie > > ======================================================================== > > IETF NOTE WELL statement reiterated. > > Agenda bashing. > > TH: We are trying to restore interoperability to a part of the Internet > infrastructure where it has been lost. URI mechanism is one of the most > important pieces of the application space. URIs were originally designed > to be (1) under the hood and (2) ASCII. Assumption that IRIs were a way > of presenting data that was really represented in a URI. That assumption > has changed to make IRIs more of a first-class citizen. XML being a prime > example. Now there are no less than 9 communities working on IRIs (W3C, > etc.). Not trying to add a 10th. > > JK: Counter-theory is that IRIs have been a disaster for > internationalization. > > LM: There is a horrible mess, but there is the possibility of > making things a little bit better. > > [slide] History of prior IRI specifications > > [slide] Review of current documents > > - draft-duerst-iri-bis-07 > - draft-duerst-mailto-bis-07 > - RFC 4395 > > Other documents explicitly left out of scope. > > Issues: > > 1. IRI as protocol element vs. mapping of IRI to URI. Until now, IRI > has been defined as a sequence of Unicode characters that's converted > into a URI by translating to UTF-8 and then percent-encoding. The > meaning of the IRI was to be exactly the URI to which it was mapped. > Later, it became clear that most implementations parsed the UTF-8 and > translated to hex-encoding only if necessary. We also found that some > applications were using IRIs as strings for namespaces (e.g., XML > namespaces). In other words, applications were using IRIs directly as > protocol elements. > > 2. Normative reference to IDNA. The IRI spec defined translation of > domain name components using IDNA, but IDNA is under transition. > > 3. Different levels of "liberal processing". IRIs that weren't > actually valid were accepted in places like HTML documents. Two levels > here: one defined by XML community, another by the HTML community (i.e., > what browsers currently implement). > > [slide] Other documents and committees > > - HTML5 work in WhatWG and W3C HTML WG > - IETF IDNABIS WG > - IETF EAI WG > > TH: Start introducing discussion. In particular, let's have comments and > questions on IRI as protocol element. > > MD: We still need to make sure that the conversion from IRI to URI is > well-defined. > > PR: Is everybody OK with movement from presentation layer to protocol > element? > > AM: Is there any difference on the wire? If so, it would be nice to show > some examples. > > LM: There are protocols and protocol elements. That's why we came up > with IRIs vs. URIs in the first place. > > AM: That was not my question. Will conversion processes described in > different versions of the spec produce the same data on the wire? > > LM: No, because formerly Unicode to UTF-8, hex encoded. It was listed as > an option to do Unicode to punycode. This would give you two processing > paths, because you might end up with hex-encoded UTF-8 *or* ASCII > punycode. If you then took a URI that had percent-encoding and passed it > to a non-Unicode-aware resolver, it would give you different results > than if you had passed it a punycode hostname in the URI. > > MD: RFC 3987 didn't explain that very well. But you don't know if > underneath you are dealing with DNS or some other kind of system. This > needs to be an open issue. > > JK: The plenary on Thursday night will discuss these issues as well. The > argument there will be that it's not a good idea to have two different > encodings of the same information (UTF-8 and punycode). It's an even > worse idea to have three (UTF-8, UTF-8 hex-encoded, and punycode). > > TH: That would appear to be opposed to deployed reality. Certainly we > have these IRIs/URIs in content (e.g., HTML files), not just on the wire. > > JK: One of the problems here is that we're showing many signs of digging > a deep hole. > > TH: So stop digging? > > JK: Start digging a hole over which we have more control. > > JH: I want to make sure there's an encoding that doesn't require > conversion to punycode or hex-encoding. > > PR: You mean UTF-8? > > JH: XMPP is all UTF-8, it would be nice if we could just use that. > > LM: URIs are a sequence of characters, not bytes. That needs to be > encoded as UTF-8 or UTF-16 or whatever if you want a sequence of bytes > instead of characters. > > SC: We'll discuss this on Thursday night at the plenary. Unicode > identifies characters, but you need to encode them. > > AR: I hear that we're trying to make changes based on implementation, > but are we trying to get rid of IDNA? > > LM: No, we're discussing these issues with implementors, not ignoring > implementation reality. > > MD: I agree that two representations are bad, and three are worse. > Problem is that domain names don't appear in an IRI or URI only in the > path component, they can appear elsewhere (e.g., in query string). > > TH: The reality is that things are very messy. Application developers > try very hard to get people where they want to go, even if that input is > not valid. Even if we came up with a totally new identifier, that would > not fix what's out there. My take is that scrapping what we have now and > starting over does us no good, because it might be completely better but > completely undeployed. We need to find something in the middle, because > leaving things as they are right now doesn't help. > > MS: HTML5 spec has ended up including text about (1) error handling for > URIs and (2) character encodings in query strings. We hope to have spec > text that we can normatively reference in HTML5 so that implementors can > do the right thing going forward. > > TH: Was there discussion of other URI schemes at that point (in the W3C > or WhatWG), or was it limited to HTTP? > > MS: I believe it was limited to HTTP. > > MS: The "goals" page linked to from the draft charter is very useful and > I'd like to see those concerns addressed. > > [ > > Meeting Editor's Note: > > http://trac.tools.ietf.org/area/app/trac/wiki/DraftIriCharter > http://trac.tools.ietf.org/area/app/trac/wiki/IriWorkGoals > > ] > > JK: Ted, I agree with you, but URI syntax is currently employed as a > user interface, and I think we'll need to move away from that > eventually. I'd be happy if we could deprecate ASCII-only URIs. > > TH: Do we have consensus to do the work necessary to deprecate > ASCII-only URIs? > > JK: I would define it differently. Some URIs are not user-facing but > instead are network-facing only, so we haven't felt the need to define > internationalized versions of those. > > TH: Yes, the browser location bar screwed us royally because HTTP URIs > were originally supposed to be hidden behind hyperlinks in web pages. > > JK: The choices are either looking at things on a scheme-by-scheme > basis, or solving the problem globally for all schemes. > > TH: How much work are people willing to put in? Because that's a lot of > work. > > MD: I've already put in a lot of time, but I'm willing to continue > working. The deployed content that is not compliant is a small > percentage, but the mountain of content is extremely large. > > LM: A few points. draft-duerst-iri-bis-07 recommends (1) renaming the > existing IANA registry of URI schemes to be a registry of URI/IRI > schemes, (2) adding as a requirement for new schemes to define the > non-ASCII characters that are appropriate for the scheme, and (3) > reviewing all the existing schemes for their appropriateness as IRI > identifiers, where the default is that it's an old-style URI. This would > go a long way to making all of the identifiers into IRIs. > > TH: I remember an effort to bring all the URIs up to date, and we burned > out three people in the process. Now it would be just as much work, or > more. > > MD: Say there is a URI scheme for IP addresses, then it's just numbers > and we don't need any internationalization. On the other hand, for > something like mailto you can only use ASCII because it is so old. > > JK: If we had a URI scheme for IP addresses, you can be sure that we > would hear calls for encoding those digits in a localized version, > instead of in "Arabic numerals". > > LM: I think we have enough problems without imagining new ones. > > BL: Why is this not a presentation issue instead of a protocol issue? > > TH: Barry, that was the theory, but it failed the test of implementation. > > BL: As we discover that more things need internationalization in the > presentation layer, can't we just say that applications need to become > better at presentation? > > AR: If I understand that we'd go back and convert old URIs to IRIs, I > would suggest that the effort to do that for SIP alone would be > enormous. > > LM: Currently you could hex-encode UTF-8. > > AR: No-ASCII characters are not allowed in SIP URIs. But SIP has messed > this up because there is no normative text about it. I would be stunned > if we're not the only protocol in that boat. > > LM: So either it works or it doesn't. If it doesn't, it's ASCII only > until someone fixes it. The registry would enable that to be defined. > > AR: So you would be defining a framework, not fixing each scheme. > > LM: Right. A framework that says it's either like IRIs now, like URIs > now, or some other definition. > > TH: To Barry's point, we've never been able to force people how to layer > things correctly in their applications. The danger is that we're going > to go back into defining human-friendly names. The reality is that the > protocol elements will bleed into the human side of things. But we can't > be so liberal that things break if there are no humans involved, e.g. if > the data is provided to a lower layer. > > LM: Preventing bad things from happening to humans is a high priority -- > dealing with issues of ambiguity and reliability. Let me speak a bit > about the bleeding of protocol elements into human interfaces, because a > big part of the Internet economy came from being able to advertise your > domain name -- which was bleeding of a protocol element into user space. > > BL: I disagree, because if things worked right then a Chinese or > Japanese or Cyrllic name could be converted correctly into a protocol > element. > > LM: I'm confused. The use of i18n identifiers works and it's deployed. > There are just a few issues around the edges. The problem is that we > need to address those issues in a coordinated fashion. > > PR: The presentation layer leaks into protocol elements. Larry said IRIs > are used as protocol elements, but they are not encoded. And percent > encoded representations provide i18n, too. What we want to get away from > is conflating i18n with a particular encoding. E.g., people use UTF-8, > so what's being argued for is to standardize that usage by saying that > internationalized identifiers are to be encoded in UTF-8. > > LM: Sorry that I was not specific enough. Where they are deployed is in > HTML documents. > > PR: But those are not in UTF-8, they are in ISO 8859-foo Do you want > these identifiers to be represented in *any* encoding? We need to be > careful about which path we're on. > > LM: This isn't something that I *want*, it's what *is*. There is a lot > of software and content that treats a sequence of characters in the > encoding of that document, converts that into Unicode (usually UTF-16 > but perhaps also UTF-8), and uses the result as a URI/IRI. > > BL: The inclusion of this in an HTML does not make this into a protocol > element. > > TH: Some people think they are protocol elements. Both views might be > correct. We can get sidetracked into which encodings we prefer. One of > the goals here is to minimize the number of translations that occur > between applications. Make it as simple as possible, but not simpler. > Don't simplify identifiers, reduce the number of iterations of > translation in any given protocol handoff. Not easy, but different from > what was just described. > > PR: Let's reframe. In RFC 822, in the header an address is a protocol > element. In the body, the address is a protocol element in text. > > BL: This is going in the wrong direction. > > TH: Does anyone think we don't have a problem here? > > [laughter] > > BL: I'm not saying we shouldn't do this, only that we need to scope it. > > [slide] Charter review > > LM: There's a draft charter, perhaps we can review that? Can we address > the problem only by working on these three documents? > > - draft-duerst-iri-bis-07 > - draft-duerst-mailto-bis-07 > - RFC 4395 > > Do we need to work on more? Can we get away with working on even fewer? > > JK: First, I don't think we can narrow the scope to this. We need to > look at the impact on all schemes, not just HTTP URIs. Second, I'm leery > of having this WG try to fix mailto, especially in the context of EAI, > because mailto needs to be fixed by people who understand mail. > > MD: The mailto draft says very little about EAI. I'm not an expert about > all the details about how you do escaping in email addresses and the > like, so we need people who know about that to provide comments. > > AM: I'm happy about reducing the scope because success is good, but the > mailto/EAI interaction needs to be addressed. However, if this WG will > focus only on HTTP then it might make choices that are not generic > enough. > > LM: The intent is not to focus on HTTP. > > AM: But HTTP is the main application. Maybe the concern lacks a basis. > > LM: The charter mentions explicit coordination with other groups. Not > meant to be an exclusive list, and mainly to coordinate requirements. > > JK: We have once again fallen into the habit of always mentioning the > the example of web browsers. I agree with Alexey that we need more > examples, at least one example but three more would be even better. But > mailto is probably not a good example. > > TH: Maybe look at URNs. They are similar but different enough to be > useful for this work. > > MD: RFC 3987 references POP, IMAP, data URI scheme, URN, etc. The idea > to look at other schemes is important. But do we put out a new spec for > those? > > LM: I don't think we need to update data: URIs. > > MD: We need to look at other schemes, but we don't know which until we > investigate them in detail. > > JH: I do think XMPP is a good example because it is more recent. I think > we also need recommendations for people defining new schemes. We can > discuss in the XMPP WG whether more formal cooperation is needed. > > LM: So maybe take mailto out and put others in. > > AM: Completing the mailto update can happen elsewhere. > > LM: There is discussion in the charter of perhaps splitting the > documents, e.g., move everything about domain names into a separate > document. Also perhaps a separate BCP for informational text about why > some characters are problematic and others are not. > > Chairs begin asking questions of the room... > > TH: HUM Is there a problem here for the IETF to solve? > > Hum "yes" -- many > Hum "no" -- silence > Hum "not enough information to decide" -- silence > > TH: Raise your hand if you are willing to: > - be on a mailing list to discuss these issues > - review documents > - replace me and Pete at the front of the room > > TH: A fair number of volunteers on the first two. > > TH: I'm going to ask for two directional questions about the charter.... > > TH: HUM Should it be within the charter to scrap the existing approach > from RFC 3987 and start over with an entirely new approach? > > Hum "yes" -- about one-third. > Hum "no" -- about one-third. > Hum "not enough information to decide" -- about one-third. > > TH: HUM Should the charter include an explicit list of schemes to review? > > Hum "yes" -- about one-third. > Hum "no" -- about one-third. > Hum "not enough information to decide" -- about one-third. > > Clarifying question from the mic: would that include making it a > minimum list, not a limiting list? > > LD: Can we ask if this is a deal-breaker? > > TH: I don't think we need to go there now, let's clarify the charter > first. > > TH: Re-hum? > > JK: I don't care which specific schemes are selected for review, just as > long as the WG reviews multiple schemes. > > TH: HUM Must we decide specifically which schemes need to be investigated > *before* the WG is chartered, or can the WG sort that out? > > Hum "yes" -- none for "before" > Hum "no" -- many for "WG can sort it out" > > TH: Must the charter specify a minimum number of schemes to investigate, > or can the WG sort that out? > > Hum "yes" -- one hum for "minimum number" > Hum "no" -- many for "WG can sort it out" > > TH: So we seem to have consensus that there are volunteers to work on > IRIs at the IETF. Thanks! > > END > > ======================================================================== > >
Received on Wednesday, 25 November 2009 09:38:15 UTC