W3C home > Mailing lists > Public > public-iri@w3.org > July 2012

Re: [iri] #128: use of the term 'origin'

From: Martin J. Dürst <duerst@it.aoyama.ac.jp>
Date: Tue, 10 Jul 2012 20:27:18 +0900
Message-ID: <4FFC1196.7080803@it.aoyama.ac.jp>
To: Larry Masinter <masinter@adobe.com>
CC: "stpeter@stpeter.im" <stpeter@stpeter.im>, "public-iri@w3.org" <public-iri@w3.org>, Chris Lilley <chris@w3.org>
On 2012/06/17 0:28, Larry Masinter wrote:
> does this apply to any format other than HTML? I'm not sure that this applies to anything else... Within image/svg+xml, for example? The notion of document charset doesn't apply to some formats.

Hello Larry,

Very good idea to test this. I tested the various browsers that I have, 
looking at the actual requests in Wireshark, everything on Windows 7. 
The test consisted of the attached SVG file in iso-8859-1 with a link to 
an existing domain but a non-existing page with a query part with 
non-ASCII characters.

Here are the results:

Opera 12:
GET /non-existent?r%C3%A9sum%C3%A9 HTTP/1.1\r\n
This means the query part is sent as percent-encoded UTF-8.

Safari (5.1.7):
GET /non-existent?r%E9sum%E9 HTTP/1.1\r\n
This means that the query part is sent as percent-encoded iso-8859-1.

IE9:
GET /non-existent?r\351sum\351 HTTP/1.1\r\n
This means that the query part is sent as RAW iso-8859-1.

Firefox 13.0.1:
GET /non-existent?r%E9sum%E9 HTTP/1.1\r\n
This means that the query part is sent as percent-encoded iso-8859-1.

Chrome 20:
GET /non-existent?r%E9sum%E9 HTTP/1.1\r\n
This means that the query part is sent as percent-encoded iso-8859-1.

With the exception of Opera, SVG seems to follow HTML. But there are SVG 
user agents that are not browsers. If somebody has one of these, please 
run this test and tell us what you got.

Also, there are formats other than HTML and SVG.

Regards,   Martin.


> Connected by DROID on Verizon Wireless
>
>
> -----Original message-----
> From: iri issue tracker<trac+iri@grenache.tools.ietf.org>
> To: "draft-ietf-iri-3987bis@tools.ietf.org"<draft-ietf-iri-3987bis@tools.ietf.org>, "stpeter@stpeter.im"<stpeter@stpeter.im>
> Cc: "public-iri@w3.org"<public-iri@w3.org>
> Sent: Mon, Jun 11, 2012 19:38:45 GMT+00:00
> Subject: Re: [iri] #128: use of the term 'origin'
>
> #128: use of the term 'origin'
>
> #choose ticket.new
>    #when True
>   While reviewing 3987bis for i18n terminology, I came across this
>   paragraph (Section 3.5):
>
>      For compatibility with existing deployed HTTP infrastructure, the
>      following special case applies for schemes "http" and "https" and
>      IRIs whose origin has a document charset other than one which is UCS-
>      based (e.g., UTF-8 or UTF-16).  In such a case, the "query" component
>      of an IRI is mapped into a URI by using the document charset rather
>      than UTF-8 as the binary representation before pct-encoding.  This
>      mapping is not applied for any other scheme or component.
>
>   The term 'origin' could be ambiguous here. It doesn't seem to be
>   referencing the Web Origin Concept (RFC 6454) but instead seems to be
>   based on the "document" (broadly construed) in which the http or https
>   URL is found (e.g., as a hyperlink in an HTML document or perhaps as
>   running text in an email message). It would be good to make that clear.
>    #end
>    #otherwise
>      #if changes_body
> Changes (by stpeter@…):
>
>
>      #end
>      #if changes_descr
>        #if not changes_body and not change.comment and change.author
> Description changed by stpeter@…:
>        #end
>
> --
>      #end
>      #if change.comment
>
> Comment(by stpeter@…):
>
>   One way to remove the ambiguity would be to change "origin" here to
>   something else, but even then I think we'd need additional text. I
>   tentatively propose the following:
>
>      For compatibility with existing deployed HTTP infrastructure, the
>      following special case applies for the schemes "http" and "https"
>      when an IRI is found in a document whose charset is not based on UCS
>      (e.g., not UTF-8 or UTF-16).  In such a case, the "query" component
>      of an IRI is mapped into a URI by using the document charset rather
>      than UTF-8 as the binary representation before pct-encoding.  This
>      mapping is not applied for any other scheme or component.
>      #end
>    #end
> #end
>
> --
> -----------------------+---------------------------------------
>   Reporter:  stpeter@…  |       Owner:  draft-ietf-iri-3987bis@…
>       Type:  defect     |      Status:  new
>   Priority:  minor      |   Milestone:
> Component:  3987bis    |     Version:
>   Severity:  -          |  Resolution:
>   Keywords:             |
> -----------------------+---------------------------------------
>
> Ticket URL:<http://trac.tools.ietf.org/wg/iri/trac/ticket/128#comment:1>
> iri<http://tools.ietf.org/wg/iri/>
>


svg_test_query.svg
(image/svg+xml attachment: svg_test_query.svg)

Received on Tuesday, 10 July 2012 11:28:00 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 10 July 2012 11:28:00 GMT