Re: text/* types and charset defaults

Julian Reschke wrote:

 [Removing the default Latin-1] 
> Would that break anything in practice today?

The only "victims" could be text/html for HTML 2 (maybe 3.2)
relying on this default.  That won't break HTTP, it did its
job, get object X from A to B (in the case of a GET).  What
the HTTP client later does with X is its business.  Stuff
like <meta http-equiv="Content-Type" ... is not really HTTP,
and we're not trying to add a "missing" meta to HTML 2 here.

As far as HTML 2 was "broken" it was fixed 1997 by RFC 2070,
nothing to do 2008 in HTTP.  Please correct me if I miss an
obvious clue.

> "When in canonical form, media subtypes of the "text" type
>  use CRLF as the text line break.

Is "canonical" a shorthand for "MIME compatible" ?  

>  HTTP relaxes this requirement and allows the transport of
>  text media with plain CR or LF alone representing a line 
>  break when it is done consistently for an entire entity-body.

I wonder why HTTP cares about lineends in text media types,
it has no line buffer with 1000 characters like SMTP.  HTTP
only needs to spot the begin and the end of the body, or the
similar task for chunks where that's used.

>  HTTP applications MUST accept CRLF, bare CR, and bare LF as
>  being representative of a line break in text media received
>  via HTTP.

Why should HTTP care at all about lineends in a body, it's not
supposed to fix them into local lineends on the fly, or is it ?
  
>  HTTP allows the use of whatever octet sequences are defined 
>  by that character set to represent the equivalent of CR and
>  LF for line breaks.

In other words it doesn't care, with both UTF-16, both UTF-32,
and various ECDICs there are many charsets with other ideas,
additional Unicode points to indicate lineends, all irrelevant
for HTTP as the task to get content X from A to B.

> Does this need fixing?

Maybe it needs trimming, I'm not sure.

 Frank

Received on Sunday, 20 January 2008 15:45:15 UTC