Joint meeting at TPAC from HTML and i18n core WG minutes 2007-11-09 from Felix Sasaki on 2007-11-09 (public-html@w3.org from November 2007)

From: Felix Sasaki <fsasaki@w3.org>
Date: Sat, 10 Nov 2007 02:06:44 +0900
To: public-html@w3.org, public-i18n-core@w3.org
Message-ID: <473493A4.3090901@w3.org>

... are at http://www.w3.org/2007/11/09-i18n-minutes.html and below as text.

Felix

[1]W3C

[1] http://www.w3.org/

- DRAFT -

SV_MEETING_TITLE

9 Nov 2007

See also: [2]IRC log

[2] http://www.w3.org/2007/11/09-i18n-irc

Attendees

Present
Regrets
Chair
SV_MEETING_CHAIR

Scribe
fantasai

Contents

* [3]Topics
1. [4]Validator checking entity reqs
* [5]Summary of Action Items
_________________________________________________________

<aphillip_> [6]http://www.w3.org/html/wg/html5/#determining0

[6] http://www.w3.org/html/wg/html5/#determining0

<anne>
[7]http://www.whatwg.org/specs/web-apps/current-work/multipage/secti
on-parsing.html

[7]
http://www.whatwg.org/specs/web-apps/current-work/multipage/section-parsing.html

<aphillip_>
[8]http://lists.w3.org/Archives/Public/public-i18n-core/2007OctDec/0
088.html

[8]
http://lists.w3.org/Archives/Public/public-i18n-core/2007OctDec/0088.html

16: 13 -!- Irssi: Join to #i18n was synced in 0 secs
... 13 < Hixie>
[9]http://www.whatwg.org/specs/web-apps/current-work/multipage/secti
on-parsing.html#parsing
... 13 < Hixie>
[10]http://www.whatwg.org/specs/web-apps/current-work/multipage/sect
ion-parsing.html#the-input0

[9]
http://www.whatwg.org/specs/web-apps/current-work/multipage/section-parsing.html#parsing
[10]
http://www.whatwg.org/specs/web-apps/current-work/multipage/section-parsing.html#the-input0

<scribe> ScribeNick: fantasai

Addison: There was a badly-titled thread saying something about
making windows-1252 the default encoding.
... Our first reaction was, wouldn't it be nice if that were
something else, say utf-8
... At the same time we recognize that there's a legacy encoding
issue, since previous versions of HTML required iso-????

<hsivonen> [11]http://hsivonen.iki.fi/charmod-checking/

[11] http://hsivonen.iki.fi/charmod-checking/

<hsivonen> [12]http://hsivonen.iki.fi/charmod-norm-checking/

[12] http://hsivonen.iki.fi/charmod-norm-checking/

Addison: If you actually look at the sections, 8.2 and ....
... It does not in fact say that the default encoding of the
universe at large is windows 1252
... In the sequence there's looking at byte sequences, then using
heuristics, etc.
... at the end of that sequence there's a paragraph that says
... if all else fails, you have to supply some
implementation-defined default and we recommend you do these things.
... And windows-1252 just appears out of nowhere.
... One thought we had was for us to provide some information on why
windows-1252 is preferable and how it differs from the standard ISO
encodings.

<Hixie> "

<Hixie> When a user agent would otherwise use the ISO-8859-1
encoding, it must instead use the Windows-1252 encoding."

Henri: that part is a violation of charmod

Addison doesn't consider that a violation of charmod

Addison: There are superset encodings and they're often tagged with
the subset encodings.
... using the superset interpretation doesn't conflict with using
the subset interpretation
... We're not proposing a substantive change, just providing more
justification for what you're doing.
... We also looked at the structure of the paragraph, and had some
concerns.
... one was the phrasing of "western demographics" etc
... We had several reactions.
... Oene it's not clear what a western demographic and how you tell
when you're talking to one on the internet.
... We proposed 2 things, one of which was to turn two things
around.
... We have a love of utf-8, and we'd like you to mention that one
first and then the legacy thing
... We also think the wording could be changed somewhat on the
windows-1252 to say that "in a legacy context, if you have to guess,
you should guess this one"

Ian: I haven't gotten to that issue yet, haven't looked at it in
detail, sounds ok

Richard: Is it purely editorial?

Addison: It doesn't change the result, it just changes how you
explain the result.

Ian: Do you have any recommendation for dealing with say Japan and
other parts of East Asia?

Addison: There are a variety of things in step #7 that allow for
various heuristics and sniffing.

Ian: windows-1252 is fine for US and UK, but what about other
places?

Felix: Depends on what device.

Addison: Most implementations use information in the browser, e.g.
what the browser uses or if a narrower auto-detect is set (as for
Japanese)

Ian: So in the Japanese cases, you expect that the rest of the steps
would take care of it?

Addison: I think you'd trap those encodings before you get to step
7(?)
... Might want to mention that in some cases of getting a subset
encoding to use the superset encoding.
... I think we can provide that information.

Ian: I believe when I wrote that section that I checked a browser
and that was the only mapping they had.

Addison: Most browsers dont' just do GBK, but do ????
... There are some cases, such as in Japan, where the byte patterns
are completely different.
... where the encoding schemes are different even though the charset
is the same
... that kind of autodetection is a separate thing
... I think this is still valid.
... THe only question I have is, if you're thinking "what should
happen in step 7" is some language-dependent or context-dependent
thing ...

Hixie: In this final step, you don't have any information from the
content

Addison: You might want to think about splitting step 7 and doing a
utf-8 detection first
... UTF-8 has recognizable byte patterns, it would be great to put
that first before saying "use your favorite legacy encoding"

Hixie: The concern is what happens if the user enters some bytes
into the form and then submits it?

Addison: We were just looking at that in the i18n working group

Hixie: We'd have to make sure that that's what the server was
expecting.

Felix what information are you looking at to guess what encoding the
user applies?

Hixie: Typically different localizations of the browser have
different default encodings.
... well, the email's in my pile. I don't know when I'll get to it.

Addison: We'll look at superset encodings and try to write up a
document that you can reference.

Introductions

Richard Ishida: W3C Internationalization Lead

Anne van Kesteren: Opera Software

Elika: fantasai, CSSWG Invited Expert, works on international text
layout

Addison Phillips: Yahoo, i18n wg

Amit Parashar: something-or-other chair

Henri Sivonen: working on HTML5 conformance checker

Ian Hickson: HTML5 editor

Felix Sasaki: i18n Core, i18n ITS and Web Services Policy WG [W3C]

<plh> Philippe Le Hegaret: W3C, Architecture Domain (XML, Web
Services, i18n), and Video

Ishida: Can you explain the alt text issue?

<najib> Najib Tounsi, W3C Morocco Office Mgr.

Ishida: We believe that you should never put human-readable text in
an attribute value because you can't put markup in it
... which is important for various i18n reasons: bidi, language
annotation, ruby, etc.

Hixie: We still have the <img> element; we can't get rid of it. It
still has alt attr, because it's had that.
... We can't give it content because HTML parsers all close it right
after the start tag.
... We also have the <object> tag, which has full fallback
capabilities.

Ishida: Would the group advise the <object> tag then?

Hixie: I don't think we'll have a recommendation one way or another;
if your fallback content needs element content, then you'll have to
use <object>
... We've been doing some work, e.g. Acid2, on making sure the
<object> tag works properly in various browsers.

Ishida asks about some XHTML2 stuff

Hixie: THe XHTML2 group did two things, one was switching some
attributes into elements, e.g. title attributes.
... Then they also went and started usng rdf for everything: we are
certainly not going to do that.
... For the first one, I'm not convinced that the benefits of using
an element for these things is better than the costs
... We can try not to do things like that in the future though
... This problem comes up in many places, e.g. in DOM APIs that take
a string.
... There are also places where we can't make such changes, such as
the <title> element
... whose content winds up in places like filenames where you can't
have structured markup anyway

Ishida: Can you use bidi in filenames?

Hixie: probably, but I'm not going to recommend it

Ishida: We might need to start thinking about how to convert text
from markup to strings with bidi control characters.

<anne> (I think HTML 5 should get &rlo;, &lro;, and &pdf; (or
something in that direction) for BiDi. These are already in IE.)

Hixie: We did consider having a DOM attribute that would pull out
e.g. bidi control characters from the markup and alt text from
images
... not sure where that's going
... I would recommend finding solutions for plaintext, since that
will work for both

Discussion of that

language tags are in Unicode, but were deprecated as soon as they
were added: they were added as deprecated and should never be used

<anne> (event though the characters they map to are apparently
deprecated)

discussion of markup-plaintext thing

<apppp> reference RFC 3066 should point to BCP 47

Addison notes that the i18n group needs to review the date parsing
things

<najib> +1 for to add &rle, ..., &pdf; in HTML

Henri notes that it's using ISO dates anyway

najib, if we're adding more entities I want &zwsp;

<najib> It depends on usage frequences. :-)

Validator checking entity reqs

Henri: I don't check that character entities are only used for
characters that are unclear.
... because I can't tell mechanically whether the character is
unclear

<anne> fantasai, I think &zwsp; is also supported by IE

cool

let's add it :P

all the characters next to it have names,

zwnj, zwj etc

<najib> I don't have IE on MacOS :-( & :-)

Ishida explain that this part of charmod is about best practices

it's not should in the normative sense

Elika: Maybe you should go through the document and change the
wording of should sentences that don't match RFC2119 to something
else

Ishida: Well, we mean it that way for authors. Maybe we need to
create different classes and explain which recommendations apply to
which

<fsasaki> [13]http://hsivonen.iki.fi/charmod-norm-checking/

[13] http://hsivonen.iki.fi/charmod-norm-checking/

Henri: I documented which constructs in HTML5 result in a continuous
string
... I don't have any other comment there except that I wrote this
and it is available :)
... I have another comment, but its targetted at the unicode/icu
specs

Ishida: Might want to post to the unicode list

<apppp> Title: I18N / HTML5 break out session

<apppp> Scribe: fantasai

<apppp> ScribeNick: fantasai

Summary of Action Items

[End of minutes]
_________________________________________________________

Minutes formatted by David Booth's [14]scribe.perl version 1.128
([15]CVS log)
$Date: 2007/11/09 17:05:50 $
_________________________________________________________

[14] http://dev.w3.org/cvsweb/~checkout~/2002/scribe/scribedoc.htm
[15] http://dev.w3.org/cvsweb/2002/scribe/

Scribe.perl diagnostic output

[Delete this section before finalizing the minutes.]
This is scribe.perl Revision: 1.128 of Date: 2007/02/23 21:38:13
Check for newer version at [16]http://dev.w3.org/cvsweb/~checkout~/2002
/scribe/

[16] http://dev.w3.org/cvsweb/~checkout~/2002/scribe/

Guessing input format: RRSAgent_Text_Format (score 1.00)

Succeeded: s/THere/There/
Succeeded: s/asks a question/what information are you looking at to gue
ss what encoding the user applies?/
Succeeded: s/psot/post/
Found ScribeNick: fantasai
Found Scribe: fantasai
Found ScribeNick: fantasai
WARNING: No scribe lines found matching ScribeNick pattern: <fantasai>
...

WARNING: No "Present: ... " found!
Possibly Present: Addison Elika Felix Henri Hixie Ian Ishida MikeSmith
Philip Richard ScribeNick Title amit anne aphillip_ apppp fsasaki hsivo
nen jgraham_ najib plh smedero
You can indicate people for the Present list like this:
<dbooth> Present: dbooth jonathan mary
<dbooth> Present+ amy

WARNING: No meeting title found!
You should specify the meeting title like this:
<dbooth> Meeting: Weekly Baking Club Meeting

WARNING: No meeting chair found!
You should specify the meeting chair like this:
<dbooth> Chair: dbooth

Got date from IRC log name: 9 Nov 2007
Guessing minutes URL: [17]http://www.w3.org/2007/11/09-i18n-minutes.htm
l
People with action items:

[17] http://www.w3.org/2007/11/09-i18n-minutes.html

WARNING: Input appears to use implicit continuation lines.
You may need the "-implicitContinuations" option.

End of [18]scribe.perl diagnostic output]

[18] http://dev.w3.org/cvsweb/~checkout~/2002/scribe/scribedoc.htm

Received on Friday, 9 November 2007 17:07:11 UTC