Re: Suggestion to add "spacing between sentences" to CSS3 Line WD from Shelby Moore on 2002-12-17 (www-style@w3.org from December 2002)

From: Shelby Moore <shelby@coolpage.com>
Date: Mon, 16 Dec 2002 23:46:07 -0600
To: John Lewis <lewi0371@mrs.umn.edu>
Cc: www-style@w3.org
Message-Id: <4.1.20021216231135.018042a0(null)>
>> The "correct" solution would be a CSS style. Whether it can be
>> implemented is another matter[...]
>
>I agree. If you don't use a markup language that determines what
>sentences are, you need something like a :sentence selector (or
>similar solution). But none of that matters until someone manages to
>create an algorithm to decide what a sentence is--that actually
>works--for most languages. Until that happens, it can't go in CSS.
>(*If* it happens--I'm not sure if it's even possible in English,
>especially if you include nonstandard English.)


My intuitive bet is that 90+% accurate algorithms probably already exist,
even we aren't aware of them.  Natural language processing is apparently
quite advanced from when we were in academic setting and keeping up with
research in many diverse disciplines.  My bet is Microsoft has proprietary
research (and plans to use it commercially) given the billions they spend
on research and Gates's vision with tablets and expanding the universe of
what computers can do.  For example, I know when I was doing research on
image quality metrics, Microsoft people were involved in some of the latest
research.

I don't have time to go off on another tangent in natural language
processing, but I do question the assumption here that computers can not
read grammar.  Maybe they still can not.  More likely, it is probably more
of an issue of resources, patents, etc..

Also even with simplistic algorithms, even 20% error rate might be an
improvement over the current results, given that using a style is optional
and given that the worst is a space is slightly more narrow or wider
following some of the rarer constructs in English grammar which are
mistaken by simplistic parser for end of sentence.  I can speak for other
languages, but I would assume latin languages have similar grammar complexity.

On the one hand, I think a specification in CSS is needed to open the door
to implementations (algorithms that might be out there or just around the
corner).  On the hand, I yield to the experts here on this list.  I have
only been here 2 days and I just wanted to make a simple suggestion.  I
happy that so many are now aware of this issue, and felt strongly enough to
think about it and comment.  I think that in itself is a significant
accomplishment.


>Today, you have a few solutions, none of which are very good.


Thanks.  That is what I was trying to say.


> The only
>one that doesn't include content to force a presentation is by
>manually wrapping every sentence in span. And then you're adding tons
>of markup for a tiny stylistic issue.


I did not even consider that option, probably for fear of the impact it
would have on search engine rankings.


>For these reasons, I think an end of sentence character is the best
>solution (and the only solution that could actually be put into
>practice without a huge amount of work).


I agree it is another option that might be useful.


> The big problem is that
>people won't use it--but I don't think it's wrong to deny people the
>ability to because the average person is wrong. Another problem is
>that it's extra work for the author--but as it shouldn't be required,
>that's not a big deal.


In the case of Cool Page, when a user enters a double space after a period,
then we know with reasonable accuracy this is intended to be end of
sentence.  So these we could probably convert to single space with EOS
character.

However, without the parser, we could do no better for users that type
single spaces (most users).  This is why I said I did not think it was the
best idea.  But as another option, maybe it is worthwhile.


>PS: Every typography book I've read has insisted (and my own
>experience leads me to believe) that two or more spaces after a
>sentence make text harder to read (primarily because it creates large
>white gaps in running text, and in some cases causes diagonal or
>vertical lines of white space inside running text).


My understanding is that Jakob Nielsen argues that on the web, useability
takes precedence over other factors that are normal priorities in other
forms of presentation.  And one of his first claims (way back when focus
was on eliminating bad design) was that users _SCAN_ web pages.  E.g. that
users could not be expected to read something from start to finish, unless
they had first skimmed it to determine it's relevance.  How can you scan
(speed read), if you can't quickly find the beginning of sentences?  Yet he
quotes typography experts on the issue of single or double space.  IMO,
that is an example self-contradiction.  What do typographers know about the
web?  Typography has a long history in metal layout.  Only relatively
recently converted to high resolution presentation (printing) devices.


> I don't think this
>is any less true on a low resolution device (like a cellphone or PDA).
>If it is, and redesigning the typeface can't help, then neither can
>CSS (as it cannot increase the resolution of a display device).


Then IMO you don't understand all typography and aliasing issues well.
Notice from my signature that I was (minor) co-author and publisher of
FONTZ!, one of the first ever wysiwyg font editors in the world (on GEM os
before MS Windows).

When designing a font for a low resolution device, there is trade-off
between tight kerning that looks reasonable with word spaces, and spaces
between sentences that can be distinguished rapidly. Especially when the
period becomes a single pixel.  It is mitigated by using fonts well
designed for the web (Verdana, etc), but many users prefer to use the
plethora of poorly designed fonts available for free download on the web.
Font use sells a lot of Cool Page, even it is crock because we can not
(yet) guarantee the visitor has the font.

Also there are many other issues that come into play, which I have
mentioned already.  One that people conveniently ignore on this list so far
is Accessibility for people who are blind in one eye like I am or other
visual handicap.  Then there is just pure user preference.  Why do we give
the user the ability to set a yellow font in 3 point size on white
background?  Because it isn't our job to decide what a user wants.

BTW thanks for your well reasoned reply.  I am flattered by the interest
that was taken in this issue.  I never expected.  I thought it would be
ignored :)

-Shelby Moore
Received on Tuesday, 17 December 2002 00:45:32 UTC