permissive line breaks; RLURIs

All,

Sorry, a little, to be sending this blindly into a list to
which I do not subscribe and whose archives I haven't found,
but, onward.

Everyone has seen these monstrosities:

  href="file:/home/xanthian/ckouts/current_ij/usr/prod/ia/res/external-source/lang/en_US.ISO_8859-1/properties/bestserv.ccat"

and that's a completely typical case where I work, and in fact abbreviated
slightly by use of a symlink current_ij to hop past a couple of usually
uninteresting directory levels.

Worse yet are the URIs with appended forms data going back to a server.
Most of these are automatically generated, but if it is desired to capture
them hard wired into an original HTML source script document for reuse 
without walking the path that generated the form return, they quickly become
nightmares for the user.

These Ridiculously Long Uniform Resource Identifiers, or RLURIs,
cause problems in at least a few places:

1) They make producing a pretty printed HTML source file an exercise in
   futility:

                        <li>
                            <a
                                href="file:/home/xanthian/ckouts/current_ij/usr/prod/ia/res/external-source/lang/en_US.ISO_8859-1/properties/bestserv.ccat"
                            >
                                Example of how Whistle does locale
                                information
                            </a>
                        </li>

   There's nothing the least bit "pretty" about that.

2) They sink the HTML human author and human reader in ugly complexity,
   increasing the risk of error in scripting and misunderstanding in
   reviewing scripts containing such URIs.  Compare:

            href="http://maps.expedia.com/QMResults.asp?P=41.7218058571429,-88.4274673754479,1,10+S%2D+745+Clarendon+Hills+Rd%0D%0ABLD+41+APT+206%0D%0AHinsdale%2C+Illinois&A=1&C=41.803288%2C-87.927486&Title=10+S%2D+745+Clarendon+Hills+Rd%0D%0ABLD+41+APT+206%0D%0AHinsdale%2C+Illinois%2C+United+States&PR=1&O=-9.05357142857143%2C-41.4642857142857&L=USA0409" 

   with a C syntax style alternative for declaring very long strings
   by the convention that quoted strings separated only by whitespace
   are really the join of the separate parts:

    href="http://maps.expedia.com/QMResults.asp?"
         "P=41.7218058571429,-88.4274673754479,1,"
             "10+S%2D+745+Clarendon+Hills+Rd%0D%0A"
             "BLD+41+APT+206%0D%0A"
             "Hinsdale%2C+Illinois"
         "&A=1"
         "&C=41.803288%2C-87.927486"
         "&Title=10+S%2D+745+Clarendon+Hills+Rd%0D%0A"
             "BLD+41+APT+206%0D%0A"
             "Hinsdale%2C+Illinois%2C"
             "+United+States"
         "&PR=1"
         "&O=-9.05357142857143%2C-41.4642857142857"
         "&L=USA0409"

   which gives the author some chance of getting the script correct
   in a limited number of tries, and the reader some hope of wading
   through the gobbledegook without stumbling.

3) They stress-test common editing tools; the above line is approaching
   the limits of the original vi() editor, for example, and is nowhere
   close to an extreme case.

4) When displayed, rather than followed, in an HTML document, they
   are just as much of a mess, plus they destroy the utility of most
   paragraph flowing or table cell column width automated setting
   schemes.

I'd like to propose two changes for the next generation of the HTML
standard:

A) Provide a way to author HTML source scripts that includes some
   mechanism for writing RLURIs split across lines, the above C
   syntax for use within an HTML tag being one possible mechanism,
   a "smart hyphen" that eats whitespace on the following line
   might be another.

B) Provide a way to display RLURIs from within HTML scripts that does
   not defeat the text and table layout success of the display agents.

The first proposal is adequately explained by items 1 to 3 above.  To
expand on item 4 a bit:

HTML currently provides a mechanism to _force_ a line break, and a
mechanism to _prohibit_ a line break.  What is missing is a mechanism
to _permit_ a line break at a place where a display agent might not
normally "see" such an opportunity.

There are two options available that come to mind immediately for
helping the display agents:

A simple new HTML tag that indicates a permissive break point, much
like the soft hyphens of many word processors; chose for example
<PBR> for "permit break", then:

        file:/home<PBR>/xanthian<PBR>/ckouts<PBR>/current_ij<PBR>/usr<PBR>/prod<PBR>/ia<PBR>/res<PBR>/external-source<PBR>/lang<PBR>/en_US.ISO_8859-1<PBR>/properties<PBR>/bestserv.ccat

would permit the rendered version of the RLURI to be split before
any slantbar but the first, so that the rendering the URI in a
table column might look like this


        file:/home
        /xanthian/ckouts
        /current_ij/usr
        /prod/ia/res
        /external-source
        /lang
        /en_US.ISO_8859-1d
        /properties
        /bestserv.ccat

providing the minimum table cell width and then height without
breaking up the longest directory level into parts.

Where the need for that column of a table to be narrow were
less, or the browser window wider, the display agent could
choose an alternate rendering stacked thusly:

        file:/home/xanthian/ckouts
        /current_ij/usr/prod/ia
        /res/external-source/lang
        /en_US.ISO_8859-1d
        /properties/bestserv.ccat

This would greatly simplify what is otherwise a time consuming
task of picking out such breakpoints by hand, and one that
does not nicely generalize to various width browser windows.

A more complex alternative, but one requiring less effort on
the part of the human or automatic HTML script writing tool
to search through and mark every possible permitted break point
would use matched beginning and ending permitted break tags,
instead of an unmatched tag, and might look like this:

    file:/home<PBR beforetoken="/">/xanthian/ckouts/current_ij/usr/prod/ia/res/external-source/lang/en_US.ISO_8859-1/properties/bestserv.ccat</pbr>

with the same rendering output possibilities as above.  The nastier
case shown above can also be handled moderately nicely "this way":

http://<PBR beforetoken="/" aftertoken="%2C" aftertoken "," aftertoken="?" aftertoken="%0A%0D" beforetoken="&amp;" beforetoken="+" aftertoken="%2D" aftertoken=".">maps.expedia.com/QMResults.asp?P=41.7218058571429,-88.4274673754479,1,10+S%2D+745+Clarendon+Hills+Rd%0D%0ABLD+41+APT+206%0D%0AHinsdale%2C+Illinois&A=1&C=41.803288%2C-87.927486&Title=10+S%2D+745+Clarendon+Hills+Rd%0D%0ABLD+41+APT+206%0D%0AHinsdale%2C+Illinois%2C+United+States&PR=1&O=-9.05357142857143%2C-41.4642857142857&L=USA0409</PBR>

(though we sure aren't conserving bandwidth) with the resulting
rendering some abutted joining, as needed for the space limitations
in the rendered form, of the following lines:

    http://maps.
    expedia.
    com
    /QMResults.asp?
    P=41.7218058571429,
    -88.4274673754479,
    1,
    10
    +S%2D
    +745
    +Clarendon+Hills
    +Rd%0D%0A
    BLD
    +41
    +APT
    +206%0D%0A
    Hinsdale%2C
    +Illinois
    &A=1
    &C=41.803288%2C
    -87.927486
    &Title=10
    +S%2D
    +745
    +Clarendon
    +Hills
    +Rd%0D%0A
    BLD
    +41
    +APT
    +206%0D%0A
    Hinsdale%2C
    +Illinois%2C
    +United
    +States
    &PR=1
    &O=-9.05357142857143%2C
    -41.4642857142857
    &L=USA0409

For what it's worth.  Ideas are cheap, implementing them is the
tough part, while standardizing them before a proof of concept
implementation exists is next to impossible.  [Been there, done
that; voting member ANSI X3H3, 1977 to 1981.]

Comments intended for my eyes back to me as well as the list,
please.  My quota of subscribed mailing lists is topped out somewhere
above my ability to cope with the influx, already.

               ===== random archival quality quote =====

We  handle  four  billion calls a year, for everyone from presidents and
kings to the scum of the earth.  So your call doesn't go through once in
a  while, or you get billed for a call or two you didn't make.  We don't
care.  We don't have to, we're the phone company.
                                                          -- Lily Tomlin

--
Kent Paul Dolan.
<xanthian@well.com> <xanthian@aztec.asu.edu> <xanthian@whistle.com>

Received on Monday, 29 November 1999 03:36:58 UTC