Re: draft-wilde-text-fragment-01 (was: Including 'fragment identifier semantics' ...) from Erik Wilde on 2002-09-18 (www-tag@w3.org from September 2002)

From: Erik Wilde <net.dret@dret.net>
Date: Wed, 18 Sep 2002 20:58:24 +0200
To: Chris Lilley <chris@w3.org>
CC: gerald@w3.org, David Hopwood <david.hopwood@zetnet.co.uk>, ietf-types@iana.org, uri@w3.org, www-tag@w3.org
Message-ID: <3D88CCD0.4080104@dret.net>

hi there.

Chris Lilley wrote:
> DH> Dan Kohn wrote:
> DH>  http://www.ietf.org/internet-drafts/draft-wilde-text-fragment-01
> DH> Yuch. This is overcomplicated and not sufficiently useful to justify the
> DH> risk of security flaws in parsing regular expressions. There's a good
> DH> case for supporting a simple "#<line-number>" syntax for text/plain, but
> DH> nothing more IMHO.
> Depends what you want it for. For Ted Nelson-style standoff markup,
> you would need character addressing and ranges, too. Matching on text
> patterns is handy to make more robust links if the document is being
> edited (and would be even more handy if combinations, such as "the
> seven lines after 'Non-ASCII Characters in Regular Expressions'" were
> possible) (although, there are still problems if a revision takes that
> to eight lines, or a 'page break' intervenes as here).

sure there is a trade-off between simplicity and complexity. and 
fragment identifiers can always break. so why bother? i am nor sure that 
  the regex stuff is necessary, but it would be nice to have in some 
cases, and this is why i included it in the draft. unfortunately, people 
are so busy doing all kinds of xml-stuff, so the feedback so far has 
been minimal. most people seem to like the regex approach.

> There are some unresolved questions of detail - what does
> #char(6) point to in the text file containing "Hi World" encoded in
> UTF-16, for example - "o" or "W" (is the BOM two characters, or a
> thing before the first character).

thanks!

> It might be worth mentioning that the encoding of the text in a match
> string might well be different to the encoding of the text document
> containing the string to be matched.

good point. applications must be able to transcode characters.

> The pointer to the (recently expired but still there)
> http://www.ietf.org/internet-drafts/draft-borden-frag-00
> was valuable.

however, i have received feedback that the parentheses-syntax is too 
complicated and should be replaced with a more ascii-ish syntax such as 
#char=2-4;char=7-3335, and i kind of liked the idea. i have no idea 
where the borden draft ist going, but i think that it will disappear. 
does anybody have news about this?

> text-scheme   =  ( char-scheme / line-scheme / regex-scheme )
> should presumably be
> text-scheme   =  ( char-scheme / line-scheme / match-scheme )

yup.

> For non-normative reference to BRE, the following might be useful:
>  An Introduction to Posix Regexps
>  http://www.regexps.com/src/docs.d/hackerlab/html/introduction-to-regexps.html
>  Regular Expressions
>  http://www.opengroup.org/onlinepubs/7908799/xbd/re.html
>  The latter includes a BRE grammar.

thanks.

so, apart from the minor fixes, does anybody have an opinion about the 
prantheses vs. equals syntax issue? i also thought about adding a 
checksum facility (md5 or something along these lines), so that fragment 
identifiers could recognize a particular version of a resource. any 
opinions about this feature (it would be optional, so that applications 
would be allowed to ignore the checksum)?

cheers,

erik wilde  -  tel:+41-1-6325132  -  fax:+41-1-6321035
           mailto:net.dret@dret.net -  http://dret.net/
           computer engineering and networks laboratory
           swiss federal institute of technology  (eth)
           * try not. do, or do not. there is no try. *

Received on Wednesday, 18 September 2002 14:59:45 UTC