- From: Erik Wilde <net.dret@dret.net>
- Date: Wed, 18 Sep 2002 20:58:24 +0200
- To: Chris Lilley <chris@w3.org>
- CC: gerald@w3.org, David Hopwood <david.hopwood@zetnet.co.uk>, ietf-types@iana.org, uri@w3.org, www-tag@w3.org
hi there.
Chris Lilley wrote:
> DH> Dan Kohn wrote:
> DH> http://www.ietf.org/internet-drafts/draft-wilde-text-fragment-01
> DH> Yuch. This is overcomplicated and not sufficiently useful to justify the
> DH> risk of security flaws in parsing regular expressions. There's a good
> DH> case for supporting a simple "#<line-number>" syntax for text/plain, but
> DH> nothing more IMHO.
> Depends what you want it for. For Ted Nelson-style standoff markup,
> you would need character addressing and ranges, too. Matching on text
> patterns is handy to make more robust links if the document is being
> edited (and would be even more handy if combinations, such as "the
> seven lines after 'Non-ASCII Characters in Regular Expressions'" were
> possible) (although, there are still problems if a revision takes that
> to eight lines, or a 'page break' intervenes as here).
sure there is a trade-off between simplicity and complexity. and
fragment identifiers can always break. so why bother? i am nor sure that
the regex stuff is necessary, but it would be nice to have in some
cases, and this is why i included it in the draft. unfortunately, people
are so busy doing all kinds of xml-stuff, so the feedback so far has
been minimal. most people seem to like the regex approach.
> There are some unresolved questions of detail - what does
> #char(6) point to in the text file containing "Hi World" encoded in
> UTF-16, for example - "o" or "W" (is the BOM two characters, or a
> thing before the first character).
thanks!
> It might be worth mentioning that the encoding of the text in a match
> string might well be different to the encoding of the text document
> containing the string to be matched.
good point. applications must be able to transcode characters.
> The pointer to the (recently expired but still there)
> http://www.ietf.org/internet-drafts/draft-borden-frag-00
> was valuable.
however, i have received feedback that the parentheses-syntax is too
complicated and should be replaced with a more ascii-ish syntax such as
#char=2-4;char=7-3335, and i kind of liked the idea. i have no idea
where the borden draft ist going, but i think that it will disappear.
does anybody have news about this?
> text-scheme = ( char-scheme / line-scheme / regex-scheme )
> should presumably be
> text-scheme = ( char-scheme / line-scheme / match-scheme )
yup.
> For non-normative reference to BRE, the following might be useful:
> An Introduction to Posix Regexps
> http://www.regexps.com/src/docs.d/hackerlab/html/introduction-to-regexps.html
> Regular Expressions
> http://www.opengroup.org/onlinepubs/7908799/xbd/re.html
> The latter includes a BRE grammar.
thanks.
so, apart from the minor fixes, does anybody have an opinion about the
prantheses vs. equals syntax issue? i also thought about adding a
checksum facility (md5 or something along these lines), so that fragment
identifiers could recognize a particular version of a resource. any
opinions about this feature (it would be optional, so that applications
would be allowed to ignore the checksum)?
cheers,
erik wilde - tel:+41-1-6325132 - fax:+41-1-6321035
mailto:net.dret@dret.net - http://dret.net/
computer engineering and networks laboratory
swiss federal institute of technology (eth)
* try not. do, or do not. there is no try. *
Received on Wednesday, 18 September 2002 14:59:45 UTC