- From: Erik Wilde <net.dret@dret.net>
- Date: Wed, 18 Sep 2002 20:58:24 +0200
- To: Chris Lilley <chris@w3.org>
- CC: gerald@w3.org, David Hopwood <david.hopwood@zetnet.co.uk>, ietf-types@iana.org, uri@w3.org, www-tag@w3.org
hi there. Chris Lilley wrote: > DH> Dan Kohn wrote: > DH> http://www.ietf.org/internet-drafts/draft-wilde-text-fragment-01 > DH> Yuch. This is overcomplicated and not sufficiently useful to justify the > DH> risk of security flaws in parsing regular expressions. There's a good > DH> case for supporting a simple "#<line-number>" syntax for text/plain, but > DH> nothing more IMHO. > Depends what you want it for. For Ted Nelson-style standoff markup, > you would need character addressing and ranges, too. Matching on text > patterns is handy to make more robust links if the document is being > edited (and would be even more handy if combinations, such as "the > seven lines after 'Non-ASCII Characters in Regular Expressions'" were > possible) (although, there are still problems if a revision takes that > to eight lines, or a 'page break' intervenes as here). sure there is a trade-off between simplicity and complexity. and fragment identifiers can always break. so why bother? i am nor sure that the regex stuff is necessary, but it would be nice to have in some cases, and this is why i included it in the draft. unfortunately, people are so busy doing all kinds of xml-stuff, so the feedback so far has been minimal. most people seem to like the regex approach. > There are some unresolved questions of detail - what does > #char(6) point to in the text file containing "Hi World" encoded in > UTF-16, for example - "o" or "W" (is the BOM two characters, or a > thing before the first character). thanks! > It might be worth mentioning that the encoding of the text in a match > string might well be different to the encoding of the text document > containing the string to be matched. good point. applications must be able to transcode characters. > The pointer to the (recently expired but still there) > http://www.ietf.org/internet-drafts/draft-borden-frag-00 > was valuable. however, i have received feedback that the parentheses-syntax is too complicated and should be replaced with a more ascii-ish syntax such as #char=2-4;char=7-3335, and i kind of liked the idea. i have no idea where the borden draft ist going, but i think that it will disappear. does anybody have news about this? > text-scheme = ( char-scheme / line-scheme / regex-scheme ) > should presumably be > text-scheme = ( char-scheme / line-scheme / match-scheme ) yup. > For non-normative reference to BRE, the following might be useful: > An Introduction to Posix Regexps > http://www.regexps.com/src/docs.d/hackerlab/html/introduction-to-regexps.html > Regular Expressions > http://www.opengroup.org/onlinepubs/7908799/xbd/re.html > The latter includes a BRE grammar. thanks. so, apart from the minor fixes, does anybody have an opinion about the prantheses vs. equals syntax issue? i also thought about adding a checksum facility (md5 or something along these lines), so that fragment identifiers could recognize a particular version of a resource. any opinions about this feature (it would be optional, so that applications would be allowed to ignore the checksum)? cheers, erik wilde - tel:+41-1-6325132 - fax:+41-1-6321035 mailto:net.dret@dret.net - http://dret.net/ computer engineering and networks laboratory swiss federal institute of technology (eth) * try not. do, or do not. there is no try. *
Received on Wednesday, 18 September 2002 14:59:45 UTC