- From: Chris Lilley <chris@w3.org>
- Date: Fri, 6 Sep 2002 22:33:15 +0200
- To: gerald@w3.org, David Hopwood <david.hopwood@zetnet.co.uk>
- CC: ietf-types@iana.org, uri@w3.org, www-tag@w3.org, net.dret@dret.net
On Thursday, September 5, 2002, 9:27:35 PM, David wrote: DH> -----BEGIN PGP SIGNED MESSAGE----- DH> Dan Kohn wrote: >> It would also be worth noting and/or commenting on this draft: >> >> http://www.ietf.org/internet-drafts/draft-wilde-text-fragment-01 DH> Yuch. This is overcomplicated and not sufficiently useful to justify the DH> risk of security flaws in parsing regular expressions. There's a good DH> case for supporting a simple "#<line-number>" syntax for text/plain, but DH> nothing more IMHO. Depends what you want it for. For Ted Nelson-style standoff markup, you would need character addressing and ranges, too. Matching on text patterns is handy to make more robust links if the document is being edited (and would be even more handy if combinations, such as "the seven lines after 'Non-ASCII Characters in Regular Expressions'" were possible) (although, there are still problems if a revision takes that to eight lines, or a 'page break' intervenes as here). The draft does usefully distinguish between characters and bytes, and counts the former, minus (normalized) line endings,and wisely stays away from attempting to count words, sentences, or paragraphs (though the latter would be tractable). There are some unresolved questions of detail - what does #char(6) point to in the text file containing "Hi World" encoded in UTF-16, for example - "o" or "W" (is the BOM two characters, or a thing before the first character). It also distinguishes between point and range references, which is vital (particularly since fragments in HTML and, less commonly, in XML are often incorrectly assumed to be point references. It might be worth mentioning that the encoding of the text in a match string might well be different to the encoding of the text document containing the string to be matched. The pointer to the (recently expired but still there) http://www.ietf.org/internet-drafts/draft-borden-frag-00 was valuable. text-scheme = ( char-scheme / line-scheme / regex-scheme ) should presumably be text-scheme = ( char-scheme / line-scheme / match-scheme ) I thought the draft a little cautious regarding IRI although given the content of http://www.ietf.org/internet-drafts/draft-masinter-url-i18n-09.txt perhaps this is wise. I hope that http://www.w3.org/International/2001/draft-masinter-url-i18n-08.txt gets republished soon. For non-normative reference to BRE, the following might be useful: An Introduction to Posix Regexps http://www.regexps.com/src/docs.d/hackerlab/html/introduction-to-regexps.html Regular Expressions http://www.opengroup.org/onlinepubs/7908799/xbd/re.html The latter includes a BRE grammar. -- Chris mailto:chris@w3.org
Received on Friday, 6 September 2002 16:33:38 UTC