- From: Sebastian Hellmann <hellmann@informatik.uni-leipzig.de>
- Date: Mon, 25 Jun 2012 18:24:45 +0200
- To: Jirka Kosek <jirka@kosek.cz>
- CC: Felix Sasaki <fsasaki@w3.org>, MultilingualWeb-LT Working Group <public-multilingualweb-lt@w3.org>
Hi Jirka and Felix, On 06/22/2012 10:23 AM, Jirka Kosek wrote: > On 21.6.2012 22:22, Felix Sasaki wrote: > >> Just FYI, without arguing for anything, for the ITS 1.0 test suite >> http://www.w3.org/International/its/tests/ >> we created something like this >> /{}myMetaDoc/{}body[1]/{}insert[1]/{myChineseMakupLanguage}书籍[1] >> to identify each element and attribute note - taken from a "path" attribute >> at >> http://www.w3.org/International/its/tests/test1/Translate1-result.xml >> >> The format expands namespace (if there is none, there is empty curly >> brackets). > I think that given we will be dealing mainly with HTML we can omit > namespace in oder to get more concise syntax. We can cover possible > SVG/MathML island with xmlns() XPointer scheme: > > #xmlns(svg=http://www.w3.org/2000/svg)xpath(/html[1]/body[1]/div[3]/svg:svg[1]/svg:g[1]/svg:text[7]) > The only thing left now is the syntax, I guess. In http://tools.ietf.org/html/rfc2396#section-2.4.3 some of the characters were considered "unwise": unwise = "{" | "}" | "|" | "\" | "^" | "[" | "]" | "`" But they have been upgraded to reserved characters: http://tools.ietf.org/html/rfc3986#section-2.2 fragment = *( pchar / "/" / "?" ) with pchar = unreserved / pct-encoded / sub-delims / ":" / "@" unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~" XXX reserved = gen-delims / sub-delims XXX gen-delims = ":" / "/" / "?" / "#" / "[" / "]" / "@" sub-delims = "!" / "$" / "&" / "'" / "(" / ")" / "*" / "+" / "," / ";" / "=" The part after the crosshatch "#" isn't part of the URI, anyhow. Should we forward the syntax question to the uri@w3.org list? "[" and "]" do not seem to be ok. According to the RFC these are the ones we can use: ALPHA / DIGIT / "-" / "." / "_" / "~"/ "/" / "?" / ":" / "@" / "!" / "$" / "&" / "'" / "(" / ")" / "*" / "+" / "," / ";" / "=" I think the semantics are pretty straightforward. These questions remain: - How would the syntax look like to select attributes? - should we only select elements or attributes? - should we only select one element or allow to select all of a certain type e.g. html[1]/body[1]/div ? All the best, Sebastian -- Dipl. Inf. Sebastian Hellmann Department of Computer Science, University of Leipzig Projects: http://nlp2rdf.org , http://dbpedia.org Homepage: http://bis.informatik.uni-leipzig.de/SebastianHellmann Research Group: http://aksw.org
Received on Monday, 25 June 2012 16:25:34 UTC