- From: Sebastian Hellmann <hellmann@informatik.uni-leipzig.de>
- Date: Mon, 25 Jun 2012 18:24:45 +0200
- To: Jirka Kosek <jirka@kosek.cz>
- CC: Felix Sasaki <fsasaki@w3.org>, MultilingualWeb-LT Working Group <public-multilingualweb-lt@w3.org>
Hi Jirka and Felix,
On 06/22/2012 10:23 AM, Jirka Kosek wrote:
> On 21.6.2012 22:22, Felix Sasaki wrote:
>
>> Just FYI, without arguing for anything, for the ITS 1.0 test suite
>> http://www.w3.org/International/its/tests/
>> we created something like this
>> /{}myMetaDoc/{}body[1]/{}insert[1]/{myChineseMakupLanguage}书籍[1]
>> to identify each element and attribute note - taken from a "path" attribute
>> at
>> http://www.w3.org/International/its/tests/test1/Translate1-result.xml
>>
>> The format expands namespace (if there is none, there is empty curly
>> brackets).
> I think that given we will be dealing mainly with HTML we can omit
> namespace in oder to get more concise syntax. We can cover possible
> SVG/MathML island with xmlns() XPointer scheme:
>
> #xmlns(svg=http://www.w3.org/2000/svg)xpath(/html[1]/body[1]/div[3]/svg:svg[1]/svg:g[1]/svg:text[7])
>
The only thing left now is the syntax, I guess.
In http://tools.ietf.org/html/rfc2396#section-2.4.3 some of the
characters were considered "unwise":
unwise = "{" | "}" | "|" | "\" | "^" | "[" | "]" | "`"
But they have been upgraded to reserved characters:
http://tools.ietf.org/html/rfc3986#section-2.2
fragment = *( pchar / "/" / "?" )
with
pchar = unreserved / pct-encoded / sub-delims / ":" / "@"
unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~"
XXX reserved = gen-delims / sub-delims
XXX gen-delims = ":" / "/" / "?" / "#" / "[" / "]" / "@"
sub-delims = "!" / "$" / "&" / "'" / "(" / ")"
/ "*" / "+" / "," / ";" / "="
The part after the crosshatch "#" isn't part of the URI, anyhow. Should
we forward the syntax question to the uri@w3.org list? "[" and "]" do
not seem to be ok. According to the RFC these are the ones we can use:
ALPHA / DIGIT / "-" / "." / "_" / "~"/ "/" / "?" / ":" / "@" / "!" / "$"
/ "&" / "'" / "(" / ")" / "*" / "+" / "," / ";" / "="
I think the semantics are pretty straightforward. These questions remain:
- How would the syntax look like to select attributes?
- should we only select elements or attributes?
- should we only select one element or allow to select all of a
certain type e.g. html[1]/body[1]/div ?
All the best,
Sebastian
--
Dipl. Inf. Sebastian Hellmann
Department of Computer Science, University of Leipzig
Projects: http://nlp2rdf.org , http://dbpedia.org
Homepage: http://bis.informatik.uni-leipzig.de/SebastianHellmann
Research Group: http://aksw.org
Received on Monday, 25 June 2012 16:25:34 UTC