Re: mlw-lt-track-ISSUE-29 (Maxime): Please comment the possible solutions for HTML+ITS2.0 to HTML (and | embedded by | embedding) RDF from Sebastian Hellmann on 2012-06-25 (public-multilingualweb-lt@w3.org from June 2012)

From: Sebastian Hellmann <hellmann@informatik.uni-leipzig.de>
Date: Mon, 25 Jun 2012 18:24:45 +0200
To: Jirka Kosek <jirka@kosek.cz>
CC: Felix Sasaki <fsasaki@w3.org>, MultilingualWeb-LT Working Group <public-multilingualweb-lt@w3.org>
Message-ID: <4FE890CD.8030509@informatik.uni-leipzig.de>

Hi Jirka and Felix,

On 06/22/2012 10:23 AM, Jirka Kosek wrote:
> On 21.6.2012 22:22, Felix Sasaki wrote:
>
>> Just FYI, without arguing for anything, for the ITS 1.0 test suite
>> http://www.w3.org/International/its/tests/
>> we created something like this
>> /{}myMetaDoc/{}body[1]/{}insert[1]/{myChineseMakupLanguage}书籍[1]
>> to identify each element and attribute note - taken from a "path" attribute
>> at
>> http://www.w3.org/International/its/tests/test1/Translate1-result.xml
>>
>> The format expands namespace (if there is none, there is empty curly
>> brackets).
> I think that given we will be dealing mainly with HTML we can omit
> namespace in oder to get more concise syntax. We can cover possible
> SVG/MathML island with xmlns() XPointer scheme:
>
> #xmlns(svg=http://www.w3.org/2000/svg)xpath(/html[1]/body[1]/div[3]/svg:svg[1]/svg:g[1]/svg:text[7])
>
The only thing left now is the syntax, I guess.
In http://tools.ietf.org/html/rfc2396#section-2.4.3 some of the 
characters were considered "unwise":

    unwise      = "{" | "}" | "|" | "\" | "^" | "[" | "]" | "`"


But they have been upgraded to reserved characters: 
http://tools.ietf.org/html/rfc3986#section-2.2

       fragment    = *( pchar / "/" / "?" )

with

    pchar         = unreserved / pct-encoded / sub-delims / ":" / "@"

    unreserved    = ALPHA / DIGIT / "-" / "." / "_" / "~"
    XXX reserved      = gen-delims / sub-delims
    XXX gen-delims    = ":" / "/" / "?" / "#" / "[" / "]" / "@"
    sub-delims    = "!" / "$" / "&" / "'" / "(" / ")"
                  / "*" / "+" / "," / ";" / "="



The part after the crosshatch "#" isn't part of the URI, anyhow. Should 
we forward the syntax question to the uri@w3.org list? "[" and "]" do 
not seem to be ok. According to the RFC these are the ones we can use:
ALPHA / DIGIT / "-" / "." / "_" / "~"/ "/" / "?" / ":" / "@" / "!" / "$" 
/ "&" / "'" / "(" / ")"  / "*" / "+" / "," / ";" / "="

  I think the semantics are pretty straightforward. These questions remain:
  - How would the syntax look like to select attributes?
  - should we only select elements or attributes?
  - should we only select one element or allow to select all of a 
certain type e.g. html[1]/body[1]/div ?

All the best,
Sebastian




-- 
Dipl. Inf. Sebastian Hellmann
Department of Computer Science, University of Leipzig
Projects: http://nlp2rdf.org , http://dbpedia.org
Homepage: http://bis.informatik.uni-leipzig.de/SebastianHellmann
Research Group: http://aksw.org

Received on Monday, 25 June 2012 16:25:34 UTC