W3C home > Mailing lists > Public > www-rdf-interest@w3.org > June 2003

RE: XML Enriched N-Triples (XENT)

From: Jimmy Cerra <jimbobbs@hotmail.com>
Date: Mon, 16 Jun 2003 00:58:49 -0400
To: "'Sean B. Palmer'" <sean@mysterylights.com>
Cc: <www-rdf-interest@w3.org>
Message-ID: <000101c333c3$f9ce9310$0100a8c0@picard>

>> I don't want to build a tokenizer for parsing 
>> apostrophes, white spaces, and other strings on top of 
>> another tokenizer, the XML processor.
> 
> Why not?

It's another level of abstraction that I have to deal with.  With only
XML, the character encoding, string parsing, entity normalization, and
even sometimes file IO are all handled by a third party processor.  I
don't have to worry about them, and I can concentrate with the model
extracted from the document.  However, text-node processing has to be
done on top of all that - and it must be written by me (since XML
processors are general-purpose machines) - in order for the XENT model
to be extracted.  With SAX that's sometimes trivial; however, it becomes
a real liability with DOM or XSLT processing.  Also, incompatibilities
with the XML processor and text-node processor have to be tested, worked
around, etcetera.


> > IMHO, using two or more different escaping methods
> > really mucks up the language.
> 
> I'm not actually sure what you mean by this--could you expand, please?
> Actually, escaping should've been listed as a TODO in my original
> announcement. But since XENT uses XML/RDF, it's basically going to
> have to use entity escaping *instead* of the Python-esque \uHHHH
> method. No big deal, and not an issue that would get in the way of any
> standards track work on the format, IMO. 

For instance, I suspect that &quot; and &apos; won't work:  The XML
processor will normalize them to " and ' before the text-node parser
sees them, and the parser will puke on the misplaces characters.  So you
must to use a different escaping method - \' , \" , and \\ , (or \u0027
, \u0022 , and \u005C ) - and that's butt ugly in XML (two separate
methods... <<shiver>> ).


> Hmm. I think that BCC is better since this thread is liable to go off
> topic...

Too late.  :-)

--
Jimmy Cerra

] "I have learned these days, never to limit
]  anyone else due to my own limited
]  imagination." - Dr. Mae C. Jemison

> -----Original Message-----
> From: Sean B. Palmer [mailto:sean@mysterylights.com]
> Sent: Sunday, June 15, 2003 11:25 AM
> To: jimbobbs@hotmail.com
> Cc: www-rdf-interest@w3.org
> Subject: Re: XML Enriched N-Triples (XENT)
> 
> > [...] the more experimentation, then the more likely some
> > good variants will be created.
> 
> Quite. Whilst RDFCore are well past overdue going by their chartered
> timeline, and whilst the RDF Syntax specification is still not at CR,
> RDF/XML is only undergoing a bug fix from the 1999 original; it's a
> half-decade old technology coming to fruition. It may be too widely
> deployed now for any alternate serialization to seriously challenge
> it, and that, like it or not, is going to put off a lot of people from
> using RDF. It's a shame that the main barrier to alternate
> serializations, and hence RDF's adoption, is an historical/political
> accident.
> 
> With any format like RPV or XENT, or your own hypothetical YARS (Yet
> Another...), the suitability of the language--how well it fits the
> requirements of those who use RDF--is unfortunately a small factor.
> The XML and RDF communities are full of a lot of people who have very
> strong opinions about lots of things--by necessity, though it tends to
> lead to some quite obsessive and heated likes/dislikes of various
> constructs.
> 
> For example, as was noted to me on #rdfig [1], XENT itself is pretty
> much a mix of constructs from Notation3/N-Triples and RDF/XML mashed
> together into one proposal. I tried, of course, to take the best
> features from each approach, but the problem is that proponents of RDF
> serializations are usually very passionately one-sided about which
> method they prefer (talking from my possibly incorrect experience
> here). In other words, people tend to favor one serialization very
> much over the others. So whilst you'd think that a compromise between
> them would be a good idea, it'll probably just end up with almost
> every established member of the RDF community snubbing it :-)
> 
> I note that even though QNames are used heavily in communications
> about SW vocabularies, they're viewed as harmful in the motivation
> section of RPV. We need them; we may as well deploy them.
> 
> > On abbreviating element names for URIs, there has been
> > controversy [2].
> 
> A better reference is:-
> 
> http://www.w3.org/2001/tag/doc/qnameids-2002-07-15
> 
> There's a lot of FUD surrounding QNames, but at the end of the day
> they just map prefixes to namespaces. For RDF, it's too handy an
> abbreviation mechanism for URIs to pass up. RDF/XML and Notation3
> would be lost without them, and NTriples is basically too difficult to
> write because it doesn't have them.
> 
> The TAG finding says that parsing costs of QNames in PCDATA may be
> high. I've proved that, for Python and SAX and least, the opposite is
> true. I suspect that this will be the case in many other languages
> too. The TAG, in the finding above, say that since the approach is
> widely deployed to good effect, it's "reasonable to use QNames in this
> way".
> 
> > I don't want to build a tokenizer for parsing apostrophes,
> > white spaces, and other strings on top of another tokenizer,
> > the XML processor.
> 
> Why not?
> 
> > IMHO, using two or more different escaping methods
> > really mucks up the language.
> 
> I'm not actually sure what you mean by this--could you expand, please?
> Actually, escaping should've been listed as a TODO in my original
> announcement. But since XENT uses XML/RDF, it's basically going to
> have to use entity escaping *instead* of the Python-esque \uHHHH
> method. No big deal, and not an issue that would get in the way of any
> standards track work on the format, IMO.
> 
> > I like your ideas; [...]
> 
> Thanks. Your feedback is appreciated.
> 
> > CC: Tim Bray - as with the origional message
> 
> Hmm. I think that BCC is better since this thread is liable to go off
> topic...
> 
> Cheers,
> 
> --
> Sean B. Palmer, <http://purl.org/net/sbp/>
> "phenomicity by the bucketful" - http://miscoranda.com/
Received on Monday, 16 June 2003 00:58:58 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Monday, 7 December 2009 10:51:59 GMT