- From: Dave Beckett <dave.beckett@bristol.ac.uk>
- Date: Sun, 22 Jul 2001 11:09:31 +0100
- To: w3c-rdfcore-wg@w3.org
- CC: Aaron Swartz <me@aaronsw.com>
>>>Aaron Swartz said: > To be quite honest I'm getting a bit tired of arguing over the > N-Triples syntax, but I thought I'd throw this out there. The N-Triples syntax seems to have reached a conclusion (I'll address in other threads) after the usual kind of technical discussion that happens. I and other N-Triples parser coders, N-Triple reader/writers need this precision. > Today I came up with the idea that we could define anonymous > nodes as having a special representation in the abstract syntax, > but define the N-Triples conversion from RDF as having a > well-defined set of names for them. ... Addressing issues with the model by making it depend more on the XML syntax doesn't seem like a good idea (more below). > ... IOW, an algorithm such as > the one used by SiRPAC or CARA to generate anonymous nodes in > order could be specified somehow (using XSLT, for example), and > all RDF/XML parsers which outputted N-Triples could follow this > specification. That way, comparison of N-Triples documents could > be reduced to a simple sort/diff operation, while still > retaining the semantics of anonymous nodes. > > An example: > <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> > <rdf:Description> > <rdf:value>foo</rdf:value> > </rdf:Description> > </rdf:RDF> > > could be unambiguously defined as: > > _:g0 rdf:value "foo" . > > as parsers would use anonymous node names in the form of _:gn > where n begins at zero and increments for each anonymous node > reached in the XML. > > Obviously this wouldn't work for N-Triples output from a data in > which ordering information had been lost (I have some ideas on > that one too, but they are less formed) but it would work for > any RDF/XML -> N-Triples conversion. > > Another option would be to use the number of the element from > which the anonymous node occurs. In this case the number would > be 2, as the <rdf:Description> is the second element in the > document. > > This seems to give the best of both worlds. > > Thoughts? RDF/XML parsers can legitimately attack the RDF statement generation in different orders (e.g. suck it all into a DOM and take top level elements in reverse order), and they can thus generate identifiers for anonymous nodes in different orders. Unless all the parsers followed pretty much the same algorithm, this processing was written into the description of the XML syntax and model, and this processing model enforced, it would be hard to do and cause a lot of rewriting of existing well-used parsers e.g. W3C SiRPAC, Stanford RDF API, Jena's RDF Filter. As a parser writer, I feel this would be a major problem to retrofit. More evidence: the author of Repat (C version of RDF Filter) tried to make his generated IDs match those genids made by (some) SiRPAC/RDF Java API parser and he found it very hard since it basically meant changing the code structure to match that of the Java. If in future, we invented a new XML syntax which had slightly different ordering, we would be in trouble also generating the anon nodes in the same way. The current alternative is better: use standard graph compare algorithms that have already been implemented at least twice (Jeremy @HP, Jan, Jos?) in rather a short time. So I think this idea is not worth it, for the sake of making node merging easier in N-Triples. Dave
Received on Sunday, 22 July 2001 06:09:32 UTC