- From: Manu Sporny <msporny@digitalbazaar.com>
- Date: Sun, 10 Jul 2011 16:21:35 -0400
- To: Philip Jägenstedt <philip@foolip.org>
- CC: public-rdfa-wg@w3.org
On 07/04/2011 09:41 AM, Philip Jägenstedt wrote: >> We can chat more about this during our conversation tomorrow. > > I'd prefer to discuss this in public email threads, so that anyone > that has something to add can do so. Ah, that's not what I meant - just merely was going to tell you where the latest resources are and ask you about how you plan to review and give feedback on the documents. Yes, the discussion should happen on the mailing lists. :) More below... >> == HTML == >> >> The spec only references the 2008 RDFa in XHTML REC, not >> <http://dev.w3.org/html5/rdfa/>. Is this an oversight? >> >> Note that just deferring to the two RDFa specs implies different >> processing requirements depending on XHTML and HTML. Different >> behavior of API's depending on XHTML/HTML is not going to be well >> received by browser implementors, as it creates all kinds of problems. >> (Consider, for example, what happens when a DOM is created entirely by >> script or when a subtree of a text/html document is moved by script to >> a application/xhtml+xml document. It just doesn't make sense to switch >> between different modes here.) >> >> I'm going to assume in the following that the intention is for there >> to be a single API spec covering both serializations. > > AFAICT, the issue remains. Yes, the intent is to have a single API spec that covers as many serializations as possible in a generic way. In general, the RDFa processing rules are written in such a way as to be as syntax agnostic as possible. That is, it is a set of instructions that operate on a document tree, however, that document tree does not necessarily need to be a DOM. You can use a SAX-based parser to implement an RDFa processor, and you could theoretically use a SAX-based parser to implement the RDFa API. >> == getTriplesByType order == >> >> Which order should triples be returned in? This must be well-defined >> in order for the API to be interoperably implementable. > > The same issue now applies to Graph.toArray, although it now says > "Note: the order of the Triples within the returned sequence is > arbitrary, since a Graph is an unordered set." > > As an anecdote, the ECMAScript spec has always said that when > enumerating properties of objects, the order is undefined. (Properties > are conceptually an unordered set.) In practice, implementations do > use a particular order (insertion order, more or less) and this has > required reverse-engineering between browsers because scripts rely on > that order. The same thing will happen with Graph.toArray if it > becomes widely deployed. That's good feedback. Graph ordering is not a simple problem. We /could/ do something like insertion order because that is a deterministic part of the RDFa processing algorithm. We do have a general mechanism for ordering triples in a graph, but that requires graph normalization. There is currently no known graph normalization algorithm that will run in polynomial time for degenerate cases. The difficult part is the implementation of the blank node labeling problem. We have it working for all real-world use cases... but there are some theoretical examples that cannot be solved in polynomial time. For example: rings of 1000 blank nodes that all look the same all connected to the next node. This is called the graph isomorphism problem and while getting an ordering can be done, the solution cannot always be done in a reasonable time frame. We're dealing with this problem with the JSON-LD work. However, normalization is not something that you want in a frequently used code path. Why can't you tell script developers that they can't depend on order? Have you tried going the other way? Shuffling the array on output to ensure that they don't get the same order twice? I realize that this isn't ideal, but neither is having to ensure that some order is kept. >> == Merging of triples == >> >> If the same triple is expressed several times in the document, are >> they merged, or will two instances of the same triple returned by >> getTriplesByType? > > Issue still applies to DataParser.parse DataParser.parse() extracts the information from the document and places the information into a Graph. The Graph can only have one instance of the same triple (storing duplicates is a logical nop). We say this in the latest spec under section 2.2.2: Graphs: "Graphs must not contain duplicate triples." DataParser.process() would send duplicate triples through to the callback, but we believe that is the proper behavior. That interface is more raw and low-level. If somebody wanted to detect duplicate triples, for example: for linting purposes, they could use .process() to do so. >> == Triple.language == >> >> Is the language normalized, and how? If the language is given with >> lang="sv_FI" in markup, what does .language return? If there is any >> merging of triples, does that happen before or after language >> normalization? > > Triple.language is no more, it seems. Literals do have a language attribute: http://www.w3.org/2010/02/rdfa/sources/rdf-interfaces/#literals I don't quite understand what you mean by "language normalization". That is, if somebody specifies lang="sv_FI", and a triple is generated like this: <http://blog.foolip.org/about/philip> foaf:name "Philip Jägenstedt"@sv_FI . Then the .language attribute would contain "sv_FI". >> == Dynamic changes == >> >> getTriplesByType returns a static array of Triples. In contrast, a lot >> of HTML APIs return a live NodeList, so that changes to the document >> are reflected in that NodeList. This is the case with e.g. >> getElementsByTagName and the Microdata getItems API. If returning a >> static array is intentional, can you make that more explicit by saying >> that it is the triples that are in the document at the time of >> invocation that are returned? > > It appears that DataParser.parse is used to generate a Graph from a > Document, which would "solve" the problem of dynamic changes. I don't > think that it is acceptable solution because it is extremely expensive > to have to re-parse the entire document after any change, but it does > answer my original question. Doesn't Microdata have the same issue? If the DOM changes in a way that an item is added or deleted, the NodeList returned by getItems() becomes outdated. You must re-process the entire DOM in order to ensure that you're working with the most up-to-date information. What am I missing? >> RDFa Profiles >> >> 3. If a profile actually becomes widely used, aren't you worried about >> the DDoS that will result? Compare to the problems of DTD outlined in >> <http://lists.w3.org/Archives/Public/public-html/2008Jul/0269.html>. There was quite a bit of discussion on this issue in the RDFa WG. Yes, we were concerned about DDoS on RDFa Profiles. After discussing it for a while, though, a few things became clear. There is no "one authoritative source" for an RDFa Profile - they're just like CSS, JavaScript files, images, video, and CSS. They are not like a DTD. Web authors are free to copy a profile to their servers and serve it from there. Profile authors are free to publish their profile on a CDN, much like jquery is distributed by Google. We even allow the default RDFa Profiles to be hard coded by implementers in their implementations. In the RDFa Core spec, Section 9: RDFa Profiles states: "RDFa Processor developers are permitted and encouraged to cache the relevant triples retrieved via this mechanism, including embedding definitions for well known vocabularies in the implementation if appropriate." Lastly, if Web vocabulary developers are concerned about a DDoS attack, they probably shouldn't create a profile for their vocabulary or application. >> 4. Should browsers have Turtle and RDF/XML parsers to handle the case >> where the profile is using those syntaxes? MAY is a keyword for >> interoperability disaster, at least in the context of web browsers... Browsers don't need TURTLE and RDF/XML parsers to handle profiles in those syntaxes. The only requirement is an XHTML+RDFa processor to process RDFa Profiles. The spec states: "RDFa Profiles are optional external documents that define collections of terms and/or prefix mappings. These documents must be defined in an approved RDFa Host Language (currently XHTML+RDFa [XHTML-RDFA]). They may also be defined in other RDF serializations as well (e.g., RDF/XML [RDF-SYNTAX-GRAMMAR] or Turtle [TURTLE])." The second sentence is the important one here - RDFa Profiles MUST be provided in XHTML+RDFa at a minimum. Web developers may optionally provide them in different serializations (such as TURTLE, RDF/XML or even JSON-LD). -- manu -- Manu Sporny (skype: msporny, twitter: manusporny) President/CEO - Digital Bazaar, Inc. blog: PaySwarm Developer Tools and Demo Released http://digitalbazaar.com/2011/05/05/payswarm-sandbox/
Received on Sunday, 10 July 2011 20:22:14 UTC