- From: Manu Sporny <msporny@digitalbazaar.com>
- Date: Sun, 10 Jul 2011 18:14:30 -0400
- To: Philip Jägenstedt <philip@foolip.org>
- CC: public-rdfa-wg@w3.org
On 07/04/2011 06:02 PM, Philip Jägenstedt wrote: > I was mistaken, some of this is still problematic with the > DataDocument interface, which has getElementsByType, > getElementsBySubject and getElementsByProperty methods. These now > return NodeLists, but is it intentional that these collections are not > live? Yes, it was intentional. We /could/ return a live node list, but were concerned that it would hurt browser performance. This is an area where we could really use some of your input. My understanding is that the Microdata spec suffers from the same issue - if you add an element to the DOM that contains an itemscope statement, the code managing the live NodeList that getItems() returns would have to detect the addition and re-parse at least part of the document in order to update the .properties collection, no? We attempted to prevent this sort of mandatory re-parsing of the document unless the Web developer specifically requested it. > For all three methods, the order must also be defined. Would it be acceptable if sorted in triple generation order? That is, as triples are generated by the processor, they're added to the default Graph in order? That's deterministic and should be easy to do if people follow the processing rules. Any additional triples added to the graph, from say a TURTLE parser or JSON-LD parser, could be added to the "end". We would still need to discuss the ramifications of this, of course. > All of the mess I originally outlined also applies to > DocumentData.getSubjects or getValues. Unless the information can be > cached, implementation is not feasible. Define "cached". Can there be a delay to the cache? To propose an overly simplistic strawman mechanism: the first call to the getSubjects() mechanism forces a parse of the document, but each subsequent call for the next 1000ms uses the cached values? Would you be okay if the document is re-parsed completely if a new RDFa or Microdata attribute is detected in the inserted DOM elements? What about a .structuredDataDirty flag that notifies the web developer that they should manually re-parse? I don't see how both Microdata and RDFa would be able to give anyone /live/ updates as both seem to have algorithms that require either part or all of the document to be re-processed. That could kill performance if the DOM is being updated with Microdata/RDFa items 100+ times per second. We could introduce a delay/throttle to the cache and a callback when new RDFa data is detected. Which one of these strategies seems most likely to address your concerns? Is there another approach that would be better? >>> == getTriplesByType type? == >>> >>> Some underlying assumptions about the model appear to be unstated >>> here. Specifically, is it type as in @datatype or as in @typeof? (I'd >>> guess it's @typeof.) >>> >>> What if a <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> predicate >>> is used explicitly, not using the @typeof shorthand? > > This question still applies to getElementsByType Ah, good catch. Yes, it is referring to @typeof, not @datatype. If someone uses <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> instead of the @typeof shorthand to specify the type of the subject, getElementsByType() should still return the element. That is, it doesn't matter how rdf:type was set, getElementsByType should return the element regardless. The underlying model is expected to query rdf:type regardless of how it is set. >>> == RDFa Profiles == >>> >>> Is it intended that the DOM API work with RDFa Profiles? Supporting it >>> in browsers seems fairly problematic. >>> >>> 1. Consider this script: >>> >>> var e = document.querySelector("[profile]"); >>> e.profile = "http://example.com/previously-unseen-profile"; >>> document.getTriplesByType("http://examples.com/some-type"); >>> >>> This clearly will not work, since the browser won't synchronously >>> download and parse the profile while the script is running. Given >>> this, how is a script to know when the API is safe to use? We had been discussing this a few months ago and had thought that we could perform some of this work in something like a Web Worker and block the RDFa API until all profiles are loaded. >>> 2. Should browsers preemptively fetch/parse all profiles in a >>> document, even though 99% of documents won't use the getTriplesByType >>> API? Well, the RDFa profile isn't just for getTriplesByType(). The profile can define prefixes and terms, like so: foaf -> http://xmlns.com/foaf/0.1/ name -> http://xmlns.com/foaf/0.1/name so that people can markup stuff like so: <span property="foaf:name">Philip Jägenstedt</span> or like so: <span property="name">Philip Jägenstedt</span> Ideally, we wanted to delay the fetching of profiles until the Web developer called one of the RDFa API methods. That way, not everyone has to pay the structured data tax. >>> Should that delay the document load event? (related to the above >>> question) I think we should avoid delaying the document load event. Perhaps there should be a new event fired when the RDFa document is ready to be processed? Or perhaps we should delay the retrieval of the profile documents until a program makes a call to the RDFa API? > Should I perhaps just file individual bugs? Discussing so many issues > in a single email thread is probably going to be messy... Unfortunately, we don't have a buzilla bug tracker. I'll open issues for each of these items and point you to them. That will help us ensure that we deal with all of them as a Working Group. Thanks for the detailed feedback, Philip - it's very much appreciated. :) -- manu -- Manu Sporny (skype: msporny, twitter: manusporny) President/CEO - Digital Bazaar, Inc. blog: PaySwarm Developer Tools and Demo Released http://digitalbazaar.com/2011/05/05/payswarm-sandbox/
Received on Sunday, 10 July 2011 22:15:18 UTC