- From: David Lee <David.Lee@marklogic.com>
- Date: Mon, 8 Jul 2013 14:10:01 +0000
- To: "Rushforth, Peter" <Peter.Rushforth@NRCan-RNCan.gc.ca>, mca <mca@amundsen.com>
- CC: "public-xmlhypermedia@w3.org" <public-xmlhypermedia@w3.org>
- Message-ID: <6AD72D76C2D6F04D8BE471B70D4B991E140521@EXCHG10-BE02.marklogic.com>
IMHO, just a thought. I belive spiders/bots keeping up with the evolving web "platform" is a similar problem to screen -readers keeping up with applications, or spambots keeping up with spam filters. It's a never-wining never-losing game of catchup. The web platform is more and more dynamic content, and content tailored for the user, device, time, location, browser etc. Perhaps it is continual losing battle to try to reverse engineer applications and content for purposes of web spiders. Instead, much like the famed robots.txt, perhaps a "standard" for exposing machine readable content though the web is more worthwhile. ( much like Atom I suppose). If you are going to hit cnn.com with a iPhone, a android tablet, a full screen Chrome browser, IE7 (as a logged in user or not), even depending on where you are you will get different "content" . If the site knew that it was a bot (and presumably people want the bots to get the data - free advertising right ?) then it could (and should, IMHO) provide a bot-readable variant of the web page much like it is now providing a tailored variant for humans. Perhaps instead of embedding that into the content it could (should?) be provided as an ancillary metadata. Of course this puts the burden again on the site maintainers ... like the old days where you had to register URL's with search engines. But maybe if software treated bots as a special kind of user agent it might not be that bad ... Ok maybe I am fantasizing ... people won't actually go to the trouble of doing that will they ? But then will they go to the trouble of inserting similar metadata in existing content ? what's the difference ? ----------------------------------------------------------------------------- David Lee Lead Engineer MarkLogic Corporation dlee@marklogic.com Phone: +1 812-482-5224 Cell: +1 812-630-7622 www.marklogic.com<http://www.marklogic.com/> From: Rushforth, Peter [mailto:Peter.Rushforth@NRCan-RNCan.gc.ca] Sent: Monday, July 08, 2013 9:45 AM To: mca Cc: public-xmlhypermedia@w3.org Subject: RE: hypermedia affordance aspects > hope this helps. Yep. I'll try to capture some of this on the wiki. All good stuff. Cheers, Peter ________________________________ From: mca [mailto:mca@amundsen.com] Sent: July 5, 2013 16:29 To: Rushforth, Peter Cc: public-xmlhypermedia@w3.org Subject: Re: hypermedia affordance aspects <snip> I have often wondered if hypermedia could be used by machines in the absence of a human directing the action in some way. </snip> hypermedia is "including instructions in the response" and it is very possible to include instructions that machines can deal with. it means optimizing all three layers of the message (format, protocol, domain-semantics) for machines instead of humans. most folks don't work through these three things far enough to "find" a machine-friendly hypermedia experience. <snip> For example, a crawler is behaving per coded instructions from the programmer. A browser is behaving with a little more leeway, but guided by a user. </snip> lots of unspoken assumptions in these two sentences. i'll fill in my guesses, but they may not hit the same notes as the ones in your mind... to create a machine that crawls the web you need a machine that: 1 - understands the response format (e.g. HTML, XML, JSON, YAML, etc.) 2 - can find and activate any protocol-level affordances (HTML.A, HAL.RESOURCE, Cj.link, etc.) 3 - already (apriori) can recognize and deal with any domain-specific details required to solve the problem luckily most all Web crawlers can solve for these three pretty easily. for example, the simplest is a "dumb" crawler. that means you can skip level 3, assume level 1 only requires HTML, and that the only affordance (level 2) you need to deal with is HTML.A. that's pretty easy. most all of use can write these. you can add a level-3 feature by only looking for links of a certain classification. for example, crawl the web for all HTML documents that contain an HTML.LINK or HTML.A with a @rel value of "stylesheet." there, now you have a stylesheet bot. of course, you can expand/deepen sophistication by tweaking all three levels. how about a machine that knows HAL+JSON, HAL+XML, and Cj? or one that recognizes not just read links, but read templates (HTML.FORM@method="get". or one that recognizes more than just one possible rel value? this is just adding more features to the machine's ability to deal with responses. <snip> Can you explain a bit more about situations where a "machine" is working independently while processing hypertext? </snip> consider a machine that understands the "hypermedia space" that includes all the possible links and data elements of a "twitter-like" service. One that can write as well as read payloads. now, you can create machines that do lots of possible things such as crawling the response space to find related messages, match people up based on their posted content, suggest (or even execute) links between users, collect related messages into a readable stream, etc. this is possible when you can create machines that deal with all three levels of semantics (format, protocol, domain) and when servers do the same. hope this helps. mca +1.859.757.1449 skype: mca.amundsen http://amundsen.com/blog/ http://twitter.com/mamund https://github.com/mamund http://www.linkedin.com/in/mikeamundsen On Fri, Jul 5, 2013 at 3:59 PM, Rushforth, Peter <Peter.Rushforth@nrcan-rncan.gc.ca<mailto:Peter.Rushforth@nrcan-rncan.gc.ca>> wrote: Hi Mike, > mutability and transclusion ... are hints from the server on how the client MAY/SHOULD handle the results of the request. Whether this is explicitly stated in the response (e.g. my UBER example) or implied in the human-readable documentation and then realized via code is another matter. I see that. I would guess it depends on the target protocol too. Although of course XML is used across many protocols, I am focused on the Web, since it is built with/on/around hypermedia. >My recent experience in creating systems where machines make most of the determinations at runtime is leading me to include more explicit affordances in my designs. This is an optimization for _machines_ not humans, tho. I have often wondered if hypermedia could be used by machines in the absence of a human directing the action in some way. For example, a crawler is behaving per coded instructions from the programmer. A browser is behaving with a little more leeway, but guided by a user. Can you explain a bit more about situations where a "machine" is working independently while processing hypertext? >HTML.FORM@method="get" means "here is a template for constructing a URL". HTML.FORM@method="post" means "here is a template for constructing a message body." So the former is templating the URL of the request; the latter it is templating the body of the request. I also suppose one might want to template other parts of a request, e.g. Accept: . This is the notion behind @type to my understanding. This is more than a "hint", in my view, in the same w.ay that @method="post" is more than a hint. If you don't do what it says there, you run the risk of errors e.g. 415 Unsupported Media Type. Just as you shouldn't do a POST when a GET is called for. But coming back to the question... > HTML designers could have created different affordances for each (e.g. <url action="..."><input ... /></url> AND <payload action="..."><input .../></payload>) but that did not happen. Instead the @method value tells the client whether to construct a URL or body. I was thinking that @method was just a way of providing a value for an HTTP method for the future request, but I see it has combined that meaning with the association of the template (INPUT markup etc) for URL or body. I was wondering if the method might actually be associated with whether you templated the URL or the body. So for example, if you had markup like <payload action="..."><input .../></payload> perhaps that would imply or be defined by documentation to mean that the method would (could only) be POST/PATCH or PUT, but never GET HEAD DELETE or OPTIONS, so @method would not be necessary. Then again, maybe @method would be necessary because in real life (aka HTTP / the Web in general) (i.e. not the html version of life (aka the html web)) you would need to disambiguate between whether POST or PATCH is required in a given situation. >Again, what is explicit in the representation and what is implied in the human-readable documentation is a totally different discussion than the one I am proposing in this thread. My bad. Thanks for your help. I've renamed the thread to reflect this subject. As I've tried to highlight in my comments above, my perspective is that hypermedia is always templating some protocol element in http, be it the URL, method or other header. Sometimes, as discussed, more than one header/protocol element is targeted / implied by a given affordance. Sometimes the value is a "complete" value e.g. @href="http://example.com/foo" means to use "/foo" for the path and "example.com<http://example.com>" for the Host: header, and other times the value needs more processing / selection by a user or user-agent, which might correspond to "mutable" in your aspects classification (?). I suppose it's just a different way of looking at the problem. Cheers, Peter ________________________________ From: mca [mailto:mca@amundsen.com<mailto:mca@amundsen.com>] Sent: July 5, 2013 13:18 To: Rushforth, Peter Cc: public-xmlhypermedia@w3.org<mailto:public-xmlhypermedia@w3.org> Subject: Re: affordance mutability / templated requests <snip> How much of these abstractions must be realized in markup in order that a media type spec can identify what request to make in a given situation? IOW how much request metadata is *necessary* to make the 'right' request? </snip> first, "right" is vague here. my assertion is that all four of them must be handled by both clients and servers. Note that two are network level (safety & idempotence) and are "promises" from the server. The other two are client-side related (mutability and transclusion) and are hints from the server on how the client MAY/SHOULD handle the results of the request. Whether this is explicitly stated in the response (e.g. my UBER example) or implied in the human-readable documentation and then realized via code is another matter. My recent experience in creating systems where machines make most of the determinations at runtime is leading me to include more explicit affordances in my designs. This is an optimization for _machines_ not humans, tho. <snip> That reminds me of the discussion between Berners-Lee, Andreesen et al regarding transclusion and navigation </snip> The IMG discussion is apropos for a couple reasons. it was the first time someone attempted to indicate transclusion for HTML (good idea) and it was a very succinct affordance as all the aspects are implied in the documentation of IMG, not in the representation. For example, idempotence and safety arise from using HTTP.GET. immutability and transclusion are documented behavior for this control. It's also worth noting the use of SRC instead of HREF. This is another key to the affordance design. SRC means transclude=true, HREF means transclude=false. FWIW, in my Cj design[0], I adopted an affordance model[1][2] very close to TimBL's model[3]. Again this was due to my desire to optimize for machines, not humans. <snip> I've wanted to discuss the @method in html here as an example. Is it necessary, especially given that only GET and POST are allowed, and mostly it seems just a way to tunnel content by browsers outside of URL parameters. </snip> not sure i follow you here. HTML.FORM@method="get" means "here is a template for constructing a URL". HTML.FORM@method="post" means "here is a template for constructing a message body." These are quite different actions that MUST be communicated to the client. HTML designers could have created different affordances for each (e.g. <url action="..."><input ... /></url> AND <payload action="..."><input .../></payload>) but that did not happen. Instead the @method value tells the client whether to construct a URL or body. This is not a browser-specific issue, BTW. even when i build my own native client, i need to resolve for this issue (URL or body construction). Again, in Cj, i created the "template"[4], and "queries"[5] elements in order to communicate constructing bodies (template) or URLs (queries). <snip> What are other examples of level 7 protocols which would need the additional/explicit markup and how would it be used? (I guess that is a big topic). </snip> Again, what is explicit in the representation and what is implied in the human-readable documentation is a totally different discussion than the one I am proposing in this thread. I suspect you mean (here) which L7 protocols cannot have the full range of their uniform interface expressed using only my "aspects." To that point, off the top of my head, most all the HTTP-inspired L7 protocols have the "feature" of more than one of their uniform interface methods having the same "aspect" signature over the network. FTP.PUT and FTP.DEL are both unsafe and idempotent. HTTP.GET, HTTP.HEAD, and HTTP.OPTIONS are all safe and idempotent. HTTP.PUT, HTTP.DELETE are both unsafe and idempotent HTTP.POST and HTTP.PATCH are both unsafe and non-idempotent CoAP.PUT and CoAP.DELETE are unsafe and idempotent I'm sure there are others i can't bring to mind ATM. This is not a "bug" of course, but a feature. For these protocols method is a significant part of the _network_ signature (as it affects caching, for example). I chose to leave this out of my aspects model since there are quite a few possible protocol-specific variations that weren't helpful to my design work at the time. Note this all comes down to 1) what affordance you wish to model and 2) how much of your affordance design will be explicit (in the message) and how much will be implicit (in the human-readable docs). As "implied" relies on entailment[6], my machine-optimized designs usually leave very little that is not explicit. That is my _design_ decision, not anything technical, tho. [0] http://amundsen.com/media-types/collection/ [1] http://amundsen.com/media-types/collection/format/#arrays-links [2] http://amundsen.com/media-types/collection/format/#properties-render [3] http://1997.webhistory.org/www.lists/www-talk.1993q1/0186.html [4] http://amundsen.com/media-types/collection/format/#arrays-queries [5] http://amundsen.com/media-types/collection/format/#objects-template [6] http://www.thefreedictionary.com/implied mca +1.859.757.1449<tel:%2B1.859.757.1449> skype: mca.amundsen http://amundsen.com/blog/ http://twitter.com/mamund https://github.com/mamund http://www.linkedin.com/in/mikeamundsen On Fri, Jul 5, 2013 at 12:33 PM, Rushforth, Peter <Peter.Rushforth@nrcan-rncan.gc.ca<mailto:Peter.Rushforth@nrcan-rncan.gc.ca>> wrote: Hi Mike, Nice to hear from you. > the list I work from (safety, idempotence, mutability, transcluion) is an abstraction above any single level 7 protocol and can be applied to HTTP, XMPP, WebSockets, etc. with equal success How much of these abstractions must be realized in markup in order that a media type spec can identify what request to make in a given situation? IOW how much request metadata is *necessary* to make the 'right' request? That reminds me of the discussion between Berners-Lee, Andreesen et al regarding transclusion and navigation http://1997.webhistory.org/www.lists/www-talk.1993q1/0182.html where the new <img...> tag with a @src won out over the more verbose forms derived by recombining other markup (href, rel). I've wanted to discuss the @method in html here as an example. Is it necessary, especially given that only GET and POST are allowed, and mostly it seems just a way to tunnel content by browsers outside of URL parameters. What are other examples of level 7 protocols which would need the additional/explicit markup and how would it be used? (I guess that is a big topic). Cheers, Peter ________________________________ From: mca [mailto:mca@amundsen.com<mailto:mca@amundsen.com>] Sent: July 5, 2013 11:27 To: liam@w3.org<mailto:liam@w3.org> Cc: stephengreenubl@gmail.com<mailto:stephengreenubl@gmail.com>; Rushforth, Peter; public-xmlhypermedia@w3.org<mailto:public-xmlhypermedia@w3.org> Subject: Re: document node attributes As Liam points out, there has been quite a bit of "re-invention" regarding hyperlinking over the years. In my case, while working on my Hypermedia Factors[1], I created a list of (what I view as) essential "aspects" of hypermedia when used over a network[2]. These look similar to the XLink[3] and HLink[4] properties. However, the list I work from (safety, idempotence, mutability, transcluion) is an abstraction above any single level 7 protocol and can be applied to HTTP, XMPP, WebSockets, etc. with equal success. One way to look at this problem is to ask "How can I describe hypermedia affordances in a machine-readable way such that it does not tightly constrain the protocol to be used?" An additional question might be "Can I do this an a general enough way that it can be modeled easily and with high-fidelity across various media type formats?" IOW, is there a way to describe hypermedia that is de-coupled from both protocol and format? I addressed this an a QCon talk a while back[5]. One can start w/ a single "uber" hypermedia affordance that looks like this (the XML variant appears below): <uber:link name="search" href="http://example.com/search/" safe="true" idempotent="true" transclusion="true" mutable="true"> <uber:data name="q" value="hypermedia" /> </uber> which describes this in HTTP: GET /search?q=hypermedia host: example.com<http://example.com> .... By tweaking the properties of this single affordance, you can describe almost every possible network request (for HTTP, PUT/DELETE and PATCH/POST have similar-enough network signatures that an additional property is needed [method]. Some protocols like WebSocket don't have this network signature collision problem). Note that this approach would require a very small addition to XML (via NS or some other means), can be described easily via both XSD and DTD, and can be applied to existing documents w/o too much disruption of their doc models. Just two elements (link and data) and a handful of attributes. When thinking about a generic way to describe linking and parameter-passing (e.g. forms) for XML, I think it can be valuable to start from this "uber" approach and consider another pass at the way XLink and HLink attempt to solve the same problem. I know it seems silly (possibly laughable) to propose _another_ attempt at distilling what we learned from HyTime, but there you go ;) FWIW, I think that by raising the bar to an additional abstraction level (e.g. the network signatures instead of the media-type or protocol descriptions) makes this a worthwhile and portable effort. And I think the XML space is a great place to start ;) Cheers. [1] http://amundsen.com/hypermedia/hfactor/ [2] http://www.amundsen.com/blog/archives/1109 [3] http://www.w3.org/TR/xlink/#N1238 [4] http://www.w3.org/TR/hlink/#s_hlink_module [5] http://www.slideshare.net/rnewton/amundsen-costbenefitshypermedia/80 mca +1.859.757.1449<tel:%2B1.859.757.1449> skype: mca.amundsen http://amundsen.com/blog/ http://twitter.com/mamund https://github.com/mamund http://www.linkedin.com/in/mikeamundsen On Fri, Jul 5, 2013 at 10:33 AM, Liam R E Quin <liam@w3.org<mailto:liam@w3.org>> wrote: On Fri, 2013-07-05 at 13:53 +0100, Stephen D Green wrote: > How about HLink then? > > http://www.w3.org/TR/hlink/ > HLink is (unfortunately) an example of a group of people working in isolation. If we wanted a way to associate semantics with elements in XML documents based on namespace, we'd want to be able to associate more than just linking - e.g. see my "automatic namespaces" proposal. Having said that, the idea of link discovery is an important one. HLink is doing what HyTime architectural forms did, in a way. -- Liam Quin - XML Activity Lead, W3C, http://www.w3.org/People/Quin/ Pictures from old books: http://fromoldbooks.org/ Ankh: irc.sorcery.net<http://irc.sorcery.net> irc.gnome.org<http://irc.gnome.org> freenode/#xml
Received on Monday, 8 July 2013 14:10:27 UTC