- From: Niklas Lindström <lindstream@gmail.com>
- Date: Fri, 17 Feb 2012 17:32:48 +0100
- To: Manu Sporny <msporny@digitalbazaar.com>
- Cc: RDFa WG <public-rdfa-wg@w3.org>
Hi Manu, as I said to you personally: great job! Thanks for doing this! I'd like to mention a regexp trick I just learnt, that may open up for a way to do what I previously thought too cumbersome. It is about how to write legible, matching regexps for a combination of many attributes given in *arbitrary order*. The trick is to use one look-ahead for each attribute, where the lookahead matches anything up to and including a specific attribute. This way, the regexp actually scans for each in turn, within the element, making the pattern a match if all of them are present, regardless of order. Here is a working example in javascript (run with e.g. node): var hrefRelTypeofPattern = /<\S+(?=[^>]*?\shref="(.*?)")(?=[^>]*?\srel="(.*?)")(?=[^>]*?\stypeof(?:="(.*?)")?)/m; var m = ' <a href="path/0">0</a> <a rel="related" \ typeof="Item" \ href="path/1" \ class="info" \ >1</a> <a href="path/2">'.match(hrefRelTypeofPattern); console.log({href: m[1], rel: m[2], type: m[3]}); That should print out { href: 'path/1', rel: 'related', type: 'Item' }, i.e. it only matches the link where all of @rel, @href and @typeof are present. Granted, this doesn't seem to allow for matching optional values (since it then stops when finding *any* of the attributes, not all). But our use cases are for matching a specific set of attributes combined, so I now think we may actually go down this path... (And to think I believed I already had a solid knowledge of regexps. :) Great to know there are still secrets to unravel!) Best regards, Niklas On Mon, Feb 6, 2012 at 6:02 AM, Manu Sporny <msporny@digitalbazaar.com> wrote: > As a part of the research to see how RDFa is currently being used in the > wild, we had a plan to use the Common Crawl data set to analyze RDFa, > Microdata and Microformats usage. I took some time last week to start > that work, here are the findings: > > http://manu.sporny.org/2012/structured-data-searching/ > > -- manu > > -- > Manu Sporny (skype: msporny, twitter: manusporny) > Founder/CEO - Digital Bazaar, Inc. > blog: PaySwarm vs. OpenTransact Shootout > http://manu.sporny.org/2011/web-payments-comparison/ >
Received on Friday, 17 February 2012 16:33:50 UTC