- From: Niklas Lindström <lindstream@gmail.com>
- Date: Fri, 17 Feb 2012 17:32:48 +0100
- To: Manu Sporny <msporny@digitalbazaar.com>
- Cc: RDFa WG <public-rdfa-wg@w3.org>
Hi Manu,
as I said to you personally: great job! Thanks for doing this!
I'd like to mention a regexp trick I just learnt, that may open up for
a way to do what I previously thought too cumbersome. It is about how
to write legible, matching regexps for a combination of many
attributes given in *arbitrary order*.
The trick is to use one look-ahead for each attribute, where the
lookahead matches anything up to and including a specific attribute.
This way, the regexp actually scans for each in turn, within the
element, making the pattern a match if all of them are present,
regardless of order.
Here is a working example in javascript (run with e.g. node):
var hrefRelTypeofPattern =
/<\S+(?=[^>]*?\shref="(.*?)")(?=[^>]*?\srel="(.*?)")(?=[^>]*?\stypeof(?:="(.*?)")?)/m;
var m = ' <a href="path/0">0</a> <a rel="related" \
typeof="Item" \
href="path/1" \
class="info" \
>1</a> <a href="path/2">'.match(hrefRelTypeofPattern);
console.log({href: m[1], rel: m[2], type: m[3]});
That should print out { href: 'path/1', rel: 'related', type: 'Item'
}, i.e. it only matches the link where all of @rel, @href and @typeof
are present.
Granted, this doesn't seem to allow for matching optional values
(since it then stops when finding *any* of the attributes, not all).
But our use cases are for matching a specific set of attributes
combined, so I now think we may actually go down this path...
(And to think I believed I already had a solid knowledge of regexps.
:) Great to know there are still secrets to unravel!)
Best regards,
Niklas
On Mon, Feb 6, 2012 at 6:02 AM, Manu Sporny <msporny@digitalbazaar.com> wrote:
> As a part of the research to see how RDFa is currently being used in the
> wild, we had a plan to use the Common Crawl data set to analyze RDFa,
> Microdata and Microformats usage. I took some time last week to start
> that work, here are the findings:
>
> http://manu.sporny.org/2012/structured-data-searching/
>
> -- manu
>
> --
> Manu Sporny (skype: msporny, twitter: manusporny)
> Founder/CEO - Digital Bazaar, Inc.
> blog: PaySwarm vs. OpenTransact Shootout
> http://manu.sporny.org/2011/web-payments-comparison/
>
Received on Friday, 17 February 2012 16:33:50 UTC