Re: How should I query ActivityStreams objects containing both JSON and HTML?

Individual situations vary, but in general, real world HTML markup tends to
be much more brittle and change more frequently than JSON data. I've spent
a lot of time on tools that both scrape HTML in the wild and handle medium
to large JSON objects, and I've had better luck keeping the two fairly
separate. I'd suggest a pre-processing pass to scrape the HTML into a
consistent JSON schema, and only then handle that and your native JSON
together, however you like.

On Fri, Mar 31, 2023 at 3:06 PM Bob Wyman <bob@wyman.us> wrote:

> In a Mastodon.social post
> <https://mastodon.social/@bobwyman/110120087223817037>, I asked:
>
> XPath and JSONPath are similar, but different. (See JSONPath spec
> <https://goessner.net/articles/JsonPath/>) This presents a problem for me
> since I'm building a system to query ActivityStreams objects that can
> include HTML wrapped in JSON.
>
> Should I:
>
>    - Use XPath syntax for both JSON and HTML?
>    - Use JSONPath syntax for both JSON and HTML? (If so, is there a
>    reasonable extension to JSONPath to support selecting on HTML attributes?)
>    - Switch between JSONPath and XPath depending on the underlying
>    datatype? (e.g. Embedding XPath in JSONPath.)
>
> If you were writing a query, would you accept needing to know both
> syntaxes?
>
> I would appreciate any advice you might be able to provide. Also, I would
> be interested to hear if anyone else has already been faced with and
> addressed this issue.
>
> bob wyman
>


-- 
https://snarfed.org/

Received on Friday, 31 March 2023 23:12:09 UTC