Re: How should I query ActivityStreams objects containing both JSON and HTML?

On Sat, 1 Apr 2023 at 00:12, Ryan B <w3c@ryanb.org> wrote:

> Individual situations vary, but in general, real world HTML markup tends
> to be much more brittle and change more frequently than JSON data. I've
> spent a lot of time on tools that both scrape HTML in the wild and handle
> medium to large JSON objects, and I've had better luck keeping the two
> fairly separate. I'd suggest a pre-processing pass to scrape the HTML into
> a consistent JSON schema, and only then handle that and your native JSON
> together, however you like.
>

Yes, extracting the HTML and processing separately is probably the
easiest option :)


>
> On Fri, Mar 31, 2023 at 3:06 PM Bob Wyman <bob@wyman.us> wrote:
>
>> In a Mastodon.social post
>> <https://mastodon.social/@bobwyman/110120087223817037>, I asked:
>>
>> XPath and JSONPath are similar, but different. (See JSONPath spec
>> <https://goessner.net/articles/JsonPath/>) This presents a problem for
>> me since I'm building a system to query ActivityStreams objects that can
>> include HTML wrapped in JSON.
>>
>> Should I:
>>
>>    - Use XPath syntax for both JSON and HTML?
>>    - Use JSONPath syntax for both JSON and HTML? (If so, is there a
>>    reasonable extension to JSONPath to support selecting on HTML attributes?)
>>    - Switch between JSONPath and XPath depending on the underlying
>>    datatype? (e.g. Embedding XPath in JSONPath.)
>>
>> If you were writing a query, would you accept needing to know both
>> syntaxes?
>>
>> I would appreciate any advice you might be able to provide. Also, I would
>> be interested to hear if anyone else has already been faced with and
>> addressed this issue.
>>
>> bob wyman
>>
>
>
> --
> https://snarfed.org/
>

Aaron
-- 
Aaron Gray - @AaronNGray@fosstodon.org

Independent Open Source Software Engineer, Computer Language Researcher,
Information Theorist, and Computer Scientist.

Received on Friday, 31 March 2023 23:24:21 UTC