Adding jq to Linux

Kernel, Main, Utilities & Applications, Miscellaneous Devices.
theypsilon
Scripting Wizard
Posts: 105
Joined: Sun May 24, 2020 8:20 pm
Been thanked: 40 times

Adding jq to Linux

Unread post by theypsilon »

From its web:
jq is like sed for JSON data - you can use it to slice and filter and map and transform structured data with the same ease that sed, awk, grep and friends let you play with text.
So, it's a small CLI tool that would be very useful for handling API responses such as Github API responses (which are JSON) in MiSTer scripts.

The main advantages are that jq is much more robust and less error prone for its task, and is expected to be more performant in general. With it you don't need to build your own throwaway partial-JSON parser made with regexes, as jq has a fully conformant parser already built-in. jq should also perform better in theory, in both speed and memory usage, as it is tuned specifically for JSON parsing and querying. Plus, jq is commonly used in industry, and it's proven to be stable.

Main use for jq on MiSTer would be within the Updater. It would be used there to deal with Github API calls in a more robust and performant way. This would also help in a move towards using Github API instead of Github HTML pages whenever possible.

jq source can be downloaded from: https://stedolan.github.io/jq/download/

It has to be built from source for MiSTer Linux. According to that page, jq is written in C and has no runtime dependencies.
Locutus73
Core Developer
Posts: 51
Joined: Mon May 25, 2020 9:55 am
Has thanked: 1 time
Been thanked: 8 times

Re: Adding jq to Linux

Unread post by Locutus73 »

According to some tests (not performed by me) using GitHub API, whenever you use it without being logged with an account (the use case of the updater and other scripts), GitHub throttles you very soon, making this approach not very useful for a script like the updater which must perform many many interactions. The HTML approach, on the other hand, may trigger the anti abuse system when heavily parallelized (PARALLEL_UPDATE="true"), but it is an extreme condition, and no throttling occurs without using an actual account.

Regards.

Locutus73
theypsilon
Scripting Wizard
Posts: 105
Joined: Sun May 24, 2020 8:20 pm
Been thanked: 40 times

Re: Adding jq to Linux

Unread post by theypsilon »

Yes, you are right. According to this ( https://developer.github.com/v3/#rate-limiting ) unauthenticated requests to Github API are limited to 60 per hour, which would be a problem if we translate every fetch of a HTML page into a Github API request. But in my mind, there is a much better solution.

Ideally, for the mid-term, the goal would be doing a single request to fetch a small DB containing all the relevant links for the Updater. This DB would be built on Github servers, by scheduling a task every 20 minutes thanks to Github Actions. Additionally all the links contained in that DB should be pointing to "https://raw.githubusercontent.com/${DIRECTORY}" instead of "https://github.com/${DIRECTORY}?raw=true". Like this, we go straight to a static server and won't be penalized for using PARALLEL_UPDATE="true" .

This combined solution should make the Updater much faster. And there is a working example very close to us that is using it already. It's basically the approach taken by nilp0inter for feeding his MiSTer_WebMenu. Here is for example, how his DB looks like: https://github.com/nilp0inter/MiSTer_We ... devel.json

With this approach inspired by nilp0inter solution, the user doesn't need to do even a single Github API call. And here is where jq comes into play. Once you have built a DB, you could download it easily from a raw.githubusercontent.con URL, store it in the RAM, and parse it with jq on demand. No need to fetch anything else externally other than the Cores, MRAs and other MiSTer resources. And again, because we prepared the links to be from a https://raw.githubusercontent.com/ URL, the speed and the parallelization opportunities are much better.

In the end of the day, that DB doesn't need to be JSON. It could be some custom format. But I think it makes more sense to stick with JSON, because that way we can evolve the Updater code in incremental steps and have less abrupt changes, and also JSON is easy to navigate. For that I feel that jq would fill the gaps pretty nicely. Plus, nilp0inter told me already that he will be happy if we use his code, so we can even have a head start by adopting parts of his current solution.
Locutus73
Core Developer
Posts: 51
Joined: Mon May 25, 2020 9:55 am
Has thanked: 1 time
Been thanked: 8 times

Re: Adding jq to Linux

Unread post by Locutus73 »

Well, the idea of an external DB has been already discussed in the past, but it goes against the basic, seminal goals of the updater. When I started to write it I had this two basic goals I'd like to mantain in the future:
1) The updater must be self contained and self sustained: it must not to require any additional external resource apart from what was originally/natively available with the MiSTer project.
2) The updater must be totally transparent for Sorgelig (and other developers) not requiring a single change to his usual workflow.

Basically it must work by itself with no additional hassle for anyone. I agree that an external DB would make the updater faster, but it would break rule 1, requiring additional stuff to be mantained. Ok the db could be generated automatically by another script, but I don't want ties and knots to external resources (apart what is natively available in the project as it is). Using a reductio ad absurdum, if we want to set up an external resource for updating, why reinventing the wheel? We could just set up an APT repository and use APT tools (even Aptitude) for installing, disinstalling and updating stuff (cores, Linux, whatever), but, again, this has been discussed too: it would bloat the minimal MiSTer Linux and nobody want to be a distro mantainer.

On the other side any hint/tip in order to optimize the updater without using additional external resources is welcome: on January I followed some hints in order to check only repositories that have been updated since the latest successful run using a single API call and this dramatically, I mean, really dramaticaly improved performances. This kind of improvement is compatible with rules 1 and 2.

Oh, I almost forgot: there is an intrinsic rule n. 0
0) The updater won't do anything illegal, anything harming the project, anynthing that goes against Sorg wishes (we are all guests in Sorg's home).
So the updater won't ever download copyrighted stuff or download cores outside mister-devel since this (in my personal opinion which may not reflect other dev/people ideas) promotes fragmentation which is not good. Downloading stuff only from mister-devel should incentivate devs to join mister-devel itself. Other devs can fork and make their own updaters, it's their right, since the code is GPL, but promoting turn-key solutions which helps this IMHO harms the project because nullifies any incentive to join mister-devel... it may be really welcome by users, but it's bad for MiSTer as a project. Again, these are my personal 2 cents and don't need to reflect other people/devs opinions.

Regards.

Locutus73
theypsilon
Scripting Wizard
Posts: 105
Joined: Sun May 24, 2020 8:20 pm
Been thanked: 40 times

Re: Adding jq to Linux

Unread post by theypsilon »

Ok the db could be generated automatically by another script, but I don't want ties and knots to external resources (apart what is natively available in the project as it is). Using a reductio ad absurdum, if we want to set up an external resource for updating, why reinventing the wheel? We could just set up an APT repository and use APT tools (even Aptitude) for installing, disinstalling and updating stuff (cores, Linux, whatever), but, again, this has been discussed too: it would bloat the minimal MiSTer Linux and nobody want to be a distro mantainer.
Yes, the DB is autogenerated. That's important to remark. Doesn't need by hand maintenance.

I respectfully disagree with that being a valid reductio ad absurdum. It wouldn't be because you are talking of moving to a total different infrastructure, that would be very disrusptive with the current MiSTer ecossytem.

A Github Action only requires a very simple .github/workflows/action.yml file and to point to some script. All within the same repository of Updater_script_MiSTer. It is arguably self-contained indeed, since that auto generated DB would become an in-MiSTer-devel Github resource like any other you rely on (Wiki, for example). And it would be even more self-contained than most, because it would live in the exact same repository as the updater. It could even be distributed within the same mister_updater.sh file if you are aiming for that.

I'm a bit surprised by the frontal opposition to this kind of solution. It would not only be a more robust solution for users, and a more performant one. But also it would be easier to maintain. No need to deal with HTML pages that Github can change whenever they please, since they are not commited to HTML stability.

I would like to kindly ask you to reconsider, since it doesn't really violate your rule 1. (Neither the 2, or the 0 obviously, that would be implicit).
Locutus73
Core Developer
Posts: 51
Joined: Mon May 25, 2020 9:55 am
Has thanked: 1 time
Been thanked: 8 times

Re: Adding jq to Linux

Unread post by Locutus73 »

theypsilon wrote: Fri Jun 26, 2020 12:09 pm A Github Action only requires a very simple .github/workflows/action.yml file and to point to some script. All within the same repository of Updater_script_MiSTer. It is arguably self-contained indeed. Since that auto generated DB would become an in-MiSTer-devel Github resource like any other you rely on (Wiki, for example). And it would be even more self-contained than most, because it would live in the exact same repository as the updater. It could even be distributed within the same mister_updater.sh file if you are aiming for that.
I'll study that.
I still like the idea that mister_updater.sh can be locally executed without any https://github.com/MiSTer-devel/Updater_script_MiSTer being up and running, and still working with just the wiki and the core repos being up and running... and actually GitHub is not committed to HTML consistency, but realistically they made one change in two years that required 15 minutes of coding to fix... but I'll study/evaluate the workflows anyway.
Thnx.

Locutus73
theypsilon
Scripting Wizard
Posts: 105
Joined: Sun May 24, 2020 8:20 pm
Been thanked: 40 times

Re: Adding jq to Linux

Unread post by theypsilon »

I think I get what you mean by having mister_updater.sh with the ability of being locally executed without its repo. It would be helpful for development/debugging.

In practice though, for one, Updater_script_MiSTer needs to be up and running anyways since the users will run update.sh which fetches from there. And for devs, you can place a flag that would indicate to not fetch the DB from internet, but from the current folder in your machine. I use that method extensively in my scripts (with 'export') and works really well.

About HTML stability. Nobody knows when they will change HTML next time, and in which part of the app. They could even do it without changing the UI at all, which would be very tricky to debug. And also important, parsing HTML and being dependant of a web structure, is always gonna make your code look noticeably more complicated than when you use a model crafted specifically for your needs. So the DB approach is again an opportunity for improving maintainability.
Locutus73
Core Developer
Posts: 51
Joined: Mon May 25, 2020 9:55 am
Has thanked: 1 time
Been thanked: 8 times

Re: Adding jq to Linux

Unread post by Locutus73 »

Alternatively the updater could use a dual strategy: using an hosted db if it exists and it has been recently updater, otherwise doing its usual HTML scrubbing (which I somehow like, because, although being fragile to HTML changes, it can be ported to other hostings without APIs)... I'll think about it. Actually it won't be a strict or high priority, current updater is quite quick (or not unbearably slow :D ), but I'll think/explore the workflows approach which seems interesting to study anyway.
Thank you for the hint (it's not the only useful one you provided)!

Regards.

Locutus73
theypsilon
Scripting Wizard
Posts: 105
Joined: Sun May 24, 2020 8:20 pm
Been thanked: 40 times

Re: Adding jq to Linux

Unread post by theypsilon »

Yes, I think that's a good way of evolving the script design.

You are welcome. Feel free to ping me about this whenever you want.
theypsilon
Scripting Wizard
Posts: 105
Joined: Sun May 24, 2020 8:20 pm
Been thanked: 40 times

Re: Adding jq to Linux

Unread post by theypsilon »

Coming back to the topic of the thread. For any further advance, I think jq would be a really great addition. I hope @Sorgelig or anybody involved in OS maintenance could consider it.
Locutus73
Core Developer
Posts: 51
Joined: Mon May 25, 2020 9:55 am
Has thanked: 1 time
Been thanked: 8 times

Re: Adding jq to Linux

Unread post by Locutus73 »

Did you try to see if there’s a Debian ARM deb package with a compatible pre compiled binary, for downloading+unpacking it just like I did in the updater for unrar binary (and other binaries in other scripts)?

Regards.

Locutus73
theypsilon
Scripting Wizard
Posts: 105
Joined: Sun May 24, 2020 8:20 pm
Been thanked: 40 times

Re: Adding jq to Linux

Unread post by theypsilon »

I've just tried it.

There is this one: https://packages.debian.org/search?arch ... eywords=jq

I tested this version: https://packages.debian.org/jessie/armhf/jq/download

And is working fine on my MiSTer. I've just followed the tutorial ( https://stedolan.github.io/jq/tutorial/ ) without issues.
User avatar
Sorgelig
Site Admin
Posts: 877
Joined: Thu May 21, 2020 9:49 pm
Has thanked: 2 times
Been thanked: 211 times

Re: Adding jq to Linux

Unread post by Sorgelig »

I can include jq
theypsilon
Scripting Wizard
Posts: 105
Joined: Sun May 24, 2020 8:20 pm
Been thanked: 40 times

Re: Adding jq to Linux

Unread post by theypsilon »

That's great, thanks a lot.
Post Reply