Itsnotlupus' MiddleMan

inspect/intercept/modify any network requests

Questo script non dovrebbe essere installato direttamente. È una libreria per altri script da includere con la chiave // @require https://update.greasyfork.org/scripts/472943/1320613/Itsnotlupus%27%20MiddleMan.js

Autore
itsnotlupus
Versione
1.5.2
Creato il
12/08/2023
Aggiornato il
31/01/2024
Licenza
MIT

MiddleMan - insert yourself between a page and its network traffic

This places a layer of middleware between a page's codebase and the fetch() and XMLHttpRequest APIs they pretty much all rely on.

It relies heavily on the Request and Response Web APIs, and you'll want to familiarize yourself with those first.

Beyond that, using it is simple. You get one object, middleMan, and you can .addHook() and .removeHook() on it.

middleMan.addHook("https://example.com/your/route/*", {
    requestHandler(request) {
      console.log("snooped on request:", request);
    }
});

Your request handler can return nothing to keep the current Request unchanged.
It can return a different Request object to change what will happen over the network. It can also return a Response object, in which case the network request will be skipped altogether and the Response will be used instead.

Finally, it can return a Promise to either a Request or a Response.

Hooks are called in the order they were registered. Each hook is given the Request object obtained from the previous hook.
When a request hook returns a Response, no further request hooks are called.

middleMan.addHook(/https:\/\/example\.org\/foo/bar\/*.json/, {
    async responseHandler(request, response, error) {
        console.log("snooped on response:", response, error);
        const data = await response?.json() ?? { errorMessage: error.message };
        data.topLevel = "IMPORTANT";
        return Response.json(data);
    }
});

In the example above, we used a regular expression for the route rather than a string with wildcards. Use whatever makes sense.

Response handlers work similarly to request handlers. They can return nothing to keep the current response, or they can return a new response.
Unlike request handlers, they take 3 parameters: a Request, a Response, and an Error. If a response is passed, you probably won't get an error, and vice-versa. The presence of the associated Request object in the response handler means you only need one handler to capture everything about a given request.

Notably, if a response handler is given an Error, indicating that the network request failed fundamentally (maybe the client is offline, maybe the server is down, or maybe there's a CORS issue), it can still choose to return a Response for it, at which point the remaining matching response handlers will be called to massage that response, and it will be passed to the fetch/xhr calling code as if it was the actual response.

All matching response hooks are called in order.

If a request gets redirected, response handlers that match the original requested URL AND the final destination URL will be used.

That's the gist of it. Here's a little bit of middleware that logs all graphql requests and responses, and replaces "dog" with "cat" on Twitter:

middleMan.addHook("https://twitter.com/i/api/graphql/*", {
  async responseHandler(request, response, error) {
    console.log("RESPONSE HANDLER saw", { request, response, error });
    if (error) return;
    const data = await response.json();
    console.log("data=", data);

    function traverse(obj, find, update) {
      if (!obj || typeof obj != 'object') return;
      Object.keys(obj).forEach(k => {
        if (find(obj, k)) obj[k]=update(obj[k]);
        else traverse(obj[k], find, update);
      });
    }

    traverse(data, 
      (obj,key) => ['full_text','text','description'].includes(key) && typeof obj[key] == 'string', 
      s=> s.replace(/\bdog\b/ig, 'cat'));

    return Response.json(data);
  }
});

It produces results like this:

Finally, middleMan.removeHook() takes the same parameters as .addHook().
It's a similar pattern as addEventListener() vs removeEventListener().

If there are relevant network requests firing early in a page lifecycle, don't forget to use @run-at document-start in your script metadata block.
Even then, your userscript extension may not be able to start your script early enough to intercept the first few requests.
When in doubt, log what's happening. (print debugging best debugging.)

That's it. Have fun!

(Disclaimer:
This was put together by eyeballing XHR's behaviors, occasionally squinting at a WhatWG spec, and coming up with great rasons why failing so many WPT tests is actually fine.
Generally, anything that's fetch()-based should "just work."
The most obvious functionality gap on the XHR side is that synchronous requests are not supported and will throw an exception if a site tries.
(It turns out there are remarkably few good reasons to use those.)

I'm still squashing behavioral discrepancies with the real XHR, and most (but not all) of my test sites are happy with it now.
Some discrepancies are unlikely to ever go away, particularly when it comes to the exact timing in which things synchronously happen.

Anyway, it's probably fine for what you need, even if it's still a wee bit buggy. Proceed with caution.
)