討論 » 建立請求

Is it possible to take a given image on a webpage, and convert it to a base64 code from the image loaded on the browser without sending another request

§
發表於:2024-09-23
編輯:2024-09-23

I was wondering if this is possible, as some sites detect hotlinking, wordpress sites/tumblr loads an alternative content such as an HTML wrapper (will load an HTML file with the image instead of just the image when you refresh, and both the HTML and image on the same URL), and others simply 403s like what pixiv does. Firefox, when you have the devtools open, you can rightclick the img tag -> Copy -> "Image Data-URL" and you copied the image itself, just encoded into URI (a similar thing can be done in chrome using the network tab: Network tab -> refresh the page -> find the item that is an image -> Preview -> Rightclick it -> "Copy image as data URI").

I was thinking of a userscript can get the image data (that is the "data:image/png;base64" format) that is already loaded on the browser instead of sending a (additional) request (as if you visit the image URL directly; triggering the hotlink measure in the process).

NotYou管理員
§
發表於:2024-09-23

No. It's not possible to get an image that was loaded by browser. You have to send another request. What you can get is URL of loaded image via PerformanceObserver.

§
發表於:2024-09-23

Ok, just wondering, this isn't really suitable for web scraping storage anyways, as according to wikipedia, the base64-encoded image would be at least 33 to 37% size increase, and images are already HUGE compared to just text.

Nowadays more and more sites are becoming less tolerant with bots scraping page content (bot simply loads the page and extract data, may consume the same amount of bandwidth as a normal user loading the page or more, and web scraping have many good reasons, like search indexing), old forums now having attachment original resolution images behind a login wall (examples: nextgenupdate.com, gmtnation.com), and some outright 403s out bots even when the page is just text (like atwiki.jp). I'd assume this is due to scraping used for AI training. I've seen news about how CAPTCHAs are effectively dead (and some sites decided to crank up the difficulty in hopes that it can properly "tell computers and humans apart", but fails), a much more difficult "stuck between a rock and hard place" situation when it comes to setting up robots.txt rules if you want a real audience but not go so aggressive you have none.

§
發表於:2024-09-23
編輯:2024-09-23

I think that's possible, if you just want to grab < img > srcs and convert and store it to base64
https://greasyfork.org/en/scripts/6861-greasyfork-script-icon does almost that.

You would have to run at start, then see "who's faster" your net loading the site or your script finding and replacing links with the converted base64 data, so maybe a few img network requests would end up being sent anyway

發表回覆

登入以回復