Feature request: Extract links as the user scroll down (general-purpose, any site)

§

Posté le: 01/11/2020

Édité le: 01/11/2020

I'm not good with advanced programming (only the basics of JS), but anyway, can someone make a general-purpose version of this: https://greasyfork.org/en/discussions/requests/57590-extract-links-to-tweet-and-media-on-twitter-as-you-scroll-down

Extracts not only links to external sites, but also links to images, and any external files.

This is also very useful for infinite-scrolling AND pagination, because of how console logs can persist as you navigate to another page. This certainly will help saving any pages to the WBM. I really hate having to only extract links on a page 1-at-a-time, and would like it so that it auto-extracts as I go from page to page. It should execute on page load and when scrolling in case of JS-loaded content (otherwise it only grab links that are loaded from the HTML but not when generated by code)

I wanted to automate saving github pages. Thanks to the URL format, I don't have to go to each repository pages just to get the download link:

Github repository URL format:
	https://github.com/Username/RepositoryName
	
	To download the files:
		https://github.com/Username/RepositoryName/archive/master.zip
		
			^That redirects to:
				https://codeload.github.com/Username/RepositoryName/zip/master.zip

hacker09

§

Posté le: 02/11/2020

Édité le: 02/11/2020

Signaler un commentaire

I understand what you mean, but that would take some good amount of time to code, plus I think that there's a better way.

1 make a button to call this "function" document.querySelectorAll("a[href*='http']").href
2 add that to an array or inside a for, then display all urls on that page on the browser console or somewhere else on that page.
3 Even if the page auto loads something else, just click on that button to pick up all the new links loaded

like this
for (var i =document.querySelectorAll("a[href*='http']").length; i--;)
{
console.log(document.querySelectorAll("a[href*='http']")[i].href);
}

or add the links to an array

var animeidResult = []; //Creates a new blank array
var animeid = document.querySelectorAll("a[href*='http']");
for (var i = 0; i < animeid.length; i++) { //Starts the for condition
animeidResult.push(animeid[i].href); //Add The animeid To The Array
}

animeidResult.join('\n'); //display the links

Scripter113

§

Posté le: 02/11/2020

Signaler un commentaire

I prefer semi-automated-like method (not having to press a button), without having to press a button, like as I surf, it autoextracts links.

hacker09

§

Posté le: 02/11/2020

Signaler un commentaire

Then add the codes I did inside a function and call that function as you scroll the website
window.addEventListener('scroll',FunctionName);

Scripter113

§

Posté le: 02/11/2020

Édité le: 02/11/2020

Signaler un commentaire

I chose the first one, this wasn't so bad:

window.addEventListener('scroll',ExtractLinks);

function ExtractLinks() {
	for (var i =document.querySelectorAll("a[href*='http']").length; i--;)
		{
			console.log(document.querySelectorAll("a[href*='http']")[i].href);
		}
}

Wait a minute, it is only getting some of the links. It does not work with relative paths (a href="../"), let me edit that...

Ah, there we go:

window.addEventListener('scroll',ExtractLinks);

function ExtractLinks() {
	for (var i =document.querySelectorAll("a[href]").length; i--;)
		{
			console.log(document.querySelectorAll("a[href]")[i].href);
		}
}

Ack that is spitting out duplicates but not all the links.

Scripter113

§

Posté le: 02/11/2020

Signaler un commentaire

@hacker09 IDK how selectors work, what I'm testing this on github's search page, and have issues where it: it extracts the some links but ignore others:

window.addEventListener('scroll',ExtractLinks);

function ExtractLinks() {
	for (var i =document.querySelectorAll("a[href*='http']").length; i--;)
		{
			console.log(document.querySelectorAll("a[href*='http']")[i].href);
		}
	for (var i =document.querySelectorAll("a[href*='/']").length; i--;)
		{
			console.log(document.querySelectorAll("a[href*='/']")[i].href);
		}
}

I suck at programming these types of things

Scripter113

§

Posté le: 02/11/2020

Édité le: 02/11/2020

Signaler un commentaire

Ok, I think this works:

window.addEventListener('scroll',ExtractLinks);

function ExtractLinks() {
	for (var i =document.querySelectorAll("a").length; i--;)
		{
			console.log(document.querySelectorAll("a")[i].href);
		}
	for (var i =document.querySelectorAll("img").length; i--;)
		{
			console.log(document.querySelectorAll("img")[i].src);
		}
}

hacker09

§

Posté le: 02/11/2020

Édité le: 02/11/2020

Signaler un commentaire

If you want this script to work only on github, you can use and add .match on
console.log(document.querySelectorAll("a")[i].href);
console.log(document.querySelectorAll("img")[i].src);

But you need to make the regex for links like
https://github.com/Username/RepositoryName
https://github.com/Username1/RepositoryName2
https://github.com/hacker09/Scripts123

If you don't know regex, ask here https://webchat.freenode.net/#regex

Then it's pretty easy to change the links of the Github repository URL format:
https://github.com/Username/RepositoryName

To download the files links: https://github.com/Username/RepositoryName/archive/master.zip

^That redirects to links like: https://codeload.github.com/Username/RepositoryName/zip/master.zip

Scripter113

§

Posté le: 02/11/2020

Signaler un commentaire

Crud, when tested on tumblr, the images aren't logged. Test: https://kujoushino.tumblr.com/post/619891748515266560

hacker09

§

Posté le: 02/11/2020

Édité le: 02/11/2020

Signaler un commentaire

deleted

Scripter113

§

Posté le: 02/11/2020

Signaler un commentaire

I said tumblr, not twitter.

hacker09

§

Posté le: 02/11/2020

Édité le: 02/11/2020

Signaler un commentaire

deleted

hacker09

§

Posté le: 02/11/2020

Édité le: 02/11/2020

Signaler un commentaire

add this to your function,but this is returning the images in the reverse order, I'm not sure why
for (var i =document.querySelector("div[class*='photoset'] > iframe").contentDocument.querySelectorAll("img").length; i--;)
{
console.log(document.querySelector("div[class*='photoset'] > iframe").contentDocument.querySelectorAll("img")[i].src);
}

Scripter113

§

Posté le: 08/02/2022

Édité le: 08/02/2022

Signaler un commentaire

Uhh, just today, it is no longer logging in the console log.

EDIT: Nevermind, was using the mobile twitter URL.

Greasy Fork

Feature request: Extract links as the user scroll down (general-purpose, any site)

Poster une réponse