§
Posté le: 27/05/2014

Extract links on page

Can someone make a script that extracts all links (or ones that follow a filter) from a page?

Something like

https://addons.mozilla.org/en-us/firefox/addon/link-gopher/

Now you might say "Why not just use that?" Well because it doesn't always work.

What would you want the output to look like?

The shortest path to a collection of links is: https://developer.mozilla.org/en-US/docs/Web/API/document.links

There may also be URLs launched using scripts associated with various elements in the page. Those are difficult to extract.

§
Posté le: 28/05/2014
Édité le: 28/05/2014

Well just a blank page with the links in plain text would be fine with me (like the extension itself does)

Basically if i can see the links on the page i want to be able to extract them

Hmm, I found an old bookmarklet I created 8 years ago. Maybe this will tide you over?

To install a bookmarklet, copy the code, then right-click your Bookmarks Toolbar and choose New Bookmark, paste the code in Location, and give it a name like ListLinks.

javascript:function%20fwbr(str){return%20str.replace(/([&\?\*=])/g,"$1<wbr>");};loc=fwbr(location.href);x=document.getElementsByTagName("A");w=document.getElementsByTagName("AREA");y=window.open();y.document.write("<html><head><title>Links!</title><style>p{line-height:1.2em}%20a{text-decoration:none;border-bottom:1px%20dotted%20blue}%20img{border:none}</style></head>\n<body><h3>Anchor%20(&lt;A&gt;)%20Links%20from<br>"+loc+"</h3>\n");for(n=0;n<x.length;n++){if(x[n].href!=""){y.document.write("<p><a%20href=\""+x[n].href+"\">");if(x[n].textContent.replace(/\s+/,"").length<1){if(x[n].childNodes.length>0){for(j=0;j<x[n].childNodes.length;j++){if(x[n].childNodes[j].nodeName=="IMG"){y.document.write("<img%20src=\""+x[n].childNodes[j].src+"\"%20alt=\""+x[n].childNodes[j].alt+"\">");if(x[n].childNodes[j].alt!="")y.document.write("<br><em>Alt%20Text:</em>%20"+x[n].childNodes[j].alt);if(x[n].childNodes[j].title!="")y.document.write("<br><em>Title%20Tip:</em>%20"+x[n].childNodes[j].title);y.document.write("</a>");%20break;}else%20if(j+1==x[n].childNodes.length)%20y.document.write("[Background%20Image%20or%20Color%20Only]</a>");}}else%20y.document.write("[Background%20Image%20or%20Color%20Only]</a>");}else%20y.document.write(x[n].textContent.replace(/^\s+/,"")+"</a>");y.document.write("<br>\n"+fwbr(x[n].href));%20if(x[n].onclick){y.document.write("<br>\n(onclick%20event%20handler%20not%20shown)");};y.document.write("</p>\n");}}if(w.length>0){y.document.write("<h3>Image%20Map%20(&lt;AREA&gt;)%20Links%20from<br>"+loc+"</h3>\n");for(n=0;n<w.length;n++){if(w[n].href!="")%20y.document.write("<p><a%20href=\""+w[n].href+"\">"+fwbr(w[n].href)+"</a></p>");}}%20y.document.write("</body></html>");y.document.close();void%200;

As for doing a userscript, what kind of user interface are you looking for? It's not so convenient to use the "monkey menu" and I am not aware of a way to integrate with the right-click context menu. That leaves the option of adding a button into the page or defining a keyboard shortcut (with the potential for a conflict if it's not unique).

§
Posté le: 29/05/2014

Well in the extension there's a button on the add-on bar that you right click and it gives you the option to extract all links or do it by a filter.

Is it possible to do that with a greasemonkey script?

§
Posté le: 29/05/2014
Is it possible to do that with a greasemonkey script?

It is possible, but not necessary (IMO;)

I do not think user scripts can create their own toolbar buttons. On the Greasemonkey button drop-down, you will find "User Script Commands" if a script is enabled for the current page. So it's buried a bit. Still, if you do not need it very often, that might be convenient enough.

By the way, this extension was updated on May 19th to version 1.3.3, so if you have an older version, you might want to test the current one.

§
Posté le: 01/06/2014
Édité le: 01/06/2014

i tested the new version, same problems

i wish it worked properly because it has the exact functionality i want.

§
Posté le: 01/06/2014
i tested the new version, same problems

i wish it worked properly because it has the exact functionality i want.

You did not tell what problems you are having. You mite be better of contacting the addons' author and open an issue about your problems.

§
Posté le: 06/06/2014

The problem is it doesn't always extract the links

One instance would be when the links are inside of a [code] tag on a vbulletin forum, it just completely ignores them.

Poster une réponse

Connectez-vous pour poster une réponse.