General URL Cleaner

Vexe

§

Posted: 2016-08-28

Report comment

Code fixes for Newegg, eBay and Amazon URLs

Newegg

Need to filter on just /Product/Product.aspx to avoid breaking search pages.

// @include        /^https?://www\.newegg.c(om|a)/Product/Product.aspx/.*$/

eBay

Preserve "orig_cvip=true" in the URL parameters so that "view original listing" on a sold or completed item will work.

function cleanEbayItem(url) {
    if (url.match(/orig_cvip=true/))
    {
        var tail = "?orig_cvip=true";
    }
    else
    {
        var tail = "";
    }
    return 'http://' + url.split('/')[2] + '/itm' + url.match(/\/[0-9]{11,13}/) + (url.match(/#[A-Za-z]+$/)||'') + tail;
}

Amazon

Preserve in-page anchors (#) in URLs so that clicking the link to jump to reviews works.

function cleanAmazonItemdp(url) {
    if (url.match(/#.*/))
    {
       var tail = url.match(/#.*/);
    }
    else
    {
       var tail = "";
    }
    return 'https://' + url.split('/')[2] + url.match(/\/dp\/[A-Z0-9]{10}/) + tail;
}

Thanks so much for continuing to keep such a great script updated!!

KnowbodyAuthor

§

Posted: 2016-08-29

Edited: 2016-08-30

Report comment

Instead of writing an if-else block like this: if (thing3) { tail = thing3; } else { tail = ''; } return thing1+thing2+tail;

You can just add it directly like this: return thing1+thing2+(thing3||''); And if thing3 is null, it just uses the blank string.

But I've modified a few of the URL cleaning functions so instead of taking in just the href attribute, they take in the entire a object. And that makes it easy to get certain bits from the url, and it gives better performance.

So, you can parse the href attribute, or you can parse other attributes, such as origin, protocol, host, pathname, search, or hash (which each contain a part of the full href string, and are faster to get and faster/easier to parse than the full href string)

For example: a.href = 'https://www.google.com/search?q=bleep#q=meep' a.protocol = 'https:' a.host = 'www.google.com' a.origin = 'https://www.google.com' a.pathname = '/search' a.search = '?q=bleep' a.hash = '#q=meep'

See if this works for ebay, on version 2.6+: function cleanEbayItem(a) { return a.origin+'/itm'+a.pathname.match(/\/[0-9]{11,13}/)+(a.search.match(/\?orig_cvip=[^&]+/)||'')+a.hash; }

Doing it this way means instead of using a regex to parse the href string to get the hash, I can just use a.hash, which gives better performance. And if there is no hash, it's already a blank string, so you don't have to worry about dealing with "null".

Also, how are newegg searches broken? It should already be checking for path=='/Product/Product.aspx' (where path is either a.pathname or location.pathname)

Vexe

§

Posted: 2016-09-09

Edited: 2016-09-09

Report comment

I'm trying to figure out what was broken in Newegg, and now I can't repro, of course...

I am definitely not a good JS programmer, especially for things like in-browser performance. So I defs appreciate your much-improved version. :)

If Newegg is still borked somewhere I'll make sure to post.

Addendum - the eBay fix didn't work.

Addendum 2 - this works:

function cleanEbayItem(a) {
    return a.origin+'/itm'+a.pathname.match(/\/[0-9]{11,13}/)+'?'+(a.search.match(/orig_cvip=[^&]+/)||'')+a.hash;
}

The version you posted above expects orig_cvip= to be the first querystring parameter, but it doesn't look like that's the case with most eBay URLs. Adding the question mark doesn't appear to affect page anchors, either, so ebay.com/itm/$num?#anchor works.

KnowbodyAuthor

§

Posted: 2016-09-09

Report comment

Well, I wasn't a very good js programmer when I originally started writing userscripts to clean URL's. I'm still not sure I'd call myself an expert. I'm learning more as I keep this script maintained.

One way to do it is by searching for either "?" or "&" using a character class [?&], and then replacing "&" with "?": a.search.match(/[?&]orig_cvip=[^&]+/)[0].replace('&','?')

But probably the best way to do it is just replace all the "&" characters with "?" first, and then get the regex match (which works because we only need the one parameter):

a.search.replace('&','?').match(/\?orig_cvip=[^?]+/)||'')

This also means it doesn't include a "?" in the URL if it doesn't match anything.

Vexe

§

Posted: 2016-09-20

Report comment

I figured out where Newegg breaks - it happens on main product pages, like http://www.newegg.com/Components/Store - the product links on the page there are incorrectly cleaned up, so they end up all being http://www.newegg.com/Products/Product.aspxnull.

The same thing happens to ProductList.aspx pages - the url is rewritten in the address bar to the same null address. Rewrites should only be done on http://www.newegg.com/Products/Product.aspx.

KnowbodyAuthor

§

Posted: 2016-09-21

Edited: 2016-09-21

Report comment

I didn't see any URL's with null in them, but I changed the way newegg link cleaning works so it now looks for and deletes parameters from the search portion of the URL, which also works on a lot more of the newegg URL's.

I also added a hash deleter, which allows hash links to work, but immediately deletes them from the page URL after they're used.

Vexe

§

Posted: 2017-03-27

Edited: 2017-03-27

Report comment

a.search.match(/[?&]orig_cvip=[^&]+/)[0].replace('&','?')
But probably the best way to do it is just replace all the "&" characters with "?" first, and then get the regex match (which works because we only need the one parameter):
a.search.replace('&','?').match(/\?orig_cvip=[^?]+/)||'')
This also means it doesn't include a "?" in the URL if it doesn't match anything.

This doesn't actually work. Not sure why, but I edited the newest version of your script with my previously posted code, and it properly includes the orig_cvip querystring parameter.

return a.origin+'/itm'+a.pathname.match(/\/[0-9]{11,13}/)+'?'+(a.search.match(/orig_cvip=[^&]+/)||'')+a.hash;

Also, the Amazon cleaner breaks sponsored links, The below works to identify sponsored links (they include "picassoRedirect" in the URL) and rewrites them as standard links.

function cleanAmazonRedir(url) {
    return (url.replace(/\/gp(.*)%2Fdp%2F/,"/dp/")).replace(/%2(.*)/,'');
}

...

    amazon:function(a) {
        if (amazon.test(a.host))
            if (a.pathname.includes('/dp/')) a.href = cleanAmazonItemdp(a);
            else if (a.pathname.includes('/gp/product')) a.href = cleanAmazonItemgp(a);
            else if (a.pathname.includes('picasso')) a.href = cleanAmazonRedir(a.href);
            else if (a.search) a.href = cleanAmazonParams(a.href);
            if (a.pathname.includes('/ref=')) a.pathname = cleanAmazonParams(a.pathname);
    },

Greasy Fork