Help With RegEx

Greasy Fork is available in English.

Help With RegEx

stuzbot

Publicado: 26/03/2020

Help With RegEx

I'm trying to knock together a script which will block certain sites from appearing on Hacker News.

I found a really basic userscript which used a RegEx with a hardwired list of domains [apologies for the screengrab but it's just one ridiculously long line and will completely knacker the page layout if I paste it in a code block!]:

It works. But it's obviously not ideal. It's pretty nasty to add or remove domains --especially given the whole RegEx part is on a single line of code, as long as your arm.

I'm trying to come up with something slightly more user friendly by storing the domains in an array and then looping through them applying the RegEx. I should say at this point that my knowledge of Javascript is practically zero. So, combine that with the fact RegExes are involved, which melt most people's brains at the best of times and you'll understand that I'm foundering here.

This is what I've come up with so far. I've snipped the list of domains for brevity:

// ==UserScript==
// @name            Hacker Choose
// @description     Get rid of wanky links from Hacker News
// @license         MIT
// @match         https://news.ycombinator.com/*
// ==/UserScript==

console.log('Hacker Poos!');

var domains = [
"arstechnica.com",
"bbc.com",
"washingtonpost.com",
"wired.co.uk",
"wired.com",
"wsj.com",
];


domains.forEach(iter);

function iter(value, index, array)
{
var reggo =`/<tr class="spacer"(?:(?!<tr)[\\s\\S])+<tr(?:(?!<tr)[\\s\\S])+<a\ href="from\\?site=` + value + `">.+?<tr[\\s\\S]+?<\\/tr>/`;

document.body.innerHTML = document.body.innerHTML.replace(new RegExp(reggo, "g"), "");

console.log(reggo);
}

As you can see, I've pretty much just lifted the RegEx from the script I found and tried to get it to work with passing in each domain in turn in the loop. But it's not blocking any of the domains on my list. The output of the console.log(reggo) line is as follows:

/<tr class="spacer"(?:(?!<tr)[\s\S])+<tr(?:(?!<tr)[\s\S])+<a href="from\?site=arstechnica.com">.+?<tr[\s\S]+?<\/tr>/

/<tr class="spacer"(?:(?!<tr)[\s\S])+<tr(?:(?!<tr)[\s\S])+<a href="from\?site=bbc.com">.+?<tr[\s\S]+?<\/tr>/

/<tr class="spacer"(?:(?!<tr)[\s\S])+<tr(?:(?!<tr)[\s\S])+<a href="from\?site=bloomberg.com">.+?<tr[\s\S]+?<\/tr>/

which, as far as I can see is identical to the RegEx in the original script I was working from:

/<tr class="spacer"(?:(?!<tr)[\s\S])+<tr(?:(?!<tr)[\s\S])+<a href="from\?site=(<snip>)">.+?<tr[\s\S]+?<\/tr>/

I can only assume that Javascript's RegEx parser is interpreting it differently to what my browser is outputting to the console. But, to be honest, I'm pretty lost here. So any pointers would be much appreciated.

PS: I've tried backslashing the dots in the domain names array [as per the original script I'm working from, but it made no difference]

woxxomMod

Publicado: 27/03/2020

Editado: 27/03/2020

Denunciar comentario

The simplest/fastest approach is to use DOM because replacing innerHTML is usually slow and is always ineffective as the browser needs to recreate the entire DOM subtree, and it removes all dynamic event listeners attached in JavaScript (not a problem here though).

const domains = new Set([
  'arstechnica.com',
  'bbc.com',
  'washingtonpost.com',
  'wired.co.uk',
  'wired.com',
  'wsj.com',
]);

for (const elSite of document.querySelectorAll('.sitestr')) {
  if (domains.has(elSite.innerText)) {
    const tr = elSite.closest('tr');
    for (const el of [tr.previousElementSibling, tr, tr.nextElementSibling]) {
      if (el) el.remove();
    }
  }
}

While the above code works it may briefly show the removed items while the page loads so a more robust solution is to run the script on document-start and use MutationObserver:

// ==UserScript==
// @name        filter ycombinator.com
// @match       https://news.ycombinator.com/
// @grant       none
// @run-at      document-start
// ==/UserScript==

const domains = new Set([
  'arstechnica.com',
  'bbc.com',
  'washingtonpost.com',
  'wired.co.uk',
  'wired.com',
  'wsj.com',
]);

const mo = new MutationObserver(onMutation);
onMutation([{
  addedNodes: [document.documentElement],
}]);

function onMutation(mutations) {
  const toRemove = [];
  for (const {addedNodes} of mutations) {
    for (const n of addedNodes) {
      if (n.className === 'sitestr') {
        if (domains.has(n.innerText)) toRemove.push(n);
      } else if (n.firstElementChild) {
        for (const el of n.getElementsByClassName('sitestr')) {
          if (domains.has(el.innerText)) toRemove.push(el);
        }
      }
    }
  }
  mo.disconnect();
  for (const n of toRemove) {
    const tr = n.closest('tr');
    for (const el of [tr.previousElementSibling, tr, tr.nextElementSibling]) {
      if (el) el.remove();
    }
  }
  mo.observe(document, {subtree: true, childList: true});
}

Publicar respuesta

Inicia sesión para responder.