Discussioni » Sviluppo

How to detect when infinite scrolling website finished loading everything?

§
Pubblicato: 08/12/2020
Modificato: 09/12/2020

Let's use these 2 users as examples
https://myanimelist.net/animelist/deg?status=2

https://myanimelist.net/animelist/abystoma2?status=2

The list of the user deg is really small and loads fast, but the list of the user abystoma takes a lot of time to load, every time that I scroll down the pages takes more and more time to load the next page contents, this means that it's impossible to use something like settimout/setinterval to run the function to scroll down after some time, because the timing that would work for the deg user list, wouldn't work for the user abystoma.

At first I was fetching the user profile, because it's possible to see how many items are in their completed list
so I was fetching these links is our case example
https://myanimelist.net/profile/deg
https://myanimelist.net/profile/abystoma2

async function getVariables() //Creates a function to get the needed Variables
{ //Starts the function
const response = await fetch('https://myanimelist.net/profile/deg'); //Fetch
const html = await response.text(); //Gets the fetch response
const newDocument = new DOMParser().parseFromString(html, 'text/html'); //Parses the fetch response
TotalCompletedAnimes = newDocument.querySelectorAll(".di-ib.fl-r.lh10")[1].textContent; //Creates a variable to hold the actual TotalCompletedAnimes value
} //Finishes the function
getVariables()



But I figured out that my script doesn't always work, because the website https://myanimelist.net only updates the actual TotalCompletedAnimes value sometimes, not every second, and this is a problem on my script, because my script runs a interval at every 0 seconds and checks if the actual loaded items value length are identical to the variable TotalCompletedAnimes lenght. if (TotalCompletedEntries !== TotalCompletedEntries)

This is what I did

var TotalCompletedEntries = document.querySelectorAll("td.data.number").length; //Get the actual total items loaded
console.log('Scrolling Down. Please Wait!'); //Shows a message in the console for dev purposes
interval = setInterval(function() { //Starts the Function that automatically "Press the keyboard key End"
if (TotalCompletedEntries !== TotalCompletedEntries) //If condition that detect if the whole list is loaded or not
{ //Starts the if condition
window.scrollTo(0, document.body.scrollHeight); //Scrolls the website till the whole list is loaded
} //Finishes the if condition
else //When the whole list is loaded
{ //Starts the else condition
console.log('Full List Loaded! Stopping Scrolling Down Now!'); //Shows a message in the console for dev purposes
clearInterval(interval); //Breaks the timer that scrolls the page down every 0 secs
scrape(); //Run the Scrapping Function
} //Finishes the else condition
}, 0); //Finishes the interval function that will run the function every 0 secs


But now I've just figured out another way to get the most updated number, now the variable TotalCompletedEntries length is always 100% accurate and up to date.
The problem is that I need to use an async function in a loop to fetch a website and get the accurate TotalCompletedEntries length. In the case of the user abystoma2 this takes some time,but for the user deg this process is really fast.
This is what I did


while (true) { //Starts the while condition to get the Total Number of Entries on the user completed list
console.log('This user has more than 300 Completed Entries\nGetting the Total Completed Entries Number...'); //Shows a message in the console for dev purposes
const html = await (await fetch('https://myanimelist.net/' + type + 'list/' + username + '/load.json?status=2&offset=' + nextpagenum)).json(); //Fetches the user completed list
nextpagenum += increaseby; //Increase the next page number
totalanimestwo = html.length; //Variable to get the Total Completed Entries Number
TotalCompletedEntries += totalanimestwo; //Sum the Total Completed Entries Number and add the result to the variable TotalCompletedEntries
if (totalanimestwo !== 300) //If the next page has less than 300 completed entries stop looping the whlie condition
{ //Starts the if condition
console.log('Finished Getting the Total Completed Entries Number!'); //Shows a message in the console for dev purposes
return; //Return whether or not the fetched page has less than 300 completed entries
} //Finishes the if condition
} //Finishes the while condition


My question is,is there a better and faster way to identify when the website finished loading everything?
* I've noticed that document.querySelector("div.loading-space") changes a little bit every time the page is scrolled down, but I'm not sure if this can help in anyway...

Is there an "universal" way to detect this on any website that has infinite scrolling, (until the last page is loaded)?

wOxxOmMod
§
Pubblicato: 09/12/2020

There's no special event to indicate the page has finished loading because these changes are performed directly in DOM after the page has already been loaded. They can keep occurring infinitely, which is why it's infinite scroll.

The closest to universal detection method is MutationObserver.

wOxxOmMod
§
Pubblicato: 09/12/2020

...and the usual addendum to MutationObserver is a sliding timeout aka debounce so once the changes stop occurring within the specified interval you consider the page truly loaded:

function waitForTrueLoad() {
  return new Promise(resolve => {
    let timer;
    const mo = new MutationObserver(restartTimer);
    if (document.readyState === 'complete') {
      onLoad();
    } else {
      window.addEventListener('load', onLoad, {once: true});
    }
    function restartTimer() {
      clearTimeout(timer);
      timer = setTimeout(onTrueLoad, 250);
    }
    function onLoad() {
      restartTimer();
      mo.observe(document, {childList: true, subtree: true});
    }
    function onTrueLoad() {
      window.removeEventListener('load', onLoad);
      mo.disconnect();
      resolve();
    }
  });
}
§
Pubblicato: 09/12/2020

Why do you asking this, man? What is your goal?

You need to parse all of the list? Then just go to a devtools panel and check how the loading is continues.

https://myanimelist.net/animelist/abystoma2/load.json?&status=2&offset=6183

load.json can return only 300 entries, and "offset" value can specify which ones the website should omit. As an example, the link above will omit all the list except the last one entry. So just do requests with growing offset while the returned json is not empty

§
Pubblicato: 09/12/2020
Modificato: 09/12/2020

function waitForTrueLoad()

Will fails if there will be a small temporary network issue (server or client side, does not matter)

wOxxOmMod
§
Pubblicato: 09/12/2020

It won't "fail". It'll simply resolve early. There are two usual solutions for this: increasing timeout and overriding (hooking) XMLHttpRequest methods.

§
Pubblicato: 09/12/2020
Modificato: 09/12/2020

There's no special event to indicate the page has finished loading because these changes are performed directly in DOM after the page has already been loaded. They can keep occurring infinitely, which is why it's infinite scroll.

The closest to universal detection method is MutationObserver.

I also thought about using mutation oberserver,but that is pretty much like setinterval or settimeout...

I will check your mutation observer script, but as I previously said
"The list of the user deg is really small and loads fast, but the list of the user abystoma takes a lot of time to load, every time that I scroll down the pages takes more and more time to load the next page contents, this means that it's impossible to use something like settimout/setinterval to run the function to scroll down after some time, because the timing that would work for the deg user list, wouldn't work for the user abystoma."

I also need the script to work 100% of the time,otherwise my script won't work 100% of the time

§
Pubblicato: 09/12/2020
Modificato: 09/12/2020

Why do you asking this, man? What is your goal?

You need to parse all of the list? Then just go to a devtools panel and check how the loading is continues.

https://myanimelist.net/animelist/abystoma2/load.json?&status=2&offset=6183

load.json can return only 300 entries, and "offset" value can specify which ones the website should omit. As an example, the link above will omit all the list except the last one entry.
So just do requests with growing offset while the returned json is not empty

My goal is to improve this script
https://greasyfork.org/en/scripts/407957-generate-a-list-with-the-animes-mangas-titles-that-were-re-watched-re-read

I need to parse all of the list,either by
1 Scrolling down the website until it's fully loaded
2 or I need to somehow make this https://myanimelist.net/animelist/abystoma2/load.json?&status=2&offset=6183 show me the rewatched times
3 Or I need to get the rewatched values using this api somehow https://jikan.docs.apiary.io/, but I wasn't able to make the request,because I don't know How to get the rewatched animes of a list
THEN I can run the most important part,that is the scrape function after one of the 3 things I said above are finished, so I can get the rewatched values,for now i'm needing to use the 1 and 2 things I said... Using both of them takes much more time than if I just used the 2 ou 3 thing I said

var request = new XMLHttpRequest();

request.open('GET', 'https://api.jikan.moe/v3/user/hacker09/animelist/completed/rewatch_value');

request.onreadystatechange = function () {
if (this.readyState === 4) {
console.log('Status:', this.status);
console.log('Headers:', this.getAllResponseHeaders());
console.log('Body:', this.responseText);
}
};

request.send();


I'm already "doing the requests with growing offset while the returned json is not empty",as you can see here,and as I said above,but this takes a lot of time for users with huge lists,I want to improve my script so that I can erase these codes and use another one's. But this doesn't return the rewatch values on the json response, so I'm doing this just to know how many elements the page will have when the page finished loading, so when the page finished loading my setinterval 0 will start the scrape function if (TotalCompletedEntries !== TotalCompletedEntries) //Run scrape function after the if condition is false,when the if condition is true keep scrolling the page down till the page have all the elements that it's supposed to have

while (true) { //Starts the while condition to get the Total Number of Entries on the user completed list
console.log('This user has more than 300 Completed Entries\nGetting the Total Completed Entries Number...'); //Shows a message in the console for dev purposes
const html = await (await fetch('https://myanimelist.net/' + type + 'list/' + username + '/load.json?status=2&offset=' + nextpagenum)).json(); //Fetches the user completed list
nextpagenum += increaseby; //Increase the next page number
totalanimestwo = html.length; //Variable to get the Total Completed Entries Number
TotalCompletedEntries += totalanimestwo; //Sum the Total Completed Entries Number and add the result to the variable TotalCompletedEntries
if (totalanimestwo !== 300) //If the next page has less than 300 completed entries stop looping the whlie condition
{ //Starts the if condition
console.log('Finished Getting the Total Completed Entries Number!'); //Shows a message in the console for dev purposes
return; //Return whether or not the fetched page has less than 300 completed entries
} //Finishes the if condition
} //Finishes the while condition

§
Pubblicato: 10/12/2020

Did you guys understand?

§
Pubblicato: 10/12/2020

@wOxxOm
I've just tested your script,but your script is to detect when the page is opened and fully loaded, your script does't do anything after I scroll down and the page keeps loading

wOxxOmMod
§
Pubblicato: 10/12/2020

That's because this is what the script was intended to do: it disconnects after the first pause. If you want to always react to DOM changes then simply use MutationObserver directly, no need for the extra code.

§
Pubblicato: 10/12/2020

Did you guys understand?

Actually, not really.. And unfortunately, neither load.json, neither api.jikan.moe are not able to show a rewatch count.

Also, if you'll look at the screenshot, you will see that even the website itself can't properly handle the rewatch count order. That's a bad news for you

§
Pubblicato: 10/12/2020
Modificato: 10/12/2020

I've just tested function waitForTrueLoad(), but it's to detect when the page is opened and fully loaded, your script does't do anything after I scroll down and the page keeps loading

While testing it you should have been also launch some endless function at background that does scrolling to the page end, and also you need to increase a timer delay at least to 5000 instead of 250, because during this time these two actions must be completed: scrolling to end, and finishing the new list options download.

Scroll to end command: window.scrollTo(0, document.body.scrollHeight);

§
Pubblicato: 10/12/2020
Modificato: 10/12/2020

@wOxxOm
You are right, I've just modified your code a little bit and that worked perfectly. But this isn't really what I want... I can do the same using a set interval of 0 to scroll down infinitely.

My question is, how can I know when the website is fully loaded, (and there's nothing else to load even if I keep scrolling down)? (I mean,even if the setinterval of 0 keeps scrolling down nothing is loaded, because the website is already fully loaded,how can I detect this?)

I only need the onTrueLoad() function of your code, to run just one time, after it's 100% sure that there's nothing else to be loaded...


function waitForTrueLoad() {
return new Promise(resolve => {
let timer;
const mo = new MutationObserver(restartTimer);
if (document.readyState === 'complete') {
onLoad();
console.log('waitForTrueLoad')
} else {
window.addEventListener('load', onLoad, {once: true});
console.log('else')
}
function restartTimer() {
clearTimeout(timer);
timer = setTimeout(onTrueLoad, 250);
console.log('restartTimer')
}
function onLoad() {
restartTimer();
mo.observe(document, {childList: true, subtree: true});
console.log('onLoad')
}
function onTrueLoad() {
window.removeEventListener('load', onLoad);
console.log('onTrueLoad')
//mo.disconnect();
resolve();
}
});
}
waitForTrueLoad()

§
Pubblicato: 10/12/2020
Modificato: 10/12/2020
api.jikan.moe are not able to show a rewatch count.

api.jikan.moe says on the apiary document that they are able to show a rewatch count, I just don't know how...

That's because you tried to filter the rewatch value, not the rewatch count, they are 2 very different things. The value means how worth it was for the user to rewatch the anime,the rewatch count is how many times the user rewatched the anime, there's no filter to filter the list only by the rewatch count,that's why I needed to make/improve this script... A user can add the rewatch count,without needing to add the rewatch value...Or the opposite as your print shows...

What the scrape function of my script does is, it clicks on all "more info" buttons, and scrape that...

Maybe instead of scrolling the page down I could just fetch the load.json untill there are not 300 entries on it,meaning the script already fetched the full list, then I could loop another fetch request to every single anime id I got while I was fetching load.json... and fetch only the "more info" html content,and scrape that, but I'm not sure if this method would be faster than the method I'm using now...

var response,html,newDocument
async function AddFinishDate() //Add The Finished Date When Completed Is Selected
{ //Starts the async AddFinishDate function
response = await fetch("https://myanimelist.net/includes/ajax-no-auth.inc.php?t=6", {
"headers": {
"content-type": "application/x-www-form-urlencoded; charset=UTF-8",
},
"body": "color=1&id=2796&memId=4673502&type=anime&csrf_token=2f4bd443f98d7dd9f81f5474c91921b36c300f88",
"method": "POST"
}); //Finishes the fetch
//memId is the member id,open the user profile page and click on report, this will give you the url with the memberid,so I would also need to fetch once the user profile page
html = await response.text(); //Gets the fetch response
newDocument = new DOMParser().parseFromString(html, 'text/html'); //Parses the fetch response
newDocument.querySelector('body > table > tbody > tr > td > div > a > strong').innerText.split('<')[0]
} //Finishes the async AddFinishDate function
AddFinishDate()

§
Pubblicato: 11/12/2020
Modificato: 11/12/2020

api.jikan.moe says on the apiary document that they are able to show a rewatch count

Where it says that?

That's because you tried to filter the rewatch value, not the rewatch count, they are 2 very different things

Ok, I understood now.

Maybe instead of scrolling the page down I could just fetch the load.json

I'm not sure if this method would be faster than the method I'm using now

It would be better for sure.

At first, when you scrolling the website down, the page size grows and RAM consumption grows too.

At second, you will not be needed in "detecting when infinite scrolling website finished loading everything" at all, because now it is you handling the fetching process. I mean, when the website detects that user screen reached the end of the page, it performs a fetch of the list part: XHR.open(`https://myanimelist.net/animelist/abystoma2/load.json?&status=STATUS_VAR&offset=OFFSET_VAR`)

So when you realized it, now you just can do this by yourself. Your script's algorithm should be like this:

  1. A user presses a "Generate rewatched list" button
  2. The script notifies the user that the process may take some time, asking for a confirmation
  3. After a confirmation there should appear some modal window with a loading progress bar, for example as it looks in a Discord app. You can even make it accurate, below I will explain how. Also you should add a big shadow to the modal and block the page scroll for better UX: document.body.style.overflowY = 'hidden'
  4. Behind the scenes the script does next:
  • Every N seconds fetches a rewatch count values from an anime list
  • If the current anime list part is ended, load more from load.json?offset=PARSED_LIST_LENGTH and go to the previous step
  • Do the previous steps cycled untill the load.json returns empty list

.5. The PARSED_LIST of a rewatches should be ready. Now you can show it right on the page, or even better - download it to the user PC as html. Also, I recommend you to keep the every last parsed rewatches list in a Tampermonkey memory, because it is easy to lose it, and user have to wait again to return it.

§
Pubblicato: 11/12/2020

How to make a loading progress bar for this script

First you need to find out how many entries is in the anime list. If the website does not show that counter, you need to do this manually by fetching a load.json. Algorithm:

First we need to find the user animes list length:

  1. If a preloaded list that is already in the page is less than 300, it means the end of the list was already reached, because 300 is a chunk size of the list set by website in load.json. But if it's not:
  2. Fetch a load.json with offset of an average user list length. For example, 600
  3. If returned list is empty or less than 300, it means the list end was skipped or reached.
  4. Otherwise, if the returned list length is 300, it means the list is bigger than 600 entries (or equals, but we can't know) and you need to make another request
  5. What offset shift should be next - is up to you. I think, next you should load the chunks with an offset step 300 (600 > 900 > 1200...) and also remember them immediately, because you will need them later.

Your goal is to find a response with a list chunk less than 300, but more than 0. When you will get it, the user list size would be calculated by OFFSET + LAST_CHUNK_LENGTH. It would be a FULL_LIST_LENGTH variable. Let's say your progress bar is a rectangle with 800px max width. Therefore, one step of the progress bar would be equal 800 / FULL_LIST_LENGTH pixels. PROGRESS_STEP_IN_PX. So after each update of a rewatched list you need to add this variable to the progress rectangle width.

Also you can calculate estimated time too, but I am tired of explanations. I think you can handle it if you will want

§
Pubblicato: 11/12/2020
Modificato: 11/12/2020

@Konf

Copy this rewatch_value and press ctrl+f then ctrl+v on the jikan apiary website.

Would fetching something like 1300 times this https://myanimelist.net/includes/ajax-no-auth.inc.php?t=6 really be better than scrolling down? I'm not sure of it, usually when I do this amount of fetches the tab/my browser also freezes for a short period of time...

I can try to make the script work only fetching the load.json and this https://myanimelist.net/includes/ajax-no-auth.inc.php?t=6 but this will take a lot of time probably, because I will need to modify the whole script and the scrape() function

My script's algorithm is already exactly like you said...

I would like to learn more about how to do this thing you said "to keep the every last parsed rewatches list in a Tampermonkey memory". I never tried to use GM_listValues() before, but I probably need to learn how to use this, because I'm making another script similar to this one, that would be nice to have something like this GM_listValues() on the tampermonkey memory too.For now I can almost say that I don't know anything about GM_listValues()

My script already shows a loading image, but without a progress bar,my script works exactly like you said
But everything you said about "How to make a loading progress bar for this script" is just the 1 part, I can't show a progress bar if I don't know how many animes the user has in the user list, so I first need to get that information/var, then I know how many fetches I will do to this https://myanimelist.net/includes/ajax-no-auth.inc.php?t=6 that's when I can show a progress bar if I want to,but this is the least important thing, first of all I need to make the script work faster,then I can think about adding a progress bar.

This is how the progress bar would work from my point of view.
1 I get the total animes number in the user list by doing what you said (I'm already doing this)
2 Let's say the user has 13000 animes... I can do something like 1300*10 (10 secs for every fetch request I will do to https://myanimelist.net/includes/ajax-no-auth.inc.php?t=6 )
3 Convert the total secs to mins and display that to the user
4 Decrease the total secs every time the function runs...
5 do the math to convert the total secs to mins and display the new time left to the user
6 Loop steps 4-5

The problem of GM_listValues() is that I still have no idea on how I can display the list to the user, and display the output in an ascending list (most rewatched animes above) exactly like I'm doing now, (There's a print screen on the script page)

Pubblica risposta

Accedi per pubblicare una risposta.