How to make multiple web.archive.org/save requests at once?

hacker09

§

Posted: 2023-12-02

Edited: 2023-12-02

Report comment

I have read their old API documentation, searched online, and also analyzed a couple of other scripts that do the same thing, but they all only make a single fetch request, not multiple ones like I am trying to do. Is there a way to do this, or an API limitation I don't know of? Does the "concurrent captures limit (limit=3)", mean that I can only save 3 pages per minute?

Below is what I found out about their API.

Anonymous users have lower concurrent captures limit (limit=3) compared to authenticated users (limit=5). The limit of daily captures for anonymous users is 5k. The size of screenshots is limited to 4MB. Bigger screenshots are not allowed due to system overload. If a target site returns HTTP status=529 (bandwidth exceeded), we pause crawling that for an hour. If a target site returns HTTP status=429 (too many requests), we pause crawling that for a minute. All requests for the same host in that period get a relevant error message. Previously, we started these captures later, adding a delay of 20-30sec.

    //Test URL https://myanimelist.net/profile/hacker09
    //fix HTTP 429 Too Many Requests
    async function SaveToIA() {
      const urls = [ //fix HTTP 429 Too Many Requests
        `profile/${username}/clubs`,
        `profile/${username}https://myanimelist.net/profile/${username}https://myanimelist.net/profile/${username}/stacks`,
        `profile/${username}https://myanimelist.net/profile/${username}/reviews`,
        `profile/${username}/friends`,
        `profile/${username}/statistics`,
        `profile/${username}/recommendations`,
        `blog/${username}`,
        `history/${username}`,
        `history/${username}/anime`,
        `history/${username}/manga`,
        `animelist/${username}?status=1`,
        `animelist/${username}?status=2`,
        `animelist/${username}?status=3`,
        `animelist/${username}?status=4`,
        `animelist/${username}?status=6`,
        `mangalist/${username}?status=1`,
        `mangalist/${username}?status=2`,
        `mangalist/${username}?status=3`,
        `mangalist/${username}?status=4`,
        `mangalist/${username}?status=6`,
        `forum/search?u=${username}&q=&uloc=1&loc=-1`,
        `forum/search?cat=forum&q=&u=${username}&uloc=2&loc=-1`
  ];
      const SaveMALPages = await Promise.all(urls.map(async (url, index) => {
        return new Promise(resolve => {
          setTimeout(() => {
              console.log('request sent Pages Fetched' + index+1)
            GM.xmlHttpRequest({
              url: `https://web.archive.org/save/${location.host}/`+url,
              headers: {
                "content-type": "application/x-www-form-urlencoded"
              },
              method: 'GET',
              onload: function(response) {
              console.log('request made Pages Fetched' + index+1)
                if (response.status === 200) {
                  console.table([{'Pages Fetched': index+1, "Archived page!": `https://web.archive.org/save/${location.host}/`+url, "Saved in": response.finalUrl}])
                } else {
                  console.table([{'Pages Fetched': index+1, "Archiving failed for page": `https://web.archive.org/save/${location.host}/`+url, "Status": response.status}]);
                  if (response.status === 429) {
                    console.error('Cool down! The I.A. is being time rate limited!!')
                  }
                }
                resolve(response.finalUrl);
              }
            });
          }, index * 10000);
        });
      }));

      await fetch(`https://api.allorigins.win/raw?url=https://anime.plus/${username}/queue-add`);
      await fetch(`https://api.allorigins.win/raw?url=https://www.mal-badges.com/users/${username}/update`);

      const profileTextResponse = await (await fetch('https://myanimelist.net/editprofile.php')).text();
      const profileDocument = new DOMParser().parseFromString(profileTextResponse, 'text/html');
      GM_setValue("ProfileBBCodes", profileDocument.querySelectorAll("textarea")[1].value);
      console.log('program complete'); //close(); //Close the actual tab
    }

SaveToIA();

𝖢𝖸 𝖥𝗎𝗇𝗀

§

Posted: 2023-12-02

Edited: 2023-12-02

Report comment

refer

𝖢𝖸 𝖥𝗎𝗇𝗀

§

Posted: 2023-12-02

Edited: 2023-12-02

Report comment

Google Docs for regstuff/Wayback Machine SPN2 API Docs

You can also refer this one which is doing the same thing as you. But you need to use translator to read it. https://qiita.com/yuki_2020/items/73307ddb2d286d79a5a9

hacker09

§

Posted: 2023-12-02

Report comment

@𝖢𝖸_𝖥𝗎𝗇𝗀

Well I am pretty sure that it would capture way too much trash that I don't care about and it would make the program take much longer to complete too, so I don't think that it would work.

Greasy Fork

How to make multiple web.archive.org/save requests at once?

Post reply