Discussions » Greasy Fork Feedback

Deleting old script versions?

§
Posted: 2017-01-27

Deleting old script versions?

Greasy Fork currently keeps all versions of all scripts, unless deleted by the author. This will eventually use all the storage space of the known universe. I'm thinking of putting in a process to trim this data.

I was thinking of something along the lines of: if a script version is more than X old, and there are more than Y newer versions of that script, then it is eligible for deletion. Possibly X = 2 years and Y = 5.

Any thoughts?

wOxxOmMod
§
Posted: 2017-01-27

Sounds good. The alternative is kinda hard to implement: store only diffs for the old versions, store the text compressed using gzip- or lz-like library.

q1k
§
Posted: 2017-01-27

It's one of the things I like about greasyfork, I can go back to any version at any time. Besides I don't think they need that much space.. Scripts are basically just text.

I like woxxom's idea much better, store in a compressed archive. ?

§
Posted: 2017-01-28

The code is stored in a compressed table, which brings 18GB of text down to 5GB.

§
Posted: 2017-01-29

if a script version is more than X old, and there are more than Y newer versions of that script, then it is eligible for deletion. Possibly X = 2 years and Y = 5.

I don't think this method works, unless there are many scripts with many versions created 2 years ago.

§
Posted: 2017-01-30

Why not compress anything after two years old in archive format with first letter. Then offer it up for download if someone wants an older version. Of course they will be download much more than they want.

Or fine tune it a bit more and just do monthly archives. Anything older than two months will be removed from site and only accessed through the archive. Anyone interested in a specific version can easily know which one they need to download since the date is on the version.

§
Posted: 2017-01-31
I don't think this method works, unless there are many scripts with many versions created 2 years ago.

It's not so much a problem right now, but the size will just keep growing.

Why not compress anything after two years old in archive format with first letter.

Because everything is already compressed and so this would actually end up using more space?

wOxxOmMod
§
Posted: 2017-01-31
Edited: 2017-01-31

Can you plug in git? It stores only the deltas so 1000 slightly changed versions of a script would occupy like 10+ times less space than a compressed table or a 7z/xz archive. This will probably require scheduling a git gc to repack frequently changed histories.

§
Posted: 2017-01-31

Do you have any evidence to back up the idea that it would take less space in git? While slightly different versions of a script could be stored efficiently with deltas, it seems like regular compression would do a real good job, too.

wOxxOmMod
§
Posted: 2017-01-31
Edited: 2017-01-31

Do you have any evidence to back up the idea that it would take less space in git?

Well, duh, this is like 2+2. Just look at any huge repo e.g. chromium: 2GB of code with only 6GB git data for the whopping 665,000 commits. On the average it's 9kB per commit (compression ratio ~222,000x).

Your table is compressed at only 3.5x (18GB of text down to 5GB), which is typical for a single code file. Or does it contain all script versions? But even so, chromium repo has 9 times more files (245k files vs 27k scripts on GF) and arguably as many times more versions (commits) while its size is about the same as the GF's table.

7zip/xz-like utilities can compress n files using the same dictionary thus compressing O(n) times better in case of slightly changed files. Obviously, this is not what happens with your greasyfork data.

Git repos are somewhere in the middle because git additionally stores an index/whatnot for efficient access.

P.S. What is the uncompressed size of all current script versions on GF? And what is the total number of old versions stored in the table? Comparing these numbers to the chromium git repo would provide a quantifiable answer to your question I guess.

§
Posted: 2017-02-01
Just look at any huge repo e.g. chromium: 2GB of code with only 6GB git data for the whopping 665,000 commits. On the average it's 9kB per commit (compression ratio ~222,000x).

Your table is compressed at only 3.5x (18GB of text down to 5GB)

You're using two separate calculations. If you compare full-size-all-versions vs. compressed-size-all-versions, then Greasy Fork is compressed at 3.5x and Chromium is compressed at 3x.

There are:
- 24,676 scripts
- 164,206 separate versions posted
- 192,101 code entries (2 per version, but shared when unchanged)
- Total code size: 18GB

Only counting the latest version of every script:
- There are 29,893 code entries
- Total code size: 750MB

wOxxOmMod
§
Posted: 2017-02-01

My point was comparing the relative difference like this:

... files current size versions packed size
chromium 245k 2200MB 665k * avg 6GB
greasyfork 30k 750MB 192k 5GB

avg is the average number of files per commit (counting the actual average could take hours), which is arguably bigger than 2, but even if it's just 1, GF loses by (2200/750) * (665/192) * (5/6) = 8x. Now multiply it by at least 2 and you'll get the picture.

§
Posted: 2017-02-02

Keep like 3/5 versions and ensure data are like, 3 months or more old?

§
Posted: 2017-02-02
Edited: 2017-02-02

Deleting some version history of a popular script could be problematic, so I think it would make more sense to also check the daily installs of the script.
So something like:
If a script is more than X years old, and the script has less than Y daily installs, then it is eligible to delete all past versions excluding the most recent.
Maybe X = 2 and Y = 30?

In addition, any scripts that are older than 3 or 4 years probably don't even work on the most recent versions of web browsers. So in this case, you could probably delete the entire script as long as the daily installs are less than Y.

I'm not sure how practical this is based on your server setup or how much of a difference in memory this would make, but you could also delete accounts that haven't been active in a certain number of years.

§
Posted: 2017-02-03

Gigabytes go now on the price of air, but if air is too expensive we have to limit its consumption :smiley:

§
Posted: 2017-02-03

Do openuserjs.org delete scripts?

§
Posted: 2017-02-03

@ you are not the wrong address?

Do openuserjs.org delete scripts?
§
Posted: 2017-02-10
Edited: 2017-02-10

So an update here... I discovered that I am inept.

1. Script codes were not deleted when the associated scripts were permanently deleted.
2. Every time a new version was created, it would create double the number of script codes necessary.
3. The production database was not set to support MySQL COMPRESSED tables, so it was not actually compressed.

I've fixed all these issues and deleted the extra script codes that were not used by anything. This has greatly reduced the amount of space needed to store the data, so any change can be put off for a long while.

The 18GB/5GB numbers were based off my local version, where 1 and 2 were problems but not 3. So the actual numbers for storing code are now 13.5GB, 3.5GB compressed.

I still want to look at the relative storage requirements between a git repo and a MySQL COMPRESSED table.

§
Posted: 2017-02-10

Is your server code open source? If so, it would allow the community to find bugs like these quite a bit easier.

§
Posted: 2017-02-18

Do openuserjs.org delete scripts?

oujs doesn't haven't any previous version support on the website, you need to use connect it to github if you want old versions saved. I prefer the simple way this site handles it (like the old userscripts.org).

Post reply

Sign in to post a reply.