HIT Scraper WITH EXPORT

Snag HITs.

目前为 2015-02-22 提交的版本。查看 最新版本

作者
Tjololo
评分
0 0 0
版本
1.6
创建于
2014-06-02
更新于
2015-02-22
大小
99.4 KB
许可证
暂无
适用于

Hit_scraper with hit export script added CUZ IT'S MORE CONVENIENT! Here's a few guides, one on mturkforum.com and one on mturkgrind.com

v1.6 update!

This one is more UI changes than anything else, but I added in some functionality people have been foaming at the bit over ;) Also got something working that's sure to excite the masses!!! See below!

Features Added:

  • Ponified the "ding" (most important)
  • Changed "Sort Types" to a dropdown instead of radio buttons
  • Split "masters" into "require" and "show". Require will require that all hits will be Masters (same as checking the box on mturk). "Show" will elect to show masters hits or hide them if they come up in the search (for when logged out, thanks to Kerek for the suggestion)
  • Added "hide" button to hide the interface (thanks to Kerek for the code and suggestion)
  • Added a very preliminary sort by TO, due to extremely popular demand. See below for notes
  • Added a very preliminary minimum TO, due to extremely popular demand. See below for notes.

Notes:
For the "sort by TO", again it's very preliminary. It will sort the table as the TO results come in, which can result in the table changing after it's been populated if your system is slow. Keep that in mind, there's no way around that if you want to sort by TO.

It also places the requesters with no TO data on the bottom. I couldn't put them on the top for some reason, so they're down there.

For the "Minimum TO": It operates as you'd expect. Put in a number between 0 and 5, and it will remove all items which are below that number, except for "No data" or "TO down", which are not removed. If you want to see the items again, click the "Show hits below TO threshold" button. That will bring them back, but won't hide them again. I didn't think that would be necessary.

There are still apparently some issues with items duplicating and such, I can't seem to duplicate these issues so I can't test for them. They're fringe cases regardless as far as I can tell. They shouldn't really matter to the majority of people, so I'll solve them as they come up but I'm not gonna spend a huge amount of time troubleshooting.

Hopefully this update works! Happy turking!


v1.5 update!

This one isn't as big, but I felt it warranted a minor release because there was some updated functionality.

Features added:

  • Added a new column, "M?", which shows if hits are Masters or not (more useful for those of us with Masters, but useful for both)
  • Added qual listing, mouseover the "M?" column to see the quals for that hit.
  • Added TO listing, mouseover TO link to see all the ratings
  • Minor tweaking, some verbiage fixes, stuff you probably won't see/notice

v1.5.1: Fixed "notqualified" link processing
v1.5.2: Hopefully fixed the firefox issue, changed the way values are stored/recalled. This will have the unfortunate effect of clearing everyone's blocklist, but hopefully this will not change in the future.
v1.5.3: Fixed storage to properly handle requesters with commas in their names, I didn't realize.
v1.5.4: Updated to fix it not scraping when you're logged out, because apparently that's a thing people do.
v1.5.5: Updated to fix 1.5.4 again, because Amazon changed the way links work. Now, if you're not qualified (IE logged out), clicking the title (and/or exporting) should give you a link to the requester page, instead of a non-functioning link.
v1.5.6: Fixed 1.5.5 again, hopefully now it'll work when you're logged out again. Made it so "qualified" is not default when not logged in, hopefully fixed "duplicate" issue (where a hit will sometimes show up twice)
v1.5.7: https://www.youtube.com/watch?v=ipADNlW7yBM
v1.5.9: added in IRC exporter based on clickhappier's and cristo's work, changed colors of M? columns to make a little more sense.

TODO:

  • Add in functionality to remove hits that match certain TO criteria
  • Add in sort by TO functionality
  • Add in "max reward" functionality

These are going to be rough, as they involve editing the list after it's been populated, and I have trouble with that. Also, expect bug fixes/subreleases in the near future with more enhancements/fixes.

v1.4 update!

Big update here, I added a lot of cosmetic and other functionality to it. Let me know if there are any issues, new functions will probably be buggy until taken care of.

Features added:

  • Ability to block by title and requester (so you can block individual hits you've done)
  • Ability to view only certain requesters with Include list (Must add requesters to list individually for the moment, if there's a desire I'll add in a button like the blocklist)
  • Ability to make scraper make a "ding" noise when it finds new work.
  • Tied in with HitDB so clicking the R/T at the end will show you the work you've done for that requester (only for green items, might not work on firefox)
  • Added A-Z sort
  • Added inverse sort
  • Added checkbox for "Correct For Skips" (mouseover the checkbox to see what it does, or try it out! On by default, will change to off by default if necessary).

Cosmetics:

  • Re-organized a bit of the header section with some | characters to separate things
  • Added some helpful "status" messages to explain some things a bit (IE why it's scraping more than the pages you told it to)
  • Moved the status messages to below the header

Subtle:

  • Made it pull the blocklist every time you run it so you can have multiple instances and they'll work together properly.

TODO:

  • Save everything to localstorage so you won't have to set it up individually each time
  • Add capability for multiple export templates (so you can have one scraper for a bunch of sites)
  • Make it easier to theme (Add a table with colors you can edit and such)

v1.4.1 changes:

  • Initial themeing support. Put all the color values up at the top of the code, with descriptors, so they can be changed easily

v1.4.2 changes:
Nothing really. Just a bit for some of our...friends...You shouldn't see anything different really.

v1.4.3 changes:
Reverted v1.4.2

v1.4.4 changes:
Added another descriptive status message.

Older update logs:

Updated to fix an issue with the export not getting the proper quals for the proper hit.

Updated so it wouldn't clobber the normal hit export script

Updated to fix a bug, and now the requester list is case insensitive.

Added description as mouseover text for title link. Hold the mouse over the title to see it.

1.3.0.10: Added ability to block requesters dynamically, and revert to the blocklist set in the code. Default blocklist contains:
"oscar smith", "Diamond Tip Research LLC", "jonathon weber", "jerry torres", "Crowdsource", "we-pay-you-fast", "turk experiment", "jon brelig"

To clear any of those from the default, just remove them from the code (line 18, remove the " marks and comma as well). To add a requester to the block list, click the "BLOCK" button next to their name. To reset to default, click "Reset blocklist" at the top.

1.3.0.11: Added a line (line 24) to change the hit export to text symbol to whatever you'd like.

1.3.0.12: Changed such that the "reset blocklist" is now a confirm dialog in case you misclick.

1.3.0.13: Updated an error with no TO hits.

1.3.0.14: Initial method of editing the existing blocklist to add/remove requesters manually. I'd like a better way of doing it, expect that to be coming.

1.3.0.15: Added "hits available" to default template per request.

1.3.1.0: Major release because of all the changes so far. This one has logical updating of the block list. What's that mean? It means when you click "Edit Blocklist" you'll get a textarea you type in. Remove requesters, add requesters, whatever you'd like. Then just click save and it saves.

1.3.1.1: Updated with Miku's new API link.

1.3.1.2: Fixed correct for skips to accurately reflect the pages you select.