THIS SCRIPT IS OBSOLETE! Please update to the new and improved version here. If you really want to keep this script, you can still download it, but keep in mind that it will no longer be updated.
Hit_scraper with hit export script added CUZ IT'S MORE CONVENIENT! Here's a few guides, one on mturkforum.com and one on mturkgrind.com. There is also a screencast video located here where I ramble on about hit scraper for a time, giving a good overview of all the functionality it offers.
Additionally, there is a script by clickhappier located here that uses scraper's blocklist to block hits on the regular mturk search results interface as well.
v2.0 Major Update
Please see below for updates past v2.0, including graphical changes and other features
I've been doing a lot of work (with copious help from clickhappier and others, and I figure enough's been done to go for a major version release. You'll find the changelog down below, but a major rundown of all features follows.
What is Hit Scraper WITH EXPORT and why should I download it?
Hit Scraper WITH EXPORT (hereafter referred to has HS) at its core is really just a different way of looking at mturk pages. Its purpose was to take the place of several other scripts people were using every day, and to make a unified, easy-to-understand interface that everyone can use with minimal training. That being said, HS still has a ton of features to enhance your turking and make your life a lot easier.
How do I use HS?
To use HS, you need to visit This URL. Bookmark it so you don't forget. If HS doesn't load right away, try refreshing a few times. If it still doesn't load, there might be an issue and I'll try to see if I can figure it out.
When you get to that page, you'll see the main interface This photo is actually very old. I will be redoing it at some point. This should be pre-populated with some default data...You can start going right away by clicking "Start", or you can customize it as shown below.
Option | Default | Description |
---|
|
|
|
Auto-refresh delay | 0 | How many seconds will elapse before the page starts scraping again. 0 is manual scrape only. EG 10 = scrape 10 seconds after the last scrape finished |
|
|
|
Pages to scrape | 3 | How many pages you want HS to look at. Default is 3 pages |
|
|
|
Correct for skips | No | If you have a lot of hits on your blocklist, you might end up blocking a lot of hits. "Correct for skips" will search additional pages to "fill up" your results. If correct for skips is off, it will ONLY search the number of pages you select in "pages to scrape" |
|
|
|
Minimum batch size | 100 (not specified) | For searching for batches. This does not matter unless you sort by most available. |
|
|
|
Minimum reward | None | Minimum dollar reward you want HS to show. EG 1 = don't show hits under $1; .2 = don't show hits under $0.20 |
|
|
|
Qualified | Yes if logged in, No if logged out | If yes, only show hits you're qualified for. If no, show all hits regardless of whether you qualify |
|
|
|
Masters Require | No | If yes, only show masters hits. If no, show all hits |
|
|
|
Masters Show | Show | If set to "Show", it will show both masters and non-masters hits (not applicable if you don't have masters and have "qualified" checked). If set to "hide", it will remove masters hits from the results |
|
|
|
Sort types | Latest | Latest sorts by time created, earliest first. Most available is by number of hits available, most first. Reward is by monetary reward, highest first. Title is alphabetical by title, A first |
|
|
|
Invert | No | Reverses the order of the sort type. Latest = oldest hits first; Most available = fewest hits available first; Reward = lowest reward first; Title = Z first |
|
|
|
New HIT Highlighting | 300 | Hits that are new to the scrape show up in bold. This number determines how long they will remain that way, in seconds. |
|
|
|
Sound on new hit | No | Play a sound when a new hit is discovered. The sound is only played once for each "screen" of new hits. For example, if two new hits are found in one scrape, the sound will play once. If one of the hits goes away, but the other remains, and it's still new based on the New HIT Highlighting number, the sound will not play because it already has. |
|
|
|
Ding | Ding | Which sound you want to hear, the old-style "Ding", or the new-style "Squee" best pony approved |
|
|
|
Sort by TO pay | No | Sorts hits by TO pay with lowest numbers on top, highest numbers on the bottom, and "No TO" requesters on the bottom most. When selected, you get the option to change sort ascend/descending. |
|
|
|
Sort by TO overall | No | Sorts hits based on Feihtality's TO calculations. They're wizardry, I'm not even really sure how it works. I think it takes all the TO categories, as well as #reviews and weightings into account. |
|
|
|
Min pay TO | None | Allows you to set a minimum "Pay" TO threshold (0-5). Any hits with a "Pay" TO below that threshold will be hidden. You can click on the "Show hits below TO threshold" button to see them. This button only appears if you're using this option. See important note below. |
|
|
|
Hide no To | No | Hides requesters who do not have a TO (not recommended) See important note below. |
|
|
|
Disable TO | No | Turns off TO checking altogether, TO Pay column will report "TO Disabled". Used for when TO is blocked, should speed it up a bit by not querying the TO server. This will invalidate any other TO configurations. |
|
|
|
Display export buttons | No | Shows/hides different export buttons. If, for example, you only wish to export to IRC, you can only select that button, and it will hide the rest. Note that, if logged out, VB is disabled regardless of whether it is selected or not. See below for an explanation. |
|
|
|
Search Terms | None | Allows you to search mturk for given terms. This is the same as searching the mturk interface. All results will contain one or more of your terms. |
|
|
|
Restrict to includelist | No | Allows you to only show requesters on your "include list". You must have an include list set before using this option or you will get no results. It will do normal searches, but any requester NOT on your include list will be ignored. |
|
|
|
Hide blocklisted | Yes | Enables/disables the blocklist. If you are not using the blocklist, any hits that WOULD have been blocked are outlined in red. |
|
|
|
Highlight Includelisted | No | Adds a highlight to any requester on your include list even when "use includelist" not checked. |
|
|
|
Start | Button | Starts scraping |
|
|
|
Hide Panel | Button | Hides everything above the buttons to give you more room. It's a toggle, so clicking it once will hide, once will show. |
|
|
|
Edit Blocklist | Button | Opens the blocklist for manual editing if you need to remove a name or something. Blocklist and include list items are delimited by the ^ symbol. |
|
|
|
Edit Includelist | Button | Opens the include list for manual editing to add or remove requesters. Blocklist and include list are delimited by the ^ symbol |
|
|
|
Edit Current Theme | Button | Opens the theme editor on the right. Mouseover each box to see what it refers to, click to get a color selector. Also, click "revert to default" to go back to default settings (a rescrape may be required) |
|
|
|
Settings | Button | Change the way the TO calculations work, enable/disable the blocklist wildcard functionality. |
|
|
|
Show TO-hidden hits | Hidden Button | See Min pay TO |
|
|
|
Stopped | Status message | Shows you the status of hit scraper, if it's stopped, scraping, running, waiting, etc |
|
|
|
Status messages | Status message | Very "dumb" status message indicator attempting to shed some light into why some things work and others don't...Also why hit scraper's doing something it "shouldn't be". |
|
|
|
Some of the elements in the settings list have informative mouseover text as well.
The hit table comes under the status information. It's laid out like so:
Column | Links to | Description | Mouseover |
---|
|
|
|
|
Requester | Requester Page | Shows the requester name and links to their page. R and T buttons allow for blocking Requester and Title respectively | None |
|
|
|
|
Title | Hit preview page OR requester page | Shows the hit preview page if one can be created/viewed, OR the requester page if one cannot. Will note if the requester link is substituted. VB and IRC buttons open the hit exporter for forums and IRC respectively | Description of hit |
|
|
|
|
Reward | None | Shows how much the hit pays | None |
|
|
|
|
HITs Available | None | Shows how many hits are available at the time the page was scraped | None |
|
|
|
|
TO pay | Requester TO page | Shows the TO value for "pay" for that requester | Shows all TO ratings, number of reviews, and number of TOS flags for that requester |
|
|
|
|
Accept HIT | Requester Preview and Accept (PANDA) page OR requester page | Similarly to the "title", it shows the panda link OR the requester page. See "title" to know if the requester link is substituted | None |
|
|
|
|
M? | None | N means a non-masters hit, Y means a masters hit | Shows all qualifications for the hit |
|
|
|
|
R | HitDB search for requester OR nothing | If green, you've done a hit that matches that requester name, click it to view. If red, you haven't, and clicking does nothing | None |
|
|
|
|
T | HitDB search for title OR nothing | If green, you've done a hit that matches that title, click it to view. If red, you haven't, and clicking does nothing | None |
|
|
|
|
Not Qualified | None | Shows hits you are not qualified for. Only shows up if there are non-qual'd hits | None |
|
|
|
|
IMPORTANT NOTE
Past v2.0 updates: Hit scraper has gotten a MASSIVE makeover thanks to feihtality and his CSS wizardry, as well has his coding expertise. He (and click) have done a fantastic job helping me segue these changes into MY version of hit scraper to give them to all of you!
First of all, there's a big graphical update with theming. The themes are accessed via the square on the upper right, just mouseove to see the theme presets. You can also click the "Edit Theme" button to set your own color values, mouseover to see what each box does, and click in it to get a color picker to change colors on the fly.
Secondly, there's been some updates to how TO works. Feihtality did some major overhaul on how TO is calculated, how sorting works, etc. There was a lot of backend work done that doesn't affect everyone, but just understand TO may work differently than you're expecting. You can change TO calculations etc. back to normal in the "Settings" menu button. You can also use that to enable wildcard-based blocklisting, but I haven't really tested that (other than making sure it saved).
Finally, the logged out scraper has been castrated because of Mturk's changes. You can no longer get requester ID's or #hits available anymore, so TO doesn't work, VB export doesn't work, and IRC/HWTF export have severely limited functionality.
v2.3.7.4-2.4.7: Various updates, graphical and non. See above for an abbreviated list.
v2.4.7.1: Fixed a typo, new hit counter in the status area should work again
v2.4.7.2: Changed ns4t function, should be faster now.
v2.4.7.3: Added a plugin on ns4t.net to allow for bulk URL processing. What this means that, instead of 4 individual server calls when doing IRC export, you only have one. Cuts time down, reduces load on ns4t, everyone wins! This is still in testing, so if you have any problems with it please let me know.
v2.4.7.4: Fixed for logged out scrapers
v2.4.7.5/6: Fixed an issue with requester ID's and the ns4t bulk shortening method.
v2.4.7.7: Updated to work with the new hitDB, refactored the hitDB search sections.
v2.4.7.8: Commented out line 2147 to fix logged out scraper
v2.4.8: Added PANDA to default vbulletin template