HIT Scraper WITH EXPORT Legacy

Snag HITs.

Author
Tjololo
Daily installs
0
Total installs
30,112
Ratings
28 0 1
Version
2.4.9
Created
2014-06-02
Updated
2016-03-10
Size
252 KB
License
N/A
Applies to

THIS SCRIPT IS OBSOLETE! Please update to the new and improved version here. If you really want to keep this script, you can still download it, but keep in mind that it will no longer be updated.

Hit_scraper with hit export script added CUZ IT'S MORE CONVENIENT! Here's a few guides, one on mturkforum.com and one on mturkgrind.com. There is also a screencast video located here where I ramble on about hit scraper for a time, giving a good overview of all the functionality it offers.

Additionally, there is a script by clickhappier located here that uses scraper's blocklist to block hits on the regular mturk search results interface as well.

v2.0 Major Update

Please see below for updates past v2.0, including graphical changes and other features
I've been doing a lot of work (with copious help from clickhappier and others, and I figure enough's been done to go for a major version release. You'll find the changelog down below, but a major rundown of all features follows.

What is Hit Scraper WITH EXPORT and why should I download it?
Hit Scraper WITH EXPORT (hereafter referred to has HS) at its core is really just a different way of looking at mturk pages. Its purpose was to take the place of several other scripts people were using every day, and to make a unified, easy-to-understand interface that everyone can use with minimal training. That being said, HS still has a ton of features to enhance your turking and make your life a lot easier.

How do I use HS?
To use HS, you need to visit This URL. Bookmark it so you don't forget. If HS doesn't load right away, try refreshing a few times. If it still doesn't load, there might be an issue and I'll try to see if I can figure it out.

When you get to that page, you'll see the main interface This photo is actually very old. I will be redoing it at some point. This should be pre-populated with some default data...You can start going right away by clicking "Start", or you can customize it as shown below.


OptionDefaultDescription



Auto-refresh delay0How many seconds will elapse before the page starts scraping again. 0 is manual scrape only. EG 10 = scrape 10 seconds after the last scrape finished



Pages to scrape3How many pages you want HS to look at. Default is 3 pages



Correct for skipsNoIf you have a lot of hits on your blocklist, you might end up blocking a lot of hits. "Correct for skips" will search additional pages to "fill up" your results. If correct for skips is off, it will ONLY search the number of pages you select in "pages to scrape"



Minimum batch size100 (not specified)For searching for batches. This does not matter unless you sort by most available.



Minimum rewardNoneMinimum dollar reward you want HS to show. EG 1 = don't show hits under $1; .2 = don't show hits under $0.20



QualifiedYes if logged in, No if logged outIf yes, only show hits you're qualified for. If no, show all hits regardless of whether you qualify



Masters RequireNoIf yes, only show masters hits. If no, show all hits



Masters ShowShowIf set to "Show", it will show both masters and non-masters hits (not applicable if you don't have masters and have "qualified" checked). If set to "hide", it will remove masters hits from the results



Sort typesLatestLatest sorts by time created, earliest first. Most available is by number of hits available, most first. Reward is by monetary reward, highest first. Title is alphabetical by title, A first



InvertNoReverses the order of the sort type. Latest = oldest hits first; Most available = fewest hits available first; Reward = lowest reward first; Title = Z first



New HIT Highlighting300Hits that are new to the scrape show up in bold. This number determines how long they will remain that way, in seconds.



Sound on new hitNoPlay a sound when a new hit is discovered. The sound is only played once for each "screen" of new hits. For example, if two new hits are found in one scrape, the sound will play once. If one of the hits goes away, but the other remains, and it's still new based on the New HIT Highlighting number, the sound will not play because it already has.



DingDingWhich sound you want to hear, the old-style "Ding", or the new-style "Squee" best pony approved



Sort by TO payNoSorts hits by TO pay with lowest numbers on top, highest numbers on the bottom, and "No TO" requesters on the bottom most. When selected, you get the option to change sort ascend/descending.



Sort by TO overallNoSorts hits based on Feihtality's TO calculations. They're wizardry, I'm not even really sure how it works. I think it takes all the TO categories, as well as #reviews and weightings into account.



Min pay TONoneAllows you to set a minimum "Pay" TO threshold (0-5). Any hits with a "Pay" TO below that threshold will be hidden. You can click on the "Show hits below TO threshold" button to see them. This button only appears if you're using this option. See important note below.



Hide no ToNoHides requesters who do not have a TO (not recommended) See important note below.



Disable TONoTurns off TO checking altogether, TO Pay column will report "TO Disabled". Used for when TO is blocked, should speed it up a bit by not querying the TO server. This will invalidate any other TO configurations.



Display export buttonsNoShows/hides different export buttons. If, for example, you only wish to export to IRC, you can only select that button, and it will hide the rest. Note that, if logged out, VB is disabled regardless of whether it is selected or not. See below for an explanation.



Search TermsNoneAllows you to search mturk for given terms. This is the same as searching the mturk interface. All results will contain one or more of your terms.



Restrict to includelistNoAllows you to only show requesters on your "include list". You must have an include list set before using this option or you will get no results. It will do normal searches, but any requester NOT on your include list will be ignored.



Hide blocklistedYesEnables/disables the blocklist. If you are not using the blocklist, any hits that WOULD have been blocked are outlined in red.



Highlight IncludelistedNoAdds a highlight to any requester on your include list even when "use includelist" not checked.



StartButtonStarts scraping



Hide PanelButtonHides everything above the buttons to give you more room. It's a toggle, so clicking it once will hide, once will show.



Edit BlocklistButtonOpens the blocklist for manual editing if you need to remove a name or something. Blocklist and include list items are delimited by the ^ symbol.



Edit IncludelistButtonOpens the include list for manual editing to add or remove requesters. Blocklist and include list are delimited by the ^ symbol



Edit Current ThemeButtonOpens the theme editor on the right. Mouseover each box to see what it refers to, click to get a color selector. Also, click "revert to default" to go back to default settings (a rescrape may be required)



SettingsButtonChange the way the TO calculations work, enable/disable the blocklist wildcard functionality.



Show TO-hidden hitsHidden ButtonSee Min pay TO



StoppedStatus messageShows you the status of hit scraper, if it's stopped, scraping, running, waiting, etc



Status messagesStatus messageVery "dumb" status message indicator attempting to shed some light into why some things work and others don't...Also why hit scraper's doing something it "shouldn't be".




Some of the elements in the settings list have informative mouseover text as well.

The hit table comes under the status information. It's laid out like so:


ColumnLinks toDescriptionMouseover




RequesterRequester PageShows the requester name and links to their page. R and T buttons allow for blocking Requester and Title respectivelyNone




TitleHit preview page OR requester pageShows the hit preview page if one can be created/viewed, OR the requester page if one cannot. Will note if the requester link is substituted. VB and IRC buttons open the hit exporter for forums and IRC respectivelyDescription of hit




RewardNoneShows how much the hit paysNone




HITs AvailableNoneShows how many hits are available at the time the page was scrapedNone




TO payRequester TO pageShows the TO value for "pay" for that requesterShows all TO ratings, number of reviews, and number of TOS flags for that requester




Accept HITRequester Preview and Accept (PANDA) page OR requester pageSimilarly to the "title", it shows the panda link OR the requester page. See "title" to know if the requester link is substitutedNone




M?NoneN means a non-masters hit, Y means a masters hitShows all qualifications for the hit




RHitDB search for requester OR nothingIf green, you've done a hit that matches that requester name, click it to view. If red, you haven't, and clicking does nothingNone




THitDB search for title OR nothingIf green, you've done a hit that matches that title, click it to view. If red, you haven't, and clicking does nothingNone




Not QualifiedNoneShows hits you are not qualified for. Only shows up if there are non-qual'd hitsNone





IMPORTANT NOTE

Past v2.0 updates: Hit scraper has gotten a MASSIVE makeover thanks to feihtality and his CSS wizardry, as well has his coding expertise. He (and click) have done a fantastic job helping me segue these changes into MY version of hit scraper to give them to all of you!

First of all, there's a big graphical update with theming. The themes are accessed via the square on the upper right, just mouseove to see the theme presets. You can also click the "Edit Theme" button to set your own color values, mouseover to see what each box does, and click in it to get a color picker to change colors on the fly.

Secondly, there's been some updates to how TO works. Feihtality did some major overhaul on how TO is calculated, how sorting works, etc. There was a lot of backend work done that doesn't affect everyone, but just understand TO may work differently than you're expecting. You can change TO calculations etc. back to normal in the "Settings" menu button. You can also use that to enable wildcard-based blocklisting, but I haven't really tested that (other than making sure it saved).

Finally, the logged out scraper has been castrated because of Mturk's changes. You can no longer get requester ID's or #hits available anymore, so TO doesn't work, VB export doesn't work, and IRC/HWTF export have severely limited functionality.

v2.3.7.4-2.4.7: Various updates, graphical and non. See above for an abbreviated list.
v2.4.7.1: Fixed a typo, new hit counter in the status area should work again
v2.4.7.2: Changed ns4t function, should be faster now.
v2.4.7.3: Added a plugin on ns4t.net to allow for bulk URL processing. What this means that, instead of 4 individual server calls when doing IRC export, you only have one. Cuts time down, reduces load on ns4t, everyone wins! This is still in testing, so if you have any problems with it please let me know.
v2.4.7.4: Fixed for logged out scrapers
v2.4.7.5/6: Fixed an issue with requester ID's and the ns4t bulk shortening method.
v2.4.7.7: Updated to work with the new hitDB, refactored the hitDB search sections.
v2.4.7.8: Commented out line 2147 to fix logged out scraper
v2.4.8: Added PANDA to default vbulletin template