HIT Scraper WITH EXPORT

Snag HITs. mturk.

Author
feihtality
Daily installs
0
Total installs
54,910
Ratings
51 5 4
Version
4.1.4
Created
2015-06-24
Updated
2017-12-22
License
N/A
Applies to

worker sub-domain compatible

To use the script once installed, attach hit_scraper or hit-scraper or hitscraper to the path of any mturk URL. Any the following URLs will work and are equally valid for the purposes of initializing the script:

  https://worker.mturk.com/hitScraper
  https://worker.mturk.com/hit_scraper
  https://www.mturk.com/hit-scraper
  https://www.mturk.com/mturk/findhits?match=true&hit_scraper

User Guide


Understanding the Interface


The top section with all the various search settings and options is internally called the Control Panel. This is filled with options that users may want to change more frequently--on a per search/scrape basis--than the items in the Settings Panel (accessed through the Settings button).

Control Panel options

Auto-refresh delayThis controls how often (in seconds) a scrape will automatically be run. Setting this to 0 will force the scraper into manual mode, turning off automatic scraping.


Pages to scrapeSets the minimum threshold for number of pages to retrieve.


Correct for skipsIf more than 66% of HITs are blocked by the blocklist, an additional page will be added until the number of blocked HITs is less than 66% of the total results.


Results per pageControls the number of results retrieved per page. It has a maximum of 100. It is typically better to increase the number results per page rather than increasing the number of pages to scrape.


Minimum rewardSets a minimum pay threshold.


QualifiedLimits results to only HITs for which you are qualified.


Masters OnlyLimits results to only HITs that require the Masters qualification.


Hide MastersFilters out HITs that require the Masters qualification while keeping all other HITs for which you may not be qualified. This setting is mutually exclusive with the Qualified setting. If both are selected, the Qualified setting will take precedence.


Hide InfeasibleFilters out HITs with qualifications you can neither request nor take a test to obtain. Useful for filtering out location based qualifications

Minimum batch sizeSets a threshold for number of HITs per HIT group. All HIT groups which contain fewer HITs than specified will be filtered out. This setting only applies when the Search by option is set to Most Available.
  • Global
Forces the Minimum batch size value to apply to all search options, not only Most Available.


New HIT highlightingSets the amount of time (in seconds) for which new HITs will be highlighted. Highlighted HITs will be emboldened and appear in larger font. Their cells in the results table will also be outlined in a white, dotted line which is more prominent on some themes than others.


Sound on new HITWhen new HITs are found, play an audio alert. There are two options--Ding and Squee.


Disable TOSkip directly to displaying the scrape results without retrieving Turkopticon data.


Search byControls the method by which to query HITs from mturk.
  • Latest - HIT creation date (newest first)
  • Most Available - number of HITs available (most first)
  • Reward - reward amount (most first)
  • Title - alphabetical by title (A-Z)
  • Invert
inverts the ordering of the above selection


Min pay TOSets a threshold on requesters' Turkopticon pay rating and hides all results with requesters below the specified value. Their visibility can be toggled via the Toggle Ignored HITs button.
Note: Requesters that have not been rated will not be affected by this setting.


Hide no TOHides all results from requesters that have no reviews on Turkopticon. Their visibility can be toggled via the Toggle Ignored HITs button.


Sort by TO paySort the results by Turkopticon pay rating.


Sort by overall TOSorts the results by overall Turkopticon ratings.


Search TermsSearch terms to search for specific HITs or requesters


Hide blocklistedHide all results which trigger a match against the blocklist.


Restrict to includelistHides all results that do not trigger a match against the includelist. If the inludelist is empty, all results will be blocked.


Highlight inludelistResults which trigger a match against the includelist will be enclosed in a thick, green, dashed outline.


Results Table

--

Additional Settings


Settings Panel options are already pretty well explained. This section is probably not necessary.