通用網站內容爬蟲抓取工具,可批量抓取任意站點的小說、論壇內容等並保存為TXT文檔
Thanks for suggestion, Great work!
I think it's not a good idea to add Turndown to this project. As this script is for novel sites, and most of them are crammed with advertisements. If I convert the content with full-supported markdown, the obfuscation will be inevitable.
Thank you.
Sometimes, I do happen to manually edit markdown files produced by Turndown sue to javascript and css script that were catched in the process.
The HTML seems to work as expected, most of the time, though I should improve it.
Add Markdown support.
Please look into my script which supports Plain Text, Markdown and HTML.