Greasy Fork is available in English.

JanitorAI - Text to Speech - Built-in/ElevenLabs/GeminiTTS

Text to Speech (TTS) integration for JanitorAI using built-in voices, ElevenLabs TTS, and Gemini TTS with emotion analysis and audio segmentation.

< JanitorAI - Text to Speech - Built-in/ElevenLabs/GeminiTTSについてのフィードバック

質問/コメント

§
投稿日: 2025/11/09

For some reason, it is not working for me anymore. It doesn't auto generate TTS. And if I manually click the play button, it shows "playing" for two seconds (don't hear any audio) and then it stops by itself even though it should be 30 or so seconds of audio.

It has only been two months since I last used this userscript, and the userscript hasn't been updated during that time so I don't know why it isn't working anymore.

§
投稿日: 2025/11/09
編集日: 2025/11/09

Hi! That’s because JAI updated their chat code — they changed the names of some HTML variables, so the script can’t find the text it’s supposed to filter and apply TTS to. I'll update when I get back home (also I forgot to upload the Elevenlabs support update mb lol, I'm also working on a script that adds a live2d model and does lipsync with the tts)

§
投稿日: 2025/11/09
編集日: 2025/11/09

Oh, okay. It makes sense that some element ID and class names change. If you can ever find something you know won't change, you can sometimes use it alongside parent() to target the element that does sometimes change. For example, if element#neverchanges never changes and it is the child of element#sometimeschanges, you could do a element#neverchanges.parent() to target element#sometimeschanges and if the #sometimeschanges does change the userscript will still work. I don't like paying for subscriptions or services so I don't use ElevenLabs, but it is good that you are working on it for those who do. Zira's voice sounds like complete crap though, which is why I stopped using this userscript a couple of months ago. I wish that you could figure out what this Google Extension did: 'https://chromewebstore.google.com/detail/ms-edge-tts-text-to-speec/oajalfneblkfiejoadecnmodfpnaeblh' because for English > USA, they have a 'Michelle' voice that is absolutely divine but it doesn't cost anything to use it. It's really weird because that 'Michelle' voice is a SAPI5 voice which is not supported in Chrome, but that extension found a way to add it or a clone of it anyway. As for the Live2D model and lip syncing stuff, that has NEVER interested me. JanitorAI has its own thumbnail images that you can enlarge and look at during chat. But for me, to see some anime character covering my screen talking at me creeps me out. But I do believe a lot of people will like that feature, including yourself, so I can understand if you keep working at it. I just hope it won't distract you or overwhelm you on the more important matters. Anyway, thank you for the reply. Have a great day. Oh, another thing about your userscript is that it waits until the AI reply has finished typing before speaking it aloud. My AI responses are super long so it's a long wait. If you ever figure out a way to make it speak aloud at the same time it is typing, it would speed things up. But I think that would take a lot of work and code knowledge to do.

§
投稿日: 2025/11/26

Hi! Okay, so I figured out how to use Edge TTS on the script. The thing is, it's using an NPM library, so you'll have to disable the Content Security Policy (CSP) in your Tampermonkey settings if you want it to work. Another problem is that it's not very stable; the audio sounds 'glitched' or like it has 'interference' when the text is around 100 words or more. Oh, and I also added the function to use TTS while the text is streaming. It'll apply TTS with a small delay (to ensure that there are no pauses) that you can configure. I'll do some tests to ensure its compatibility with my Live2D script before posting.

§
投稿日: 2025/11/26

It sounds like you made some major progress on many different things. Awesome. I don't use Tampermonkey. I was using Violentmonkey, but then I switched to ScriptCat around 1.5 or 2 weeks ago. I don't know if it has a CSP setting or not. Even if it did, I don't feel comfortable disabling security features. But I can test it in ScriptCat and report to you if it works or not. That is unfortunate that the sound gets glitchy after 100 words. My responses are like 5-6 paragraphs long, sot hat is definitely more than 100 words. But yeah, it sounds like you made some major changes. By the way, I still like JanitorAI, but I've mostly been using AI Dungeon the last one or two months. You should give it a try. It's more advanced and is better overall.

§
投稿日: 2025/12/06

By the way, you should consider moving this userscript to SleazyFork. GreasyFork and SleazyFork are the same thing, but GreasyFork is for PG website userscripts and SleazyFork is for NSFW website userscripts. Both websites use the same login information. If you do move it to SleazyFork, please let me know the new URL to your userscript. I haven't used JanitorAI or your userscript in ages, but I sitll want to keep tabs on it.

§
投稿日: 2025/12/11

I used the userscript today for the first time in many months.
- The Playback Speed feature is nice and seems to be working. But it is only available for the Built-In TTS provider.
- The TTS doesn't work while the text is streaming. It waits until the AI response is fully typed out before it starts playing the voice.
- A Volume slider would be nice too. I like to keep my web browser at 100% but adjust the volume for each page or video that I am watching. If this had a volume slider, I could set the volume low while keeping the web browser at 100% volume.
- I could not get the Gemini TTS provider to work. I don't know where to get the API key from. I just copied my Google AI Studio API and I tried both Gemini Flash and Gemini Pro, but nothing happened. It would not play any voices at all, not even 0.0001 seconds of voice.
- I did not see "Michelle" listed as a voice. She's a voice that Edge TTS had that I liked a lot. I guess that's okay if the Gemini TTS voices are as good. But I couldn't hear any of them, so I don't know.
- I am not at all interested in Live2D stuff, so it would actually make me happy not to see that. But I did not see it, and it weirds me out because the userscript changelog said that it had been added. So basically just wondering why I don't see it if it has been added. I am on version 3.9.5.

§
投稿日: 2025/12/11
編集日: 2025/12/11

Ty for the feedback! I'll keep it in mind for a future patch :D

  • About the EdgeTTS voices: I originally planned to add them, but the needed library was quite unstable and caused audio glitches with long texts so I decided to remove them from the final version.

  • Similarly, I actually implemented a feature to speak while the text was streaming but I had to remove it since different providers have different streaming speeds, it often caused a choppy audio effect, and I wanted to ensure a smoother experience.

  • About Live2D: It works correctly, but it requires a separate script I made: [https://greasyfork.org/scripts/557179-janitorai-live2d-avatars-for-characters-with-tts-lip-sync]. Don't worry, it's completely optional! However, just keep in mind that if you do want to use the Live2D script you'll need to have this TTS script installed as well.

  • About GeminiTTS: AIStudio has been experiencing some problems lately, many users report experiencing issues with it, specially for the free tiers.


You're absolutely right about the playback speed though. It seems to be an error that I'll definitely try to fix. I'll also make sure to keep in mind the volume slider feature you suggested. Thanks again for your feedback!

返信を投稿

返信を投稿するにはログインしてください