Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I consume all text as images when I read as a vision capable person so it kinda passes the evolution does it that way test and maybe we shouldn’t be that surprised that vision is a great input method?

Actually thinking more about that I consume “text” as images and also as sounds… I kinda wonder if instead of render and ocr like this suggests we did tts and just encoded like the mp3 sample of the vocalization of the word if that would be less bytes than the rendered pixels version… probably depends on the resolution / sample rate.



Funny, I habitually read while engaging TTS on same text. I have even made a Chrome extension for web reading, it highlights text and reads it, while keeping the current position in the viewport. I find using 2 modalities at the same time improves my concentration. TTS is sped up to 1.5x to match reading speed. Maybe it is just because I want to reduce visual strain. Since I consume a lot of text every day, it can be tiring.


This is also feature is built into Edge (and I agree it's great, but I mostly use it so I can listen to pages while doing chores around the office/closing my eyes.

What I would love is an easy way to just convert the page to a mp3 that queues into my podcast app to listen to while taking a walk or driving. It probably exists, but I haven't spent a lot of time looking into it.


I do this too. It's great. The term I've seen used to describe this is 'Immersion Reading'. It seems to be quite a popular way for neurodivergent people to get into reading.


Any chance you could share the source?

I found that I can read better if individual words or chunks are highlighted in alternating pastel colors while I scan then with my eyes


What’s your extension? Sounds interesting!


Just FYI, Firefox reader mode does the same thing. It's a little button in the address bar.


Reading mode in chrome does this too. Although the tts sounds like it's far behind sota


Probably because it needs to run locally on older CPUs, so it's likely using an old-school phonemizer that will run on a 15 year old computer.


The pixel to sounds would pass through “reading” so there might be information loss. It is no longer just pixels.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: