raybb
2 hours ago
I don't have an iPhone to try this, but I've been a long time time user of Tasks.org on Android and particularly because it supports CalDAV and works so well offline.
However, while we are on the topic of planning apps, you should know the Todoist added the best use of AI I've ever seen. It's called Ramble mode and you can just talk and instantly it'll start showing a list of tasks that update as you go. It is extraordinary. I'm considering switching away from tasks.org for this one feature.
Here's a short video of it: https://www.youtube.com/watch?v=DIczFm3Dy5I
You need paid (free trial is ok) and to enable experiments before you can access it.
Anyone know how they might have done this?
sburud
2 hours ago
That’s cool! Slight fear of replicating the Dropbox comment here, but all you really need to do is run whisper (or some other speech2text), then once the user stops talking jam the transcript through a LLM to force it into JSON or some other sensible structure.
raybb
an hour ago
"once the user stops talking" is a key insight here for me. When using this I wasn't intentionally pausing to let it figure out an answer. It seemed to just pop up while I was talking. But upon experimenting some more it does seem to wait until here's a bit of a pause most of the time.
However it's still wild to me how fast and responsive it is. I can talk for 10 seconds and then in ~500ms I see the updates. Perhaps it doesn't even transcribe and rather feeds the audio to a multimodal llm along with whatever tasks it already knows about? Or maybe it's transcribing live as you talk and when you stop it sends it to the llm.
Anyone have a sense of what model they might be using?
makingstuffs
an hour ago
I cannot remember off the top of my head the exact number and am clearly too lazy to google it but there is a specific length of time in which, if no new noises pass through, the human brain processes it as a pause/silence.
I want to say 300ms which would coincide with your 500ms example
wisemang
20 minutes ago
This is definitely dependent on individuals. It’s a reason during some conversations people can never seem to get a word in edgewise, even if the person speaking may think they’re providing opportunities do so. A mismatch in “pause length” can make for frustrating communications.
I am also too lazy to google or AI it but it’s something I remember from when I taught ESL long ago.
SteveMorin
an hour ago
LLM to types and done