An early adopter's thoughts on Rewind.ai's $350m pivot

For their AI assistant: local & desktop context is OUT; cloud & audio context is IN

Apr 15, 2024

Disappointed Rewind team cuts screen recording and on-device data storage in pivot to Limitless.ai
Optimistic that Limitless.ai will be competitive in audio assistants with Otter.ai and Tab
Building Rewind-inspired Dozy.ai, a desktop emotional support animal to make staying on-task and hitting goals delightful

Today, Rewind CEO Dan Siroker announced a new cloud-based app for meeting recording - Limitless.ai.

Limitless is a major pivot from the local-based screen + audio recording app that led Rewind.ai to win Product Hunt's 2022 Most Innovative Award and raise a $350m valuation Series A in May 2023.

While Rewind will continue to be supported for now, the phase-out has begun as Dan's announcement on the Rewind Slack says, "We hope over time you’ll come to agree with us that the Limitless approach is superior and you will use it exclusively."

The commitment to the pivot is proven by the Rewind X bio:

Surprise

Some members of the Rewind community had been suspecting a pivot since this cryptic tweet from Dan in December.

The rate of Rewind updates had dropped off dramatically in the last several months. The last update to the MacOS app changelog was in February. Deeplinking was the major feature this year. The lack of updates had not gone unnoticed on the Rewind Slack:

Some of the most anticipated Rewind features had not been delivered:

Local/swappable LLMs
Large multilingual Whisper model option
Better battery life efficiency
More granular privacy settings
Transparency and customizability of search
Windows support

We knew the Rewind team was working on something big, but a rename and entirely new app was a surprise to my friends and I. Limitless seems to be closer to the spiritual successor to the cloud-based meeting transcription tool that Dan and team were working on before Rewind in 2020/2021 - Scribe.ai. We can see in the Limitless roadmap more audio-focused features to come.

Comparing Rewind and Limitless

Upsides of the Limitless approach:

The swap away from OpenAI's mid-level Whisper model will be a substantial improvement in transcription quality
Users can take their meetings away from their laptop and still get transcription
Supports Windows + web in addition to MacOS
Less battery life impact and system requirements
Saves device storage
Proprietary Confidential Cloud claims to be subpoena-proof (published audit would be nice)
Reads your Gmail

Downsides:

Reads your Gmail
No mobile app
While Limitless promises to keep your data safe, the low-trust privacy model is weakened now that data is being uploaded to the cloud
Third-party transcription partners will get users anonymized audio data and may keep it up to 30 days. Unclear if de-anonymization is possible.
Desktop context in the form of screenshots and current-app data is no longer retrieved. This may come in the summer. Possibly meeting-only. Dan says, “…maybe even capturing what you see on your screen, similar to how Rewind works”
Limitless currently only supports English transcriptions and meetings from a single Google account
No clear migration path from Rewind to Limitless

Analysis

Why such a big pivot?

A year after launch, the team may have concluded that most paying users didn't prioritize the desktop context features or local-only transcription. Otter.ai, fireflies.ai, and others have proven a market in the cloud-based meeting transcription segment. Limitless may hope to differentiate with the Pendant audio-recording wearable and audio-context-based agent functionality, thereby emerging as an innovator in a lucrative segment.

Rewind also uses a substantial amount of on-device storage (~11gb/month), and it’s easier to justify charging, get user lock-in, and store more data with a cloud storage approach.

The Pendant assistant focus merited a rename. The cloud-based storage model merited an application rewrite.

Reflections as a Month 1 Rewind user

I'm disappointed.

What made Rewind such a standout was its bold step towards the vision of a trusted home base for all your personal context. You could trust it more, because the raw data was (mostly) kept on-device in an encrypted database. The screen recording aspect also gave a sense of safety that you could always find something lost. There was the expectation of integrating other sources of data - maybe if the transcription was weak, you could import cloud-based transcriptions, etc.

The future seemed bright. I dreamed of what personal assistants could rise from this trustable, data-rich soil. In the limit, I imagined Rewind understanding everything that I did, finding opportunities to support me. I imagined it as the individual-centric, instead of organization-centric, version of Task Mining; passively identifying automation opportunities in my workflow.

Even if vision is added to Limitless, I bet it will use cloud storage. Maybe even cloud vision, but I think the unit economics are too tough there and privacy ask too extreme. Limitless could win some developer hearts and minds by promising to release a self-hosted version, like Bitwarden.

The most common reason I heard Rewind users stopped using Rewind was performance - Rewind turned their new MacBooks “in to a toaster” and/or took too much battery life. The value that Rewind created was not quite enough to compensate. Computer vision (OCR), even M1 optimized, is computationally intensive, and my finding is that the Rewind team only used the most basic heuristic (screenshot hash comparison) to spare compute.

The second most common reason users I talked to quit Rewind was that the LLM features weren't good enough. Ask Rewind and Meeting Summaries were impressive prototypes, but always walked the line of the accuracy needed for reliable use for me. Rewind doesn't allow users visibility or customizability into how exactly the user's data was marshaled to create these texts, so improvement felt intractable. Local voice transcription in 2024 is not nearly as accurate as cloud transcription, so, in their defense, garbage in means garbage out. However anyone who has used MacOS' OCR VNTextRequest knows its quite good for getting text from screenshots, far better than Tesseract, so the Ask Rewind limitations feel frustrating.

Overall it looks like the team got hooked (again) into meeting intelligence. Dan is a visionary, and I'd never bet against him and the stellar Limitless team in finding innovation and success in the audio context space. I'll miss their pioneering in the desktop context space, and especially in the local data movement. Perhaps Limitless will be back with their developer platform later this year.

What's next?

I believe there is tremendous appetite for applications built atop desktop context, and not just because I’ve read every feature request in the Rewind Slack.

Apple has a research team publishing on AI that leverages screen context to improve audio assistant performance.

The 193 HackerNews comments (and 555 upvotes) from the launch of one of the several open-source Rewind competitors Rem shows many engineers have homebrewed screen recording tools for record keeping. Desktop productivity trackers like Rize explore parts of how current-app context can be used to coach people.

What I think is coming next is a desktop engine for high-context personal assistants. The data that Rewind's MacOS app collects, but in an extensible data platform like Segment. In other words, a private control center for desktop context and automation. It would allow granular permissions, so I can decide which local or cloud-based assistants get what kind of information about me and what they can automate on my computer.

I envision a single platform per-device that takes screenshots, runs computer vision, filters sensitive information, and provides the clarified, permissioned data for 3rd-party personal AI assistants. For example, in email, productivity, timesheeting, entertainment, or scheduling. Each assistant having their own doesn’t make sense because of the substantial resource demands of running an on-device vision pipeline.

Bonus: My connection

I'm currently working on a desktop virtual emotional support animal called Dozy.ai that helps staying focused and hitting goals feel delightful.

Before building Dozy, I spent months reimplementing the locally encrypted Rewind vision + current-app stack to be sure I'd have the data needed to be helpful (Rewind is not extensible). It was admittedly overkill - Dozy doesn’t use vision yet - and I call it the TopSecret engine.

Apples-to-apples performance testing the full TopSecret engine was able to hit 2.5x less battery consumption than Rewind.ai for identical image quality, OCR accuracy, and compressed storage use. The tricks, at a high level, are using a performant Swift screenshotting library and a suite of screenshot pixel diffing heuristics that only run the OCR on regions of the screen that have changed. Turns out to have worked really well, saving over an hour of battery life on my M1 Max 16”. Feels good!

Activity monitor screenshot after 33 minutes of active MacOS use. Rewind audio recording disabled. Compare the “12 hr Power column”

The plan is to focus on making Dozy.ai awesome, but maybe someday TopSecret could be spun out. Let me know if you'd like to know more.

Thanks for reading! Let's talk more here or on X, especially if you're excited about desktop AI assistants.

Andrew’s Newsletter

Discussion about this post