Stories Ai companion
Chapter 4 4 of 16

Rise of Smart Devices

The evolution from smartphones to smart homes

Rise of Smart Devices

 

In the previous chapter, we looked at voice-interfaces, cloud-connectivity, and mobile applications.

In this section, we’ll look at how consumer hardware devices began appearing, allowing these technologies to come into our homes and pockets.

Project A


Amazon founder and CEO, Jeff Bezos, in a 1998 speech at Lake Forest University:

I firmly believe that at some point the vast majority of books will be in electronic form. I also believe that that’s a long way in the future, many more than 10 years in the future. And the reason that that gets held back today is that paper is just the world’s best display device. It turns out that today with the state of the art in display devices, dead trees just make the best display devices. They’re higher resolution, they’re portable, [they’re] high contrast and so some day when computer displays will catch up with that and then I think electronic books will be extremely successful.

A little short of ten years later, in 2007, that prophecy led to Amazon introducing an electronic book-reading device: The Amazon Kindle.

Commoncog Case Library
MIT Technology Review

The idea of a digital book reader has been around for a while. But the Kindle was innovative in that it offered:

  • An always-on E-Ink display, offering a week of reading without needing a charge.
  • A user interface that differed from PCs or phones and focused on distraction-free reading.
  • A digital bookstore where new books could be discovered and purchased, then wirelessly downloaded to the device and read offline.

When network access was needed (to download more books or, eventually, browse the Internet), Kindle used a new type of network, Amazon’s Whispernet, which employed an AnyData EVDO wireless modem. It allowed users to browse for books on-device and download them whenever they wanted without signing up for a cellular service or entering account credentials. This was what later came to be known as Frictionless User Experience.

It just worked, and it was included at no extra charge.

Whispernet was the equivalent of building a wireless phone into each Kindle. No one had ever done this before. And if that wasn’t enough, Bezos decided that Amazon would cover the cost of the data plan. While establishing relationships with wireless carriers was difficult, the total cost wasn’t as onerous as the team expected. E-book files were relatively small, resulting in very modest fees. In an interview two years after the first Kindle was launched, Bezos reflected on Whispernet’s role in the Kindle’s success: I believe that’s a big, a big part of the success of Kindle. Because it makes it a seamless device where you don’t even need a computer. And you don’t ever have to hook it up to your computer with a cable. You don't have to fuss with any of that.

Source

There were many other technical innovations:

Amazon’s portable, handheld reader, which allows users to download digital versions of books, newspapers, and magazines, represents one of the first consumer uses of a low-power, easy-to-read electrophoretic display. The $399 device is a breeze to use, and though the company has not disclosed sales numbers, demand quickly outstripped supply. However, the success of the Kindle may depend on consumers’ willingness to bear the price of using it: though e-books, at $9.99, cost less than most physical books, newspapers, blogs, and other content available free on the Internet will cost money (for instance, $1.99 per month for Slashdot and $13.99 per month for the New York Times).

In a 2009 interview, Bezos described the design objectives:

“And the key feature of a physical book is that it gets out of the way. It disappears. When you are in the middle of a book, you don’t think about the ink and the glue and the stitching and the paper. You are in the author’s world. And this device, we knew four years ago when we set about to design it, that that was the number one design objective. Kindle also had to get out of the way and disappear so that you could enter the author’s world. And the design team accomplished that.”

Why is the Kindle relevant here?

Because:

  • It established that connected devices and custom user interactions could appeal to consumers.
  • All that talk of ‘getting out of the way’ meant that its design was frictionless and user-centric.
  • It allowed the functionality to be expanded through purchased ‘add-on’ content (in this case, books, but later, multimedia and apps).

Lab126

In 2004, Amazon established Lab126 in Sunnyvale, California, as a standalone R&D lab to develop consumer hardware.

The actual book software for the Kindle was developed by the French company Mobipocket (hence the file suffix .mobi on Kindle E-book files). The company was purchased by Amazon in 2005.

In 2009, Amazon added text-to-voice – what ended up as a controversial feature in the writing and publishing community – to its Kindle 2 e-book reading device. The technology was based on the Nuance system.

Text-to-voice conversion unlocked yet another user-interaction mode.

Acqui-Hires

Between 2010 and 2012, Amazon went on an acquisition spree.

2010: Ivona

Amazon Lab126

In 2002, ‘five guys from Gdansk’ (Poland) were inspired:

in my head I had a vision straight from sci-fi movies, like “2001: A Space Odyssey”, in which you had a computer that could talk and understand, i.e. a prototype of a voice assistant.

We wanted to develop both speech synthesis and speech recognition, which were necessary to create a friendly assistant. But that challenge would be over our heads; we would also need financial support. So we focused on synthesis.

Unlike other technologies focusing on input, Ivona concentrated on output.

The technology utilizes fragments of actual recorded speech utterances, dynamically reassembling them based on the input text. Four years later, they had developed a system that outperformed competing voice generation systems from IBM and Microsoft, as well as many international research projects in head-to-head competitions.

Amazon approached the company in 2010, ostensibly to license the technology for a subsequent edition of Kindle. But it ended up outright acquiring Ivona Software. Ivona was a competitor to Nuance and supported 44 voices in 17 languages, which gave it a broad international reach. Ivona’s work was instrumental in the development of the Echo AI Assistant and, later, the Amazon Polly voice-generation service.

2011: Yap

Amazon Snaps Up Yap And Its Voice-Recognition Technology:

The month before Apple revealed its “intelligent assistant,” Siri, Amazon was quietly buying out a company whose software performs voice-to-text actions like transcribing voicemails, whatever those are. In fact, its only user-facing product did just that, and its users received notice last month that the service would soon be discontinued.

But then:

The developer of Yap, Jeff Adams, started working for Amazon after his company got acquired. He revealed that Amazon had acquired Yap because it required the skills of the team to create Alexa. The founder of Yap recounts in an interview how he and his team had to essentially start over.

2012: Evi

Evi was acquired, apparently for its knowledge of going from voice to text, and from text to intent.

How Amazon’s Alexa was ‘born’ and where voice-controlled tech will take us next:

When it reached the market in 2012, the technology, Evi, was positioned as a contender to Apple’s Siri - although not by Tunstall-Pedoe, 47, who says he set out to build something new, not to compete.

There’s no deep understanding that comes from reading a document. We haven’t solved that problem, but the knowledge that powers the Evi platform is a knowledge base of structured data, including common-sense knowledge, that’s in a form computers can understand. So it’s not going to a collection of documents - it’s not like a search engine.

The other thing that’s pretty unique is its ability to reason with knowledge. So we can take a question that has never been asked before, find multiple facts in the knowledge base and chain them together, combining them to create new knowledge that’s needed to answer. Our ability to exploit that knowledge base is where the power comes from. This results in many more of the user’s questions being answerable than would otherwise be the case.

The Big Reveal

The first-generation Echo was finally released to Amazon Prime members in 2014. It was a standalone device with speaker-independent voice recognition, the ability to translate voice to text, transform text into user intent, fetch responses, and perform text-to-speech of the result in near real-time.

ℹ️ # Side Note

Full Disclosure: I owned one of those early Echo devices and have since owned many more. FWIW, I also own Home Assistant, as well as Google and Apple assistants, for development/research purposes.

Watson

Voice assistants entered the mainstream when, in 2011, IBM’s Watson Computer became a Jeopardy TV Quiz show champion, beat two former champions, Brad Rutter and Ken Jennings.

Ken Jennings, a former software engineer, holds several Jeopardy records and has since gone on to host the show. Watson beating the champions helped smooth the way for public acceptance of computer-based voice recognition and knowledge systems.


OK, Google

Google’s foray into voice and search dates back to 2007 with the introduction of the GOOG-411 service. It pre-dated smartphones, allowing users with standard phones to use voice to search for and connect to local businesses. The ostensible goal of the project was to let Google collect a database of phoneme data that could be used to train machine-learning models. Having accomplished that goal, the GOOG-411 service was mothballed in 2010.

The technology was then incorporated into Voice Search (allowing users to search Google with voice instead of typing), then Voice Input (fill any text field in Android), and then Voice Actions (control Android commands). Voice Actions was later renamed Google Voice Search and launched in 2012.

In 2010, Google introduced Voice Actions for Android as a series of speaker-independent voice-activated commands for Android phones.

ℹ️ # Side Note

If you’re as confused as I am, you’re not alone.

Google has a habit of creating many identical-sounding products and services and adding Google to the name. This has much to do with the company’s organizational divisions between Search, Mobile/Android, and Google Labs, which itself was closed after a re-org in 2011.

There’s a Killed By Google website dedicated to all the projects Google has stopped supporting.

Moving on…

Google Now

In 2012, Google introduced Google Now, which is based on Google Voice Search and consolidates several services. This was to run as a built-in part of the Search service on Android and as a companion app on iOS. What was unique about it was the creation of custom Activity Cards, which were built on top of the Google Knowledge Graph database.

Activity Cards presented various types of data relevant to the context of that information. Visually, a card could be represented with unique formats:

Image 1
Image 2
Image 3

Cards were designed to adapt to different form factors and interaction models, including voice and wearables:

Google WearOS

Three years later, in 2015, Google added support for third-party apps, enabling them to generate Custom Cards. This launched with 40 different app integrations, allowing Android users to interact with those services through voice.

This opened the door to…

Google Assistant

It wasn’t until 2016 that Google launched Google Assistant, a Siri-like experience.

The technology was integrated into Google Home Smart Speakers (later renamed Google Nest after purchase of Nest Labs in 2014). The Smart Speaker was a direct competitor to the Amazon Echo and functionally identical to the Echo Dot.

Amazon
Google

Cortana

In 2009, Microsoft began work on its own personal assistant, code-named Cortana. It was named after a well-known character in its Halo gaming franchise (and voiced by the same actor).

Cortana assistant software was first integrated into Windows 10 in July 2015 and eventually into Windows Phone and the Xbox Gaming Console.

However, unlike Amazon and Google, Microsoft did not create its own branded Smart Speaker device. Instead, it partnered with well-known speaker manufacturer Harman Kardon to make the Invoke.

Leading the Cortana effort was Larry Heck, who had also worked on the SRI CALO project and R&D at Nuance.

Anticipating More from Cortana:

“The base technologies for a virtual personal assistant include speech recognition, semantic/natural language processing, dialogue modeling between human and machines, and spoken-language generation,” [Heck] says. “Each area has in it a number of research problems that Microsoft Research has addressed over the years. In fact, we’ve pioneered efforts in each of those areas.”

Cortana was tightly integrated with Microsoft’s own Bing search engine, making integrated search directly available to users. Google could also do this, but competitors Apple (Siri) and Samsung (Bixby) did not have their own search engines to tie into. Like the other services, Cortana could tie into email and calendars via Outlook and Office 365 integrations.

Like Alexa, Cortana also had a third-party skills store. Cortana, however, was not a major success, and was eventually shut down.

Heck eventually moved on to Google, where he worked on voice recognition for the Google Assistant and then Samsung’s own virtual assistant, Bixby.

ℹ️ # Full Disclosure

Microsoft made a concerted effort to encourage developers to create software for their Windows Phone and Cortana devices, going so far as to provide free devices and offer outright payments. I was given several Windows Phone devices by Microsoft and offered a Cortana device (which I declined).

By then, most professional developers I knew were already overloaded with iOS, Android, Alexa, Google Assistant, and HomeKit development.

There just wasn’t any more bandwidth.

Bixby and Viv

Samsung announced Bixby in 2017 as a replacement for the S Voice assistant (released in 2012), which itself was initially based on the Vlingo voice recognition system and then the Nuance engine.

What made Bixby unique was that it had early support for individual voices and was integrated inside Samsung’s vast range of products, including phones, cameras, and home appliances.

In 2016, Samsung also acquired Viv Labs, a mere five months after its initial release. Viv was a startup begun by Dag Kittlaus and Adam Cheyer, two of Siri’s founders who left shortly after Apple’s purchase, and announced to great fanfare.

Viv was unique in that it was built on the foundation of integrating with third-party extensions, allowing it to perform multi-part sequences (what you might call agentic today).

Viv Labs filed several patents that may be relevant to future agentic integrations:

The patents related to third-party developers may have relevance to future attempts to extend AI Assistants through Extensions, which we will cover in a later section.

A year after the Viv acquisition, Samsung announced that Bixby 2.0 would be rebuilt on top of Viv’s technology and would be headed by (small world) Larry Heck, who had joined Samsung after his stint working on Cortana at Microsoft, and Assistant at Google.



‘M’ is for manual

Facebook

Not to be left behind, in 2015, Facebook entered the fray by purchasing wit.ai and using it to build a service they code-named M. What made M different was that it could complete a sequence of complex tasks, but when it hit a block or could not complete the steps (reportedly, some 70% of the time), it took a different track:

M is so smart because it cheats. It works like Siri in that when you tap out a message to M, algorithms try to figure out what you want. When they can’t, though, M doesn’t fall back on searching the Web or saying “I’m sorry, I don’t understand the question.” Instead, a human being invisibly takes over, responding to your request as if the algorithms were still at the helm. (Facebook declined to say how many of those workers it has, or to make M available to try.)

Having humans in the loop to complete tasks was not scalable, and Facebook discontinued the service in 2018 after stating:

We learned a lot.

In 2019, the Natural Language Processing service developed by wit.ai would be offered for free to developers, ostensibly to:

[U]nderstand and extract meaningful information (dates, time, and more) from messages that your business receives. You can use this information to identify intent to implement the messaging experience needed for the conversation.

This would be available on Facebook business pages.

ℹ️ # Side Note

A similar technique would go on to be used by Amazon’s ‘Just Walk Out’ shopping and more recently, the high-flying startup, Builder.ai.


Skills

ℹ️ # Note

The following section focuses on Amazon’s Alexa service for illustration purposes. Similar functionality, like Google Actions, Bixby Capsules, or SmartThings Custom Capabilities are available in other assistants.

Alexa’s Skill Kit allowed third parties to extend the functionality of the basic Assistant.

Amazon

Skills support phrases with blank slots that are filled with a user’s utterances. This allows the system to determine the user’s Intent along with specific parameters.

For example, a user asking for the weather for a specific location and time might ask for:

Alexa. What’s the weather forecast for San Francisco tomorrow?

Once the voice had been converted into text, a processor would take over to extract the information the user was requesting. It would look for a specific skill with:

Slot Name Value
Intent name weather
Variable location San Francisco
Variable time tomorrow

The system would then invoke the code for the skill and pass it the named parameters. The result would be returned as text, then converted to the voice of the user’s chosen speaker.

For example:

Tomorrow, the weather forecast for San Francisco is sunny, with a high of 70, but with early morning fog that will burn off by 10 am.

There are situations where the interaction may be more complex, or the necessary values to proceed with a task have not been provided. For example, if the user asks:

Alexa, what’s the weather forecast?

Here, the location and time values have not been specified. The system can surmise those values based on past requests, but for many queries, it may require interactive prompts using the Auto Delegation Process

Amazon

One problem with the skill mechanism is that the list of parameters may vary depending on the context in which it is being used. This is why support for Dynamic Entity was added.

Another problem is that if the user says the phrase in a different combination, an unfamiliar form, or with a heavy accent, the request may fail, and the user may become frustrated with the product.

This is where modern LLMs help. They are far more tolerant about what a phrase should be, can better handle ambiguity in pronunciation, and can be used to generate much more natural responses that don’t sound like canned text.

Physical World

Alexa skills can be extended to Internet of Things (IoT) devices that can be controlled remotely via WiFi and Bluetooth.

This allows voice control of Smart Home devices and the physical world. There have been other efforts to automate the physical world through projects like Google Home/Google Nest, Apple Home, SmartThings (subsequently purchased by Samsung), and the open-source Home Assistant. These all follow the same general pattern (with some minor variations).

ℹ️ # Side Note

The architecture of Smart Home devices and assistants was very similar, until Project Matter came around. Diving into Matter is outside the scope of AI Assistants, but it is a fascinating example of trying to catalog and categorize interactions. We’ll cover Taxonomies and Intents in a later section.


Personal Notes

ℹ️ # #1

I was one of the early backers of SmartThings on Kickstarter. The programming was quirky and very much a DIY thing, but it was one of the first scriptable smart home platforms.

For many years, SmartThings helped by sending push notifications every time someone (i.e., small children with quiet feet) tried to sneak past a motion sensor outside my home office. For some reason, giving me a surprise heart-attack was considered funny.

Home Assistant has taken scriptability to a whole new level.

ℹ️ # #2

As I write this, I have one of each of the following devices wired into my home lab, along with a raft of Zigbee, ZWave, Bluetooth, and Matter sensors and peripherals:

  • Google Nest Hub
  • Echo Show
  • Echo Dot
  • Apple HomePod Mini
  • Ikea Dirigera Hub
  • Home Assistant Hub (on a Pi4)
  • ESP32-S3-BOX voice assistant
  • ESP-AVS DevKit

Yes, it is an illness.

ℹ️ # #3

The urge to add voice to any device would lead to devices like the Alexa-enabled AmazonBasics Microwave Oven.

To this day, I am still baffled by it.

Amazon
ℹ️ # #4

In 2019, I joined Amazon Lab126 and worked on technologies related to Echo, FireTV, and Kindle devices. I left to join Amazon Web Services, where I worked on connected devices and IoT solutions. Occasionally, I would return to Lab126 and give seminars on the latest cloud technologies.

Obviously, all descriptions and materials in this series are based on publicly available sources that I have directly linked.

China

There have been significant efforts at creating AI Assistants in China. Up-to-date, reliable details on the technology are somewhat difficult to find outside the country, but what is available paints a fascinating picture of very advanced technologies designed for domestic consumption. Anyone studying or working in the AI Assistant field would be foolish to dismiss or ignore these products.

Alibaba Tmall Genie

The Alibaba Tmall Genie X1 was introduced in 2017. At the 2018 Neural Information Processing Systems conference. Jin Rong, head of Alibaba’s Machine Intelligence and Technology Lab demonstrated an assistant capable of handling complex user interactions:

Within 30 seconds, the agent has smoothly handled three common, and tricky, conversational ingredients: interruption, nonlinear conversation, and implicit intent. Interruption is self-explanatory: the agent can respond to the customer’s interruption and continue relaying relevant information without starting over or skipping a beat.

The nonlinear conversation occurs when the customer asks, “Who are you?” This requires the agent to register that the customer is not answering the preceding question but rather starting a new line of inquiry. In response, the agent reintroduces itself before returning to the original question.

The implicit intent occurs when the customer responds, “I’m not home in the morning.” He never explicitly says what he actually means—that home delivery won’t work—but the agent is able to read between the lines and follow up sensibly.

The capability of the chatbot far exceeded anything Alexa or Google Assistant could do at that time:

Alibaba is also developing digital assistants for other aspects of its business, including a food-ordering agent that can take your order in noisy restaurants and stores; a humanlike virtual avatar that can field questions about Alibaba products; and a price-haggling chatbot that is already used by 20% of sellers on Alibaba’s resale platform Xianyu.

As of 2019, it was reported that the AliGenie platform was running on some 200 million devices. Alibaba has also formed partnerships to use a dedicated, custom voice-processing chip.

The Tmall Genie team is pushing ahead with Quark:

Alibaba is reportedly working on AI-powered smart glasses, with development led by its Tmall Genie team and a launch planned for late 2025.

Alibaba

This will be based on Alibaba’s Quark app, using the Qwen LLM:

With the reasoning proficiency from Qwen, the revamped Quark offers advanced capabilities such as AI chatbot, deep thinking, deep research, and task execution into an easy user interface. It aims to handle tasks ranging from academic research to document drafting, image generation, presentations, medical diagnostics, travel planning, and problem-solving.

It allows users to ask complex, multi-part questions and follow-ups with more in-depth information on a topic directly within the search engines. Compared to other mainstream AI chatbots, it excels at providing real-time, precise, and comprehensive information drawn from multiple online sources, complete with reference links embedded in the responses for easy verification and further exploration.

The upgraded Quark reimagines the traditional search experience by transforming it into an all-in-one AI super assistant designed to fulfill the work-life needs of more than 200 million users in China.

The Qwen LLM offers a range of functionality:

  • Text creation: writes stories, documents, emails, scripts, and poems.
  • Text processing: polishes text and summarizes text.
  • Programming assistance: writes and optimizes code.
  • Translation: provides translation service among various languages such as Chinese, English, Japanese, French, and Spanish.
  • Dialogue simulation: engages in role-playing for interactive dialogues.
  • Data visualization: creates charts and visualizes data.

What sets the Alibaba offering apart is that, much like Google, the whole search/software/hardware/wearable AI Assistant stack is vertically owned and operated by a single entity.

Amazon’s Alexa+ and Apple’s Siri may want to take note.

Baidu DuerOS

Baidu

The DuerOS Conversational AI Platform is embedded in a range of Baidu devices, including:

  • Smartphone App
  • Smart Speaker
  • Smart TV
  • Wearable Activity Trackers
  • Refrigerator
  • Smart Home devices
  • Vehicles
  • Appliances like: refrigerator, AC, washing machine, vacuums, air purifiers, etc.

DuerOS allows any of these devices to function as AI assistants, provided they support a microphone array and speakers. This implies that, much like Amazon, Google, and Apple devices, most processing is done in the cloud.

Baidu claims that it has unique advantages:

Baidu began investing in AI at a very early stage and has recruited top talent from around the world. It possesses unique advantages in the three key factors that drive today’s AI revolution: algorithms, computing, and data. Baidu’s algorithms are built on top of ultra-large-scale neural networks, trillions of parameters, and hundreds of billions samples. Baidu’s computing power stems from hundreds of thousands of servers and the largest high performance GPU cluster in China. As the world’s largest Chinese search engine, Baidu has access to more than a trillion webpages, billions of search queries, images, video content, and positioning data. DuerOS synthesizes the best of Baidu technologies – speech recognition, image recognition, natural language processing, user profile, and other advanced technical skills – to create one of the most advanced conversational computing platforms available today.

The Baidu Smart Living Group was separately valued at 20 billion yuan or $2.9 billion. in 2020. Baidu licensed the technology to other providers for embedding in their own devices via embeddable circuits. The company has been actively working on ambitious plans.

Baidu made quite a splash back in 2017-2018, but has been relatively quiet since then. This doesn’t mean they have been slowing down. In 2025, they claimed the DuerOS X to be the first AI-native operating system.

They also announced Ernie X1, their home-grown Deep Reasoning model.

Xiaomi - Xiao AI

Between Xiaomi’s Hyper AI and MiMo Reasoning Model, the company has been actively pursuing its own path in creating underlying AI technologies.

As of this writing (2025), most of their AI-Assistant speakers appear to have switched to using Google Assistant and Nest technologies. However, they have a wide range of products and core technologies, ranging from smart home to phone and automobile products, and their AI Assistant strategy may change.

Keep an eye on them.

No Bueno

AI Assistant devices have now been with us for over a dozen years.

However, most have not been financially successful. In 2022, Amazon reportedly was on pace to lose $10 billion that year in its Alexa division.

Ars Technica

Google also announced in 2022 that it would shut down its Cloud IoT Service, used by device manufacturers as a back-end for many Smart Home projects and connected directly into their AI Assistants:

IOT World Today

Apple’s HomePod assistant was last updated over two years ago. There was speculation at WWDC-25 whether there would be news on a new homeOS (announcer: There wasn’t.)

Alexa+

Amazon has already announced and is slowly rolling out their next-generation Alexa+ device, now with generative AI support. However, the old skills will no longer work, and third-party integrations will need to be rebuilt.

The reviews have been mixed:

PC World
USA Today
The Verge
New York Times - Wirecutter

I got access to Alexa+ in October 2025. During initial provisioning, you are prompted to provide access to your contacts, emails, images, and documents, and to allow the system to detect your individual voice.

The Next Generation

Whisper and ChatGPT

In September 2022, OpenAI announced Whisper Automatic Speech Recognition (ASR) system, trained using semi-supervised learning with data collected from the web. The system was lauded as a breakthrough in text-to-speech technology and was rapidly adopted for use in a range of applications, including healthcare.

Perhaps a little too rapidly.

AP

SAN FRANCISCO (AP) — Tech behemoth OpenAI has touted its artificial intelligence-powered transcription tool Whisper as having near “human level robustness and accuracy.”

But Whisper has a major flaw: It is prone to making up chunks of text or even entire sentences, according to interviews with more than a dozen software engineers, developers and academic researchers. Those experts said some of the invented text — known in the industry as hallucinations — can include racial commentary, violent rhetoric and even imagined medical treatments.

A month after release of Whisper, in November 2022, OpenAI announced ChatGPT:

The interactions were fluid and responsive. You were no longer limited to specifically formatted slot-based queries.

The same issues with hallucinations arose, a problem that persists to this day.

NYTimes

Rabbit R1

In January 2024, at the annual Consumer Electronics Show (CES), where most consumer electronics companies congregate every year, Jesse Lyu, the founder of AI Startup Rabbit, presented a standalone device that could let you interact with an AI service on the cloud without having to use your phone.

His presentation video (above) was compelling and made some audacious claims, including the fact that they were not using an actual LLM (Large Language Model), but had come up with what they called a Large Action Model. The first batch of 10,000 pre-orders at $199 sold out very quickly, as well as a second batch with later delivery times, leading to a third round.

The Verge

The tech press was mystified and excited (but also a little cautious):

TechRadar
ℹ️ # Side Note

My first reaction to the R1 was questioning their business acumen. A fixed $199 price for a connected device?

We covered the math in the Sustainability section on Money. Unless they devised a separate recurring revenue model, each device would be running at a loss within 2-3 years (depending on their back-end OPEX costs).

The only way it could work was if their business model counted on selling a lot of devicess, and people stopped using it long before the 2-year mark. Think of it as a reverse plot from The Producers.

PlayBill

Once actually released, the backlash was fierce:

The Verge
The Verge
The worse was yet to come.
Arnica

In April 2024, a security consultant raised a flag with the way the R1 had users log in and give R1 access to their favorite applications:

But here’s the thing that has me a bit concerned. Instead of using a nice, secure method like OAuth to link accounts, the r1 has you log into services through VNC in their portal.

Don’t get me wrong, I love the convenience of being able to connect applications to an AI device. But having it snapshot your credentials or session data is… not great from a security standpoint.

This meant that Rabbit’s servers could potentially obtain and store a user’s account credentials to a connected third-party service, using the user's credentials. Even if you assumed the folks at Rabbit were trustworthy, it wasn’t clear how those login credentials were stored and how secure the Rabbit servers were.

Then it got worse.

On June 25th, 2024, a group called Rabbitude disclosed that on May 16th, they were able to gain access to Rabbit’s code base and discovered a number of hardcoded API keys with access to Eleven Labs, Azure, Yelp, and Google Maps.

Why are these API Keys important?

These are used by companies offering cloud services to govern access to their services. In this case, with these keys, anyone could access those suppliers to Rabbit and do whatever those keys allowed them to do.

While the exposure of hardcoded API keys is, in itself, a major concern for any organization, the breadth of permissions associated with the exposed keys further amplifies the severity of the exposure. According to Rabbitude, the exposed keys provide the ability to “read every response every r1 has ever given, brick all r1s, alter the responses of all r1s, replace every r1’s voice.

The issue was made public when the group felt like the critical breach was not getting the priority it needed (the company took more than a month to even acknowledge the problem):

As of the disclosure, Rabbitude claims that while they’ve been working with the Rabbit team, no action has been taken by the company to rotate the keys and they remain valid. Rabbitude gets a little chippy at the end of the disclosure stating that they felt compelled to publicize the company’s poor security practices and that while they were not planning to publish any details of the data, this was “out of respect for the users, not the company.”

Even after that, Rabbit’s response was to blame an employee leaking the data to a hacktivist group.

If you think that was a freak, one-time issue, just a few months later, another vendor with an AI-enabled device went even farther:

MGDProductions

This time, the problem was embedding those API keys inside the device software, which was relatively easy to read using freely available, standard development tools.

MGDProductions

These could be discounted as anomalies, but they point to a larger issue that the safety of what we might consider private considerations lies on a very fragile web of interconnected services. Once you’ve spoken or typed into that assistant, we have no idea what route that data takes.

Humane AI Pin

TechCrunch

The Humane AI Pin was developed by a team consisting of a former Apple designer and many former Apple engineers. It was pitched as a way to perform tasks usually relegated to a phone, without having to have a phone. The company had been around since 2017, but it only came out of stealth in 2021 with announcement that it was working on a new, unique connected device. The main product was revealed in April 2023 during a TED talk.

Humane

The product was ambitious, involving development of a custom operating system, a laser interface that projected on surfaces including one’s own hand, and a haptic touch, wearable device. It also had a standalone cell modem requiring a monthly service subscription.

The initial reviews were not kind.

NYTimes
Ars Technica

It’s a voice assistant box, so that means it has a microphone and speaker. There’s no hot word, and it’s not always listening, so you’ll be pressing a button to speak to it, and you’ll get a response back. There’s also a camera, and because you’re expected to mount this on your clothing at chest level via a magnetic back piece, you’ll be creepily pointing a camera at everyone the whole time you’re using it. It claims to be “screenless,” but it has a pretty cool 720p laser projection system that seems to function as a fine monochrome screen that projects a smartwatch-like UI onto your hand.

Once the device actually started shipping, the reviews got worse.

The Verge
404 Media

It did not help that the charging case was recalled, due to the lithium battery posing a fire hazard.

US Consumer Product Safety Commission

Humane did not last long.

It stopped shipping in February 2025, and the cloud-services were shuttered by the end of the month. The technology, patents, and team were acquired and merged with the HP IQ Team.

Voice Transcribers

There are a number of wearable devices capable of recording and transcribing every word we say and hear. I won’t go into every single one since some may be gone by the time you read this. Some listen continuously and others require user interactions to start and stop processing.

Device (alphabetic order) Price Subscription Form-factor Privacy Unique Note  
Bee $49.99 None Bracelet Processed and not saved No subscription Acquired by Amazon  
Botslab AI Note Taker $129.99 None Credit-card size Privacy Policy Standalone recorder with built-in storage. Record on-demand. 120 languages. 10 industry-specific terms. Company also makes dash cams, security cameras, and video doorbells  
Fieldy AI Pin $149 Free for 150 minutes. $15.99/mo unlimited Pendant May be processed internationally HIPAA compliant. Connect to calendar    
Friend $129 None Pendant Encrypted with device key but may be used for research Device is optional Pricey domain name, controversial NYC Subway Ads, and brazen CEO claims  
HumanPods $199 $19/mo In-ear headphones Personal Agent will learn about you, but data can be erased NatureOS with AI Personas    
IPEVO Vocal Wearable $109 Vurbo.ai Free for 300 minutes. $89-$199/mo unlimited Pin. Requires phone or laptop attachment. Data stored on local computer   Real-time language translation  
Limitless $199 $29/mo Pendant Recordings retained HIPAA Compliant Formerly Rewind  
Omi $89 None Necklace or glued to templeNot saved Recordings not stored, but metadata kept Claims to use brainwaves to start interactions    
Plaud Notepin $159 Free for 300 minutes. $19.99/mo unlimited Necklace, wristband, clip, or pin. May analyze and use data commercially   Not always-on  
Plaud NotePro $179 Free for 300 minutes. $19.99/mo unlimited Credit Card size May analyze and use data commercially Multimodal Input (audio, notes, images) Not always-on  
Senstone Scripter $189 Free basic. $14.90/mo for Pro Plan Audio and Transcription retained on app Integration with Office and Google services Industry-specific training    
Soundcore Work $159 $15.99/mo for collecting transcripts together Pendant or Pin Not available yet Coin-Sized recorder (width). Early review. Made by Anker, a large maker of rechargeable batteries  
TicNote AI Voice Recorder] $159.99 Free 600 minutes/mo and 10 AI chats/day. $79/yr and $239/yr plans for extended recording and unlimited AI chat Credit-card size Privacy Policy   Tha manufacturer, Mobvoi also makes sports watch and treadmills.  

When using these devices, you should be cognizant of potentially recording others without their knowledge.

AI Glasses

This is a fast-moving field. I will add a list of the players once things have settled. In the meantime, here are some reviews:

One More Thing

The San Francisco Standard

In this section, we looked at the evolution of hardware Companion devices, from ‘smart’ speakers, to mobile devices and wearables.

Next, we will dive under the hood and look at how they work.

Stay tuned.

Title Photo by Sylwia Bartyzel on Unsplash