In https://www.theverge.com/decoder-podcast-with-nilay-patel/761830/amazon-david-luan-agi-lab-adept-ai-interview

But if you want it to be an actual successful decision-making agent, these models need to learn the true causal mechanism. It’s not just cloning human behavior; it’s actually learning if I do X, the consequence of it is Y. So the question is, how do we train agents so that they can learn the consequences of their actions? The answer, obviously, cannot be just doing more behavioral cloning and copying text. It has to be something that looks like actual trial and error in the real world.

What we’re doing is, rather than doing more behavioral coding or watching YouTube videos, we’re creating a giant set of RL [reinforcement learning] gyms, and each one of these gyms, for example, is an environment that a knowledge worker might be working in to get something useful done. So here’s a version of something that’s like Salesforce. Here’s a version of something that’s like an enterprise resource plan. Here’s a computer-aided design program. Here’s an electronic medical record system. Here’s accounting software. Here is every interesting domain of possible knowledge work as a simulator.

… Now, instead of training an LLM just to do tech stuff, we have the model actually propose a goal in every single one of these different simulators as it tries to solve that problem and figure out if it’s successfully solved or not. It then gets rewarded and receives feedback based on, “Oh, did I do the depreciation correctly?” Or, “Did I correctly make this part in CAD?” Or, “Did I successfully book the flight?” to choose a consumer analogy. Every time it does this, it actually learns the consequences of its actions, and we believe that this is one of the big missing pieces left for actual AGI, and we’re really scaling up this recipe at Amazon right now.

Lots of reliance on RL (reinforcement learning) to accumulate knowledge of how a task can work. That’s where cars learn how to navigate problems.

But at its heart, RL is an automated, sped-up version of expert systems that led to the Second AI Winter. It does not allow for anomalies (cf. taxonomy), and creative decomposition (which requires deep semantic knowledge). At some point, you run out of physical learning scenarios, so you create a system to generate all the potential scenarious it can think of, and use the output to train the input of an RL model.

This is how Waymo is training its vehicles for autonomour self-driving. I bet you if you throw the MIT random traffic system at them, they will come up with cases where it locks up. But as the learning gets more varied, it will cover 80%, no 85%, no 90% of scenarios (and we’re back to fake benchmarks that make mediocrity acceptable).

So one of the big missing pieces, in my opinion, right now in AI, is our lack of creativity with product form factors, frankly. We are so used to thinking that the right interface between humans and AIs is this perpendicular one-on-one interaction where I’m delegating something, or it’s giving me some news back or I’m asking you a question, et cetera. One of the real things we’ve always missed is this parallel interaction where both the user and the AI actually have a shared canvas that they’re jointly collaborating on. I think if you really think about building a teammate for knowledge workers or even just the world’s smartest personal assistant, you would want to live in a world where there’s a shared collaborative canvas for the two of you.

How Alexa+ Works

That’s a good question. Alexa Plus has the ability to, for example, if your toilet breaks, you’re like, “Ah, man, I really need a plumber. Alexa, can you get me a plumber?” Alexa Plus then spins up a remote browser, powered by our technology, that then goes and uses Thumbtack, like a human would, to go get a plumber to your house, which I think is really cool. It’s the first production web agent that’s been shipped, if I remember correctly.

The early response to Alexa Plus has been that it’s a dramatic leap for Alexa but still brittle. There’s still moments where it’s not reliable. And I’m wondering, is this the real gym? Is this the at-scale gym where Alexa Plus is how your system gets more reliable much faster? You have to have this in production and deployed to… I mean, Alexa has millions and millions of devices that it’s on. Is that the strategy? Because I’m sure you’ve seen the earlier reactions to Alexa Plus are that it’s better, but still not as reliable as people would like it to be.

Alexa Plus is just one of many customers that we have, and what’s really interesting about being within Amazon is, to go back to what we were talking about earlier, web data is effectively running out, and it’s not useful for training agents. What’s actually useful for training agents is lots and lots of environments, and lots and lots of people doing reliable multistep workflows. So, the interesting thing at Amazon is that, in addition to Alexa Plus, basically every Fortune 500 business’s operations are represented, in some way, by some internal Amazon team. There’s One Medical, there’s everything happening on supply chain and procurement on the retail side, there’s all this developer-facing stuff on AWS.

Nova ACT: https://www.theverge.com/news/639688/amazon-nova-act-ai-agent-web-browser

Agents are going to require a lot of private data and private environments to be trained. Because we’re in Amazon, that’s all now 1P [first-party selling model]. So they’re just one of many different ways in which we can get reliable workflow data to train the smarter agent.

Counterpoint: This could all be more sophistry. Just moving the goalposts. From Turing Test to “Agents” doing things for us. Did anyone ask for this?


© 2025, Ramin Firoozye. All rights reserved.