Robotics

When AI assistants get physical: robotics and embodied AI

22 min read

(Domo arigato, Mr. Roboto) Thank you very much, Mr. Roboto
For doing the jobs that nobody wants to
And thank you very much, Mr. Roboto
For helping me escape just when I needed to
Thank you! Thank you, thank you!
…
The problem’s plain to see
Too much technology
Machines to save our lives
Machines dehumanize…

The Rise of the Robots

Aristotle’s Automata

Now instruments are of various sorts; some are living, others lifeless; in the rudder, the pilot of a ship has a lifeless, in the look-out man, a living instrument; for in the arts the servant is a kind of instrument. Thus, too, a possession is an instrument for maintaining life. And so, in the arrangement of the family, a slave is a living possession, and property a number of such instruments; and the servant is himself an instrument which takes precedence of all other instruments. For if every instrument could accomplish its own work, obeying or anticipating the will of others, like the statues of Daedalus, or the tripods of Hephaestus, which, says the poet,

“of their own accord entered the assembly of the Gods; “

if, in like manner, the shuttle would weave and the plectrum touch the lyre without a hand to guide them, chief workmen would not want servants, nor masters slaves. Here, however, another distinction must be drawn; the instruments commonly so called are instruments of production, whilst a possession is an instrument of action. The shuttle, for example, is not only of use; but something else is made by it, whereas of a garment or of a bed there is only the use. Further, as production and action are different in kind, and both require instruments, the instruments which they employ must likewise differ in kind. But life is action and not production, and therefore the slave is the minister of action.

– Aristotle - Politics, Book I, Part VI, 350 BCE

The concept of tools that could work unsupervised and of their own accord has been with us for centuries. But it wasn’t until the early 20th Century that notions of autonomous thinking and organic automata were interwoven.

El Ajedrecista

The first device with its own decision-making capability was a chess-playing machine built in 1912 by Spanish Civil Engineer Leonardo Torres y Quevedo.

The Scientific American, in 1915:

M. Torres claims that he can make an automatic machine which will "decide" from among a great number of possible movements to be made
, and he conceives such devices, which if properly carried out, would produce some astonishing results.

Interesting even in theory, the subject becomes of great practical utility. especially in the present progress of the industries, it being characterized, in fact, by the continual substitution of machine for man; and he wishes to prove that there is scarcoly any limit to which automatic apparatus may not be applied, and that at least in theory, most or all of the operations of a large establishment could be done by machine, even those which are supposed to need the intervention of a considarable intellectual caparity.

The claim was that the technology behind the chess machine could be extrapolated to any and all human interactions.

Let’s revisit the Dreyfus First Step Fallacy:

First step thinking has the idea of a successful last step built in. Limited early success, however, is not a valid basis for predicting the ultimate success of one's project. Climbing a hill should not give one any assurance that if he keeps going he will reach the sky. Perhaps one may have overlooked some serious problem lying ahead. There is, in fact, no reason to think that we are making progress towards AI or, indeed, that AI is even possible, in which case claiming incremental progress towards it would make no sense.

R.U.R.

DOMIN: Ah now, young Rossum; that was the start of a new age. After the age of research came the age of production. He took a good look at the human body and he saw straight away that it was much too complicated, any good engineer would design it much more simply. So he began to re-design the whole anatomy, seeing what he could leave out or simplify. In short, Miss Glory … I’m not boring you, am I?

HELENA: No, quite the opposite, this is fascinating.

DOMIN So young Rossum said to himself: Man is a being that does things such as feeling happiness, plays the violin, likes to go for a walk, and all sorts of other things which are simply not needed.

HELENA: Oh, I see!

DOMIN: No, wait. Which are simply not needed for activities such as weaving or calculating. A petrol engine doesn’t have any ornaments or tassels on it, and making an artificial worker is just like making a petrol engine. The simpler you make production the better you make the product. What sort of worker do you think is the best?

HELENA: The best sort of worker? I suppose one who is honest and dedicated.

DOMIN: No. The best sort of worker is the cheapest worker. The one that has the least needs. What young Rossum invented was a worker with the least needs possible. He had to make him simpler. He threw out everything that wasn’t of direct use in his work, that’s to say, he threw out the man and put in the robot. Miss Glory, robots are not people. They are mechanically much better than we are, they have an amazing ability to understand things, but they don’t have a soul. Young Rossum created something much more sophisticated than Nature ever did - technically at least!

R.U.R. by Karel Čapek - 1920.

The play, published in 1920, introduced the term Robot into the English language. A movie is in the works.

The author, though able to glance far into the future, was sadly not so fortunate in his own time:

Čapek was cursed to live in fascinating times. He spent his entire life in the same geographical area, but during those years it went from being a region of the Austro-Hungarian Empire, to the independent country called Czechoslovakia, to a captive vassal of greater Nazi Germany. A patriot, a democrat and a personal friend of Czechoslovakian president T. G. Masaryk, Čapek had a doctorate in Philosophy from Charles University. He made his career as a journalist, playwright and novelist. Čapek used his pen to advocate emancipation of his country after the First World War, and later to defend it from threatened German invasion. Always in ill health, Čapek died from tuberculosis shortly after the 1938 invasion; but it seems that he had by then lost the will to live as he watched Czechoslovakia betrayed by the Allied Powers. He would surely have been imprisoned by the Nazis, like his brother Josef, who died in 1943 in the Belsen Concentration Camp after years of abuse.

– Tim Madigan - Philosophy Today

Adam Link

Some 20 years later, the idea re-emerged in the pages of Amazing Stories:

Internet Archives: Amazing Stories - 1939, Issue 13

The story heavily paralleled Mary Shelley’s Frankenstein, but told from the perspective of the manufactured device:

Two months after my awakening to life, Dr. Link one day spoke to me in a fashion other than as teacher to pupil; spoke to me as man to-man.

“You are the result of twenty years of effort,” he said, “and my success amazes even me. You are little short of being a human in mind. You are a monster, a creation, but you are basically human. You have no heredity. Your environment is molding you. You are the proof that mind is an electrical phenomenon, molded by environment. In human beings, their bodies–called heredity–are environment. But out of you I will make a mental wonder!”

…

“It was not long ago that i completed your ‘brain’-an intricate complex of iridium-sponge cells. Before I brought it to life, I bad your body built by skilled artisans. I wanted you to begin life equipped to live and move in it as nearly in the human way as possible. How eagerly I awaited your debut into the world!” His eyes shone.

“You surpassed my expectations. You are not merely a thinking robot. A metal man. You are--life! A new kind of life. You can be trained to think, to reason, to perform. In the future, your kind can be of inestimable aid to man and his civilization. You are the first of your kind.”

Positronic Brains

Isaac Asimov, inspired by the Binder brothers, began the series of Robot stories that would lead to the famed Three Laws of Robotics:

“Powell’s radio voice was tense in Donovan’s ear: “Now, look, let’s start with the three fundamental Rules of Robotics – the three rules that are built most deeply into a robot’s positronic brain.” In the darkness, his gloved fingers ticked off each point.

“We have: One, a robot may not injure a human being, or, through inaction, allow a human being to come to harm.”

“Right!”

“Two,” continued Powell, “a robot must obey the orders given it by human beings except where such orders would conflict with the First Law.”

“Right”

“And three, a robot must protect its own existence as long as such protection does not conflict with the First or Second Laws.”

In Runabout - From The Complete Robot by Isaac Asimov

When it came to describing the technology behind the positronic brain, Asimov would apply a healthy dose of the rare mineral, handwavium.

Robots Today

Let’s skip past a continuous stream of advancements in pre-programmed and remotely controlled devices in reality, commerce, and fiction, and move to the present.

ℹ️ # Side Note

Drop me a note if you want me to do a separate, exhaustive post on those. I’m totally game, but that level of geekery is not for everyone.

What distinguishes a robot from a hardware Assistant is the Robot’s ability to move, either through controlled programming or autonomously.

Let’s define what we consider a robot, and what we don’t:

It can perform a pattern of physical actions with or without human intervention.
It can augment human ability.
If autonomous, it uses sensors and feedback loops to determine the best path to achieve a goal.

There are a number of devices today that fit this broad description:

Aethon Hospital Supply Robot

ℹ️ # Side Note

Personally, I would have also added:

Ability to learn and advance from its environment and its mistakes.

Can handle multi-modal input and outputs.

Can interact with other robots and form social meshes.

But that would automatically zero out all the ones listed.

Robots are the next logical step for AI, where Intents and Agentic Operations map onto concepts like Tasks, Goals, and Actions.

Shakey

Shakey the Robot, built by SRI between 1966 and 1972, was the first mobile robot with the ability to perceive objects and reason about their location.

Despite it looking anything but androgynous, Life Magazine labeled it The First Electronic Person (‘The fascinating and fearsome reality of a machine with a mind of its own’ and ‘Computers will be playing office politics’ – and you thought AI hype was a new thing!)

Minsky was not a big fan:

If the group at SRI hadn’t built Shakey, the first autonomous robot, we would have had more progress. Shakey should never have been built. There was a failure to recognize the deep problems in Al; for instance, those captured in Blocks World. Lhe people huilding physical robots learned nothing. - Hal’s Legacy

Humanoid Robots

I was going to skip ahead, but I really can’t resist pointing out two attempts at creating humanoid behavior. I’m pretty sure the developers are aware of the uncanny valley, but I personally don’t see the need to mimic human features and movement. Frankly, it’s creepy, and it distracts from the extraordinary technological accomplishments.

Sophia

Sophia has been making the rounds since 2016, as a humanoid robot with the ability to recognize and react to human facial expressions.

It has been welcomed at the United Nations and granted citizenship by Saudi Arabia.

David Hanson, the CEO of Hanson Robotics, states:

“On the tree of robotic life, human-like robots play a particularly valuable role. It makes sense. Humans are brilliant, beautiful, compassionate, loveable, and capable of love, so why shouldn’t we aspire to make robots human-like in these ways? Don’t we want robots to have such marvelous capabilities as love, compassion, and genius?”

However, it is also worth mentioning that Facebook’s head of AI, Yann LeCun, is not a fan.

Boston Dynamics Atlas

Boston Dynamics has spent years posting videos of its wireless robots building, leaping, and dancing.

They’ve even entered the popular mainstream:

The robots fluctuate between eliciting sympathy for the number of times they’ve been abused and alarm at their clones being used as military weapons.

Basic Concepts

Closed Loops

A robot takes physical inputs, figures out what is happening, and then performs an action. It does this continuously, in realtime, and across a range of concurrent events. At the highest level, how this information is taken and what happens to it can be described as Closed or Open loops.

In a closed-loop system, the system senses some value and then makes adjustments until a desired state is reached.

For example, if you set a thermostat to maintain a certain level, if the ambient temperature falls below a certain threshold, the heat is turned on until the temperature falls back into range. The system monitors temperature (converted to a digital number), passes it to a controller that has a pre-set threshold value. If the number is not in range, the controller sends a signal to turn on the heating element. It continues looping and reading the ambient temperature until the value reaches the minimum threshold. Then it sends a signal to turn off the heater.

To avoid continuously turning the heater on and off, there is a certain amount of tuning or damping that needs to be done. There are mechanisms like Hysteresis or Kalman Filtering used to Smooth the values.

<a href="https://blog.febucci.com/2018/08/easing-functions/" target="_blank"Febucci</a>

Conversely, there are Easing functions to help smooth out the rate of change without too much lurching about (technical term).

The same principle applies for a vacuum cleaner robot designed to avoid obstacles, or move to the edge of a room, then stop, based on its camera or radar measurements.

In vehicles, a driver-assist system is told to maintain the centerline of a road and continuously monitors input sensor data. A decision-making system evaluates the data in realtime, and sends signals to move the steering wheel left or right and adjust the speed until the car is safely centered.

This isn’t limited to just hardware devices. In Kubernetes, a cloud infrastructure orchestration system, there is the concept of Reconciliation Loops, where the desired state of a system is declared. At runtime, the performance is continuously monitored and the resources are automatically adjusted to maintain that preferred state.

Let’s compare this to how an AI Assistant is constantly monitoring user voice. Once it hears the wake-word, it translates raw audio data into user intent, processes it, and takes action. Replace voice with sensor data, Intent Recognition with the control loop, and Intent Execution with performing adjustments.

The main difference is that in a robotics system, the process is continuous, whereas an AI Assistant waits for user input to create a response.

As these two worlds converge, the same Intent Execution block could be replaced with a Closed Loop system.

Open Loops

In an open-loop system, there is no continuous monitoring or feedback. The system will run until some pre-set value is reached. For example, having a fan run for 10 minutes every hour. These types of systems often require human intervention, whereas closed-loop systems can be autonomous. The main difference is how the decision is made to adjust or change.

In the physical world, there is a range of inputs that can be used to determine if any change needs to be made. Here’s a sampling:

Reinforcement Learning (RL)

[ NEED MORE HERE ]

https://en.wikipedia.org/wiki/Reinforcement_learning

The direct feedback model works when you have a set of rules to react to various input signals. The problem is the world is full of messy, random signals.

[ TALK ABOUT Reinforcement learning AS WELL AS ADVERSARIAL DEEP RL AND RLHF ]

Peek Under the Hood

The reason we care about Robots is that they occupy a unique place in the AI Assistant universe. They occupy physical space, whereas software and device Assistants are mere, fixed-place observers. From a functional point of view, the two worlds are converging. What happens in the robotics world will have a direct relevance to the AI Assistant universe.

Let’s dive in.

In the robotics world, there are several middleware systems designed to handle multiple overlapping tasks and realtime requirements. The most popular are:

Many of these systems have specific differences. For example, support for particular hardware, or hard realtime support. But for the purpose of illustration, we’ll focus on the most popular open-source system: ROS (technically ROS2)

ROS Architecture

Let’s drill down a little into ROS (Robotics Operating System). What makes it different than your standard LLM/MCP system is that it leaves the door open for realtime messaging, event-based applications, and potentially real-time reactions. IMO, starting with ROS concepts offers a lot more runway when the time comes to build highly responsive and adaptive systems.

Here are some basic concepts.

Code Packaging

In the ROS universe, code and resources are bundled as Packages. These have a manifest that describes metadata and dependencies.

Multiple, related packages are bundled in Stacks. These also have their own manifest, describing group metadata.

If a system uses multiple packages, multiple stacks can be grouped together into a Distribution. This way, someone else can just download and install everything in one go.

Finally, packages, stacks, and distributions can be deployed to their own Repository. This way, organizations can have their own private or shared distribution mechanism. Think of each one of these as an App Store.

So right off the bat, we’ve got ways to share and deploy hunks of functionality.

Nodes

If you need something processed, you can run it as a Node. A typical system contains multiple nodes communicating with each other and performing actions. Nodes can act as conduits to physical objects, like motors, sensors, buttons, and gauges.

Nodes have three modes of communicating with each other:

Service: This is your classic, standard Request/Response mechanism. Think of it as HTTP or REST. You make a single-shot request (passing parameters), and get a response (status and result) back.

Action: This is where a Node asks another Node to do something that might take a while. The responses are returned in a series of updates, followed by a final Response. An analog might be a WebSocket that takes a single request, performs it, then responds with a single action, and then closes down. Another would be MCP’s Streamable HTTP mechanism.

Messages: The classic pub/sub or event-based model. This is where a node Broadcasts or Publishes data to a defined Topic and passes a Message along. Those interested in receiving the data can Subscribe to the Topic and receive the Message when available. Unlike one-to-one messaging like Service or Action, this allows multiple recipients to receive the same message.

Many device manufacturers ship drivers for ROS Controllers to allow plug-n-play support for devices.

ROS2 also has mechanisms for discovery, authentication, authorization, and attestation. Also, a graph architecture so you can detach events from interfaces. It helps with simulation, with testing, and with command-line support. All of it. You can also create a mesh, so more than one unit can share responsibility.

But the most important part is that core of ROS has nothing to do with robots!

It’s a general-purpose system for implementing asynchronously coordinating services. In fact, ROS2 is built on top of standard OMG Data Distribution Service middleware. In fact, the actual implementation of DDS can be swapped in and out to fit the environment, for example, for DDS For Extremely Resource Constrained Environments.

Other Cool Stuff

ROS has a few other pretty neat features, like:

Bags: Built-in way to record and play back raw events. Great for simulation and testing!
Gazebo Simulator: build a complete system before you even get to hardware.
ROS MCP Server: giving access for LLMs to control and get status using plain text.
SLAM (simultaneous localization and mapping): allows you to build a 3D model of the environment and position the robot inside it.

How does this relate?

ROS (and DDS) allow for the creation of an event-based system capable of handling multiple tasks and coordinating communication between them. This is a much more functional architecture than the Request/Response model in MCP and LLMs.

ROS, however, has some technical debt (for example, reliance on verbose XML for specific configurations) and is missing certain features that you would need in a future with the sort of dynamism and seamless user experience we’ve discussed before.

For example, ROS would need support for routing invocation based on functionality instead of by name. What I mean is, if there are multiple ROS Nodes capable of handling a function, there needs to be a way at runtime to dynamically choose between them without having to stop and ask the user.

Another example would be to allow for communication beyond a single device, dynamically discovering across domains, and loading and invoking nodes outside a single distribution. Also, more robust authentication and attestation, as well as a way to handle payment. Also, more robust support for multi-threading inside Nodes.

OTOH, ROS is ideal for embedding an LLM on-device and having it process requests. And instead of it invoking tools via MCPs, it could just make calls to other Nodes via Messages, Services, and Actions.

These can all be layered on top of ROS, but at some point, someone may want to fork the project and bring it up-to-date with LLM and modern service architectures.

ℹ️ # Side Note

Imagine two robot arms on a table. The first is complete, the other one in random block parts scattered about tha table. The first one picks up each piece, inspects it, then snaps it into a base. It picks up another, rotates it, inspects it through its camera, then snaps it into the first piece.

This goes on until all the parts have been clicked together in the shape of a robot arm. The second arm comes to life and goes through motions to calibrate. The second arm then proceeds to take apart the first arm, piece by piece, snapping it into small blocks and randomly placing it around the table.

Once done, it pauses, shakes as if waking from a dream. It then goes about assembling the other robot…¹

Inspired by M.C. Escher’s Drawing Hands.

I’ve been wanting to build this for ages. If any artist program wants to sponsor to build such an installation, feel free to get in touch. ↩︎

Title Photo by Andy Kelly on Unsplash