The thinking I’ve done so far tells me that the earlier ELIZA system (while canned in some respects) is grossly inefficient in many others. Its total ignorance of grammar, for example, cannot be underestimated. Also, the structure of ELIZA scripts (e.g., absence of threads) argues from an entirely new perspective, i.e., from that of belief structures.
That did not stop the media from dramatizing the public reaction.
The shocks I experienced as DOCTOR became widely known and “played” were due principally to three distinct events.
A number of practicing psychiatrists seriously believed the DOCTOR computer program could grow into a nearly completely automatic form of psychotherapy.
[…]
I was startled to see how quickly and how very deeply people conversing with DOCTOR became emotionally involved with the computer and how unequivocally they anthropomorphized it.
[…]
Another widespread, and to me surprising, reaction to the ELIZA program was the spread of a belief that it demonstrated a general solution to the problem of computer understanding of natural language. In my paper, I had tried to say that no general solution to that problem was possible, i.e., that language is understood only in contextual frameworks, that even these can be shared by people to only a limited extent, and that consequently even people are not embodiments of any such general solution. But these conclusions were often ignored.
The reality was much more prosaic.
Eliza used a basic pattern-matching system to extract the user’s intent and repeat back a variation of the question they asked. Its talent was reframing what you posed as a question as if the answer was at the tip of its tongue – if you could just divulge a little more…
Despite not knowing grammar rules, it correctly adjusted the tense and subject of a response to give its discourse the aura of an educated conversationalist.
What made it different was that it could pick out core nuggets of what you had entered, remember them, and regurgitate them later in the conversation. What in today’s parlance may be called context and memory.
If you asked Eliza a complex question, tried to challenge it, or reached a conversational dead-end, it simply ignored what you were saying and flipped back to a previous point as if recalling a memory. It was a neat parlor trick as if it was paying attention and remembering what you had told it, like a Good Listener.
ELIZA was an attempt at codifying the famous Turing Test (aka Imitation Game) to see if a computer could fool a human. However, anyone spending more than a few minutes with ELIZA could see the repeating patterns and its failure to provide meaningful answers to questions.
Weizenbaum famously decried how this could be conflated with true intelligence:
[O]nce a particular program is unmasked, once its inner workings are explained in language sufficiently plain to induce understanding, its magic crumbles away; it stands revealed as a mere collection of procedures, each quite comprehensible.
His warnings were prescient:
“Don’t use computers to do what people ought not do.”
That caution applies today, just as much as it did back then.
My first encounter with ELIZA was in the early 80s, running on a Digital VAX 11/750 computer (by then, the size of a half-height refrigerator). It was easy to find the logical holes in the program and quickly get it into a loop (reminiscent of modern Coding Assistants and Dead Loops).
My Intro to Programming Language class was booooring. I pitched the professor that I would spend the rest of the semester on an Independent Study program developing a program that would capture everything in the course’s curriculum. He agreed, as long as I showed progress every two weeks.
ELIZA’s pattern-based scheme was my inspiration for developing a PoetryBot written in Assembly Language on an Apple IIe. It used Chomsky’s Grammar to generate (literally) reams and reams of poetry, spit out on Z-fold paper off a DECWriter teletype dot-matrix printer. A primitive grading system tried to assess whether the output was sound. It often failed.
Most of the output was grammatically correct gibberish, but every once in a while, there were genuine surprises. Like the dog that always fetches a sock or a pillow, then one day actually delivers the right slipper.
I wish I had kept some of the output, but Z-fold paper was expensive, and the flip side of a page could be used to print TPS Reports.
I did pass the course.
You’re Probably Wondering How We Ended up Here
I can write a dozen posts on the history of present-day AI, and lots of themhavebeenwritten.
Instead, for those who like original sources, here’s a reading list of seminal 20th-century Points of Inflection:
Attention is All You Need - use simultaneous attention mechanisms to process sequential data. Kicked off the current GPT craze.
But let’s not beat around the bush. All this academic talk is for…
The singular event most responsible for sparking the imagination of generations of researchers was a murderous computer in space.
I’m Sorry, Dave…
The AI Assistant from Hell arrived a mere two years after ELIZA, in 1968, in the form of HAL 9000. The evil sentient computer in Stanley Kubrick’s seminal 2001: A Space Odyssey (and the novel of the same name by Arthur C. Clarke) was exciting and unnerving. It clearly contradicted Isaac Asimov’s benevolent Three Laws of Robotics:
A robot may not injure a human being or, through inaction, allow a human being to come to harm.
A robot must obey the orders given it by human beings except where such orders would conflict with the First Law.
A robot must protect its own existence as long as such protection does not conflict with the First or Second Law.
It murdered. It disobeyed orders. And it tried to protect itself at all costs, while singing Daisy, Daisy.
Audrey used custom voice recognition hardware built by AT&T Bell Labs to recognize the spoken digits ‘0’ to ‘9’. It had to be trained on a specific person’s voice, but amazingly, it was 97% accurate. The circuitry was the size of a room hidden behind that wall.
Shown at the IBM Pavilion at the Seattle World Fair. The device could recognize 16 words, including numbers. Laugh all you want, but that was a remarkable feat, given the state of the technology.
It was based on the same core technology as ELIZA, but it headed into darker terrain, given that the name stood for Paranoia. It simulated a Paranoid Schizophrenic patient.
At one point, PARRY had a head-to-head conversation with DOCTOR that might seem familiar to today’s observers of online discussion groups:
[PARRY] I used to go to the track every week.
[ELIZA] What does that suggest to you?
[PARRY] I avoid racetracks now because the races are fixed.
“You think you’ve got problems? What are you supposed to do if you are a manically depressed robot? No, don’t try to answer that. I’m fifty thousand times more intelligent than you and even I don’t know the answer. It gives me a headache just trying to think down to your level.”
This was one of the first commercial continuous speech-to-text recognition systems, but it did need
speaker-specific training. It initially had no semantic knowledge of the content and operated based on matching speech patterns.
As it happened, James K. Baker, a rising force in the world of speech recognition technology, was finishing his PhD thesis at Carnegie Mellon during the DARPA-funded research boom. In his landmark 1975 dissertation, “Stochastic Modeling as a Means of Automatic Speech Recognition,” Baker explored the uses of Hidden Markov models to recognize words from unrecognized sounds. This foundational research led to the first commercially viable speech recognition software.
In 1982, Baker and his wife, Janet MacIver Baker, formed Dragon Systems Inc. In 1997, Dragon released the first consumer-grade commercial voice recognition product, Dragon NaturallySpeaking. This software’s selling point was that for the first time in decades of speech recognition research and development, the user did not need to speak haltingly with unnatural pauses for the benefit of the machine. Dragon’s software was the first to process continuous natural speech and remains in use today.
The company had to go through a series of acquisitions and mergers, starting with Lernout & Hauspie, then ScanSoft and Nuance, before finally landing at Microsoft.
Jabberwacky (a variation on the Lewis Carroll poem) began as a conversational program on a Sinclair ZX81, but it evolved over time to learn from human conversation.
Rollo Carpenter, creator of Jabberwocky, predicting a chattering bot:
“It will then start to have a home in physical objects, little robots that are a talking pet.”
…
If I have my way, people will be walking around, sitting, cooking and more with one on their shoulder, talking in their ear.”
Talking Moose was an early Mac companion that popped onto the screen, narrating what was happening on the system with humorous quips. It used MacinTalk text-to-speech technology and made a good novelty demo to show friends (and their 6-year-old kids who found it hilarious).
What made it especially unique was that it could access system menus and windows.
This is where an assistant encroaches into the enclosing operating system, a feature Apple later added to Siri and incorporated into iOS and MacOS.
Developed by Creative Labs (of Sound Blaster fame) to show off the capabilities of their PC Sound Cards, it was one of the first chatbots to marry ELIZA-style interactions with text-to-voice output.
ALICE (Artificial Linguistic Internet Computer Entity) aka Alicebot (1995)
ALICE was a rule-based chatbot, famously inspiring the 2013 Oscar-nominated Spike Jonze movie Her. The movie featured Scarlett Johansson as the AI Chatbot Samantha.
A decade later, emulating her voice would cause a legal dust-up between Johansson and OpenAI.
Microsoft was looking to simplify the PC’s user experience and make it more user-friendly. Bob was a re-imagining of the operating system interface. It featured several animated assistant characters, including a dog called Rover.
BOB was based on Microsoft Agent technology, which incorporated speech recognition, text-to-speech, and access to Office and Windows environments.
Experimenting with human-machine interfaces was big in the mid-90s. Trying to break out of the all-too-familiar office desktop metaphor or industrial dashboards chock full of sliders and gauges.
Back then, I was at NASA Ames Research Center in Mountain View, working on methods for searching and accessing large amounts of satellite imagery, daily images, and video data. A sort of Virtual Digital Library.
The Proof-of-Concept, running on a Silicon-Graphics Workstation, was a live, 3D-rendered library with wood-grain media racks, a librarian’s desk, printers and CD writers you could drag-drop files onto – all things later came to be known as Skeumorphism. You could walk through virtual stacks and look for, say, GEOS Satellite Images or historical lunar data, or CAD diagrams. These all seem pretty ordinary today, but this pre-dates the web. State-of-the-art was terminal-based Gopher.
With the 3D Virtual Library, If you felt like it, you could browse around, then turn 180 and pick something else off the digital shelf, just like a real library. All the content was live-rendered based on records in the databases and archives. The idea was to capture the sense of discovery that comes from walking through a library and finding something interesting.
Unfortunately, NASA HQ was going through one of those periodic cost-cutting spasms and the budget request for a full rollout was turned down.
Clippy was Microsoft’s attempt to integrate an embedded assistant that would help new users unfamiliar with Microsoft Office. Clippy used the same Microsoft Agent technology as Bob and, unfortunately, faced similar criticisms.
Clippy, however, was foisted onto millions of standard Windows computers running Office 97, much like U2’s Songs of Innocencewas crammed onto iTunes without users asking for it.
This was a Windows assistant that flew around the screen, squawking as it read aloud messages and offered to help with desktop tasks. Yes, it was as annoying as it sounds.
This was the first time an assistant was integrated with external services to receive fresh data. This functionality is one of the primary use-cases for MCP today.
AI Winter, Part Deux
It was around 2001 that AI Assistant technology took a nearly decade-long break. The Internet started taking off, and as we all know, the iPhone appeared.
I was at MacWorld Expo 2007 in San Francisco when the iPhone dropped. It set me off on a decade-long run of mobile apps and hardware adventures. Once the App Store opened to third-party apps, I got a solo booth at the MacWorld Expo and ended up with a Best of Show award. That led to an App Store front page that paid off all the development costs.
It was the first (and last) time my parents got to see what I actually did for a living.
It was also the first time I had a chance to talk face-to-face with end-users of something I had worked on. It was scary, exhausting, dispiriting, thrilling, and above all, highly educational. If you are a builder or creative, I can’t recommend it enough!
The App Store ended up as a means for adding approved binary extensions (apps) to a running system (iOS). Let’s keep that in mind.
Meanwhile…
Text-to-voice and voice-to-text technology in the 1980s needed to get much, much better.
And it did.
Enter DECTalk, a standalone hardware device that allows you to send a string of text (with embedded ‘escape’ characters) via serial port commands. This could change voices, intonation, timing, pauses, and other variables. The technology behind it was hardcore fascinating.
DECTalk was useful to the burgeoning Interactive Voice Response (IVR) market. Enterprises were looking to save call center costs and allow customers to use their touch-tone phones to navigate phone menus on their own. IVR applications would accept user input (in the form of digits 0-9, *, and #), look up information from a database, fill in a template, and have DECTalk speak it back in a human-sounding voice.
It was also used by the National Weather Service and, famously, Stephen Hawking.
Good text-to-speech was a necessary step on the path to having Assistants like Alexa or Siri respond in a natural voice.
My first real job out of college was near Page Mill Road and Foothill Expressway in Palo Alto, down the street from Xerox PARC. I would regularly bump into people from Hewlett-Packard, SRI, and Apple. This was an era of technical seminars at nearby Stanford, poetry readings and author talks at Printer’s Inc. bookstore, and later, drinks at Antonio’s Nut House, The Oasis, or Varsity Theater (sadly, all gone).
My day job was to work on new technologies that could be used in industrial Man-Machine Interfaces. Being a DEC shop, we got access to an early-release version of the latest version of DECTalk.
I was assigned the task of making it handle custom utterances, since many of the technical terms that had to be returned would come out garbled unless spelled phonetically. Instead of creating a hand-crafted list of pronunciations, I thought I’d build a Graphene to Phoneme learning model. Thing was, the DECTalk documentation wasn’t that great (more like, practically non-existent) and it was easy to get it to lock up so it would have to be power-cycled.
What was supposed to take only a few weeks was stretching into months. Being new and green, I started fretting about keeping the job. Adding to my anxiety was that I had just moved to the SF Bay Area, didn’t know anyone, and was one of the few single people in the company.
To find more of a social life, I signed up to take a bartending class at a local vocational school. It wasn’t that hard. You mainly had to memorize how to mix a small set of of popular drinks (none of the Celery Root Infusion or Hibiscus Liqueurs you see today). With certificate in hand, I landed a side-job moonlighting at the Varsity Theater on University Avenue in downtown Palo Alto.
The location was gorgeous, in the 1920s Mission Revival and Spanish Renaissance style, designed by the architects of the original San Francisco Cliff House and Hotel del Coronado in San Diego. There wasn’t much else happening downtown at night and it was crawling with people my age and older. There was a movie theater, live music concerts, and a very popular open-air bar where you could have conversations about anything from the music of Michael Hedges (who played there live) to the nature of the cosmos. The best part for me was that my little studio apartment was just three blocks away.
One evening while working at the Varsity (after another day of battling NLP and DECTalk) I overheared some patrons at the bar mention their work at DEC-WRL (DEC Western Research Lab). Turns out the research office was just down the street on Alma.
My ears perked up! I made sure I served them extra-strong Irish Coffees and Long Island Iced Teas. After the second or third round, I took a chance. I remember stunned silence, then gales of laughter as their bartender asked them about the inner workings of the DECTalk Vocal Tract Model.
The next morning, I practically raced to work, eager to try out their tips. Hooray! They all worked.
Ever since then, I’ve been a huge believer in Serendipity.
In this section, we covered the fundamental technological that were needed to be able to start thinking about AI Companions.
Next, we’ll stay in historical mode, by looking into how these were put together in software and hardware products. There were a lot of misses, but also a few familiar hits.