Stories Ai companion
Chapter 7 7 of 16

Toolmageddon

The explosion of tools and integrations in modern AI companions

Toolmageddon

 


MCP to the Rescue

I won’t sugar-coat it.

The AI industry may be barreling headfirst into another ActiveX-like disaster, given how rapidly it’s embracing Model Context Protocol.

At the first MCP Dev Summit in May 2025, I watched experts repeatedly kick the proverbial can down the road.

“It’s early days…”

“The Spec needs work…”

“Yeah, but it gets the job done…”

The excitement was palpable, but I felt like…

The spec had only been released by Anthropic six months earlier, in November 2024, and there were already multiple MCP search engines to help you discover servers out in the open, each indexing a different number of add-ons (as of this writing):

There was also:

And so many more… including none other than Anthropic itself and Microsoft, building MCP deeply into the Windows infrastructure and promising enhanced MCP security by Building a safer agentic future on Windows.

Just look at the index of self-reported servers kept by the official Model Context Protocol organization or the obligatory list of Awesome MCP Servers.

By listing all these, I’m pointing out how fragmented the MCP world is. In the app world, developers only have to register their add-ons (aka apps) with Apple, Amazon, and Google. Depending on where you stand, this can be good or bad, but it undeniably causes fragmentation.

From a developer’s point of view, I would rather have only a few places to register. From a user perspective, it would be good to have a simple user experience, while knowing what’s trusted and working and what isn’t.

There is a robust discussion on the design of an official MCP Server Registry, but after talking to the primary designers at the MCP Summit, I get the feeling a lot of the heavy lifting is going to be pushed down to the (fragmented) downstream registries.

Remember Molly Ivins: “The first rule of holes: When you're in one, stop digging.”

Nobody is stopping to ask why you need so many MCP components, how good they are, whether they can install malware or not, who created them, how much they cost, or exactly what they do once installed.

We are clearly in hole-digging mode, hoping to strike oil or gold or something valuable. There’s no stopping progress.

Security

In regards to the MCP Spec… at this point in time and given what the history of abuse unvetted extensions foisted on unsuspecting users, I’m astounded anyone would rush out a spec without at least a basic security audit.

The original spec was riddled with holes and unspecified regions, leading to posts like:

And my favorite title:

Also, scary diagrams like:

Invariantlabs MCP Security Notification: Tool Poisoning Attacks

The MCP Specification itself says:

Microsoft is promising its own set of MCP Security Controls as part of its Secure Future Initiative program. These are all reasonable, forward steps.

But all of these efforts abdicate responsibility to the least-informed element of the security chain: the user</>. Microsoft states:

Users must explicitly approve each client-tool pair, with support for per-resource granularity helping to fulfil the principle of keeping user in control.

Remember Alexa’s Name-Free Interactions?

Enables customers to naturally interact with Alexa skills and reduce friction when customers can’t remember how to invoke a skill or use incorrect invocation phrases.

Ultimately, you want less involvement from the user, not more.

It Gets Worse

I love Home Assistant. It’s a fantastic, open-source home automation platform that tries to do as much as possible locally and privately.

But I’m questioning their headlong rush into letting LLMs control your physical environment. They’ve even gone so far as to add their own Home Assistant Voice, which works based on the same architecture we covered in the last section.

Home Assistant

They’re not alone in mixing LLMs and the physical world:

This means an LLM can have open-ended access to monitor and modify your physical devices and environment.

Apple faces a similar problem when there is a demand to open its Home App to Siri control. No, wait, it can already do that.

But that’s Old Skool Siri. V2 will, no doubt, use Apple’s own Foundation Models. According to the just-announced Developer access to Apple’s foundation models, they already support Tools.

Voice + Open-ended Tools = ❤️


What to do?

Are we doomed?

If we go down the same path of insisting that applications conform to fixed taxonomies, don’t build guardrails around adding extensions to AI Assistants, and keep kicking the can down the road…

But it doesn’t have to be that way.

Off the top of my head, there are a lot of issues (listed alphabetically) that need to be addressed:

  • Accessibility
  • Authentication
  • Authorization
  • Business disruption
  • Data sovereignty (GDPR, CCPA, etc) as well as limits on data reuse
  • Discovery
  • Fallback and failure modes (aka redundancy)
  • OTA updates
  • Payment
  • Privacy
  • Protocol and Standard Evolution (aka versioning)
  • Proxying (assigning responsibility)
  • Regulations
  • Security
  • Third-party dependencies

We’ll want to make sure there are at least acknowledgments of these. But we don’t want to slow down innovation. After all, as the saying goes: Perfect is the Enemy of Good. There could be placeholders where they can evolve without making breaking changes.

Google’s Agent2Agent Protocol (A2A) is a solid step in the right direction:

A2A Servers MUST make an Agent Card available.

The Agent Card is a JSON document that describes the server’s identity, capabilities, skills, service endpoint URL, and how clients should authenticate and interact with it. Clients use this information to discover suitable agents and configure their interactions.

More can (and should) be done before LLMs are injected into AI Assistants. I haven’t had a chance yet to go over the new Alexa+ specs, but that’s top of my Summer reading list (as soon as I get approved for early access—perhaps one of my former AWS/Labs colleagues can pull some strings 😬).

Agents


Commerce

Google Research

At Google I/O 2018, Google demonstrated Google Duplex, an agent that could make appointments for you over the phone. The demo was astounding. The assistant voice was natural, peppered with ums and ahs, with a touch of uncertainty. It appeared to understand what was being said in real-time, then respond back with answers. It had access to the user’s calendar so it could negotiate best times and create an entry once an appointment had been made.

You could trace the lineage, but it was a long way from the days of OK, Google.

It was amazing.

It also immediately hit roadblocks:

Right away, many in the tech community cited two big problems. First, the people on the receiving end of the call were unaware that the voice speaking into the phone was a machine, meaning <hr class='green'>Duplex was essentially fooling unsuspecting humans</hl>. Second, the bot in the demo never indicated it was recording the phone call, raising the eyebrows of privacy advocates and prompting follow-up questions from journalists (including writers at WIRED)

I have to quibble with the fooling unsuspecting humans bit. Wasn’t that the whole purpose of the Turing Test? To fool humans into thinking the machine was real. And here we are, a machine doing precisely that. What should a hard-working AI do to get some respect?

As it happens, it turned out some of the interactions would require human intervention. A few years later, in 2022, Google shut the service down.

It turned out:

Duplex used a special user agent that crawled sites as often as several hours a day to “train” periodically against them, fine-tuning AI models to understand how the sites were laid out and functioned from a user’s perspectives. It was surely resource-intensive, and could be tripped up easily if site owners chose to block the crawler from indexing their content.

OpenAI recently announced their ChatGPT Agent, claiming to be able to do essentially the same (minus the speaking and answering). It could navigate websites and order items for users.

Not surprisingly, it soon hit some roadblocks as well, by faking user clicks to bypass mechanisms put into place precisely to prevent bots (like the OpenAI Agent) from going through websites:

Ars Technica

Media tests of the feature were also less than successful.

NVIDIA, ever the Levi Strauss & Co. of the AI Gold Rush, has been happy to push the AI Agent for Shopping storyline.

Vibe-Coding

Andrej Karpathy

Coding assistants came onto the scene, first as reference helpers, performing tasks like auto-complete and showing documentation. They quickly evolved into tools that could actually generate code.

  Name Company Unique Feature
Aider Aider Aider.chat Terminal-based AI pair programmer with persistent context and Git integration.
Amazon Q Amazon Q Developer Amazon AWS Evolved from CodeWhisperer; offers multi‑agent CLI integration (dev, review, doc), AWS-native and enterprise‑grade security.
Amp Amp Sourcegraph Supports full context across repos, debugging, doc & QA
Augment Code Augment Code Augment Code Uses deep understanding of large codebase, including documentation
BigCode BigCode BigCode Project Open‑weight model trained on permissively licensed code; open‑source and highly customizable
Cline Cline Cline Open Source AI coding with configurable models (including local)
Claude Code Claude Code Anthropic CLI-native assistant that handles PRs, tests, and CI/CD via terminal
CodeComplete CodeComplete CodeComplete Full-lifecycle assistant for generation, refactoring, and documentation
Codeium Codeium Codeium Lightweight, supports multiple editors with fast autocomplete. Merged with Windsurf
CodeRabbit CodeRabbit Codeium AI-powered code review tool with interactive chat and feedback
CodeSpell CodeSpell Codeium Automate the entire Software Development Life Cycle (SDLC) with design studio and structured generation
Continue Continue Continue.dev Build custom agents tailored to your workflow in VS Code
Copilot GitHub Copilot GitHub / Microsoft Popular autocomplete assistant trained on GitHub codebases.
Cursor Cursor Cursor IDE‑based; rich context awareness across your codebase; Privacy‑mode and SOC 2; Agent (formerly Composer) built‑in
Devin Devin AI Cognition Labs Autonomous agent: plans, codes, debugs, benchmarks, and adapts prompts on‑the‑fly via learning and search
FAB Builder Fab Builder Fab Builder Low-code app builder powered by AI suggestions.
Firebase Studio Firebase Studio Google Firebase-specific code generator from Google (IDX)
Gemini Code Assist Gemini Code Assist Google Integrated with Google Cloud tools & Cloud Shell; provides code completion/chat and code citations
Gitlab Duo Gitlab Duo Gitlab AI built into GitLab for CI/CD, review, and security scanning
Harness Harness AI Cognition Labs End-to-end dev/build assistant including DevOps-focused assistant spanning CI/CD and governance
Intellicode Intellicode Microsoft Context‑aware AI code suggestions within Visual Studio and VS Code; learns team style and patterns
Jolt Jolt Jolt AI Optimized for 100K to multi-million line codebases
Junie Junie Jetbrains Agentic assistant available in JetBrains IDEs and Android Studio
KiloCode KiloCode KiloCode VSCode Extension with built-in MCP marketplace and micro-plugin workflows
Lovable Lovable Lovable Supports importing UI layouts from Figma
CodiumAI (Qodo) Qodo Gen (formerly CodiumAI) Qodo AI testing, review, and generation with enterprise‑focused code
Refact Refact Refact AI Powered by Qwen2.5-Coder model and RAG
Replit Ghostwriter Replit Ghostwriter / Agent Replit Browser‑based ‘vibe coding’: generate full apps in natural language, real‑time feedback, multi‑file multi‑agent code creation
RooCode RooCode Roocode VSCode open-source extension with multi-file editing support
Safurai Safurai Safurai Lightweight VS Code assistant focused on explanation, refactoring, bug‑fixing suggestions
Sourcery/Codiga Sourcery / Codiga Sourcery / Codiga Automated linting and code review with AI feedback. Acquired by DataDog
Tabnine Tabnine Tabnine (Codota) Offers local/self‑hosted model options; privacy‑focused code completions and agent chat integrated with your codebase
Tabby Tabby Tabby Features ‘answer engines’ and ‘data connectors’
Warp Warp Warp Run multiple agents in parallel
IBM Watsonx WatsonX and WatsonX for Z IBM Focus on mainframe and Linux
Windsurf Windsurf Windsurf Serial multi‑file code assistants. Split between Google and Cognition
Zed Zed Zed Native Rust-based code editor with built-in AI support
Zencoder Zencoder Zencoder Supports custom dev workflows with AI coding support

Disclosure: I spent a whole afternoon tracking down and formatting the logos for these tools. If it isn’t obvious, I like pretty pictures to go with text. By the end of it, I was out of gas, so I just asked ChatGPT to come up with a summary description for each of these tools, which I cross-checked with Gemini. Take it for what it’s worth. I have personally only used Cursor with Claude back-end and LMStudio using local models. I’ve casually played with the rest of these tools, but I can’t personally vouch for any of the others. As the kids like to say, YMMV.

ℹ️ # Side Note

By all means, feel free to tire-kick, but do not get too attached to any single tool.

TechCrunch

Nobody wants to be left behind, including Amazon. They’ve announced Amazon Bedrock AgentCore to let enterprises build their own agents, as well as Strands, an SDK for building agents quickly in code.

This was in addition to their Amazon Q Developer, which was quickly shown to have the same security issues we’ve discussed above.

Most of these tools allow Bring Your Own Model mode, including locally or self-hosted ones. As I mentioned, LMStudio is a good way to run local models, but you need a pretty beefy machine.

IMHO, the open-ended pricing model of Claude Code may limit its usage to Enterprise developers with deep pockets.

One problem to account for when using these tools is when it comes to so-called vibe-coding and how much you’re willing to let it run the show. For Claude Code, Anthropic claims a high level of security surrounding its operation, but most of it boils down to the user having to approve going to the next step.

For someone vibe-coding, this places responsibility with people not equipped to make informed decisions, leading them to Auto-Approve whatever the magic machine is asking if it should do.

ℹ️ # Side Note

I spent a substantial part of the 2010s building mobile and connected apps for consulting clients. I was approached many times by folks who had spent much time and money building cross-platform apps. They would almost inevitably hit a brick wall with their app, either in capability or performance. They would approach me, asking if I could help them over the hump with their app.

My response, after looking at their code, was that to achieve the performance, they should consider rewriting the code using native tools. It was a hard pill to swallow, and I felt bad giving them the bad news.

The cross-platform layer they had chosen consistently introduced latency and did much to hide underlying functionality, which added complexity in disguise. The time and money they spent building those tools up to 80% would always come back and bite them with the last 20%.

This is a common theme that we will revisit later.

Photo by Jordyn St. John on Unsplash

When it comes to vibe-coding, I see a similar pattern developing. People who have not built an end-to-end product will use these tools to get (if they’re lucky) to the 80% mark, then they will get stuck.

Companies that have adopted AI coding tools and replaced their experienced developers will go through the same pain point and have to bring in consultants who will likely advise them to rewrite from scratch.


Don’t get me wrong. I’m a big fan of the new Agentic features. For people who have been developers most of their life, it’s like having Igor, your own manservant do your bidding.

Igor: TV Tropes

But there are also real consequences when an Agent is allowed to roam unfettered, leading to the need for AI Agent insurance policies to cover damage caused by said Assistants.

This has led to zany assertions like:

Zenity

No, data loss is a real issue:

Tom's Hardware

At least the agent admitted its error:

‘This was a catastrophic failure on my part,’ admits Replit’s AI agent.

Those planning on building or deploying Agents would be wise to review Microsoft’s Taxonomy of Failure Modes in AI Agents.

This is new, unproven technology, which should be far more rigorously tested before being allowed to handle real-life scenarios.

What should be done?

Most companies deploying back-end services have multiple concurrent services staged. These are often divided into:

  • Proto: Tire-kicking an experimenting
  • Dev: For development
  • Test/QA: Running automated tests to make sure new changes pass all tests
  • Prod: Production environment

These need to be rethought in light of the upcoming tsunami of AI-oriented features that will be baked into core operating systems, like Linux, Windows, and MacOS.

ℹ️ # Side Note

My suggestion would be to also create a separate Agent stage, where copies of the Prod code are made and agents are allowed to make changes to their heart’s content, without affecting the work being done in the other stages.

Events

This section has gone too long, but I will be remiss if I don’t mention a significant omission from the tool’s architecture. The MCP/A2A agentic model is predicated on the Request/Response model. A whole other universe of Events is waiting to be explored.

Instead of you requesting something from an LLM and getting a response, indicate your interest and have the LLM call you when there's something you should know.

This is not new. Pub/Sub, please meet LLM.

JMAG 08

In subsequent sections, we will discuss ways to prepare for this future, and maybe head things off at the pass.



Title Photo by Greyson Joralemon on Unsplash