VIDEO: Web MCP: Google + Microsoft Protocol Making Websites Native to AI Agents (Chrome Canary, Gemini, Claude Code)
URL: https://www.youtube.com/watch?v=uprZUcv0FSc
DATE_INGESTED: 2026-02-23

AI agents have started to integrate with every part of our lives now. And one of the biggest areas that's happened in is the browser. Every major AI company has realize that the browser is the one tool everyone uses every single day. So why not put AI into that? But the truth is they all suck and it's not a matter of optimization. There's a fundamental problem that no amount of it is going to fix. But Google in collaboration with Microsoft just released something called Web MCP. And instead of trying to make agents better at using websites, it makes websites better at talking to agents. That's a completely different approach. And what it enables is something we haven't seen before. So this is [snorts] a simple HTML page running on a local server. Opening the extensions tab, we have the web MCP extension. Opening it below the name of this site, we have one tool book table. We connected this web MCP bridge to claude code and told it that we had a restaurant booking form open with web MCP tools available. We gave it the task of booking a table for two with a date, a name, and a special request. All of those fields are there in the form. It confirmed the date, used the web MCP tool that the site provided, filled out the fields, and successfully made the reservation. Right now, an agent has two ways to figure out what's on screen. The first way is vision-based. The agent takes a screenshot of the entire page, annotates every element it can see, and feeds that image to a model that tries to figure out what to click. The second way is DOM parsing. The agent pulls the raw HTML of the page. And if you've ever opened inspect element on any website, you know what that looks like. Thousands of lines of code. The agent reads through all of that and tries to identify the right button. Both of these approaches have the same fundamental problem. They're nondeterministic. The agent is making its best guess every single time. The reason none of this works consistently is because the entire internet was built for human eyes. Every website assumes a person is looking at it. There's no structure for machines. So every agent, no matter how good the model is, is stuck trying to interpret something that was never designed to be interpreted by a machine. With WebMCP, instead of the agent trying to figure out your website, your website registers its available actions as tools. When an agent lands on a page, it doesn't guess. It just reads the available tools and calls them directly. [snorts] Right now, WebMP is available for early preview only. As the agentic web evolves, websites also need to evolve with it. And as you already saw, by defining those tools, we give these agents better access to interact with our sites. The demo worked because it was a simple HTML form. But most real websites aren't that simple. So, WebMCP actually has two different approaches depending on what you're working with. There are two ways that allow agents to take control of the browser. The declarative API is for simple workflows like the HTML forms you just saw. The imperative API is for full-scale web apps with multiple pages, and those require some extra implementation that we'll get into further on. As of right now, there's no official documentation, but they have a repository of web MCP tools in Google Chrome Labs with two demos, and only one of them is actually hosted. There's a simple flight search demo and an official Marvel context tool inspector extension. After you install that, whatever websites have WebMCP implemented, you'll be able to detect those tools via the extension and you'll be able to do some other cool stuff as well. The input schema for the tool shows up right there. Right now, there's only one tool on this page, the search flights tool. They've given two options to use this. You can either give custom input arguments that the AI model has to fill out or you can set your Gemini API key, give a user prompt in simple English, and the page will be controlled according to that. So, right now it has these default inputs. We swapped them out and it actually searched for flights and got a bunch of results. I went back and this time the web MCP travel site had four tools available where three of them are now filters that can be applied. The input arguments for the page had also changed. I added another argument and it gave us a notification that the filter settings were updated. No flights matched those filter settings, but all of them were applied. We switched between Zen browser and Chrome throughout this. And that's because while they've released WebMCP as an open protocol that any browser could use, right now it only works on Chrome's Canary version. That's until they release the standard so that everyone can use it. So that's as far as the official tooling goes right now. No documentation, only two demos and it only works on Chrome Canary and you can't use it with Clawude Code because it's actually intended to be used by browser agents. So we found this custom web MCP bridge that you can install on your system and it gives you an MCP and an extension as well. This is what allows claude code to use web MCP and navigate and use the tools that any website offers to show how sites actually implement this. We'll start with the simpler approach in [snorts] the declarative API which you saw with the HTML form. All you really have to do is declare three things inside the HTML form. The tool name, tool description, and tool parades description. You don't need to dive deep into them. You just need to make sure your agent adds them in. We had two guides made reverse engineered from the demos in the web MCP repo and we gave Claude code access to those. Now during that process we actually ran into some common problems and had to fix these guides along the way. Both of them are available in AIABS Pro which is our community where you get ready to use templates. You can plug directly into your projects for this video and all previous ones. The main teaching is all here in the video. But if you want the actual files, the links in the description. If your agent adds in these declarations, the rest is up to the browser which reads them from the HTML. The second way was the imperative API for cases where you need more complex interactions and JavaScript execution. We had a Nex.js app initialized gave claude code the next.js guide and that was all it needed to implement it. In React apps, it creates a new file in the library folder where it declares all the tools the site needs. These are all the functions and these are their definitions. But since these web apps can become so big and even have potentially more than 100 tools, we get the same problem we get in clawed code where the context just overloads everything and breaks the whole thing. So instead of loading all the tools a website has, it's better to load only the tools a single page has. This concept is called contextual loading. So this is the next JS app we had claim. It's a fully functional small demo app with the backend implemented. Right now we're on the main homepage and this site only has three tools available. I went into the cart page and this time we had four tools and the names had also changed. The availability of tools changes based on the page you're on. This is where the registration functions come in. Whenever you land on a page like the homepage, it runs the register home tools function and when you leave, it runs unregister home tools based on which tools belong to that page. It just registers and then unregisters them. This is why it doesn't depend on the browser alone in this case, but the code also handles the integration. We're [snorts] not actually using WebMCP with a browser agent, which is what Google wants and what each browser would implement themselves. We're actually using a bridge that connects Claude Code to Wem. And this is how we control websites. And by the way, if you want to get more out of cloud code itself, we actually have a video on the 10 most updated ways to gain an advantage with it. This bridge is a community project and with the imperative API, it has a problem where tool switching doesn't really work with this MCP server. When I opened the site, we were on the checkout page and initialized the claude code session there. When we asked it to navigate back to the homepage, it couldn't see the tools available on the homepage. We were on the homepage and I went into the product page and we got an add toart button. But when it was on the product page, it couldn't really see that button. So, we had to manually add an item to the cart to demo this. But when we asked it to complete the checkout, it automatically filled in the details, placed the order, and completed the whole shopping flow. So that's one limitation of this MCP which brings us to another point. Web MCP is open source with major browser vendors and tech companies listed as participants. But right now the only browser that supports it is Chrome Canary and the intended agent is Gemini. Google's own AI built directly into the browser. If you're a website owner and you implement web MCP today, the only agent that can use your tools natively is Gemini. Claude Code needs a community-built bridge that breaks when contextual loading kicks in. every non- Googlele agent is at a disadvantage. Now, could Claude catch up? Sure, they have their own browser extension. And since that's also a browser agent, it could potentially discover these tools the same way Gemini does. But the question is, how many people are going to deliberately install a clawed browser extension versus just using the Gemini that's already built into Chrome. Chrome has billions of users. They don't need to install anything. In our opinion, Google isn't locking anyone out. They're just taking advantage of the architecture and the audience. they already have an open standard that works best inside the browser they already own with the agent they already ship. That doesn't mean you shouldn't implement it. The standard itself is genuinely useful and making your site agent accessible is smart regardless of which agent benefits first. There [snorts] are a few things worth knowing if you implement this. The spec recommends no more than 50 tools per page. This isn't meant to expose your entire application. It's meant for focused specific actions, the things someone would actually want to do on that page. Tool descriptions also matter more than you'd think. Agents read those descriptions to decide which tool to call. Vague descriptions mean the agent picks the wrong tool or skips it entirely. Write them like you're explaining the action to someone who's never seen your site. And this is still experimental. The API surface will change. Chrome 146 ships in March with broader support, but until then, this is a dev trial. Don't ship it to production yet. If you follow this channel, you know that keeping up with AI requires a strong technical foundation. That is why I love Brilliant. It's an interactive platform with hands-on lessons crafted by world-class teachers from MIT, Harvard, and Stanford. I highly recommend their clustering and classification and how AI works courses. They teach you to uncover hidden patterns and understand the logic behind large language models interactively. As you can see in the catalog on screen, they offer a massive variety of courses covering everything from foundational math to advanced data science and computer science. Brilliant is also giving our viewers 20% off an annual premium subscription, providing unlimited daily access to everything on the platform. To learn for free on Brilliant for a full 30 days, go to brilliant.org/iliabs. Scan the QR code on screen or click the link in the description. Build a real learning habit today and take your skills to the next level by heading over to Brilliant. That brings us to the end of this video. If you'd like to support the channel and help us keep making videos like this, you can do so by using the super thanks button below. As always, thank you for watching and I'll see you in the next one.