So, Anthropic decided to rain on Google Gemini 3's parade today when they released Claude Opus 4.5, which is their strongest model to date. Now, I already know what you're thinking, like, "All right, it's a Claude Opus model. I know it's going to be powerful, but it's going to be prohibitively expensive and slow, so I'm never going to be able to use it as my daily driver." But that is where you might be wrong. Because in this update, not only is Claude Opus 4.5 more powerful than its predecessor, it is three times cheaper and uses less tokens than Sonnet 4.5. And when you put that all together, that means we can finally use Anthropic's most powerful model, a model which exceeds Gemini's performance in many cases on any task we want without having to worry about breaking the bank. So, in today's video, I'm not only going to go into detail about these Opus 4.5 upgrades, but we're going to put it to the test and see how well it can create AI agents and automations from simple prompts inside of NADN. It has been a wild week in AI. So, let's just keep the ball rolling. So, like I alluded to in the opener, there are three big changes when it comes to Opus 4.5. That's its performance, its price, and its efficiency. And those are the three things we're going to talk about before we go into the live tests and see how it creates NAND agents and automations. So performance, the first benchmark they talk about is the SUI bench verified test. And this is a test where they give the large language model some sort of GitHub repo that has some issue with it. And they're trying to see if the LLM can solve it. And they tested Opus 4.5 against all the major players. The only one you don't see here is Grock. Now Opus 4.5 got 80.9% accuracy. And that was number one. The runnerup was Codex Max, which was 77.9. Of note, Gemini 3 Pro 76.2. And we also notice a big jump from Opus 4.1 all the way to Opus 4.5. So here's a few more benchmarks. Out of these nine tests that they show us, and of note, Gemini 3 Pro showed us 20 tests. It'll be interesting to see when we have a larger view of different benchmarks. Opus 4.5 comes first in six out of the nine. And of those six, you will notice they are all heavily coding related. Now, Claude has always had the reputation of the coders model of choice. So no surprise to see 4.5 leading the pack. Now, the only three benchmarks where 4.5 didn't win was graduate level reasoning, visual reasoning, and then multilingual Q&A. So, here's a look at the pricing. Opus 4.5 input is $5 per million tokens and output is $25 per million tokens. Again, this is literally three times cheaper than what 4.1 was. That was $1575, right? It was prohibitively expensive. You could not use Opus 4.1 for everything. But now, and I'm about to show you in a second with the efficiency, you can use this over Sonic 4.5. And this right here, this efficiency jump is what I think is the most important development with the release of Opus 4.5. So, you'll see right here, when set to a medium effort level, Opus 4.5 matches Sonnet 4.5's best score on Sweetbench verified, but uses 76% fewer output tokens. 76% fewer output tokens. And remember, if you look at pricing, $5 36, like it's literally cheaper. [laughter] It's It's literally cheaper in terms of executing a task if that actually holds. And that's for something at the same performance level. At its highest effort level, Opus 4.5 exceeds Sonnet 4.5 performance by 4.3 percentage points while still using 48% fewer tokens. This is huge. Again, the biggest issue with Opus in the past was it was like this lumbering giant that just sucked up tokens and dollars out of your bank account. That's no longer the case. Now, when it comes to setting this effort level, there is a link inside the anthropic article, which I will also link below in the description that explains how you can manipulate the effort level. Of note, setting effort to high produces exactly the same behavior as emitting the effort parameter entirely. So by default, if you're inside of cloud code or you're using Cloud Opus 4.5 via the API, it's already going to be a high. So just understand you're already getting max performance if you do nothing. If you want to change it and do a lower effort level and really maximize the token usage, again, the documentation walks you through step by step how to do that. So now you understand what's going on with Opus 4.5 when it comes to performance, price, and efficiency. So now let's actually put it to the test and see how well it can create NAN AI agents and automations from simple prompts. Now I'm inside of Cloud Code. I'm on the Claude Max plan. You're going to need to update it if you haven't done it already because you'll see right away Opus 4.5 is now the default model. So if you don't see Opus 4.5 here and you don't see it when you do /mod, just know you need to update. So what are we actually going to have Opus 4.5 build? Well, we want it to build this or its version of this. Now what this is is an NADN AI automation that we had Sonnet 4.5 build in the past and before that we also tested Sonnet 4.1 and so what it is is it's an automation that allows you to find jobs on LinkedIn. So you, the user, want to find a job. You fill out a form that says, "Hey, I want this job, this title, this salary, this location, whatever." And then what the automation is going to do, it's going to a it's going to find those jobs. It's going to bring the jobs back to you, the user. You will decide, hey, out of those 10 jobs AI brought me, I want jobs, you know, 1 through seven. It's then going to take that information, those seven jobs you chose, and it's going to go out and it's going to go search for the hiring managers out on the internet. It's going to do some research on them and their company. And then it's going to draft custom letters based on your resume, your information, the information of the hiring managers, and then send that out. The idea being, hey, we're going to do our best to bypass, you know, the great wall of résumés everyone is trying to get through and instead go straight to the hiring manager. Right? That's the idea. A lot of moving parts, but this is a great test to see how well this model can handle these sorts of things. And so we are using claude code in combination with the naden mcp server to do all this. Now in this video I'm not going to go step by step of how to set up the naden mcp server inside of cloud code. I've done that before. I'm going to link the video above. So if you want to follow along check out that video first. It takes you step by step how to set this up on your own machine. But what I'm going to do is I'm just going to give it this prompt. I'm just telling it, hey, I want you to create an NN workflow with the NAND MCP server. Big picture blah blah blah blah blah. I go into detail of what I just explained, what I wanted to do, search for jobs, hiring managers, etc. We have it on plan mode. We're going to hit go and we're going to see what it comes back with. So, as always, it comes up with its plan of how it's going to break it down step by step. And right away, I love that it starts doing the web search for these Appify job scrapers. Now, if you don't know what Appify is, it's a marketplace for web scrapers, right? You want web scrapers for LinkedIn, you want web scrapers for Apollo, anything. Well, not Apollo anymore. Appify is the place to go get it. So, it's already searching for the internet to see which one is real. It's not just going to hallucinate these things. All right. So, during the planning phase, it's come up with four sets of questions that it's going to ask me before it even creates this. Now, during my prompt, I didn't even tell it this, right? I didn't even say, "Hey, let's do them back and forth." It just knew intuitively that there's a number of different pathways we could go down. And it needs some user feedback. So, it's asking me how should the user provide their job search criteria. We're just going to say simple keywords. How should the user select which jobs they're interested in after scraping? So, we could do a form. We could do sheets. We could do AI auto filter. We'll just say a form. Then it wants to know how should we identify the hiring managers. We're just going to do it by department head search. And then it's asking me, hey, do you already have an Aify account and an API key? Yes, I do. Or it's going to give me guidance of how to even set it up, which is super nice. And then had another series of questions that have to do with the email stuff, like how should the personalized outreach message be generated? We're going to do AI generated. What information should be included in the outreach email? Job focus pitch, networking approach. Let's do both options. Then, how should the data be stored? We're just going to do Gmail drafts. Now, right away, the depth of those questions that asked me is already a step above. What I had gotten was set 4.5 when I ran this test in another video, which I really like because I think one of the things that trips up people the most, especially if they are like quote unquote a Vive coder and they have no real technical background, is you don't know what you don't know, right? And if AI can help kind of like shine light in those areas and ask you questions that you should have thought of before but probably haven't if you're new to the space, well, that's a huge thing. Like that's a huge positive and that's something that AI systems really didn't do in the past unless you had a very strong prompt that said, "Hey, like I want you to back and forth with me. I want you to identify potential pitfalls." Because a lot of the times, you know, it was like very much just the strength of your prompt, right? If it sucked, then the output would suck. And there's nothing AI would really do to help you along. Opus 4.5, on the other hand, really seems to do a good job with that. And here is a comprehensive plan OPUS 4.5 has come up with. We see it's broken down into multiple phases. It shows us the exact actor it wants to use. It even breaks down what the JSON's going to look like, right? The polling pattern, you know, it goes into like the selected job categories. Like this is very very detailed. Like as I scroll down, this is all part of its like pitch, right? And overall, this is going to be 22 nodes, right? And breaks down the name and the purpose, right? This is something 6 months ago if you had an AI system try to do this and have it be like a generalized thing or wasn't you know overfitting to what your prompt was this was impossible yet Opus 4.5 has kind of nailed this. So we're going to have it created and we're going to see what it actually looks like inside of Naden. So again this is what Sonnet 4.5 created us for the same exact prompt. And here's Opus 4.5's version. Now right away what do we notice? More nodes. Little more complicated right? It's one workflow instead of two. But more complicated, more notes isn't necessarily a good thing or a bad thing. What I do like are the little sticky notes it added. Right? A, it added them in the correct place, but I like having something that sort of breaks down what it was thinking. So, let's go through this automation it gave us untangle the logic and then at the end I'll give you my final thoughts on how I think Opus 4.5 does in these scenarios cuz I actually ran it through a number of other tests. So, phase one and phase two, right? The job search input. They enter this stuff and then they do the actual LinkedIn search. So, if I take a look at the job search format, we see right away, right, it actually did a good job of filling all these things out. And if I take a look at the actual scraper it used, it's this one inside of Amplify. So, it's a real scraper. It didn't hallucinate any of these things. So, Opus 4.5 does love these set nodes. You notice them all over the place when it tends to create these, which isn't necessarily a bad thing, right? It's doing its best to keep the data organized. Um, but it's just kind of a funny idiosyncrasy that I noticed. So, we scrape the jobs. We wait for the job scraper and we get the job results. Now, I wish this is one just like get data set node. But hey, it still does the job. Then it moves on to phase three and four where it processes and selects the job. So, it splits the results up again. Set node to clean the data, removes duplicates, right? That's nice. And then it runs it through some code before it gives us an actual form for us to fill out saying, "Hey, I want jobs, you know, six, seven, and eight out of 10." We then parse the job selections before we move over to phases five and six. Now, this is the part where we've chosen the jobs we like. Right? It presented us with some. We liked a few. Now, it's going to go ahead and find the hiring manager. So, what's it going to do? It's going to check to see if a company URL exists, scrape company employees filtered by HR titles, score the contacts, and then select the best hiring contact. So, actually a lot happening here. So, it's got a loop set up, right, where it's going to process each job selected. Once it goes through the loop, either it has a company URL or not. If it does, it's going to use this employee scraper. So, let's see if this Appify actor actually exists. And yes, it does exist. However, it is 1.8 stars. So, that gives me a little bit of pause, but you can see why it went for it, right? Ideal for HR and recruitment. And we're kind of in that funnel right now. Now, going back through this again, loves the set node. So, cleans up that data, eventually gets the results, parses the results, merges the data, and then does the custom outreach message. Now, this is what I'm interested in looking at. Why? Because what it should be doing is it should be sending a bunch of variable data to it, right? Like it should be mapping stuff that change based on each job. So, the email itself is customized. And let's see if it did that. So, we still need to choose a model here. And here is the prompt, right? This is just like the basic prompt it's looking for. Okay. Right. That kind of makes sense. No data in there though. But if I look at the user prompt, here we go. Write a personalized outreach message for this job opportunity. Right. And it's mapping the title, the company, the location, the description, the recipient, the name, and the background. Right? That's perfect. That's exactly the type of information I would want pass to the AI for it to be able to create these custom emails. And then from there, it parses the response and then creates a Gmail draft. It doesn't actually send it off. we kind of just want a draft um to see what's going on. Now, what it also does is it not only creates those draft is it actually creates a summary email of everything it's done based on these jobs. And that's what you see here, right? Aggregating the results, building the summary, and then sending a summary email. So, it's also keeping you in the loop and essentially using Gmail as like this database to always keep you aware of what's been going on. Now, what do we think about this? As we walked through this, this seems logically sound, right? In reality, if I go through all these nodes, right? Will it work? Will it not work? If it works, awesome. But frankly, as someone who does this for a living, I'm less concerned about oneshotting these things, and I'm more concerned about AI being a being able to build like a logically coherent automation that gets me 80 90% of the way there. It is highly unlikely that even if it oneshots this that it's the exact type of automation I want. What I want my AI system to do is exactly this, right? Like get me 80 90% of the way there. And by doing so in a logically coherent manner, it gets my mind turning about okay like what should I be doing to get this across the finish line, right? What didn't I think about what works? What doesn't? Right? That's what we're really going for. We don't always need it to oneshot these things because frankly whether it can oneshot it or not has a lot more to do with how good your prompt was and how you know detailed you are about the output you want right if I get extremely extremely detailed right take it to the extreme I'm telling it exactly what note I want for everything single thing like yeah it's going to work but what I want is to be able to get it a generalized prompt like it did and come out with something like this and frankly this I would argue is a step above this that we had in sonet 4.5 5, which is huge, right? Especially when you realize the price is essentially the same or less based on the token efficiency that Opus 4.5 now brings us. So, that is where I will leave you. I think Opus 4.5 is awesome. This did an amazing job at creating a skeleton of a pretty complicated AI automation. But what gets me most excited is looking at the changes from Opus 4.1 to 4.5, specifically the efficiency and the cost reduction. This is great, especially from a company like Anthropic, right? They're not Google. They don't have a giant, you know, like ad empire that funds everything. Like Anthropic just has Claude, you know, and yet they have still been able to get these prices down, which bodess well for everyone in like the cutting edge AI space that's looking at these larger language models and being like, "All right, well, is it always going to be this pricey? Is it going to come down? Is it eventually going to go up?" So, this is great news. This is great news for everybody. So, definitely check out Opus 4.5. Definitely check out my previous videos showing you how to connect cloud code and all these things. And lastly, make sure to check out my school. Tons of free resources there. And I also have Chase AI Plus if you're looking for a more advanced workflows and b trying to figure out how to start your own AI agency and actually make money in the space. And as always, let me know what you thought of this video.