No‑BS AI Briefing

AI Agents Scale: Google, Meta, PwC, Microsoft & Builders

Vikash

Use Left/Right to seek, Home/End to jump to start or end. Hold shift to jump forward or backward.

0:00 | 12:05

This episode of No-BS AI Briefing dives into the acceleration of agentic AI. Host Vikash Sharma unpacks major developments, including Google's upcoming Gemini Spark, Meta's privacy-first Incognito Chat, and PwC's massive enterprise deployment of Anthropic's Claude agents across 30,000+ professionals. We also cover how Microsoft's specialized MDASH system outperformed GPT-5.5 on a cybersecurity benchmark, highlighting the power of multi-agent architectures. For builders, this episode provides crucial insights into practical applications, privacy considerations, and the strategic implications of scaling AI agents in regulated environments. Discover how these advancements are shaping the future of product development. Plus, get a concrete takeaway to experiment with building your own multi-agent system this week. Don't miss out – follow No-BS AI Briefing in your podcast app for concise, actionable AI news.

Send us Fan Mail

Support the show

SPEAKER_00

AI agents are stepping out of labs and into the real world, with Google prepping an always-on assistant, Microsoft's multi-agent system outperforming advanced models, and PWC deploying thousands of cloud agents across its workforce. That's the kind of practical AI progress that really matters for builders like us. NoBS AI briefing brought to you by Proactive AI. Welcome back. I'm your host, Vikash Sharma, and this is where builders get straightforward AI news without the fluff. First up, Google is preparing to launch a new always-on AI agent called Gemini Spark, and it's designed to transform your daily productivity. I mean think about that for a second. This isn't just another chatbot, it's an assistant that lives in the Gemini app and proactively manages tasks like triaging your inbox or automating online workflows. It's expected ahead of Google I.O. 2026, so it's a big push. For builders, this signals a massive shift towards an agentic AI paradigm. We're likely to see new APIs and SDKs emerging from Google, creating an ecosystem for these autonomous assistants that act on users' behalf across Gmail, Calendar, Drive, and more. This isn't just a Google thing either. It's going to put competitive pressure on every other platform to ship its own proactive agent capabilities. So if you're building anything that touches user workflows, you really need to be paying attention here. Next, Meta just launched Incognito Chat, bringing privacy first AI to WhatsApp and the Meta AI app. Now, this is a direct response to growing user privacy concerns and regulatory pressure, and honestly, it's a smart move. Incognito chat uses something called private processing within secure isolated environments, meaning Meta explicitly states they cannot read or save your data. This is huge. Why it matters for builders, privacy by design isn't just a nice to have anymore. It's rapidly becoming table stakes, especially for AI applications dealing with sensitive user information. On-device or secure enclave processing where the AI work happens locally or in a highly protected bubble is quickly becoming a major differentiator. If you're building an AI product, you absolutely need to consider how to implement comparable privacy guarantees in your own applications. It's not just about compliance, it's about trust, and trust is hard to earn and easy to lose. Also, PWC and Anthropic have expanded their alliance, deploying cloud agents across more than 30,000 professionals globally. This isn't a pilot program or a small-scale trial. We're talking about a massive enterprise deployment. PWC is integrating Anthropic's cloud code and cloud cowork into core operations like deal making, cybersecurity, and finance, and they're reporting delivery improvements of up to 70%. Wild, right? For builders, this is a landmark validation of enterprise agent adoption at serious scale. It offers a clear blueprint for how to design, train, and deploy AI agents effectively in highly regulated environments. This case study demonstrates tangible commercial viability and a clear return on investment for enterprise agent platforms. It means if you are pitching AI solutions to large companies, you now have a real-world example to point to for how agents can transform operations and drive efficiency. And then we've got Microsoft's M-Dash system, which just outperformed GPT 5.5 on a key cybersecurity benchmark. M-Dash isn't a single monolithic model, it's a system of over 100 specialized AI agents. It scored 88.45% on the CyberGym benchmark, beating Anthropix Mythos, which got 83.1%, and OpenAI's GPT 5.5, which came in at 81.8%. What M-Dash does is autonomously identify and verify software vulnerabilities. So what does this mean for you, the builder? It's a powerful signal that multi-agent specialized approaches can genuinely outperform single general purpose models on complex high-stakes tasks. It suggests an architectural shift, moving away from relying solely on a single large language model towards building swarm or team architectures for domains like security, compliance, or even complex creative tasks. This isn't just about bigger models, it's about smarter, distributed systems. Finally, Anthropic and the Gates Foundation have launched a four-year $200 million partnership focused on AI for public good. The goal is to deploy AI for global health, education, and economic mobility. This isn't just a research grant, it includes grant funding, cloud usage credits, and technical support. From a builder's perspective, this is important because you can expect new data sets, new benchmarks, and specialized tools to emerge from this initiative. It validates AI's potential for large-scale social impact, which helps attract further philanthropic funding and talent. For builders specifically working in health or education tech, this could open doors to accessing clawed credits and technical support through the program, giving you a serious leg up on projects that align with the foundation's goals. Now let's deep dive into what I think is the most important story of the batch PWC's enterprise agentic AI deployment. What happened here is truly significant. PWC and Anthropic have expanded their alliance to integrate Anthropic's Cloud Code and Cloud Cowork into PwC's operations, impacting over 30,000 professionals globally. They're not just experimenting, they're building agentic operating models for critical areas like deal making, cybersecurity, and finance. And the claim they're reporting delivery improvements of up to 70%. I mean, think about that for a second. 70% improvements. That's a staggering number in any enterprise context. Why this matters right now is all about validation and scaling. We've heard a lot about AI agents in theory, seen them in demos, maybe even played with some basic versions. But this move by PwC takes agentic AI from proof of concept to large-scale production. It's the first major case study that demonstrates agents moving from isolated pilots into core business operations across a massive organization. For many enterprises that have been hesitant, this offers a tangible blueprint for how you can not just adopt but also achieve significant ROI from these agent systems. It's changing the conversation from can AI agents work to how do we implement them effectively? So who should really care about this? Well, if you're a founder, this is market validation for enterprise agent platforms. You can now confidently pitch your agentic solutions as productivity multipliers with proven ROI. For product managers, this means re-evaluating workflows, how can agents automate steps, reduce friction, and free up human talent for higher value tasks? It's about designing products that aren't just AI powered but AI orchestrated. If you're an engineering leader or an infrastructure engineer, you're thinking about the architecture required to support such a deployment. How do you manage secure and scale thousands of agents? And even for indie hackers, this shows that if a large enterprise can do it, you can find a smaller niche problem in a similar domain and tackle it with agents proving immediate value. How I'd think about it as a builder, I'd see this as a clear signal to move beyond simple chatbots and start thinking about agent-centric product design. Instead of just exposing an LLM, how do you design a team of agents that can collaboratively solve a larger problem? Imagine an agent that drafts a contract, another that cross-references legal databases, and a third that flags compliance risks, all working together. The value accrues not just in the underlying model but in the intelligent orchestration of these agents, their workflow integration, and the specific change management needed to get humans to adopt them. Don't just build an AI tool, build an AI colleague. The risks, of course, include overpromising on those up to 70% improvements without understanding the nuances of implementation and the need for rigorous testing in regulated environments. This isn't just plug and play. My no BS take on this is simple. This isn't just hype. This is a real tangible shift in how enterprise software will be built and deployed. The commercial viability for enterprise agent platforms is now clearer than ever. Don't get caught up in the generic AI will change everything narrative. Focus on the concrete workflows where an agent or a team of agents can deliver measurable, repeatable improvements. That's where the real opportunity is for builders. If you're finding this useful, hit follow in your podcast app right now. It takes two seconds and it's the best way to make sure you don't miss the next briefing. If you want one practical takeaway from today's episode, here it is. Experiment with building a multi-agent system for a complex task in your own team. We just saw Microsoft's M-Dash system beat out advanced general purpose models by using a swarm of specialized agents. This is a crucial insight. Here's how to try it in under 60 minutes. First, pick a complex, repetitive task within your team or product that a single model currently struggles with or that requires multiple manual steps. Think beyond just summarization. Maybe it's triaging a complex support ticket, vetting user-generated content, or even an internal code review process. Second, break that task down into three to five distinct sub-roles. Imagine you're assigning these to different human specialists. So for code review, you might have one agent for syntax, another for security vulnerabilities, and a third for best practices, each with its own specific prompt and context. Third, orchestrate these agents with a simple coordinator. This coordinator can be a small script or even a master agent that assigns subtasks, collects outputs, and synthesizes the final response. You don't need anything fancy. A basic sequential chain or a simple conditional flow will do. Finally, run this multi-agent system on a few real-world examples and compare its performance, accuracy, and output quality against what a single general purpose model would do or even against your current manual process. Why is this specific experiment worth your time right now? Because it forces you to think architecturally about AI beyond just calling a single API. It helps you understand how specialization and collaboration among AI entities can unlock performance that monolithic models can't achieve. You'll gain a tangible understanding of how to tackle genuinely complex problems with AI, moving you closer to building truly intelligent, robust products. This isn't just about tweaking prompts, it's about designing entire AI workflows, and that's a skill that's going to be invaluable moving forward. That's it for today's Nobs AI briefing. If this helped, follow the show in your podcast app and share it with one builder you know. And if you've got questions or topics you want covered, connect with me on LinkedIn and send them over. See you in the next briefing.