Summary:
This week on the Sidecar Sync, Amith Nagarajan and Mallory Mejias explore Wells Fargo’s virtual assistant “Fargo” and how it stacks up against Klarna’s AI tool from a year ago. With 250 million fully automated interactions and measurable impact on customer engagement and bias reduction, Fargo offers a powerful case study in applied AI. Amith reflects on what’s now possible for associations, why a narrow pilot project is a smart first move, and how “human in the loop” isn’t just a safety net—it’s strategic. The duo also breaks down Microsoft’s new Phi-4 reasoning models, which pack PhD-level performance into incredibly compact packages that can run on your phone. If you're wondering where the AI trend line is heading, this one’s for you.
Timestamps:
00:00 - Introduction03:47 - Meet Fargo: Wells Fargo’s AI Assistant
05:59 - Comparing Fargo with Klarna’s Assistant
08:57 - The State of AI Agents in Associations
13:05 - Event Support: A Smart Use Case for AI
15:00 - Human-in-the-Loop: Not Optional, But Essential
23:44 - Private AI: Local vs. Cloud Deployment
26:46 - Microsoft’s Phi-4 Models: Small and Mighty
32:50 - Why Small Models are a Big Deal
43:54 - AI Trendlines and the Future for Associations
🎉 More from Today’s Sponsors:
CDS Global https://www.cds-global.com/
VideoRequest https://videorequest.io/
🔎 Check out Sidecar's AI Learning Hub and get your Association AI Professional (AAiP) certification:
📕 Download ‘Ascend 2nd Edition: Unlocking the Power of AI for Associations’ for FREE
📅 Find out more digitalNow 2025 and register now:
https://digitalnow.sidecar.ai/
🛠 AI Tools and Resources Mentioned in This Episode:
Fargo ➡ https://sites.wf.com/fargo/
Klarna AI Assistant ➡ https://www.klarna.com
Microsoft Phi-4 Reasoning Models ➡ https://huggingface.co/microsoft
https://www.linkedin.com/company/sidecar-global
https://twitter.com/sidecarglobal
https://www.youtube.com/@SidecarSync
⚙️ Other Resources from Sidecar:
- Sidecar Blog
- Sidecar Community
- digitalNow Conference
- Upcoming Webinars and Events
- Association AI Mastermind Group
More about Your Hosts:
Amith Nagarajan is the Chairman of Blue Cypress 🔗 https://BlueCypress.io, a family of purpose-driven companies and proud practitioners of Conscious Capitalism. The Blue Cypress companies focus on helping associations, non-profits, and other purpose-driven organizations achieve long-term success. Amith is also an active early-stage investor in B2B SaaS companies. He’s had the good fortune of nearly three decades of success as an entrepreneur and enjoys helping others in their journey.
📣 Follow Amith on LinkedIn:
https://linkedin.com/amithnagarajan
Mallory Mejias is passionate about creating opportunities for association professionals to learn, grow, and better serve their members using artificial intelligence. She enjoys blending creativity and innovation to produce fresh, meaningful content for the association space.
📣 Follow Mallory on Linkedin:
https://linkedin.com/mallorymejias
Read the Transcript
🤖 Please note this transcript was generated using (you guessed it) AI, so please excuse any errors 🤖
[00:00:00] Amith: The most important thing for all of you associations to note is that you have options. You have ways of doing secure private AI inference. There's a number of ways to do this, and you can even do it locally on device. Welcome to Sidecar Sync, your Weekly Dose of Innovation. If you're looking for the latest news, insights, and developments in the association world, especially those driven by artificial intelligence, you're in the right place.
[00:00:26] We cut through the noise to bring you the most relevant updates with a keen focus on how AI and other emerging technologies are shaping the future. No fluff, just facts and informed discussions. I'm Amith Nagarajan, chairman of Blue. And I'm your host. Greetings and welcome to the sidecar. Think your source for content at the intersection of all things artificial intelligence and the world of associations.
[00:00:52] My name is Amith Nagarajan.
[00:00:55] Mallory: and my name is Mallory Mejias.
[00:00:57] Amith: And we are your host. And as always, we prepared an awesome episode for you guys to get some really interesting topics at the forefront of ai. And we're gonna talk all about how they apply to you in the world of associations. So excited to get into that.
[00:01:12] But first of all, Mallory, how you doing today?
[00:01:15] Mallory: I'm doing pretty well myself, Amit. It's a nice chilly day in Atlanta, so I'm enjoying that. Been getting outside a lot recently since the weather's been mostly warm. And yeah, I've had some, some fun auditions come through on the acting front, so it's, it's been a good productive, uh, weekend for me.
[00:01:32] What about you?
[00:01:34] Amith: Fantastic. Well, you know, I joke around a lot of times when I'm in New Orleans, which is home based for me, that that's the center of the universe for associations. Of course, it really isn't. Uh, but at the moment I actually am in the center of the university, at least the center of the association University here in the United States, in Washington DC and, uh, here for a bunch of meetings and also to visit families.
[00:01:53] So I, I always love coming to town here for a few days and, uh, it's been great so far and the weather is, uh, much cooler than New Orleans as well, so.
[00:02:03] Mallory: I know you appreciate that. Are you meeting with any associations while you're there?
[00:02:08] Amith: Yeah, I had a breakfast chat with, uh, somebody and, uh, have a few more meetings lined up, meeting with some of our team members across our company.
[00:02:15] So it's always a productive time in DC it's pretty much nonstop from early morning till late in the evening when I get into sound.
[00:02:22] Mallory: Yeah, I was, I was saying to Amit before we started recording, I didn't know how he was possibly gonna squeeze in this podcast with his schedule today of all these meetings.
[00:02:31] But, uh, you showed up, Amit. I'm really, I'm really happy we're here.
[00:02:34] Amith: Well, episode 81, we gotta keep the streak going and, uh, this is so much fun to record and, uh, I'm, I'm always interested in making time for it. My audio quality may not be as good as normal, unfortunately for this episode, so apologies in advance if that is the case and that is your experience.
[00:02:50] But, uh, I'll be back to normal recording session, uh, shortly, but, uh, for now I am on the.
[00:02:57] Mallory: We take the sidecar sink all over. I don't know if we've ever done it internationally yet. I'm trying to think on my end. I don't think I've ever recorded in another country. What about you? Am.
[00:03:08] Amith: I don't believe so, but that sounds like a challenge.
[00:03:10] I think I need to book a flight somewhere, so
[00:03:13] Mallory: Yeah. Hey, hey, let's do it. We'll do like maybe a, a Mexico version of the sidecar sink. That'd be fun. Well, today, as Amit mentioned, we have some exciting topics lined up for you. We're gonna first be exploring this Wells Fargo AI assistant. And then doing a little bit of a reflection on an episode we did, it was actually episode 21 where we talked about Klarna AI assistant, just to do a little compare and contrast.
[00:03:36] And then we will be talking about the latest Microsoft's Phi four Family of models with some great naming conventions as we always chat about on the Sidecar Sync Podcast. So first and foremost, the Wells Fargo AI assistant is called Fargo. It's an advanced virtual assistant integrated into the Wells Fargo mobile app that helps customers with a wide variety of banking tasks through both voice and text interactions, from checking balances to processing payments and handling refunds, and also providing personalized financial guidance.
[00:04:08] Fargo serves as a 24 7 banking assistant for Wells Fargo customers. The assistant uses a model agnostic architecture, and it employs different specialized LLMs for various tasks. So that's that multi-agent framework that we talk about often on this podcast. It has a privacy first design. So no personally identifiable information is exposed to external language models and sensitive data is processed locally before any cloud interaction.
[00:04:37] They're seeing some impressive results so far. So there have been 245.4 million interactions with the assistant in 2024, which is actually double what they projected. And these are interactions entirely without human intervention. So. Around 250 million interactions without human intervention. They're seeing deep engagement with their AI assistant, so 2.7 interactions per session on average, and across the board with their AI initiatives, they're seeing a three to 10 x increase in customer engagement.
[00:05:10] And something that's also interesting to note, we've talked about bias that's built into AI models because of the material that i's trained on. We've also talked about bias wi with humans, right? Because when we're making decisions, we're pulling on all of our previous experience as well. Something that's been interesting with their AI initiatives at Wells Fargo is they're seeing.
[00:05:30] Some bias reduction in certain areas. So the AI has led to fair lending decisions when it comes to loans, which I think is quite interesting to note. Uh, behind the scenes. Pega is the company, particularly their customer decision hub behind all these AI initiatives at Wells Fargo, and it helps them analyze billions of interactions to determine the next best conversation for each customer making Fargo's responses highly personalized and relevant across channels.
[00:05:59] And as I mentioned in episode, so it was episode 21. Right now we're recording episode 81. So a long time ago, 60 weeks ago, we talked about the Klarna AI assistant. Uh, and I wanted to do a little bit of a reflection. There aren't a ton of stark differences, but there are a few. So despite serving fewer customers, Wells Fargo has about 70 million customers.
[00:06:21] Klarna has 150 billion, so substantially different. Wells Fargo handled that 250 million interactions that I mentioned in the whole year of. 2024 compared to Klarna S two million-ish in its first month. They haven't published their full number for 2024, but even comparatively, 2 million in one month if it continued on that trend or even considerably increased each month, 250 million interactions at Wells Fargo's.
[00:06:47] Pretty impressive for their 70 million customer base. Klarna also publicly stated that its AI assistant was doing the work of about 700 full-time customer sif service agents handling two thirds of all customer service chats. Wells Fargo has not published, uh, a specific equivalent number of agents replaced, but given Fargo's scale.
[00:07:09] As far as exceeding Klarna in both total interactions and per customer engagement, I would say it's reasonable to infer that Fargo automates work that would require potentially thousands of agents. And something also worth noting is the feature evolution in both. So the Klarna assistant has expanded from customer service to shopping recommendations.
[00:07:29] Personalized shopping feed, multilingual support, and uh, chat, GPT integration for shopping advice. The Wells Fargo assistant has added AI driven spending insights, actionable financial tips, improved money movement and financial insight summaries. So going beyond this basic customer service routine interaction and really providing further value to the consumer, which I think is quite interesting.
[00:07:52] So, Amit, you've been talking about. Virtual assistants, AI agents. Really from the beginning. From the beginning of this podcast for sure. How have you seen that conversation evolve, particularly over the last year?
[00:08:06] Amith: You know, it's interesting you mentioned, uh, episode 21 versus 81, so it's exactly 60 episodes there, roughly 60 weeks ago when we talked about Klarna.
[00:08:14] I think back then, both of us by what Klarna had achieved with. I mean, it was GT four, if I recall correctly. But, um, you know, it, it was compared to what we have now, a very rudimentary model. And what they achieved was, was pretty remarkable. And so now when we, when it, it's good to have that perspective because in 60 weeks we've had roughly a little bit over two AI doublings, uh, in power.
[00:08:37] So, um, a lot of fascinating things to unpack here. So. Um, feature set increase is definitely something I think that makes sense because if you can engage people in a way that they find pleasing, that they find useful, they'll come back more. And if you have more functionality to offer, then you can go deeper.
[00:08:57] So, you know, Wells Fargo is able to, for example, provide spending insights, uh, directly in their, uh, their platform. That could be really useful for a lot of people, especially if you have a, a credit card and a bank account with well Fargo or maybe some other things. Um, that broader set of insights that you could get from your bank could be pretty powerful and helpful.
[00:09:17] Help make decision, help you make. Better decisions with respect to investing even. And, uh, that those are things that, uh, third party apps have been doing for a while. Products like Mint or a number of others, uh, rocket Money's another one that have some AI features, but this is an opportunity for a platform like a bank to bring some of that.
[00:09:37] Engagement back to the bank as the core platform for most people's, you know, primary fi financial interactions. So I think that's interesting. Um, in my experience, the association community has been moving a little bit slower than I'd like in terms of member service agents overall, uh, people have been doing bits and pieces, you know, uh, we have in our own family of companies a.
[00:10:00] Uh, one of which is obviously Betty, which has about a hundred associations, uh, working, uh, now and, and growing quickly. And, and that definitely is, uh, Betty is definitely in this realm, has been knowledge agent, the expert agent in terms of all things association knowledge. We've mentioned previously on this podcast, we're launching, uh, something specifically for member service that deals with.
[00:10:20] Routing, incoming asynchronous messages like emails and, and, uh, SMS and so forth. Uh, but, you know, I, I'd see, I'd say that we're still in super, super early innings. So if you're an association that's thinking about this, saying, Hey, we'd love to have something like the Wells Fargo assistant or like the Klarna assistant, um, you're still, you've still got plenty of time ahead of you.
[00:10:40] Uh, but I wouldn't, you know, spend all year thinking about it. I'd run an experiment. Um, to me this is what's so powerful about this particular use case is it's, it's both sides of the value equation. The one side is cost reduction or efficiency improvements, but the other side is improving the value to the customer, um, which is the biggest thing when you see people using a service more and more.
[00:11:02] Light up a, you know, a light bulb for you says, Hey, there's something here when, see, for example, engagement in a web-based search tool compared to a knowledge agent, where the knowledge agent has like literally 50 x longer session times than a search tool. That should tell you something about the value you're creating.
[00:11:19] It's not that it takes 50 times longer to get the information. It's the, the knowledge agent is much, much faster at getting people the information they want, but rather because people found value. Low time to, to value for the customer. They come back more. So if they come back more, there's more opportunities to engage, more opportunities to create value and have a, a reinforcing cycle.
[00:11:41] So I find it to be, uh, a really, really exciting area for associations to jump into. But as I said, I think it's still super early.
[00:11:49] Mallory: Okay. I was gonna say, I'm sure we have some listeners thinking, well, great. Wells Fargo did it in Klarna with their 70 million and 150 million customers respectively. That's feasible for them.
[00:11:59] You said it's still early stages for associations. Can you contextualize what you mean by that? So what would you say is currently feasible right now for a pilot project with a member service agent?
[00:12:11] Amith: I think you could stand up, um, a member service agent over the next three to six months in your association.
[00:12:17] A number of different ways. Uh, there are a number of tools you could use for this using either off the shelf tools and just string them together with, you know, different kinds of agent frameworks. Uh, you could certainly partner with companies, uh, that specialize in this, either in the association market, like our companies outside of the.
[00:12:40] Focused on kind of large enterprise, like the one that you mentioned. Uh, there's also a company called Deagon and Sierra is another one that do customer service agents kind of at the very high end of the market. Um, you know, and, and people in the association market I think are gonna have association specific solutions more and more.
[00:12:55] Obviously what we're focused on is that, but that you're gonna see more and more choice there. Uh, so I think there's, there's off the shelf stuff.
[00:13:05] I think this is a great opportunity for a, uh, experimentation round where you could do something really, really small. Um, don't try to boil the ocean and solve all customer service or member service inquiries, but focus on a pain point. For example. Um, many associations have a highly seasonal, uh, volume of activity that comes in around their annual conference, so.
[00:13:27] Prior to the annual conference, uh, they might have a fairly reasonable inflow of inquiries, but right before and during and after the conference, they might have, let's say, a 30 or 60 day window of time on the calendar where it's just completely crazy. Well, what if we could put in place a great member slash event service, uh, AI that could help field.
[00:13:49] 70% of those questions that are fairly repetitive, that's a super achievable thing. And within the narrower context of events, uh, the domain of questions usually are far narrower. So I think that's an easy thing to go experiment with. Um, overall what I'd say is, uh. To me, the thing that you have to remember is yes, you're an association.
[00:14:09] You're not Wells Fargo. Yes, you're an association. You're not Amazon, but the technologies have come down so much in cost, are so much more accessible and there's so much more powerful that not only can you do this in as, as an association, but you're going to be expected to, um, your members don't care that you're not Wells Fargo or Amazon or Netflix or Klarna.
[00:14:28] They just expect the same quality of experience from you that they expect. From their largest consumer experiences, and it may not be fair, but fairness doesn't really matter if the, if the eye of the consumer, uh, the expectation of the bar has been set at this level, they're going to expect it soon enough from you.
[00:14:44] So might as well get ahead of that and provide them something slightly before they might expect it from the association.
[00:14:50] Mallory: And then you can provide that additional value, those insights, things that really, really help your members in their profession or industry to further create that, that value based relationship.
[00:15:00] Amith: I'd say this is also a great time to reinforce a concept we've talked about on the Pod Mallory a number of times, which is how do you prioritize your energy? Uh, your energy might be classified as human labor, like your team's time, your volunteers time, uh, also your dollars. That's part of the energy flow, right?
[00:15:19] Where do you invest, uh. A lot of people are saying, well, our infrastructure is so terrible. We've got ancient systems, you know, we've got a really old a MS and we've gotta replace that thing. And, uh, or, or an old LMS, you know, and, and while those, those older systems are clearly things that you need to look at over time, um, if they worked last year, there's a g decent chance they'll still work.
[00:15:41] This year and next year. And the question is, is instead of replacing a major system like that, which is, you know, significant effort sometimes takes 18 to 24 months to fully go through a process like that, sometimes longer. Uh, what if you didn't do that right? And you said, Hey, we're going to deprioritize some of those classical association IT things and instead invest.
[00:16:02] And some time, time being the most important ingredient to experiment with this case, right? Go figure out how make a member services agent work for you as your priority, let's say the next six months and hit on a selection or implementation. The amount of value you create for members from this technology is so much higher.
[00:16:23] It's dramatically different than what an internal system replacement might yield. Uh, so I, again, I'm not suggesting that you work with a, an unstable, shaky foundation with ancient technology forever, but if you have to choose between something like. Frankly, nobody's gonna really notice on the external side, uh, I focus on this and, uh, maybe you don't have to choose between the two and your group and your association, but most people do have to choose between those kinds of priorities.
[00:16:50] So I thought it'd be a good time to remind people that you can attack these things. If you're willing to say no to stuff, you just have to draw a line to the stand and say, you know what? We're gonna put a pause and all of these old. Classical types of systems and projects, and we're gonna keep them running, obviously, but we're not gonna invest big dollars in big energy in these older technologies.
[00:17:08] Instead, we're gonna focus on making these new AI things work. Last thing I'll say about that is once you do these kinds of new projects, you'll actually reframe what you think you need. When it comes time to replace some of that infrastructure, you might think you know what you want in that next generation A MS, but frankly, you probably don't.
[00:17:25] When you build an AI technology or two and deploy it into production, you'll get so much better of a sense of where your members want you to go, and that might change the requirements for what that new a MS is going do. And then thing related to the way is vendors are.
[00:17:46] Some other type of solution. Um, everybody in these types of database applications is working really hard right now to figure out how to AI enable their systems. So I'd give a bit, you'll have better choice and you'll have visibility.
[00:18:03] Mallory: Hearing you talk about that. Uh, we, we've just hit the one year mark of moving to Atlanta and it made me think of our experience last year of moving into an apartment we had never seen and trying to furnish it before we were there and realizing sometimes you just need to be there physically in the space before you realize, ah, okay, we need this size couch, we need a TV right here.
[00:18:24] It makes me think of you or of association specifically trying to replace their a MS and then. Perhaps getting to that point and thinking, oh gosh, now with AI we realize we need all these other features and all this other infrastructure. So I think it's a really valid point, Amme. And I wanna talk a little bit about this pilot project that you mentioned.
[00:18:42] I can definitely resonate. Um, having been the primary point person at Sidecar who would take in a lot of inquiries. Approaching digital now, the conference before the event and after the event. Um, however, I would think if you came to me and said, we're gonna, you know, roll out this AI agent and there's going to be potentially no human intervention, right?
[00:19:03] We're just gonna roll it out. I would be intimidated by that and to be honest, scared that it wouldn't work. So, and I'm sure our association listeners feel the same way. What is your thought on the pilot project of, of trying to roll out an agent that has. No human intervention versus trying to roll out an agent that does the routing, like you kind of briefly mentioned earlier, uh, is no human intervention the goal?
[00:19:26] Talk me through that.
[00:19:28] Amith: I don't think that's the goal at all. In, in almost all cases, I don't think that's the case either for Klarna or Wells Fargo. As I understand their models. It's more about, uh, making available instant and high quality responses for most things. Mm-hmm. But at the same time, being able to interact with human agent when, when appropriate, uh, and.
[00:19:45] What I, this might sound like we're, we're, uh, trying to find a silver lining in terms of the employment side of the equation here. Um, in saying that the humans can focus on higher value activities, that's oftentimes consultants speak for saying they're gonna be laid off. Um, in reality, uh, there's some of that that might happen in the association market, probably not.
[00:20:04] So. In the broader market. You know, if you have 10,000 people in a call center, maybe you don't need 10,000, maybe you need 2000, maybe you need your best 2000 people. So there's some issues there for sure when you think about that across an entire sector. But for the association world, I think of it this way that you know your member services folks, your event services folks.
[00:20:23] They have a lot more to offer than just answering rote inquiries, asking like, people say, Hey, when do I need to register? Where can I check in? Can I bring my spouse to this particular function? What's the guest registration fee? Um, you know, where can I find this particular article? All these kinds of basic help desk questions and AI can nail all those things, and those people who are asking those questions are gonna be happier with a better answer.
[00:20:46] That's nearly instant. But those member services reps, those event folks can have conversations with people, can learn more from those members, can take time to actually have live synchronous phone calls and video calls to really be the concierge to help provide an experience so that it feels like you're checking into the four Seasons when we come to your event, rather than checking into the red roof.
[00:21:11] So, you know, the whole idea is, is that you want to level up the caliber of service and the quality of service that you provide. And you can do that. You can, you know, punch way above your weight class by using AI to take care of the rot stuff. Coming back to your point, um, that's where this concept, ingenix systems called human in the loop is so critical.
[00:21:30] That's for key decision making, but it's also for escalation where the AI should be trained and can easily be trained. To be smart enough to not try to take on everything, right? Where you can tell the ai, Hey, for these three or four different kinds of inquiries, we can answer it in these different ways.
[00:21:46] These are the ways, these are the tools that are available. Uh, you might have knowledge agency, you might have capabilities around database lookups. There might be two or three different things that the agent is really good at, but we can tell the agent to err on the side. Uh, if.
[00:22:07] Purpose of the call or the the inquiry? Uh, let's just say that the AI detects a tone of frustration. Uh, let's say that there's two or three iter iterations of emails and the AI detects that the person's just not particularly happy. You know, AI is really, really good at reading into the emotion, uh, from just plain text.
[00:22:26] Uh, and that's even more true with audio. If you, if you were this audio and then be to detect. I'm not super happy with me right now, is the ai I'm gonna forward it to forward this message to somebody else, um, for, you know, a human in that case. Right. To, to help Mallory out.
[00:22:45] Mallory: Mm-hmm. Yep. As you said, AI's pretty good at detecting sentiment.
[00:22:50] It's not something you think it would be good at, but like word choice and especially if it has more information through audio, video, it, it does a pretty decent job at it.
[00:22:59] Amith: You can also tell like how, how, uh, someone, if someone's coming back two or three times and they feel like they're asking the same thing repetitively and they, they like, or even like using a simple like phrase, like as I said, right, when I say, as I said, I, I feel like I'm repeating myself and, and I find myself doing that with customer service reps.
[00:23:16] Um, in that, you know, kind of ongoing infinite loop of emailing people, uh, who don't have a great idea of what I'm after. Address my in some way. So I think there's a lot of opportunity here, but yes, to your point Mallory, you make a really important one. I wouldn't try to like hand this over robot and in
[00:23:39] level up what the do in.
[00:23:45] Mallory: My last question here, AME, is that the ability to process sensitive data locally before any cloud interaction is a major privacy advancement and is of course essential for things like banking or buy now pay later when you're dealing with people's payment information. Do you think that this is a necessity for associations?
[00:24:05] Amith: I, I think it's an important concept. Association should be aware. A lot of people make the assumption, they'll say to me, for example, oh, I love the idea of, uh, whatever the application is, but they'll say, I have all this sensitive data. Uh, or it might not be sensitive, like patient data or banking data or something like that, but it might be just, we have a lot.
[00:24:25] Knowledge Pository, we don't wanna send that to chat GPT or to Claude. We just don't trust them. And that's a reasonable concern. But people make the assumption that that's a dead end, right? That that's the end of the conversation. Whereas there's both ways of doing private deployment in the cloud of your own models where you could say, Hey, I'm gonna run Llama or a number of other models in a private cloud deployment and to.
[00:24:49] These models are shrinking, their capabilities are growing and they're shrinking in size. You can actually run them locally on a phone, in a web browser, um, and in ways that also provide additional privacy. So, um, I don't know exactly what Wells Fargo's doing, but uh, apple strategy around this is sounds similar.
[00:25:06] What they'll do is on the phone itself. The LLM that's running locally, very, very small. LLM will try to get the essence of what you've asked, um, and then determine if it can answer the question locally or if it will need to promote a portion of that information abstracting out anything personally you may have shared with just the general concept.
[00:25:27] Get a higher order knowledge from a remote LM also in operating.
[00:25:38] The personal information never really left the local environment. Um, the most important thing for all of you associations to note is that you have options. You have ways of doing secure private AI inference. There's a number of ways to do this and you can even do it locally on device, and that's gonna continue to be the case.
[00:25:55] There's this, uh, all this, this growing, uh, collective body of, of, of language models that you can run that are smaller and smaller that run extremely efficiently on computer, on, you know, desktop computers and laptops, and even on phones.
[00:26:09] Mallory: Well, you, you really set me up perfectly There, me to go to topic two, which is Microsoft's five, four models, small language models with big reasoning power.
[00:26:18] So Microsoft just released a new family of five four models including these great names. Five four reasoning five, four Reasoning plus, and five four. Many reasoning, those aren't too bad. I've seen worse, I would say, come out of open ai. Uh, the five four reasoning models are very much a part of the broader trend toward reasoning or thinking models called either or that can perform advanced reasoning and ability to analyze complex scenarios, apply structured logic, and solve problems in a way that resemble human thinking.
[00:26:49] So to break down that five, four family, we've got five four reasoning, which is a 14 billion parameter open weight model, fine tune for complex reasoning math, science, and coding tasks. It uses supervised fine tuning with high quality curated data, enabling it to generate detailed reasoning chains and match or surpass much larger models.
[00:27:10] On benchmarks. Then we've got five four Reasoning Plus, which builds on that five four reasoning model that I just mentioned. Further trained with reinforcement learning and able to use 1.5 X more tokens for even higher accuracy. It matches or exceeds the performance of much larger models like Deep Seek R one, which we've covered on the podcast, which as a note has 671.
[00:27:35] Billion parameters compared to the 14 billion parameters of this model. Uh, and open AI's oh three mini on several key benchmarks. And then we've got five four mini reasoning, a compact 3.8 billion parameter model optimized for mathematical reasoning and educational use. Suitable for deployment on resource limited devices like mobile phones and edge hardware.
[00:27:57] So Amme was already kind of gearing up to mention a lot of the practical benefits of smaller models they can run locally on PCs. Mobile devices and edge hardware. They're also designed for offline use in copilot and PCs. And of course there are lower computational requirements that make them more accessible and cost effective.
[00:28:16] All three of these models are openly available under permissive licenses, and they can be accessed through Azure, AI Foundry, and hugging face. So Amit, what are your initial thoughts on the uh, five four family of models?
[00:28:30] Amith: Well, uh, first of all, let's, uh, let's spell this out for folks because five four might be, uh, it might be, uh, pronounced or, or spelled differently.
[00:28:39] It's PHI dash number four. Um, and that's, I think part of what makes it so hard to pronounce is there was five three, and then it almost sounds like you're saying 5, 5
[00:28:48] Mallory: 4,
[00:28:49] Amith: but, um, ah, yeah, you're right. So, but it's, yeah, that's what I, when I first started hearing this.
[00:28:57] Um, my thoughts are, wow, this is really exciting. So what you said that's part of many really interesting comments is that, um, not across all benchmarks, but across several important benchmarks, uh, five for reasoning plus, which I'll talk about in more detail in a second, matches or exceeds deep seek. R one, which if you recall, R one, uh, shook the world back in January, February timeframe.
[00:29:21] People freaked out.
[00:29:25] Most powerful, uh, AI reasoning model, the the oh one and oh three mini uh, capabilities. So here's the deal. Here's the way to think about this, is that this is a tiny model. 14 billion parameters in today's, uh, model sizes is really small, capable of being run probably on some phones, but definitely on a PC or a, and, um, one of they're make it perform well is by giving it more time.
[00:29:51] When you ask the question, so reasoning slash thinking models, it sounds like some new category of model. It's this really cool, complex thing. Uh, in, in reality actually, it's not all that different from the models we've had in the past. It, it, it's, it's essentially saying, Hey, model, I want you to spend time to think.
[00:30:10] This problem to be able to spend time just like going deeper and thinking through the problem, breaking it down step by step into small chunk, uh, and then compiling the results of each of those into an answer. Uh, another way to think about it is that the model is able to revise, uh, something that thought about previously.
[00:30:29] When we interviewed Ian Andrews from, he used an analogy that I, and I've this a number of times, which is that it the model a backspace key where model edits response simply as fast. What reasoning and models do? And the way to think about it for you is that, you know, you have access in a 14 billion parameter model to something that previously, literally two months ago, required a six 71 billion parameter model, uh, which makes it possible to run all sorts of workloads on smaller and smaller devices.
[00:31:01] Um, so, and then the, the, the mini model is a much.
[00:31:08] Four model, but it also is trained to use more, uh, compute resources when you ask questions. So it can reason through problems. And that model is suitable for running on edge hardware, which would include phones and, and other devices that have much smaller memory and, and computational ability. So I find all of this to be super exciting.
[00:31:28] It just reinforces the trend line of what we talk about. I've been saying this for a few years now that I'm actually more excited about the compact small. Super efficient, lightweight models becoming smarter than I am about frontier models like Cloud three seven Sonnet, Gemini P five Pro. Those are awesome.
[00:31:45] The fact that these superpowered models that run only in the cloud are getting smarter is of course exciting. But the fact that these small, really efficient models can do so much more, uh, is just stunning. I mean, what you have in five four reason is better than what you had six months in the very models in.
[00:32:04] You now can run that on.
[00:32:12] Mallory: I know you, you and I, we like to geek out about all the, the minute details of all these models because that's part of our job and I think we just enjoy learning about it. But you mentioned the trend lines, and I always think it's important with these model conversations to zoom out a little bit and, and look at the bigger picture.
[00:32:27] So what you just said is really profound, but what do you think this trend line means with the smaller, more powerful models specifically for associations?
[00:32:36] Amith: Well, I think, you know, going back to the last conversation, if there are certain types of data that you have in your organization that you're not comfortable sharing with any of the other, any of the providers, or Google or Open ai, you can take this, you can run it even on your own.
[00:32:55] Physical hardware if you wanted to, or you could run it in a virtual, private cloud environment in one of the major cloud providers where it's completely contained and as secure as anything else. Any other computer program that you run, most people have gotten pretty comfortable about secure private cloud deployment, where in a cloud like you know, Google or AWS or Azure, you can set up resources that are a hundred percent secured and private.
[00:33:20] And by most measures, far more secure than computers. You run physically, like on your own physical hardware, um, and run whatever programs you want, right? Uh, traditional computer programs and an AI model is just a computer program. It works differently than a traditional computer program, but it is a computer program and you can run it on a hardware.
[00:33:38] You have absolute control over. So if you have that ability, it opens up a class of applications that associations have often told me that they're uncomfortable with, which is things related to clinical data that they might have access to if they're a healthcare association, if they're a financial association, maybe there's benchmarking data that they receive from some of their members that they don't passing to open AI or anybody else for applications now.
[00:34:09] Incredible accuracy. So it opens up a.
[00:34:20] Subsume your content into their corpus of training data that they'll use for future models, which by the way, you have legal protection around, but let's just say you're still not comfortable. And that's, that's probably reasonable, right? To be a little bit skeptical. Uh, even if the legal agreement says that it can't be used in certain ways, um, you might say, okay, well I'd rather just be totally sure and I'm just gonna run this type of model on my, so it opens up a doors.
[00:34:44] The other thing.
[00:34:50] Smaller models run faster with less energy, less you know, resources and are cheaper to run. So if you have a little model like this that's as smart as what was previously requiring a a giant model and you can now run it with a really cost-effective small model, you can do more, right? You might have a hundred million documents that you know, go back to the beginning of your association's formation.
[00:35:13] And you might like to analyze them in all sorts of new ways that you previously would've thought to be totally, uh, unattainable because you might've said, well, we have this idea in mind where, you know, we have a million documents of every paper we've ever published and every opinion that's ever been written on every paper.
[00:35:28] We would like to ask certain questions of every one of those papers, right? Have a detailed analysis done of each of those papers in order to capture like some metadata or some structured insight from all those papers. And let's just say a year ago you thought about this idea, it was a cool idea, but then you're like, yeah.
[00:35:45] It cost like between two and $3 to do that per paper. And we have a couple million pieces of content that's just not gonna scale. But now, if you have a 97, 90 8% cost reduction, which is basically what you get here, now, it might cost you a few thousand dollars, right? Or maybe a $10,000. You might say, you know what, that's actually pretty reasonable.
[00:36:03] And if you wait six more months, it might be, you know.
[00:36:09] As well as the privacy, it opens up the door to just use way, way more of this inference that we keep talking about.
[00:36:16] Mallory: Mm-hmm. Amit, this is just not something I'm up to speed on, so I'll ask in case we have some listeners as well that have the same question, but when you talked about running models privately in your own cloud environment versus running them locally or those as sec, is one as secure as the other or is one more secure than the other?
[00:36:33] Amith: You know, there's, there's pros and cons to each approach. So let's say that I have the old school way of doing it, that in my office I have a computer server and I run that physical server. I am responsible for site security to make sure no one physically enters that location. I'm responsible for network security.
[00:36:51] I'm responsible for the whole. Traditionally IT departments and associations did that. They would run, you know, they'd have server rooms where they'd have, you know, racks of these servers and they'd run them and they're responsible for all of that. You know, the site security and, and the digital security.
[00:37:06] And I would argue that, generally speaking, that is going to be less secure than a modern cloud provider that have, you know, incredibly rigorous. You know, like military grade, physical security around their sites, way, way more than any association's ever gonna have. Um, and from a digital security perspective, implementing your own, you know, approach to cyber security is really important for your own resources.
[00:37:29] But cloud service providers tend to have really, really good built in security architectures that are a good starting point. So I generally ams run a secure local. Uh, a well implemented cloud environment. I think most security experts would tend to agree with that. Certainly for SMBs, like small to medium sized businesses, which associations fit into, um, there are exceptions to every statement.
[00:37:54] Obviously. There are some organizations who would argue, you know, we have even stronger site security and digital security than any cloud provider and certain information we have justifies this and.
[00:38:17] You leave a wide open back door to, without thinking about, you know, where you're just like, oh, I'm just gonna, you know, post a password to my website, you know, on Reddit and let anyone log in. I mean, that sounds totally stupid, but, uh, the reality is there's all sorts of human factors that go into compromising security all the time.
[00:38:33] Uh, that can affect you either way. Um, local inference on a device that an end user uses though.
[00:38:46] I'm to my banking assistant on my phone, and I have to all this information about, and I'm about like my salary or my investment strategy. Maybe that information isn't really what that local AI needs help with. Maybe it needs to help, it needs help reasoning through some general ideas. So then guide what the local LLM does.
[00:39:05] So instead of sharing my salary and my net worth from the local conversation, let's say with the remote ai, what it does is just, Hey, I'm working with the consumer. Um, they're working through these kinds of problems. Uh, can you gimme some general guidance on A, b and c? And then the remote LLM that throws way more compute at it and comes up with a stronger answer, feeds it back to the local, says, Hey, here's the direction you should go with.
[00:39:29] And then the local LM then takes that private data, infuses it back into the answer from the remote l lm, and gives me. Experience that's really, really high quality in my phone, right? Uh, and my personal data not never left my phone. And the same thing can be done for healthcare. And that could be a compliment that associations could take advantage of.
[00:39:48] Uh, let's say that you're a medical association and you wanna provide capabilities for, uh, you members to have chats with you that are
[00:40:01] don't of. Healthcare data to ever come back to you. Right? So what if you have local LLM that did part of the processing and just, like I said, abstracted out the problem, removing patient specific data, then got a knowledge agent that has, you know, a tremendous amount of content and, and, you know, compute capability to formulate an answer and then reinfused that back with the local data.
[00:40:22] There's, there's ways to do that as well. Um, and there's applications for associations for sure.
[00:40:28] Mallory: Yep. I've mentioned my husband's in healthcare and he is just waiting for the day that he has exactly what you mentioned, where he could drop in some patient info, uh, and get that resolved with, you know, better accuracy perhaps than he could have found doing some searches online.
[00:40:41] Amith: I was just gonna say one other thing, Mallory, that I think our listeners might find interesting. You know, uh, for those of you that have heard me talk about the acquired podcast before, heard Mallory mention we.
[00:40:56] Um, there's a new episode they just dropped. Uh.
[00:41:07] In the healthcare. Oh, wow. And, uh, what's super interesting, there's a lot of interesting things about that particular episode, but, um, there's a lot of talk about AI and a company like Epic and what they're gonna do in the healthcare field, and they're the, by far, the dominant player in providing tools like MyChart patient and, you know, the EM EMRs, EHRs and billing systems and so forth, and hospitals and health.
[00:41:31] Soon feature AI capabilities for doctors to use. So you might ask the question, well, what is the role of the association? How do we provide value when the hospital might have an AI bot built into their secure E-H-R-E-M-R system? And the answer in my mind is to compliment that where you know you have certain things that nobody else has, particularly your content.
[00:41:52] Um, and over time you forms of experiential data that would be unique to that be complimentary. To what people get out of an E-H-R-E-M-R. So I think there's actually a very bright story. Whether those things can interoperate and integrate with the experience, uh, that a doctor or a medical practitioner member may have is a big question because companies like Epic specifically are famously very guarded about integrations.
[00:42:16] But I think there's an opportunity here for, um, you know, many associations in a similar capacity outside of healthcare as well to do things where you, you can compliment a line of business system that your members use every day.
[00:42:29] Mallory: I'm gonna have to tell Bailey about that episode. I've actually gotten him on the acquired podcast.
[00:42:33] He listened to the Costco episode as well and really enjoyed it, so he will certainly enjoy the Epic one. Amif. My last question was about this trend line again. It seems like when you zoom out, we're seeing smaller and smaller models become more and more powerful in your mind in the next five years. If you could zoom out, do you think we'll be looking at.
[00:42:54] Tons of models that are, you know, millions of parameters and like more powerful than we could possibly imagine. I guess I'm, are we trending toward creating as small and small of models as we can or is there a place for the giant ones and the small ones as well? I.
[00:43:10] Amith: I think it's both. I think that, you know, if you can further compact these models down to the point where, let's say we could come up with 12 months from now, you said five years.
[00:43:19] I don't know that I can think that far out. I think that the next year, and that was a hard question. Yeah. It's hard to predict even what's gonna happen in the next 12 months, but let's just say in the next 12, 18, 24 months, fairly near term, let's say that an equivalent model to the five four, uh, reasoning model is available in a hundred million.
[00:43:40] 10 x or even 50 or a hundred x smaller than the current five four model that could run in a web browser that could run on really, really lightweight phones. Not even like an iPhone 16, but something much smaller than that. And so if that's the case, then now you have really high end reasoning capability in a super compact form.
[00:43:59] You know, you could have it running pretty much everywhere. You could have that capability in your earbuds, you know. So, uh, those capabilities becoming smaller and smaller is good. Uh, I think what you're also gonna see is that the state-of-the-art frontier models will keep getting smarter and smarter. Uh, you know, one of the stats that I think has been missed by a lot of folks is, I think it was the most recent O three release by open ai, uh, in Gemini two five Pro and Cloud 3.7 in extended thinking.
[00:44:39] 70, 80 percentile of PhDs. That means that, you know, if you put the average PhD, which is, you know, no slouch, typically right in the middle, that's the 50 percentile. So oh three is that the 70 to 80 percentile of performance of PhDs, and not just in one field, but across a number of different disciplines ranging from.
[00:44:57] To philosophy, to various forms of science and engineering. So it's pretty stunning what you have. And that's in oh three, which is a big, heavy, expensive reasoning model. Um, and, but if you could have that capability distilled down into smaller and smaller and smaller models, even if it didn't, even if these models didn't get any smarter, right?
[00:45:15] That's, that's pretty darn smart. And if you make it super, super fast, small cost efficient, the doors that open are really compell.
[00:45:25] Mallory: That is a great place to wrap up this episode. What would you do if you had all those PhDs at your fingertips running on your phone and your earbuds? I don't know. We might be there pretty soon.
[00:45:36] Everybody. Thank you for tuning in to today's episode, and we'll see you all next week.
[00:45:43] Amith: Thanks for tuning into Sidecar Sync this week. Looking to dive deeper. Download your free copy of our new book, ascend. Unlocking the power of AI for associations@ascendbook.org. It's packed with insights to power your association's journey with ai.
[00:45:59] And remember, sidecar is here with more resources from webinars to bootcamps to help you stay ahead in the association world. We'll catch you in the next episode. Until then, keep learning, keep growing, and keep disrupting.

May 9, 2025