Skip to main content

Summary:

In this jam-packed episode of Sidecar Sync, Amith and Mallory break down two of the biggest AI stories of the moment: DeepSeek’s R1 model and OpenAI’s new Operator agent. R1 has been making waves with its impressive reasoning abilities, low training costs, and major impact on AI stocks—leaving industry giants scrambling. Meanwhile, OpenAI’s Operator brings a new level of automation, allowing AI to browse the web and complete tasks like a human. What does this mean for associations and AI adoption? Tune in as we explore the implications, risks, and opportunities these developments bring to the table.

Timestamps:

00:00 - Introduction
03:44 - The DeepSeek R1 model: why it's a big deal
13:35 - Can we trust R1’s $6 million training claim?
17:22 - The future of AI: nearly free and everywhere
24:32 - Will R1 be used in Sidecar’s AI tools?
31:11 - OpenAI Operator: AI that can use the web for you
37:38 - Should you be cautious about AI agents?
42:17 - What Operator means for the future of work
44:46 - Closing thoughts

 

 

🔎 Check out Sidecar's AI Learning Hub and get your Association AI Professional (AAiP) certification:
https://learn.sidecar.ai/

📕 Download ‘Ascend 2nd Edition: Unlocking the Power of AI for Associations’ for FREE
https://sidecar.ai/ai

📅 Find out more digitalNow 2025 and register now:
https://digitalnow.sidecar.ai/ 

🛠 AI Tools and Resources Mentioned in This Episode:
DeepSeek R1 ➡ https://www.deepseek.com/
OpenAI Operator ➡ https://openai.com
Claude’s Computer Use (Anthropic) ➡ https://claude.ai
Llama 3.3 (Meta AI) ➡ https://ai.meta.com/llama

👍 Please Like & Subscribe!
https://twitter.com/sidecarglobal
https://www.youtube.com/@SidecarSync
https://sidecarglobal.com

Follow Sidecar on LinkedIn

⚙️ Other Resources from Sidecar: 

More about Your Hosts:

Amith Nagarajan is the Chairman of Blue Cypress 🔗 https://BlueCypress.io, a family of purpose-driven companies and proud practitioners of Conscious Capitalism. The Blue Cypress companies focus on helping associations, non-profits, and other purpose-driven organizations achieve long-term success. Amith is also an active early-stage investor in B2B SaaS companies. He’s had the good fortune of nearly three decades of success as an entrepreneur and enjoys helping others in their journey.

📣 Follow Amith on LinkedIn:
https://linkedin.com/amithnagarajan

Mallory Mejias is the Manager at Sidecar, and she's passionate about creating opportunities for association professionals to learn, grow, and better serve their members using artificial intelligence. She enjoys blending creativity and innovation to produce fresh, meaningful content for the association space.

📣 Follow Mallory on Linkedin:
https://linkedin.com/mallorymejias

 

Read the Transcript

Amith: 0:00
AI will be in every single experience in life, and if you don't provide it, you're going to seem like you don't like. It'll be as remarkable as if someone didn't have a website today. Right? Welcome to Sidecar Sync, your weekly dose of innovation. If you're looking for the latest news, insights and developments in the association world, especially those driven by artificial intelligence, you're in the right place, especially those driven by artificial intelligence, you're in the right place. We cut through the noise to bring you the most relevant updates, with a keen focus on how AI and other emerging technologies are shaping the future. No fluff, just facts and informed discussions. I'm Amith Nagarajan, Chairman of Blue Cypress, and I'm your host. Greetings and welcome to the Sidecar Sync, your home for content all about associations and artificial intelligence. My name is Amith Nagarajan.

Mallory: 0:52
And my name is Mallory Mejias.

Amith: 0:55
And we are your hosts. Now, before we get into our exciting episode, let's take a moment to hear a quick word from our sponsor.

Mallory: 1:03
If you're listening to this podcast right now, you're already thinking differently about AI than many of your peers, don't you wish there was a way to showcase your commitment to innovation and learning? The Association AI Professional, or AAIP, certification is exactly that. The AAIP certification is awarded to those who have achieved outstanding theoretical and practical AI knowledge. As it pertains to associations. Earning your AAIP certification proves that you're at the forefront of AI in your organization and in the greater association space, giving you a competitive edge in an increasingly AI-driven job market driven job market. Join the growing group of professionals who've earned their AAIP certification and secure your professional future by heading to Learnsidecar.ai. Amith, it has been a crazy week in the world of AI. How are you doing on this fine Wednesday?

Amith: 1:59
I'm doing great, other than the fact that I can't see a whole lot, because I just got done going to the ophthalmologist and they did this medieval stuff to me and put stuff in my eyes, and so I walked out into the street in New Orleans it was a bright, sunny day I had to walk a couple blocks and forgot my sunglasses. So, other than all that I am doing fantastic, the world of AI was not top of mind when I couldn't see. Let's put it that way. How about you?

Mallory: 2:22
I thought the world of AI was always top of mind for you, Amith. I'm doing pretty well.

Mallory: 2:27
It's a nice what 60 degrees here in Atlanta. I'm happy to just go outside without a puffer coat on and walk my dog, so I feel like things are good. And, honestly, I've been excited for this episode, as I am every week, because I realized I don't fully process all this AI news until you and I do this podcast. It's almost like this is our time to really digest everything, break it down. So I started telling my husband about R1 and then realized wait, wait, wait. I need to do the podcast first. We'll come back to this after that and I'll have more.

Amith: 3:00
Yeah, that's awesome. Well, yeah, and when you have to present a content or a topic or whatever you're doing, it definitely forces your mind to think a little bit differently. There's probably a different neural pathway that you use when you're teaching, training, communicating anything that's even somewhat formal. In this podcast, we keep it pretty light, but still we're presenting to a whole bunch of people, and so it makes you kind of distill down the content a little bit differently. You have to think about how you're presenting it, of course, but also what's the right way to summarize things, because you and I both spend countless hours every week reading content on AI, thinking about how it applies to associations, talking with association leaders, building educational content. So, yeah, it's a great touchpoint each week. I really enjoy it.

Mallory: 3:44
And if you have been living under a rock everyone you probably wouldn't know what we're talking about today, but I already kind of teased it. Our first topic for this episode is DeepSeek's R1 model, which has been everywhere all over my LinkedIn, every news post I see is about this model. So we're going to spend a good bit of our time today talking about that, and then we will also be talking about OpenAI's operator agent, which you all know we love talking about agents on this pod. So that's going to be a fun discussion as well. But first let's kick it off with the infamous DeepSeek R1. So, to give you some context, DeepSeek is a Chinese artificial intelligence company and it released its R1 model on January 20th of this year.

Mallory: 4:26
DeepSeek is a relatively new player in the AI field, having been founded in May of 2023 as a spinoff of the Chinese quantitative hedge fund HiFlyer. The company has quickly gained attention in the AI community for its high-quality language models and innovative approaches to model training. So as I went through this topic for today, I kind of broke out what I thought were the reasons why there's so much news around R1. Amith, I'll be interested to see if you're in agreement. One of those is performance, and then cost, and then the fact that it's open source and then finally, of course, its impact on the USS stock market. But starting with performance, r1 outperforms OpenAI's O1 model, Anthropix's Claude 3.5 Sonnet model and other models in the majority of reasoning benchmarks. During training, r1 developed the ability to spend more time on complex problems by re-evaluating its initial approach, a behavior that actually emerged naturally.

Mallory: 5:25
Something interesting about R1 is it explains its thought process as it works through problems and questions, which is interesting to see, and it has been shown to complete some interesting puzzles or questions that have traditionally stumped AI models in the past. One of those is how many R's are in the word strawberry, which I think we've talked about on the pod before. It's that puzzle question is actually the inspiration behind the name Project Strawberry, which was formerly QSTAR, which is now known as OpenAI's O1. So it can answer that question. And then also on LinkedIn, I saw a post by Mark Heaps, chief Tech Evangelist at Grok Grok with a Q, and he said he likes to run this puzzle with AI models to see if they can get it correct. It's really simple to the human brain, but there are three people in a room One is reading, one is playing chess. What is the third person likely doing? The answer is playing chess, because it takes two people to play chess. And he said in most scenarios, ai models will create elaborate stories to answer that question. And R1 was able to get it right, it said, went through its thought process and said well, that person must be playing chess. That's the performance summary.

Mallory: 6:36
Now, talking about cost, DeepSeek claimed to have developed R1 for only $6 million, which we know is significantly less than the up to billions invested by US tech giants. Interestingly enough, the cost to train the R1 model is actually less than some leaders make at some of these big AI companies in the US. That reduced cost trickles down, so R1 is significantly cheaper to use than O1, for example, costing about 27 times less for input and output tokens. As I mentioned, r1 is also open source, which means it's poised to accelerate innovation in the AI sector and potentially disrupt the current landscape dominated by closed source models.

Mallory: 7:17
And then we saw quite the stock market impact. So the release of the R1 model caused significant disruption, particularly affecting AI-related stocks. On January 27th, the tech-heavy NASDAQ index fell by approximately 3%, resulting in a $1 trillion loss in market capitalization. NVIDIA, which is a leading AI chip manufacturer, stock plummeted nearly 17% on January 27th. The single-day drop wiped out almost $600 billion from NVIDIA's market value, marking the largest daily loss in Wall Street history, and we know this model's ability to achieve high performance with less advanced hardware challenged the perceived value of premium AI chips. We also saw impacts on other tech companies like Microsoft and Alphabet, which is Google's parent company. Now all of you can access DeepSeek's R1 model on desktop and on mobile. You can go download the app if you would like to. There's probably questions there on whether you should or not and it's also available on the Hugging Face platform. So that's a lot of information, Amith, people are losing their minds over R1. I tried to break it down into why I think that is, but I'm interested to hear your take.

Amith: 8:30
Well, that was a great summary, Mallory, and I think that ultimately, if you can get something that's perceived to be worth X for a fraction of X, that's interesting. And if people spend a lot of time and money building that first product that now you can deliver at a fraction of the cost, it undermines the perceived value of the thing that people are selling to begin with, on the one hand, but it also questions like what are the production methods? Right, in this case, really advanced chips and lots of them. And so NVIDIA got hammered because the perception was that if you can train R1, which is, in concept at least, as good as O1, there's independent verification happening, and some has already happened. It seems to be legit. But if that's true, then what does that mean for the future of the most advanced frontier models, right? So what does that mean for the future of frontier labs, investments and the most advanced models? So, a couple of things. Does that mean for the future of Frontier Labs, investments and the most advanced models? So, a couple of things.

Amith: 9:29
The trend line is something we've been covering on this pod and our other forms of content on Sidecar for some time, which is I've been one to say that I'm more excited about small to midsize models than I am about large models, and it isn't that I don't get excited so much about these new 0103, other things coming out. I do get excited about those things because it's always interesting to see what the absolute edge of capabilities are. But models today, they're already sufficiently advanced to do so much more than we do with them, and so the application development opportunities that exist on top of even current models like 4.0 and Llama 3.3 and others that are out there that are becoming cheaper and cheaper, are mind-boggling. You can remake the entire way a business operates with current AI, even if it didn't get better at all, and so, as those capabilities that were once frontier now become available not only in open source but in smaller and faster and cheaper models, that's really good for everyone, because that makes the incremental cost per unit of AI, if you will, dramatically more accessible for everyone on Earth. And there's this thing called Javon's paradox, which is this idea that as a technology becomes more efficient and therefore it becomes less expensive, demand radically increases, and we've seen that to be true across pretty much every technology that our species has invented over time. As we've been able to find efficiency in production. We've found decrease in cost and we've found massive increases in demand. Think about automotive sector, think about energy utilization, think about traditional computing and now AI. So we're going to see that and, as a result of that, that means there's going to be a massive increase in demand. What we've seen so far is a tiny sliver of the demand that anyone's expecting, but this is a nonlinear curve, meaning that it's far greater in its impact than what people tend to perceive. That's why these exponentials are so hard to visualize, tend to perceive. That's why these exponentials are so hard to visualize. And so, ultimately, I don't think there's anything to worry about.

Amith: 11:29
If you're a leading producer of hardware or applications, I do think and I've said this for a while that if you're a producer of the fundamental models, you've got a lot to worry about because that's a hyper-competitive space. There's almost no differentiation. I already view it as a commodity. I view the models as totally being commoditized. I mean whether it's like when we're helping organizations think through software architecture. Picking the model is by far not the highest concern, as it was originally, because it used to be. You'd have to go after pretty much the most performant model, the fastest model, but also really the smartest model. More than that, and that's no longer the thing you think about, because these models are comparable across the board and, like we've talked about in recent episodes, Llama 3.3, 70 billion parameter version that came out in December is comparable to GPT-4.0. And it's unbelievable what you can do even with that small model.

Amith: 12:21
So I guess the point here is this class of models called reasoning models.

Amith: 12:26
They do a little bit more than the regular LLMs, but it's the same exact thing is happening.

Amith: 12:32
They just essentially have new techniques that are being incorporated in them. So, whether it's out of China or out of Africa, or out of South America or out of some other place, we're going to have more and more innovation coming from places that have far fewer resources in the traditional sense, but are innovative, creative and are coming up with new ways to do things. Because the deep sea folks there's a variety of things that led to their breakthrough, but they really have some really smart approaches to the engineering under the hood that were novel, that people outside of their organization hadn't thought of. So I think there's a lot of good that comes from that. Jan LaCoon, who's the head of AI for Meta, was saying that this isn't a victory of China over the US, but it's a victory of open source over closed source, because what he's really pointing to is that when you have this massive sea of people who are all collaborating and sharing, it's very hard for any proprietary company, or all proprietary companies combined, to compete with that, and I think he's right.

Mallory: 13:33
There's so much to unpack there. Amith, let me start with. You mentioned right at the top independent verification that's going on around this whole process. Some AI leaders right now are speculating that this whole $6 million to train the model thing is not true or accurate. As someone who has more understanding of how that process works for training a model, do you feel like what we're seeing is true, or do you think they could just slap a number on there and say sure it was $6 million?

Amith: 13:58
Yeah, I mean, first of all, no one knows exactly what they trained it on. We don't even know exactly what the cluster size was, what the chips were, we don't know how long it trained for, so we don't know how much money they spent. So you have to take it as are they telling the truth or not. And even if you believe them, the question is what were the inputs? Because as models get better, they're predecessor models for every model, and that lineage directly impacts how you're able to train. Newer, faster, smarter models Like Llama 3.3 is dramatically smarter than Llama 3.1. And so that's the same family of models, but they're able to use Llama 3.1, the bigger model, the 405 billion parameter model, to generate data that that was used to help train the smaller 3.3 version, and that's a process called distillation, and some people are claiming that that looks like something that, at least in part, was responsible for DeepSeek's performance, and one of the things that was shown as potential evidence of that but it's, of course, just a screenshot is the DeepSeek R1 model actually quoting OpenAI's policy framework for why it can or cannot answer certain questions, which is obviously indicative of having consumed content that came out of OpenAI's models? Now, obviously, I don't know if that's true or not, but it wouldn't surprise me if there were techniques like that being used with OpenAI's models, or probably with Claude, probably with other models as well, because, essentially, if the mindset is, take whatever you can possibly get your hands on and use that to improve models, there are people who are going to do that, whether you like it or don't like it, whether you agree with it or don't agree with it, it's going to happen. Did it happen in this case, I don't know, but it is happening in general, and so it does violate the terms of service for all these commercial model providers. But the distillation, the concept, is very powerful, right, like if you try to put legal structures around something to prevent something that has economic reasoning behind it, why it's happening? It's going to happen. The legal structures are going to fall over immediately because the power of the economic incentive is so great, and that's what's happening here. Distillation is a really powerful technique. It's not the only technique, but it's something that absolutely helped the R1 model be as good as it is Ultimately for our association listeners in particular.

Amith: 16:12
A lot of that, aside from maybe being interesting, trivia is not necessarily useful, other than to come back to the through line and the trend, which is that associations, and everyone else for that matter, can count on very powerful models being available for nearly free. That is the thing you can count on, based on the competition, based on the advancements, based on the hardware going forward, based on economies of scale. It's just very obvious that these models are going to be essentially nearly zero cost. And so why is that important for you?

Amith: 16:44
If you're an association CEO or anyone who's thinking about the business side of the organization, not the technology, you might be saying well, wouldn't it be great if we could do these things, if it was very inexpensive, if GPT-4.0 didn't cost so much that I couldn't process all of my data with it because that would cost me millions of dollars or it would take too long?

Amith: 17:07
Or the question of like well, I wish I could do that with AI, but I don't really want to send my data over to one of these commercial providers.

Amith: 17:13
I sure wish I could run these AI models myself on hardware or environments I control. Well, you can do that, and you can do it for fractions of what it cost, even six months ago. Well, you can do that, and you can do it for fractions of what it cost even six months ago. So the reason that's so important to get your head wrapped around is you need to plan your strategy based on what will be available in the coming 6 to 12 months, of course, but even the next few years, and this stuff is going to get more powerful and cheaper, which means there's more business applications that you can count on having availability for you. And a lot of people have this fallacy where they say, oh, I wish I could do AI, but I'm too small, I don't have the budget, I don't have this, and that All of those assumptions are false now and they're going to be more false in the future.

Mallory: 17:55
So it sounds like the list of excuses is dwindling.

Amith: 17:58
Certainly from a cost perspective, and what that also means is everyone else is going to be adopting these technologies, these techniques and these strategies, and that means the consumers of the world are going to expect AI everywhere. Every CRM system is going to have AI baked into it. So if you're a CRM or an AMS vendor and if you don't have a really deep and well-thought-out AI strategy that you are scurrying about implementing as fast as possible, you better watch out, because that's going to be the expectation in those types of tools. It's going to be the expectation in every piece of software you use and every vendor you deal with. It's a software experience. It's an e-commerce site or an app. The interaction between brands and individuals is going to be defined by the quality of the experience, but also the absence of friction, and friction is something that be defined by the quality of the experience.

Amith: 18:43
But also the absence of friction and friction is something that's caused by constraints or choke points. Sometimes it's caused by design. For example, you can't go buy a product from Hermes because they don't want you to be able to in order to create artificial scarcity, which is a really cool business model if you can have that, but most people can't do that. So in reality, friction is generally a really bad thing in the consumer experience. So what I would point to is here we have a classical example of the association saying poor, poor me, I'm too small and I'm too under-resourced to handle doing anything meaningful with AI, but nobody cares outside of your organization.

Amith: 19:21
The reality is is your consumers, your members, your customers, your users whatever you want to call them are more and more accustomed to a low friction, high quality environment which is increasingly driven by AI, if not entirely driven by AI from everything else in their lives, not just their professional lives, but their personal lives. So what I'm trying to say is really simple, is that this trend line affects the world and it affects the expectations of the consumer, because AI will be in every single experience in life and if you don't provide it, you're going to seem like you don't. It'll be as remarkable as if someone didn't have a website today. You want to talk to a vendor and you want to buy something. There's no website. There's not even. Maybe, maybe they don't even have a phone. That's how far back in time you will seem to be to your members and to your customers if you don't have, like, a deep AI strategy.

Mallory: 20:10
I find it a bit ironic that this model was released right after the TikTok ban, because we just saw one of the most popular apps in the world get banned in the US for only 12 hours, but still a surprise nonetheless. Are you comfortable using a Chinese AI model? I mean, is that something that you're thinking about, Amith? Do you think it's safe to use Sure?

Amith: 20:31
So I'm totally comfortable using the model so long as it's not running in China. So if you go to DeepSeek, their website, if you download the app from the App Store, from the Android Store, you are connecting to servers that are controlled and ultimately the data is going back to China. Now I don't have any theory or knowledge that says Deep Seek is controlled by the DCP or is not, but generally speaking, you can assume that if your data is going to China, that it could be accessed by government, it could be used for purposes outside of what the company even intends. So I don't know how much that's actually happening versus not, but it's absolutely a possibility. So you have to be thoughtful about that, even as a consumer, as a business, most certainly you have to be thoughtful about where is your AI running? So maybe compelling. They actually have a free pricing plan for nonprofits as well, as announced, and I don't know if that's totally free forever at all levels of inference, or if it's free only for a certain level. But that is a major, major trap potentially and again, I'm not suggesting that the people behind this have nefarious ideas or plans. They might be wonderful people who have the best intentions in the world, but they still live in an environment where, ultimately, they don't have control, so it's a situation you have to be thoughtful about.

Amith: 21:47
Now, as far as running the model, remember it's open source and you can run the R1 model in a lot of other places. In fact, the folks over at Grok created a distillation of Llama 3.3 that distills parts of essentially the R1 model into Llama 3.3 that they took live on their website already, and that's something worth checking. That's not the R1 model. You can inference the R1 model itself. In the United States there's a number of providers that already spun it up. But on the Grok platform you can inference on Llama 3.3 distilled with R1, which essentially means they took Llama 3.3, then they got a bunch of sample content out of the R1 model by running it over and over again, and then they essentially did additional training on Llama 3.3 to make it smarter, using some of the brainpower from R1. And they essentially leveled up Llama 3.3. They gave it more reasoning skills, they gave it capabilities. So this is the power of distillation, right? In this case it's two open source models, so that's another interesting thing to play with. And on the Grok platform everything goes 10x as fast as it does anywhere else. So that's worth checking out. And that inference is well. They have a data center in Saudi Arabia now as well, but you can choose to inference only in the United States if you want with them question.

Amith: 23:03
Yes, you have to be very thoughtful about where the AI is running and where you're sending your data. It doesn't mean you can't use models that have a Chinese provenance at all. I don't think that's the right way to think about it. There's another great model called Qen that's available and the Qen 2.5 model is fantastic. It outperforms a lot of other models. It's kind of neck and neck with GPT-4.0. They just released the new Qen 2.5 Max, which is a bigger version that's supposed to be in the O1 category. So there's a lot of competition and stuff coming from China. If it's open source, it's great.

Amith: 23:33
Some people might have theories oh, it's going to phone home somehow and transfer data.

Amith: 23:36
That's easy to detect and it's easy to firewall this stuff off. So I wouldn't worry about anything like that and I would also have a provider that runs this for you that knows what they're doing. I wouldn't try to download it yourself and run it in your own data center. At least most of our listeners who are in the association realm, who don't have that in-house expertise, shouldn't do that. But there's lots of options. That's. The beauty of this is that open source models they separate the decision of inference provider from model provider, whereas prior to them, prior to the point where you had these different models that can run anywhere, you picked OpenAI or Cloud or someone like that, and you said, okay, I'm going to use, or Google, right, and you use one of those providers. They are the model provider and they also run the model on your behalf. But with open source models, you can run them anywhere, and so that separation of concerns creates more competition and creates more choice, which is ultimately a wonderful thing.

Mallory: 25:21
So we're in agreement, right, that the reasoning based performance of the model is pretty impressive lower cost. So my question for you, Amith, that I always like to ask is are you considering running R1 in an environment for any of the products that we have?

Amith: 25:37
Yeah, we've actually been thinking about it already and we thought about O1 and potentially O3 down the road for certain tasks. So we've talked a lot on this pod about agents and I want to talk for a moment about reasoning models, just to refresh our strawberry episode, our more recent episode, where I think we talked a little bit about O3 maybe, maybe a couple of weeks ago I could be hallucinating that as well and then R1. But these models, what they do, these so-called reasoning models, really what they're doing is it goes back to the conversation we had on strawberry with what we called essentially the AI's equivalent of system one versus system two thinking. And going back to that work essentially in our brain, system one thinking essentially is that instinctive, immediate reaction we have. That doesn't really require any higher order processing, but it's like someone says A, I say B right, it's like this instantaneous reaction. You say one thing, someone reacts a different way. It's like instinctive and nearly instant, whereas our system two thinking is that which requires some reflection, it requires some reasoning.

Amith: 26:34
Sometimes we take minutes, hours, days, weeks, months, years to think through problems when they're more complex. Machines haven't had that capability. Machines haven't had the ability themselves to say, oh, that's an interesting problem and it's kind of complicated. Let me think through that for a minute. Let me check my work. Now. You can kind of approximate that, even with older LLMs, where you say think step by step that's so-called chain of thought prompting, and it actually helps. It's more of a trick than it is anything else. It doesn't actually cause the model to work differently.

Amith: 27:00
But with reasoning models they're actually trained to kind of iterate. So what they do is they say hmm, Mallory has asked me to count the number of R's in the word strawberry. How should I do that? Well, let me look at all the letters. Let me see if these letters have this. Let me go count them. Let me see if that. And then they go and they check their work. They say did I actually do what Mallory asked me to do? So there's this kind of iterative process where the model is checking its own work and it's coming up with a plan.

Amith: 27:24
Well, if you recall, when we talk about agents and multi-agentic software architectures, it's basically the same idea where we're saying take a problem, break it down into component parts, execute those parts, potentially be able to loop back and redo some of them if there was failures in any of them and then ultimately come to an answer and give you a more complex output. What's happened essentially is the model is doing some of this reasoning, which is essentially this iterative looping kind of capability in the model itself Right, and that's extremely powerful because it makes it easier for people to use. But if you have a multi-agent architecture, it may not be the right thing to use. So a good example of this is our Skip software, which I'm very excited about right now because we're literally a handful of business days away from a soft launch to what we're calling a private preview than a public preview of the new version of Skip, which is crazy new and improved and incredibly powerful. But also it's going to be a completely turnkey SaaS option. So people will be able to go to the website, click Get Started, click a couple buttons to connect Skip to their SaaS solution, like Salesforce, HubSpot etc. Of choice.

Amith: 28:34
A few minutes or longer later, depending on the size of the database, skip will be synchronized. You can start talking to skip. So skip has been rebuilt. Skip has only existed for around 18 months since we started building skip. A little bit longer than that. We've rebuilt skip and thrown everything away now four times, and the reason is is that not only does the architecture need to change, but the underlying models have changed, and it's actually a little bit of an overstatement to say we threw everything out, but we threw away lots of it, and that's really important when things are changing rapidly.

Amith: 29:02
But in the case of what we have in the latest Skip architecture, the way the architecture is set up is it's possible for Skip to essentially do extraordinarily complex things for you. Sometimes they take minutes to do, because Skip is able to essentially look at what you're talking about, take the whole context of not only your conversation but literally all the data in your enterprise, look at all the types of data you have in your Salesforce system or your HubSpot or whatever, and then figure out a plan, execute the plan step by step, which might require 3 steps, 5 steps, 50 steps, and then, after Skip executes all those steps, to come back through, check Skip's work and then ultimately assemble the final work product, which is almost always some kind of analytic or reports. It's like a report that says hey, here's my member churn for the last three years, here's my predicted member churn, something like that. So Skip is able to do those things because of these techniques and, because of that, each of the individual steps that exist within that kind of agentic framework don't require an R1 level capability, because we've broken down the problem through the agent layer to essentially result in the LLMs underneath it being much simpler. Almost all of Skip now runs on Llama 3.3, which is the 70 billion parameter model we've talked about, compared to previously requiring OpenAI.

Amith: 30:21
OpenAI was the only game in town last year for Skip. We needed the level of horsepower they had and nobody else had a GPT-4O caliber model. Now you have four or five options that are all really good and Llama 3.3, again mentioning Grok with a Q. We are using them exclusively for inference. We do go up from that a little bit for a couple of key tasks Kip does, but because of our architecture, we can get a lot of horsepower out of these smaller models.

Amith: 30:46
Now, that being said, there are definitely situations where it'd be helpful to have an R1 or an O1 or an O3 caliber model, and our teams are always looking at this stuff, always playing with it and figuring out where you can solve novel problems with it. So it's a lot of fun, but yeah, I mean, I think that people tend to do this, they think, oh, I have to, now that R1's out, I got to use it. Well, actually, r1 and O1 are really bad at certain things. If you want a near instant response to a fairly simple question that doesn't make any sense, you don't need it to go through that whole thinking process Like. Gpt-4-0-l-l-m is perfectly good at answering that and it doesn't do any of those steps. It's nearly instant, comparatively speaking. So there's different things for different problems. Essentially, is what I'm trying to say.

Mallory: 32:26
Yeah, that makes sense. You mentioned a private preview for Skip. Is there any way for any of our listeners to be a part of that group, if they're interested?

Amith: 32:36
They can ping me on LinkedIn. We're taking just a very, very small number of people. We already have a number of people that are signed up for the private preview. That's only going to last a handful of weeks as we get through that, and then we're going to go to a public preview and then we'll go to a full release. All those steps will happen in Q1. So it's happening pretty quickly, but if anyone's interested in that, they can ping me on LinkedIn.

Mallory: 32:57
Awesome. Moving to our second topic of today, we're talking OpenAI Operator, which is a groundbreaking AI agent unveiled on January 23rd of this year, designed to automate various tasks by navigating the web like a human. It can perform activities like planning vacations, filling out forms, making restaurant reservations and ordering groceries Really fun activities. Operator is powered by a new model called Computer Using Agent, or CUA, which combines GPT-4-0's vision capabilities with advanced reasoning through reinforcement learning. This allows Operator to see web pages through screenshots and interact with them using a virtual mouse and keyboard inputs, just as a human would.

Mallory: 33:41
What are some key features of Operator? Well, it can autonomously browse websites to complete tasks without requiring custom API integrations. It can leverage reasoning capabilities to overcome challenges and correct mistakes when it encounters difficulties. It can hand control back to the user for assistance, and users can also add custom instructions for specific sites or save prompts for repeated tasks. Currently, operator is available exclusively to ChatGPT Pro users, which right now is $200 per month through operatorchatgpt.com, but OpenAI does plan to expand access to other tiers and integrate it directly into ChatGPT in the future. So, booking a reservation for two at a seafood restaurant, finding a few tickets for a concert, sending mom flowers on her birthday. These are a few examples that the OpenAI team shared in their demo Amith. I'm curious on your end if you test this out and you realize it works really well. What are some everyday tasks, personal or professional, that you could see yourself using this for?

Amith: 34:48
First of all, I got to get in my soapbox for a minute. I think these guys really need some help with picking examples. Those are so dumb. Every single one of those tasks is actually something I'd really like to actually spend. These guys really need some help with picking examples. Those are so dumb Like every single one of those tasks is actually something I'd really like to actually spend time on. It's fun to pick a restaurant. It's fun to like find you know music that you're going to go to, or I don't want anybody that I send flowers to to know that an AI did it, that's for sure.

Amith: 35:09
Certainly my wife, you know. So I don't know. I think they need a little bit of help picking some examples that are actually tasks that are good ideas to automate. But aside from that, the reason that they pick those things is because they're good examples of consumer experiences that happen in most people's lives at different times. But the point is this actually, the world around us we have shaped for hundreds of thousands of years in our image and to work well, to have a user interface that's intuitive for people. So the way things work in the physical world works well for the human form, which is why humanoid robots are such an interesting area of product development, because if you can plug a robot in to the human world, the robot has way more utility than if it's only able to interact in very narrow ranges of environments. So, similarly, if an AI in the digital world can interact through human interfaces or interfaces designed for humans, which would include, of course, websites, but also desktop software, the crusty old AMS that you don't like very much, all those products that are out there, right, if an AI could learn those desktop software tools of various flavors or mobile tools or whatever. That's really interesting because then the AI is able to actually literally do all the digital labor that a person would do through a keyboard and mouse. And we know that AIs can see, we know they can speak, we know they can hear, now they can type and now they can move the mouse, and so this is a very interesting thing.

Amith: 36:40
I want to point to another rival lab of open AIs called Anthropic, which is the maker of Clod. Last year they revealed something very similar to this called computer use, which is available in Clod, I think also for their premium tier. I personally haven't used either of these, but the idea is the same. Now the way they've implemented it is different. In the case of Claude's computer use, it controls your local desktop, so you're actually handing over the reins of your computer to the AI, which I personally find very scary the idea that the AI would get into files or send an email on my behalf or just something gone awry right, like even malware type scenarios. But what OpenAI did, I think, is really smart, that they have a different approach.

Amith: 37:22
The way Operator works is it does not control your computer, it controls a remote browser, and that remote browser runs in an environment where you can see what it's doing and you can interrupt it at any time. It can ask you to help at any time. So it's kind of like you know you're the co-pilot, right, instead of you using a co-pilot. The AI is doing the work and you're the co-pilot. Effectively, it flips the roles. But that's kind of cool because in that context, there's less risk because you're not logged in to any website. So, like with OpenTable, they didn't really talk about this in their demo when they released the operator agent. But how was that developer who was demoing it logged in? Well, someone has to log in. Of course, they omitted that from the demo, which I understand, but the reality is there's downsides to all this stuff.

Amith: 38:11
I think this is brand spanking new. So I personally haven't used it because I don't have a lot of utility for it immediately. Plus, I'm a little bit nervous about it, to be perfectly honest, just in terms of cyber risk. But, that being said, I do see use cases which we can talk about. But that's kind of my opening salvo and I think it's a really important innovation because of the kind of the human interface thing I was starting with, right, like robots in the physical world, being able to like, meld into the whole world.

Amith: 38:40
That's designed for us, and the same thing for computers. Computers are designed for us to use. It's incredibly inefficient for one computer to talk to another through a user interface designed for people. It's kind of silly, actually, because APIs are a thousand times faster and more efficient, but the world of websites and applications is probably tens of thousands of times bigger than the number of things you can do through API. So it just opens up a lot of possibilities for automation and legacy systems. I was kind of kidding about the crusty old AMS comment, but kind of not, because if you could have the bot actually do some of the stuff with your old system which maybe people don't like using, maybe it frees them up and they don't really care about replacing the old AMS as much anymore because you've automated it.

Mallory: 39:24
Well, I will say crusty old AMS is probably going to be one of my top ranking phrases that's ever come out of this podcast. I'm going to chuckle about that later tonight. I think some of our listeners might be surprised at me to hear that this is something you're a bit more weary about. I feel like generally you're an optimist right, you're realist as well but generally you don't seem to be overly cautious with any of these things. New release is not saying that you are now, but I'm curious if you are cautious about this, are you suggesting that our listeners be as well, that this is not something they should go out and try?

Amith: 39:58
Well. So I'm also a cheap entrepreneur, so paying OpenAI $200 this month is not something I'm willing to do right now. I just don't see enough value. I don't really care about using Sora personally, and this tool isn't enough. It's not meaningful enough for me to go try it out. Plus, I'm kind of busy with other things. So if I had a little more free time I might go play with it and pay him for a month, but I really don't think it's that interesting because I understand how it works and I think everyone's going to have this.

Amith: 40:21
There's actually already open source versions of Operator that you can download and inference locally with Llama, which are worth checking out as well. That's a little bit more involved in setting it up, but you can run operator-like things and that's actually existed for months. It's not something that happened right after OpenAI came out with Operator, so there's ways to do this. I also think it's super, super early infancy days for this technology. It's really crude, it's really slow. I don't think the utility is super high, but like I was saying earlier about R1 and how the predecessors to it and what's going to happen later are on this crazy timescale, the same thing is going to happen with this by the end of this year. These kinds of agents will be very common and they will work quite well and they'll be very fast.

Amith: 41:03
So, rather than moving at this like pokey pace that you saw in the demo, where it's like, oh, that's what operator does?

Amith: 41:10
That operator really doesn't know how to use the computer too well, right, maybe we should send them to, like a Mavis Beacon teaches typing class, because that operator kind of sucks at using the keyboard.

Amith: 41:19
But what I'm serious about, though, is that that's because the AI is so slow right now and it's so resource intensive to do. What it's doing is essentially taking a screenshot multiple times per second, running the screenshot through the model and then getting back JSON and instructions, essentially to tell it what to do and which pixel to move the mouse to and what button to click, and all that kind of stuff. It's incredibly inefficient, but it's a great concept, and so over the course of the next six months, we'll get another AI doubling, another one the second half of the year, and probably by the end of the year. These things will be quite good. So I'll jump in my comment about cybersecurity. Risk is actually mainly with the computer use and anthropics model running on my local computer. That risk doesn't exist in the open AI example, because you're running in a remote browser. At least it's not the same kind of risk.

Mallory: 42:03
Yeah, it just seems like this is one of the first steps we've seen recently for agentic solutions or agent-like actions, in terms of actually being able to do things on your computer in consumer-grade technology. Would you agree with that?

Amith: 42:17
Yeah, totally. I mean the examples people use. They're trying to make it relatable as well, which is why they use the how to automate sending your mom a birthday gift, which I think is just the funniest example. But it's crazy. But that concept is relatable, I guess, and we're talking about it. So maybe they knew all along that it would be something people would laugh at. But the point I would make is there's a lot of business use cases for this.

Amith: 42:41
So think about this you get an email and the email says I'd like to renew my membership, and you in the member services assistant or member services specialist role has to do that. So what do you do? Well, you have the email open in one screen and then you open up your AMS crusty or not in the other screen and then you do the stuff right. You open up, you search for the member, you swear at the computer because it couldn't find the member based on the email or the phone number, and then you finally find the member and then you go into the record and then you figure out how to do the renewal, whatever that process is in that system. So there's like anywhere from three to 15 steps that you go through in the AMS.

Amith: 43:22
Well, what if you could just say hey, operator, watch me do this. This is what happens a lot? I get these emails and this is what I need to do in the AMS. This is the AMS system I use, so this is what I want to do and use. So this is what I want you to do.

Amith: 43:39
And the operator says, yep, got it. I now have been trained just like a new employee and the next time an email comes in like that, you can just click on it and say set it to operator and operator takes care of it. It sounds a little bit sci-fi and it wouldn't work well at all right now if you tried to do that with operator. It's way too simplistic of an agent today, but that's not true in 6 or 12 months, certainly in a couple of years. So, yes, you should totally be thinking about this stuff as a way to radically change the way your workflow is, because those types of things that require so much manual effort and take up as a percentage of your staff's time that kind of task I just mentioned somewhat jokingly but seriously it's like that probably chews up half of the people's time in 50% of the positions in your organizations. Like 25% of your staffing budget goes towards those kinds of tasks that could probably be automated. Think about what you could do with getting that time back.

Mallory: 44:25
And I would say too, if you're a leader listening to this podcast, you've got to take a stance on this, you've got to educate your team on this, because what you don't want to happen is this technology to get a little bit better and maybe your staff goes off and they start using operator and maybe they're not disclosing that and maybe operator does exactly what Amith says makes a mistake. And then your staff comes to you and say well, I used operator, so am I in trouble? Is operator in trouble? What's our staff usage policy around this? I would say this is something you want to get ahead of for sure.

Amith: 44:54
Yeah, I would add to that. I think policy and guidelines are key, because you have really three categories of people that I've seen and this isn't just for this technology, but it's especially true. You have people that are doing nothing and probably actually very happy to do nothing. And then you have people who really want to do something but aren't doing anything because there's no policy to help them know that it's the right or the wrong thing to do. And then you have people who are just going to do whatever the hell they want, regardless of what your policy says, and your policy is probably not going to affect them, and that they exist in every organization, and actually sometimes those people are really helpful if you can get them to say, hey, what's working, what's not. But in any event, without a policy without guidelines, without training as well, without a policy without guidelines, without training as well, you're leaving people in the dark, and so I agree wholeheartedly with what you're saying, Mallory.

Amith: 45:39
I think I talk to people about a lot of different technologies, a lot of different AI models and software architectures and business strategies and blah, blah, blah, blah blah, but every single time, the conversation is all right, where do we get started? And it's always the same thing. Get some policies and guidelines in place. Don't spend a million years on it. Don't go and convene a special panel of 300 of your favorite members to talk about what AI should look like and spend the next six months and amend your bylaws. Don't do that. Just create something simple really quickly and tell people it's going to change constantly because AI is changing fast. Roll it out and then invest in some training for yourself and for your team. It can be sidecars, it can be anyone else's. Just go get trained and get a guideline in place and you'll be dramatically better off.

Mallory: 46:23
Yeah, and guidelines aren't always meant to stifle people's innovation by any means. They can actually give people a lot of freedom by creating those parameters that they can act within.

Amith: 46:32
Yep.

Mallory: 46:33
Well, Amith, it was a good, jam-packed episode. You were on your comedy game. We talked about R1, crusty old AMS's operator. Lots of good stuff. Thanks for tuning in to our audience and we will see you all next week.

Amith: 46:49
Thanks for tuning in to Sidecar Sync this week. Looking to dive deeper? Download your free copy of our new book Ascend Unlocking the Power of AI for Associations at ascendbookorg. It's packed with insights to power your association's journey with AI. And remember Sidecar is here with more resources, from webinars to bootcamps, to help you stay ahead in the association world. We'll catch you in the next episode. Until then, keep learning, keep growing and keep disrupting.

 

 

Mallory Mejias
Post by Mallory Mejias
January 30, 2025
Mallory Mejias is the Director of Content and Learning at Sidecar, and she's passionate about creating opportunities for association professionals to learn, grow, and better serve their members using artificial intelligence. She enjoys blending creativity and innovation to produce fresh, meaningful content for the association space. Mallory co-hosts and produces the Sidecar Sync podcast, where she delves into the latest trends in AI and technology, translating them into actionable insights.