Summary:
In this episode of the Sidecar Sync, co-hosts Amith Nagarajan and Mallory Mejias dive deep into the cutting edge of AI-generated audio and video. Mallory demos the stunning new ElevenLabs V3 voice engine and the lifelike hand gestures of HeyGen's Avatar 4, while Amith shares behind-the-scenes insights from a recent hackathon focused on building agentic software. The pair also unpack Apple’s eyebrow-raising research paper, “The Illusion of Thinking,” and debate whether AI models are truly reasoning—or just pattern-matching very well. Whether you're building AI learning hubs or just wondering when Siri will finally get it together, this episode’s for you.
Timestamps:
00:00 - Introduction05:39 - Hackathon Preview and Agentic Software Vision
08:16 - ElevenLabs V3: Emotion, Expression & Global Reach
09:58 - Live Demo of ElevenLabs V3
14:12 - Real-World Application in Sidecar's Learning Hub
22:37 - HeyGen Avatar 4: Hand Gestures & Hyperrealism
24:55 - Avatar Demo & Ethical Considerations
31:25 - Real-Time AI Instructor Interactions
37:21 - Apple’s “Illusion of Thinking” Paper
45:05 - Siri, AI Utility, and the Road Ahead
🎉 More from Today’s Sponsor:
Member Junction https://memberjunction.com/
📅 Find out more digitalNow 2025 and register now:
https://digitalnow.sidecar.ai/
🤖 Join the AI Mastermind:
https://sidecar.ai/association-ai-mas...
🔎 Check out Sidecar's AI Learning Hub and get your Association AI Professional (AAiP) certification:
📕 Download ‘Ascend 2nd Edition: Unlocking the Power of AI for Associations’ for FREE
🛠 AI Tools and Resources Mentioned in This Episode:
ElevenLabs V3 ➞ https://www.elevenlabs.io
HeyGen Avatar 4 ➞ https://www.heygen.com
Claude ➞ https://claude.ai
OpenAI Voice Mode ➞ https://openai.com/chatgpt
Synthesia ➞ https://www.synthesia.io
https://www.linkedin.com/company/sidecar-global
https://twitter.com/sidecarglobal
https://www.youtube.com/@SidecarSync
⚙️ Other Resources from Sidecar:
- Sidecar Blog
- Sidecar Community
- digitalNow Conference
- Upcoming Webinars and Events
- Association AI Mastermind Group
More about Your Hosts:
Amith Nagarajan is the Chairman of Blue Cypress 🔗 https://BlueCypress.io, a family of purpose-driven companies and proud practitioners of Conscious Capitalism. The Blue Cypress companies focus on helping associations, non-profits, and other purpose-driven organizations achieve long-term success. Amith is also an active early-stage investor in B2B SaaS companies. He’s had the good fortune of nearly three decades of success as an entrepreneur and enjoys helping others in their journey.
📣 Follow Amith on LinkedIn:
https://linkedin.com/amithnagarajan
Mallory Mejias is passionate about creating opportunities for association professionals to learn, grow, and better serve their members using artificial intelligence. She enjoys blending creativity and innovation to produce fresh, meaningful content for the association space.
📣 Follow Mallory on Linkedin:
https://linkedin.com/mallorymejias
Read the Transcript
🤖 Please note this transcript was generated using (you guessed it) AI, so please excuse any errors 🤖
[00:00:00] Mallory: Because you can see avatars of other people. I think when you see them of yourself though, and you see how realistic they look, that's when you can really gauge just how crazy this technology is.
[00:00:12] Amith: Welcome to Sidecar Sync, your weekly dose of innovation. If you're looking for the latest news, insights, and developments in the association world, especially those driven by artificial intelligence, you're in the right place.
[00:00:24] We cut through the noise to bring you the most relevant updates with a keen focus on how AI and other emerging technologies are shaping the future. No fluff, just facts and informed discussions. I'm Amith Nagarajan, Chairman of Blue Cypress, and I'm your host. Greetings and welcome to the Sidecar Sync. My name is Amith,
[00:00:44] Mallory: and my name is Mallory Mejias.
[00:00:47] Amith: We are your hosts today. As always, we have some exciting stuff for you guys at the intersection of associations in ai. How are you doing, Mallory?
[00:00:57] Mallory: I'm doing pretty well, Amith. I know it's, it's been a minute since you and I have recorded an episode. Sometimes we have to bulk batch these episodes, a few in one week.
[00:01:05] So I haven't seen you in a bit, but all is well over here in Atlanta. How have things been for you?
[00:01:11] Amith: Things have been good. I've been bouncing around a little bit. Uh, helped out an association out on the west coast with a keynote at their conference. Uh, I think two weeks ago. It all blends together and have been bouncing around doing things and about to head west again to participate in a hackathon for a week, working with a number of our colleagues across our family of businesses, building all sorts of cool new things.
[00:01:33] In ai. So I always enjoy those. Those are fun opportunities to think creatively and, and do some fun engineering work as well. So excited about that. And uh, yeah, there's just been a lot going on. So it's, it's down here in New Orleans. I dunno if Atlantis caught up. Last time we discussed this topic of weather, I had said it was annoyingly hot already here in the big easy, but, and you, you had said that the weather was awesome in Atlanta.
[00:01:56] I think that particular day, is it still nice?
[00:01:58] Mallory: It's getting hotter. I'll say it is hot here in Atlanta. I think though, it's funny how quickly people adjust because it's still only right now, uh, let's see, 82 degrees, right? So in New Orleans, 82 is like, ooh, 82. This is kind of nice outside. Now I'm spoiled and I think, oh, 82 degrees is so hot.
[00:02:18] 75 degrees is kind of hot too. So I think it's all relative, but things are getting warmer here in Atlanta. But as I've mentioned, I really like to do outdoor activities, so I'm excited to, to kick out those. New bikes we got, we just got some new helmets, so we're gonna be riding around Atlanta safely. Um, I wanted to ask Amit about your keynote, because you mentioned on the pod before you work on your keynotes well in advance, but then you're also kind of fine tuning them down to the last minute because things are changing so quickly.
[00:02:48] So what did you present on? Was it more of a general AI talk or did you focus in on like any recent development?
[00:02:55] Amith: Usually what associations ask me to do is to give them a picture of the landscape of ai. Okay? So certain fundamental elements of that are quite durable. I would say, you know, the overall exponential landscape of change, and what I talk about a lot is how to think about ai, not so much in terms of tool number one or tool number two, but more about like what will happen, how do you anticipate the change and the rate of change, and therefore when you plan about, you know.
[00:03:24] What are you gonna go build or how are you going to utilize this stuff? Um, how to build that thinking into your roadmap and your adoption process. Most people are used to thinking about adopting tech by studying the technology and then thinking about a roadmap for that technology. That's essentially a snapshot in time.
[00:03:40] If you were to evaluate a new CRM software, you might say, okay, well let me pick between Salesforce and HubSpot and Pipedrive, and so on and so forth, and then you. Choose one and then you implement it. It might take you three months, six months longer to implement it. Uh, and then your plan to use it would be based upon the assumption that the functionality, it's really pretty much what you evaluated when you first looked at it.
[00:04:01] Um, and in the world of traditional software, that's okay. I mean, software changes pretty quickly, even pre ai. But in the world of ai, when models are doubling in power every six months, almost like clockwork, it's very difficult to use that mindset. You have to look at it with much more of a predictive approach where you say.
[00:04:18] I'm going to build an implementation plan, what will likely be in six months, 12 months, 24 months time, and therefore build that into the roadmap and essentially build to what you anticipate to have, which is a little bit scary for people actually. So I spent a lot of time talking about trying to acclimate to that new reality.
[00:04:36] Um, that part generally stays the same, but then the examples change. Mm-hmm. And then what I try to do for each audience. It's to tailor the content in a way where it really hits home, where I can use examples that are in their world, uh, which is a bit of stretch for me. And actually part of fun is that, you know, you learn about these different groups and what they do, whether it's in medicine or law or accounting or science somewhere.
[00:04:55] So it's, it's good fun. I really enjoy it and it helps me a lot in leading the Blue Cyprus family, thinking about our vision, thinking about where we want to go, because a lot of what I need to do is understand, uh, with as much depth. This possible, uh, the challenges and opportunities associations have. So when I'm in the mix as part of the delivery of one of their most important services, which.
[00:05:14] Conferences tend to be, uh, that's an incredible feedback loop for me to bring back ideas, saying, Hey, well what about these things that, uh, our team may, may, uh, get value from? And, uh, quite a number of us on the leadership team across Blue Cypress and various companies within BC are out there, uh, providing these kinds of keynotes to, uh, association conferences.
[00:05:33] So we have a, a really good network of folks bringing that kind of feedback, uh, back into the mix.
[00:05:39] Mallory: So going into the hackathon next week, which I think when this episode drops, it'll be next week. So, uh, you're at the hackathon right now. If you're listening to the Sidecar Sync podcast, are you focusing on any specific visions that you have or what's the direction If you wanna share, maybe it's a secret.
[00:05:57] Amith: There's, there's no secret really. It's, uh, you know, well, my, my vision at the moment is mountain biking and e foiling on the lakes up in Utah where I'll be. That's, uh, two of my favorite activities in the summer. And so I'm envisioning doing a lot of that next week. I don't know how much I'll actually get to do, because hackathons are all about, you know, writing code for 16 hours a day and eating far too much pizza.
[00:06:17] Mm-hmm. Uh, but we'll get out there a little bit and do some of that. But, uh, in terms of, uh, AI specifically. I think that we'll be focusing almost exclusively on agentic software, meaning, so we talk a lot about agents here. Uh, we have a number of AI agents already in our portfolio at Blue Cypress, like Betty and Skip, and we're adding a couple more that are in the earlier stages.
[00:06:38] Uh, and we, what we've built is, uh, our open source framework. Member Junction is our AI data platform. It's the freely available, uh, uh, AI data platform We've talked about. Quite a few times in this pod. Uh, we've added a number of infrastructure features there for building agents where the goal is essentially for a, um, a, a business person, not a technology person, to be able to construct their own agents inside an enterprise ready to AI data platform.
[00:07:04] So not building simple agents in a consumer tool, but being able to use the same conversational style of interface. Talking to an AI to say, Hey, I wanna build an agent that does these things, but you're in a totally secure data ecosystem with your data from all over your organization brought together.
[00:07:20] Uh, so we're working on that and then we're building some additional agents on top of that, uh, foundation layer. So quite exciting stuff.
[00:07:28] Mallory: Absolutely. Dare I say, is that vibe coding, having a, a business person go in and build their own agent.
[00:07:34] Amith: We'll be up in the mountains, so there'll be a very good vibe. Uh, and we'll be working with Claude Code.
[00:07:39] I'm sure we'll be working with some of the other advanced agentic coding tools that are out there. Um, so yes, I guess you could say that although everyone there is a professional developer and, uh, so, you know, we'll be getting into the code, actually doing some coding as well. Mm-hmm. Side by side with our, our good friend Claude.
[00:07:56] Mallory: Good vibes all the way around. All right, well, in today's episode, we've got some exciting topics lined up. First, we're gonna be talking about what's new in ai, video and audio, one of our favorite topics, and then we're going to be having a discussion around an interesting paper that was released called The Illusion of Thinking that I'm excited to talk about.
[00:08:16] So first and foremost. What's new in ai, video and audio. I'm gonna spend a little bit of time talking about 11 labs, one of our faves, and then Hagen another one of our faves as well. So 11 labs has launched 11 V three alpha, and this new model offers unprecedented expressiveness, emotional depth, and fine grain control over speech delivery, making it particularly well suited for creative and professional applications like audio books, video narration, educational content, and interactive storytelling.
[00:08:47] 11 V three introduces advanced audio tags, so think, whisper, or angry or giggles, laugh, size, and this allows users to direct tone, emotion, and pacing directly within the script. This enables more nuanced performances, including non-verbal reactions and mid-sentence mood shifts. The model supports multi-speaker conversations with natural interruptions and emotional transitions, uh, allowing for seamless interaction between characters.
[00:09:16] It also supports over 70 languages up from 33 in previous versions, covering approximately 90% of the world's population. This makes it highly accessible for global content creation and multilingual projects. The model's new architecture allows for better stress, cadence, and expressivity from text input resulting in more human-like and dynamic speech.
[00:09:40] I will say achieving the best results with 11 Labs or 11 V three requires more precise prompting and script annotation compared to earlier models. As much as I can explain about this, I feel like the best way to understand what 11 V three is about is to insert a quick clip from their demo. So I'm gonna do that now.
[00:09:59] Demo Video: Hey Jessica, have you tried the new 11 V three? I just got it. The clarity is amazing. I can actually do whispers now like this. Ooh, fancy. Check this out. I can do full Shakespeare now. To be or not to be. That is the question. Nice though. I'm more excited about the laugh. Upgrade. Listen to this. That's so much better than our old ha ha ha.
[00:10:26] Robot chuckle. I know, right? And apparently we can do accents now. Too. Fancy a cup of tea. Wow. V two me could never. I'm actually excited to have conversations now instead of just talking at people.
[00:10:43] Mallory: You'll see that demo is pretty impressive. You're hearing the whispering, you're hearing the wit, the personality, the back and forth between the narrators.
[00:10:51] I went in and played around with 11 V three myself, and I told Amit this before the recording, what I was able to create was not as good as what was shown in the demo. I don't think that's anything shocking. Right. You always put your best foot forward in the demo. It took me probably. I wanna say I played with it like 15 to 20 tries to get the clip that I'm gonna show you a little bit later in this podcast.
[00:11:14] But overall, I really like the added annotations. Just being able to provide that guidance and direction is incredibly helpful. 11 V three is still undergoing refinement, so some users are reporting audio tags might not always be interpreted as intended, which was similar to the experience that I had.
[00:11:32] But overall. Really impressive and exciting on the 11 labs front. So, Amme, I wanna know what your thoughts are. Uh, do you think 11 Labs is maintaining its spot as a leader in AI and audio?
[00:11:45] Amith: I think so, I think this, uh, the demo is incredible and, you know, we, we say that each time there's a new kind of step change in quality, but, you know, we, our, our natural modality as humans is audio, right?
[00:11:57] We, we speak to each other, we listen to each other, and it's various forms of audio. And so the expressiveness. Of this new model is really what's unprecedented. The ability to control the model, to hint at the model, to be able to give it these different extra tags is a dream come true. Actually, if you're doing text to speech, it's, it's something we've been talking about and thinking about as users of a variety of text to speech technologies for some time.
[00:12:23] So I find it exciting. I think the, the race is so close that new models from other vendors are, uh, both coming and coming in parallel. You know, OpenAI introduced a new text to speech model just about a month or so ago, so I certainly would not count them out. It's a major, major category that. All the major labs I think are going to play in at some level.
[00:12:43] So I find that exciting competition is a great thing. 11 has been a leader in this category. I love their hyper-focus. The fact that this is, this is the thing they do, it really keeps them, I think, uh, on edge and moving ahead and experimenting in new ways. You pointed out it's an alpha release, which in software land means that it's not even good enough to be a beta, so.
[00:13:03] If you use beta software and you've kind of, you know, felt like occasionally you get bunked on the head, because it's a beta software tool, alpha tools are even more raw. So it's definitely in it, it's kind of in test phase. Yeah, I wouldn't use it for anything production like, you're definitely not supposed to do that.
[00:13:18] But it's nonetheless at the speed at which AI moves. It'll be beta if they even do that, and then production quality probably within a handful of months, if not sooner. So I'm, I'm super pumped about this. I think this is an opportunity to communicate more effectively and to add really more enjoyment to a lot of communications that would otherwise potentially be very dry.
[00:13:38] Mallory: Mm-hmm. I agree with that. I had the experience. Probably two years ago now of working on a short film with a good friend of mine, and I didn't realize how important audio is. We did a lot of research, you know, looking at low budget, short films and things that you could do to help bring up the quality.
[00:13:56] And essentially it wasn't necessarily the type of camera you used though, that's important. It was the audio. If you had bad audio quality, if people couldn't hear well, then your film's not gonna do well. So I think having AI audio that's so impressive is really. Essential for associations. I know we rolled out our new AI learning hub content using 11 labs for audio.
[00:14:19] Everyone, if you haven't checked out our most recent episode I did with Jason Marky about content creation and like the whole workflow around that, please check it out. It was an incredible episode. But I'm curious from your take, Amee, we rolled it out I guess a few weeks ago at this point, maybe a month ago.
[00:14:35] What has the reception been on using AI audio and AI avatars, which we're gonna discuss as well.
[00:14:42] Amith: A couple different things on that. So we, at the beginning of every one of our courses introduce the instructor, and the instructor is an AI avatar. Each avatar is given a persona, essentially a name, a personality, and, and obviously an avatar, which is the, the video and the, and a voice.
[00:14:57] And so the idea there is to build the personality of those instructors over time so you get to know them and different instructors will have different areas of expertise to essentially give them human-like qualities. While at the same time introducing themselves very, very clearly as AI avatars, that's kind of an AI ethics conversation, right?
[00:15:15] That you really should in, in our opinions anyway, you should really disclose the use of ai. I don't see it as a negative at all. I just see it as an imperative to tell people that it is ai, uh, to be on the up and up about that. Uh, but we have received really positive feedback that, uh, the quality of the audio, the quality of the content is really good.
[00:15:32] We can make changes quickly. That's the reason that was our motivation in doing that. We like talking plenty over here at Sidecar, but uh, we like to be able to change, uh, the content extremely rapidly to keep up with the pace of ai. So 11 Labs along with Hagen, which I know we're gonna be talking about a little bit later, um, have given us the ability through a number of, of software innovations we've created to stitch it all together.
[00:15:55] Uh, a mechanism to publish new content, really, really. Fast. So it's been positive and I've been excited about that because we were a little bit fearful of, uh, not, I wouldn't call it so much that we thought we might have a backlash, but we just thought we might have some people that, uh, felt it wasn't ready for prime time.
[00:16:09] And we've gotten a little bit of critical feedback, but primarily it's been super positive. Uh, and some of the nitpicking, I wouldn't actually, nitpicking might be the wrong word to use, I think it's kind of nitpicking, but, um, it's basically saying things that are like, you know, certain words weren't expressed perfectly and all that.
[00:16:24] Totally get it. And I agree with it. We're gonna get to the point where it's, it's a hundred percent perfect. But I would say simply that, you know, I can speak for myself, I use far more filler words than I would like to, uh, when I listen to recordings of myself, I can't stand it because there's uhs and ums and.
[00:16:40] All sorts of other annoying things. And, uh, AI does not do that unless you tell it to, right? If you tell AI to be human-like, it'll probably start doing all the things that you hear me do in any event. Um, I think that it's, it's really positive. Uh, I think that at the same time what we're about to be able to do in the next generation, once we get access to a production version of V three will be really cool because.
[00:17:04] We will be able to put all sorts of new forms of expression into the same exact audio script, but by adding these kinds of tags, we'll be able to. Make the, the really key points hit home better. Mm-hmm. But I think we'll be able to make it a lot more fun. Right. You know, if you think back to maybe even high school or middle school, and you think about some of your experiences there from a, from a learning perspective that were most formative, oftentimes it's with a teacher who, uh, found a way to engage the classroom through humor.
[00:17:32] Still teaching a very important lesson, whatever it was. And that's, that's been my experience, certainly in the classroom as a student, uh, years ago. And I think that if we can find ways to scale that there's incredible power in delivering better and better educational experiences. I.
[00:17:47] Mallory: Yeah, absolutely. I can think back to my high school teachers that were really funny, even in a class like AP Statistics, right?
[00:17:54] That can't be that fun, but there's ways to incorporate humor. Um, wow. And I will say as someone who tried out 11 V three, as I mentioned, something that's pretty neat that you can do because it might seem a bit daunting to have to put your whole script into 11 V three and then. Manually go through and add all your own tags.
[00:18:14] You can actually press a button, I think it's called Enhance, and then 11 Labs just puts tags where it thinks they belong. And it at least gives you a start. So you can say, okay, well no, I don't wanna laugh here, but I do want excitement here. Maybe I want a British accent. I don't know. But you can do that all within 11 labs, which is pretty neat.
[00:18:32] Amith: That is super cool. Uh, I find that very exciting. I think that, uh, you know, the ai, there's multiple layers of AI happening here, right? So there's a, you mentioned this earlier in the overview of this new model's availability, that there's better understanding of the text itself in order to suggest to you where it should insert those tags to add emphasis or add a laugh or, or whisper or shout or whatever it is.
[00:18:56] Those are things that deeper understanding of the content has to be in place. So. This is a good example of the compounding of these models. Mm-hmm. And the leverage of using one model to train another model to make another model smarter. So Live Labs V three is in the audio modality, specifically text to speech, but it's sitting on top of all of the innovations that are happening in language models and perhaps other things that we're not aware of.
[00:19:20] Um, the other comment I wanted to make about this is that. When we were working on building the technology to enable this new form of content production for all of our, all of our LMS content, um, we did not believe that the con, the technology at the time, 11 Labs and Hagen was good enough to do what we wanted to do.
[00:19:39] We actually thought that it was fairly poor quality because we started talking about this last fall. This is not a concept that was new to us. We're like, yeah, this is amazing. We, at some point we'll be able to create all sorts of content this way. Early this year. We started really late last year, early this year, we started experimenting with the software pipeline to be able to stitch all this together, what you and Jason spoke of in the prior episode, which I would also encourage everyone to check out.
[00:20:02] It's really cool. Um, but in that process, we knew that it would take us probably two, three months to really build the software. And what we were betting on was that either at that point in time when the software was ready or soon thereafter. The underlying models, the audio and video models would be good enough.
[00:20:19] And we were essentially riding that wave looking to see not so much where the audio and video models are today. Mm-hmm. But what we are fairly certain they will be able to do in six months or 12 months time in building for that. So it kind of reinforces what I was saying earlier about what I try to teach audiences about anticipating the AI curve.
[00:20:35] Mm-hmm. And then building for what will likely be in the near future. And there's a little bit of risk to that because you might be a little bit early and you might have to wait. But that's actually a much better place to be, in my opinion, than being late to the party with something that's underwhelming.
[00:20:48] Mallory: Mm-hmm. Building with a vision in mind. And now that you have that infrastructure, you get to ride this wave of just better and better quality. Like when this is production ready, roll out 11 V three in the content production line, and then Hagen avatars, if that's what you wanna do. So I feel like you're, you know, of course that's what we think on the Sidecar Sync podcast, but I think arriving early is better than late.
[00:21:09] Amith: No surprise there.
[00:21:11] Mallory: This is a good segue to Hagen Avatar four, which is a next generation avatar engine that transforms a single photo into a lifelike talking video with synchronized voice facial expressions, and for the first time, pretty realistic hand gestures, all without the need for a camera or motion capture.
[00:21:31] Avatar four's diffusion inspired audio to expression engine analyzes vocal tone, rhythm, and emotion to generate photorealistic facial movements, head tilts, pauses, micro expressions, making the avatar's performance feel pretty authentic and human-like. As I mentioned, avatar four is introducing those natural hand movements that are synchronized with the speech, adding depth and expressiveness to every video.
[00:21:57] It's funny as I'm saying this, if you can see, I mean, I'm moving my hands, uh, might be priming myself there. It supports a wide range of avatars, including hyperrealistic human clones, which is the example I did. Stylized characters, anime, and even animal avatars. It works with portrait, half body and full body images.
[00:22:16] The model can also adapt to various image perspectives, front facing tilted heads profiles, angled poses, delivering a faithful render regardless of the original photos orientation. So. A quick overview of how this works. One, upload your photo. Like I said, portrait, half body, full body. You can type in your script directly in Hagen, or you can upload an audio file from another platform.
[00:22:41] So you can probably see where I'm headed here. I used 11 labs V three and then I combined it with Hagen Avatar four. Uh, I'll share that video in just a moment, and then you'll see the video at the end where the avatar speaks, reacts and gestures. In sync with your audio input and produces that video in just a few minutes.
[00:22:59] To be honest, standout capabilities, in my opinion, are the fact that kind of what we were talking about with 11 V three, avatar four interprets the script. So it's not just syncing the words, but it's also reacting to the tone, pacing in emotion, and that gets you a more natural delivery in the end. So I used 11.
[00:23:18] V three to create this voice, and then I took it into Hagen. I used my LinkedIn professional headshot, and I'm going to include the video now of what I created. I'm AI avatar Mallory, what you see on screen was created using one single photo in Hagen. My audio is brought to you via 11 labs V three time for me to get some avatar coffee.
[00:23:42] See ya. It's a pretty short clip as you can see. I think it was, uh, maxing me out at 15 seconds for the video, so I had to make the audio script pretty short, pretty. Impressive. Amit, I know when I shared an earlier version of this with you, you were impressed as well. Of course there's always the uncanny valley of, of watching an avatar version of yourself, but I almost feel like it's an essential step to do because you can see avatars of other people.
[00:24:09] I think when you see them of yourself though and you see how realistic they look, that's when you can really gauge just how crazy this technology is. So, Amit. Based on this topic, we're seeing huge quality improvements with ai, audio and video. We're experimenting a lot at sidecar with it. Do you currently see a lot of associations dabbling with ai, audio and video?
[00:24:33] I.
[00:24:34] Amith: I am starting to see more and more. I think the dabbling is on very small elements initially. Um, video is really an amazing area of progress for ai and, uh, hey Jen, along with Synthesia and a number of other companies, are really pushing the boundaries of what's possible with, uh, specialized models around avatars, I think are worth not calling out.
[00:24:56] Versus, uh, VO three or other, uh, general purpose video models, which are also super interesting. But these avatar oriented videos are focused specifically on, on really you as a person or, or an individual and facial expressions and now, uh, body movements, uh, hand gestures, these kinds of things. Uh, and we've seen some major improvements there.
[00:25:18] When you sent me that first version, Mallory, if I didn't know your voice quite well, I would actually, probably have been, I would've been, it would've been possible. To fool me. I knew it wasn't real because the voice was obviously not your audio, right? You hadn't yet tried to train in an 11 labs version of yourself and all that.
[00:25:33] So, but I think we're, we're almost through the uncanny valley in a sense, right? Because it's so good that it is, uh, if it's of a, if the avatar is, is, uh, a lookalike of a person, uh, the ability for most of us to detect. That it's AI is, uh, I think we're well past that actually in mm-hmm. Many respects. Um, this is also another point in time where it's worth maybe, uh, touching on ethics for a second.
[00:25:55] This is also where I think, uh, there's different approaches to this. I think there's value in AI avatars that represent us where you. If you choose to do so with your own likeness to create your own avatar, create your videos from there. Uh, if you have permission to do so from people and they fully understand what it means that you can create videos of any kind, then, then that's great.
[00:26:14] Uh, but I also think there's plenty of opportunity for entirely AI generated characters to be formed and to use those, even if some of those are intentionally a little bit less human, where yeah, maybe they're a little bit cartoonish or something like that, where you use these in order to express the ideas, the content, the entertainment, uh, but.
[00:26:32] Intentionally have a gap between the truest expression of, of human-like form. So there's options there. There's opportunities and there's options and there's things to be thinking about just because we can, uh. Impersonate a video of a true human to such a degree of perfection doesn't necessarily mean that that's what we should do.
[00:26:50] In all cases, I think there's, it's an option. Um, by the way, just as a, a side note, uh, to call out for, for all of our audio only friends that are listening on various, uh, podcast channels, we do have a YouTube channel, and it is worth checking out our YouTube channel from time to time, particularly for episodes like this because you'll get to not only see Mallory and myself speaking, but you'll also see these demos.
[00:27:14] Uh, with a little bit more, uh, resolution than you can get through audio only. Uh, but my, my general point of view is this, is that audio and video are these two modalities that obviously go hand in hand. They train off of each other. Improvements just keep on coming. I think the opportunities are there.
[00:27:29] Coming back to your question, I anticipate seeing an explosion of use cases in the association sector. I think there's some hesitancy around privacy and around ethics, and that's why I, I kind of point to those topics a lot when I speak to audio or video use cases. Because it is appropriately a very sensitive, topical area.
[00:27:49] Uh, and I think the approach is, number one, be totally transparent. And number two, be creative about it. You don't necessarily have to clone your most, uh, well-known speakers at your annual conference. You can do other things. So there's a lot of opportunity here.
[00:28:03] Mallory: I also wanna note, Amit, I know you and myself, we created Haitian avatars probably in 2023 sometime.
[00:28:11] I think it was around there, mid 23. Uh, and the process to create, it wasn't difficult, but you had to submit a video of yourself speaking. It was like two minutes long. You had to take a break after every sentence, close your mouth. So just. In less than two years. Basically just to be able to drop a single photo into Hagen and then create this whole video of yourself or of another character like you mentioned, I think is really neat.
[00:28:37] Amit, I'm curious, because we all know you're an avid AI user like myself. Do you ever use your AI avatar for any business purposes?
[00:28:46] Amith: No, I, I never have used it for business purposes. Actually, it's interesting, uh, when I recorded my AI avatar with Hagen a while back, I think it was a little bit, uh, over 18 months ago, I did have one of my dogs in the background on a chair and I was, I was quite interested to see whether or not Hagen would pick up on his movements.
[00:29:05] 'cause he happened to be moving around a little bit. And it did, it not only picked up. Like, so when you watch recordings or generations, I should say, not recordings of, from that avatar, it shows me speaking and it's a pretty low grade version of an avatar compared to what we're doing now. It's very obviously not real, uh, but it's, but it's close.
[00:29:21] It was, it was interesting. It also shows my dog moving around, so, um, it's pretty cool. But, uh, in any event, yeah, I, I have not used it yet. I don't know that I'm really not comfortable with it. I just don't know that I've had a reason to, I.
[00:29:34] Mallory: Yeah, I don't, I was thinking maybe, and maybe this already exists, but I don't know if you had your video off in a Zoom call or a teams call for whatever reason.
[00:29:43] Maybe you have your avatar there speaking while you speak. I don't know if that's quite possible yet, but I don't really want an AI avatar version of myself, to be totally honest. I think I'm just good with, with the real thing for now.
[00:29:54] Amith: Yeah. I'm not there either, but that doesn't mean that others wouldn't be quite compelled by this for different reasons.
[00:29:59] And I, I do think one of the use cases that's gonna be very interesting is, um, real-time interaction with avatars. So for example, going back to the learning scenario, uh, we're not doing this yet with Sidecars Learning Hub, but it's definitely on the roadmap when the technology is sufficiently. Far along, which is to say, Hey, we have our AI instructors and there's, right now there's five different AI instructors across our courses.
[00:30:20] And we're, by the way, we have a whole bunch of courses in development that are super exciting. We're gonna be going from seven different courses on AI for associations on the learning hub to, uh, over a dozen as the year progresses and add more and more from there. Um, but as different instructors, you know, have their.
[00:30:35] Areas of expertise, if you will, in terms of what they focus on in the courses. And what if we had a panel discussion amongst those avatars discussing different topics, and they're all AI avatars situated in kind of a panel seating kind arrangement. And what if you could ask them questions yourself?
[00:30:50] Right? Or what if you could ask an individual instructor questions and get answers? Uh, we rolled out, uh, Betty on our learning hub about. Two weeks ago, right around the same time we rolled out the AI generated content. And so Betty, if you're not familiar, is an AI knowledge assistant that can pop up on a website as a little chat conversational assistant.
[00:31:11] And so. The idea was we wanted to be able to have a tutor in every lesson. So you can just ask any question and Betty understands which lesson you're in, and will give you, uh, a Socratic kind of, uh, dialogue to help guide you. Rather than giving you the answer outright like you might get on a chat interface on a website, you'd get, you know, more of, uh, a nudge in the direction you might want to go and ask good questions of the learner.
[00:31:34] Um. And so the idea there is that's, that's great as a starting point, but what if we could actually have like a full, live, interactive conversation At any point in time when you're watching a video, you could just ask a question and the avatar in that video just starts talking to you, right? It just, it sounds very sci-fi.
[00:31:51] And today you cannot do this. Not, uh, I mean, if you have infinite computing resources at your disposal personally, you could probably do it. But for consumers and for associations not quite there yet, um, we will be, I think that technology, it's not a scientific challenge at this point. It's an engineering challenge and it's a scaling challenge to be able to do interactive live, uh, types of things like I'm describing.
[00:32:12] So I would say my prediction is, is by the end of 2026, uh, we'll probably be there where you can have those kinds of experiences. So plan on that now, because think about how powerful. Powerful. That is, if people could have, you know, two-way discussions or group conversations where you have a, a group of, uh, human students who are learning something with the AI instructor and you're having a group conversation, uh, that could be incredibly powerful.
[00:32:34] Mallory: Mm-hmm. I wanna acknowledge too, because I'm sure we have some listeners that perhaps like me, there's just some part of them deep inside, maybe not so deep that thinks, you know, there's something about human delivery. I mean, obviously we're on this podcast right now. You and myself, Amit, we could have AI avatars right now doing this podcast, but there's something about.
[00:32:56] Connecting with people and personal relationships and human banter that can be entertaining and informative for people. And I hear you all on that. But again, go check out that episode with Jason the previous episode because he brought up a really great point about. Using ai, not just for content creation, but content delivery and actually pointing out that often human instructors, we have the best intentions, but we might go off on a tangent.
[00:33:21] We might not be totally up to speed in the latest research backed instructional design principles, and maybe an AI avatar can be, or with the AI avatar, the video and the voice. So I think there's value in that as well. And. I don't know, maybe sometimes a bit of our ego thinks, well, I could do a better job.
[00:33:39] And maybe that's true, but I do think there's a ton of value in allowing AI to deliver content to.
[00:33:46] Amith: I agree with that, and I think in the context of associations doing what they do, that's a hundred percent correct. And I also think there's opportunities in the fields, the professions, the industries that the associations represent.
[00:33:57] Think about an AI avatar representing all of the world's knowledge on medicine, being able to provide real time, you know? State-of-the-art, medical care to somebody. There's obviously lots of challenges, concerns around that, but there's also an immense opportunity because no human doctor, no matter how well trained, no matter how well intentioned can compete with that, it doesn't mean that humans have no role in this process.
[00:34:19] It just means that this is a, an incredibly complimentary technology. If you think about it from the viewpoint of what AI can do at scale, that we. Are not interested in attempting to keep up because it's not what we do well. Mm-hmm. But I agree through what you said about the pod and other forms of media like this, is that there is an opportunity not only for human expression, um, and for connecting, but there, there are things that just seem to be the domain of people that seem fun to do.
[00:34:43] And that certainly is for me. And you know, it's one of those things that I think a lot of people enjoy connecting with when they're on, um, the listener or viewer side as well.
[00:34:52] Mallory: Mm-hmm. With the AI side, the human side, great for delivery, great for connection. But I think with educational content in particular, and I mentioned this on that episode too, not to spoil everything, but the fact is you, you spend all this time creating a course and perhaps a new model is released or a mo, a new functionality, and you're not gonna go back and recreate that whole course with humans.
[00:35:15] You're just not going to. So having the ability to use AI to bring your members. The most recent cutting edge education that you can, I mean, who could argue with that, that that's beneficial for them?
[00:35:26] Amith: For sure. I mean, you and I went through that exact experience multiple times, recording the sidecar learning content in prior iterations.
[00:35:33] And I remember last fall, we rerecorded everything along with a couple of our other colleagues, uh, here at Blue Cyprus. And we, you know, we had all the content ready to go. I think it was September. Um, and then a bunch of new things happened. We're like, man, you know, well we're outta date already. Yeah. So.
[00:35:49] Mallory: Yeah, every like a week later, Amit, it was all these new things and I thought, man, that would've been great to include in the learning hub. Uh, I wanna move on to the next topic, which is the illusion of thinking. So Apple dropped a research paper with a provocative title, the Illusion of Thinking. The researchers put current AI reasoning models to the test and found something fascinating.
[00:36:11] These models that seem to think through complex problems are actually just very sophisticated pattern matching machines. When Apple's team made small, irrelevant changes to math problems like changing names or adding extra details that shouldn't matter, the model's performance collapsed dramatically according to Apple.
[00:36:29] What looks like reasoning is really just the models following learned patterns from their training data. The moment you step outside those patterns, the thinking falls apart. Now this critique fits perfectly with Apple's typical playbook. They're rarely first to market with new technology, but they wait.
[00:36:48] They analyze and then aim to create the perfect version that meets their exacting standards. While everyone else races to ship AI features, apple seems to be taking notes on what's broken and potentially building something better in their labs. The question is. Are they making a valid point about fundamental flaws in current AI reasoning, or are they setting up their own grand entrance into the AI race?
[00:37:10] Amit, you shared this paper on LinkedIn. I know you had some thoughts about it. So what's your initial take on this paper?
[00:37:18] Amith: Well, you know, I think there's room for everyone's opinion when it comes to ai and it's worth having, uh, really good, uh, open debate around these topics. I think Apple has an agenda here because they're way behind in terms of adopting the current ai, and perhaps I said in LinkedIn and I, I'll say it here again, perhaps deep in a lab somewhere in Cupertino, California, they are building.
[00:37:41] A new generation of architecture that is so incredibly advanced beyond what we are aware of, that they'll shock the world in a year, two years, three years with some new Apple intelligence. That's truly remarkable. And, um, you know, not of this genre. And that is possible. In fact, many researchers are in fact working on the exact same thing.
[00:38:02] We've talked about that exact topic on this pod that, you know, current, uh, language models, vision models, audio models, all are probabilistic machines. Uh, another term is, is they're auto aggressive. Next token predictors, which is essentially is to say they're very fancy, auto complete. And that is technically true.
[00:38:19] And there's iterations of this where with, uh, test time, computer inference, time scaling with so-called reasoning models, you're just providing the model more time to think or more. Worth more time to do more probabilistic computation, as well as to hit the backspace key and edit the prior answer, which is a big thing.
[00:38:35] Um, but the bottom line is, it's not incorrect to say that these are pattern matching algorithms. We've been saying that for a long time. No one is surprised by this. Great. Uh, Jan Koon from Meta, who's the head of Metas ai, has said many, many times publicly, and you know, uh, that, that he doesn't believe that language models, large language models, transformer based architectures are the path to.
[00:38:57] To a GI or a SI. Uh, he believes that world models and other forms of AI that need to be developed are a path to true reasoning. And he's right, he's right about that. Um, but that doesn't mean that the utility from the current generation of the technology isn't incredibly stunning. Everything we've been talking about on this episode is powered by probabilistic.
[00:39:18] Pattern matching, right? That's what it is. Um, and when we start talking about ai, we say there's these things called emergent properties of model. Now, back in the earlier days of chat, GPT, uh, there was more conversation around this. People would say, well. GPT three, five illustrated emergent patterns such as X, Y, and Z.
[00:39:37] We didn't realize it would be great at language translation, for example, from English to French because we never asked it to do that yet. It was really good at English to French or English to 10 other languages. Uh, other emergent properties that initially were not expected by the community were the expertise that AI had in code.
[00:39:55] Being able to code was not the initial goal for many of the early language models yet. That was one of the first things that came out as a really important use case. So, so-called emergent properties are just really our inability to look ahead as humans to understand what probabilistic next token or next word predictors might actually result in.
[00:40:15] Um, so I would say this, they're not wrong. But I don't think they're right either. And why I say they're not right is because the, it misses the point, which is really what I said on LinkedIn. The point of the current AI is that the utility, the economic value is unbelievable. And so to come in and say, well, actually, the technology behind it isn't really doing what you think it's doing.
[00:40:38] Is not incorrect, but it's largely irrelevant. Mm-hmm. That would be my point of view. Now, I do think that there is tons of room for improvement in the model architectures to go beyond this potentially. Uh, but to scale the current model as well, that in ways that will blow away people and they'll have, uh, unbelievable opportunity to build.
[00:40:57] Uh, in fact, I would say that we're probably 1% of the way into taking advantage of current state ai. So if you take the most recent frontier models like cloud four open. Open AI's, O three Pro that was just released. These are amazing pieces of technology. They have incredible potential. Um, I would guess that we're in the single digits, maybe not 1% for everyone, but even for ourselves who, you know, we're leading the pace in, in a lot of, uh, respects and AI adoption and and association land, we're probably 10%, you know, of, uh, the potential of the current state of ai.
[00:41:31] So even if the model never got a bit better, the amount of upside potential there is just because we haven't discovered it yet. Is enormous. So what I don't like about this paper is the fact that it's causing people who've been sitting on the sidelines to remain on the sidelines. The most important thing you can do with AI right now is to go learn it, experiment with it, and put it to work for you, right?
[00:41:52] If someone else is using AI to write code a hundred times faster than you, or to produce content in other modalities a hundred times faster than you, you might be right that it's using pattern matching. But if the value created is incredible. Who cares? What's the difference? So that's my point of view. I was kind of annoyed by it.
[00:42:09] Honestly. It probably came through in what I just said and probably came through in my LinkedIn post, mainly because, again, from a research perspective, they're not wrong. They, they've shown something interesting about the math, but at the same time, um, I do think especially becomes from Apple who is a laggard in this.
[00:42:26] Phase of the AI race anyway. Uh, it could result in people staying on the sidelines, and that's, that's problematic. People need to jump into this and experiment and learn. So that's my main issue with the paper. Um, bottom line for me is they're not wrong, but they're also definitely not, right.
[00:42:43] Mallory: Mm-hmm. Yeah, I think that was really well said.
[00:42:46] And if we have any Apple Super fans here, just know Amit just bought a MacBook, so he's still, he's still a fan. Uh, my thoughts were similar to yours, Amit. It was a bit like a duh moment. Haven't we been saying if you, I don't know if you've been listening to our podcast, we, no one's debating this. And I'm with you on the fact that if, well, it's helping us, you know, write and code.
[00:43:09] Does it really matter that it's not truly thinking? I don't know. I, I guess I wasn't under the impression that it was truly reasoning, but you're right. It's cool that they mapped it all out and did the math behind it.
[00:43:18] Amith: There's quite a number of other papers that have come out in the last six months that attempt to say similar things.
[00:43:23] Uh, yet Apple's paper, of course, because it's Apple got a lot of attention, so I don't. I, I wouldn't go so far as to suggest that the researchers who did this work have any agenda other than showing something, uh, that is true and correct and, and sharing their findings. However, uh, from a business perspective, there's definitely an agenda here to support the idea that, hey, you know, there's a reason we're kind of hanging out.
[00:43:46] Um, I would simply say this, please give me a new version of Siri. I'm a big Apple fan. I love my iPhone. I do have a Mac as well. I would like it if Siri wasn't terrible. That would be cool. I. That'd be really cool and I don't care if it's probabilistic pattern matching or if it's some innovation cupertino's working up that, you know, we'd all be blown away by, it doesn't really matter to me.
[00:44:07] Just give me a better version of Siri please.
[00:44:10] Mallory: We have a, what is it like a home pod in our kitchen And typically when we interact with Siri it's, Hey Siri, can you feed dogs Apples, like some variation of that. And Siri will say, playing apples on Spotify. And you know, with as advanced as a society that we're currently living in, it's hard to believe.
[00:44:28] So, yes, please, better Siri. Thank you.
[00:44:30] Amith: Well, and if you think about it, you know, when you look at the things we just got done talking about in terms of the progression of text to speech and there's also speech to text and language understanding. And if you use chat pt, which is in my opinion still the leader in kind of real time interactive voice mode where you go to chat, GT's, mobile app and you have a conversation.
[00:44:48] It's understanding deeply nuanced, complex topics and having, you know, full on human caliber conversations with you on a wide array of topics. I mean, the technology is there and there's open source solutions that are pretty close to the caliber of open AI's, real time voice. That are available out there, uh, some of which can run on a phone.
[00:45:07] So I don't think there's any excuse to this. I think that, uh, apple has had a number of missteps. It's disappointing, but it's also Apple. They've got a couple dollars in the bank. They'll be okay, at least for now. Uh, they need to invest some of those bucks into. Figuring out this problem in the near term.
[00:45:22] You know, Apple's got their next big event coming up in September with the release of what will likely be the next generation of the iPhone if patterns of the past are likely to be predictable of the Apples of the future. Uh, and so what I would expect from them is, uh, maybe an upgrade to series that's actually useful where you can have a longer conversation and.
[00:45:40] All of that, you know, and Apple does say, and I think they're right about this point, they do say that they're very privacy forward and that they're focused on ensuring that the use of AI doesn't undermine their values when it comes to privacy. Love that about Apple. It's one of my favorite things about the company, and I actually believe them that largely they are deeply concerned with consumer privacy.
[00:45:58] I don't think that's true for any of the other major, you know, companies at that tier of, of size. Uh, but I really think it is true for Apple. Um, but at the same time, I think you can give consumers choice where you can say, listen, um, we have a better version of Siri. It does require processing more of the conversation on the server.
[00:46:16] We will delete the conversation within five minutes after your conversation ends or whatever. Are you okay with that? And if so, then opt in and you get Siri. That's. Right. Whereas if you don't, then you can stick with Siri from 2012 or whatever it was. Um, but I do think that you can, you know, give people the privacy that they want, but also inform the consumer about the options and, you know, level 'em up a little bit.
[00:46:45] Mallory: So it sounds like your take is that anybody listening doesn't need to look at this paper and say, Uhhuh, I knew it. These models are not reasoning, therefore we're not gonna be using them.
[00:46:56] Amith: I'm sure millions of people will say that, and there are people who are incredibly bright. Some of the smartest people I know that are very much on the sidelines of ai, a lot of them are engineers that are incredibly bright technical people, uh, who say, no, no, no.
[00:47:09] I can't use AI to code. I can't use AI for this. It's not perfect yet. And you know, my question to them, which I usually don't ask, is, are you perfect? And the answer is, of course not. Uh, but AI is honestly mo more perfect than we would expect from a human colleague in almost all tasks already. So, uh, yeah, that's my point of view is that I think that unfortunately this paper will have the effect of causing people who are, you know, kind of the, the people who are detractors, who are like, you know, saying, no, no, no, I'm not gonna use AI either for.
[00:47:40] For various reasons, but they're, they're in the camp, they're in. But there's a lot of people that are thinking, Hey, I, I really need to get on top of this AI thing, and they haven't made the leap yet. And then Apple comes out and says, Hey, there's this concern, you know, these current models, eh, they're really not that great.
[00:47:54] It's kind of how it's gonna be interpreted. And because it's Apple, a lot of people are listening. Mm-hmm. So that's my, that's my issue with this, is that it's, I think it is, uh, opportunistic to use their brand in such a way, which makes it look like what they're doing is, is okay. I, I don't agree with that.
[00:48:09] So that's my point of view. I could be totally wrong about their motivations, but that's how I interpreted it.
[00:48:15] Mallory: Yep, I hear that. Well everybody, thank you for tuning into today's episode. If you haven't, check out the prior episode that we've mentioned only eight times on today's pod. Amit, wishing you a good hackathon next week and uh, everyone will see you soon.
[00:48:33] Amith: Thanks for tuning into Sidecar Sync this week. Looking to dive deeper. Download your free copy of our new book, ascend Unlocking the Power of AI for associations@ascendbook.org. It's packed with insights to power your association's journey with ai. And remember, sidecar is here with more resources from webinars to bootcamps to help you stay ahead in the association world.
[00:48:56] We'll catch you in the next episode. Until then, keep learning, keep growing, and keep disrupting.

June 20, 2025