Skip to main content

Timestamps:

00:00 - Introduction
05:27 - Project Strawberry: AI’s Next Big Leap
07:35 - System One vs. System Two Thinking in AI
12:53 - Mixture of Experts and How They Shape AI
19:44 - Hugging Face’s Speech-to-Speech Model Explained
32:08 - Real-World Use Cases for Speech-to-Speech AI
40:52 - AI’s Role in Transforming Grid Infrastructure
50:09 - Future of Small AI Models and Sustainable AI Use

 

Summary:

In this special hurricane edition of Sidecar Sync, Amith and Mallory dive into the intersection of AI and associations. From the upcoming launch of OpenAI’s Project Strawberry, set to redefine how AI handles reasoning, to Hugging Face's cutting-edge speech-to-speech model, the hosts cover all the latest AI advancements. They also explore how AI is transforming grid infrastructure, fueling climate tech investments. Plus, you'll get a behind-the-scenes look at how AI could help associations manage power grids and customer service more efficiently.

 

 

 

 

Let us know what you think about the podcast! Drop your questions or comments in the Sidecar community.

This episode is brought to you by digitalNow 2024the most forward-thinking conference for top association leaders, bringing Silicon Valley and executive-level content to the association space. 

Follow Sidecar on LinkedIn

🛠 AI Tools and Resources Mentioned in This Episode:

Hugging Face Speech-to-Speech ➡ https://huggingface.co
Project Strawberry ➡ https://openai.com
OpenAI ➡ https://openai.com
The Hottest Sectors in Climate Tech? Follow the VC Money ➡ https://shorturl.at/nzmKu
‘Strawberry’ Article ➡ https://shorturl.at/ouPay

⚙️ Other Resources from Sidecar: 

 

More about Your Hosts:

Amith Nagarajan is the Chairman of Blue Cypress 🔗 https://BlueCypress.io, a family of purpose-driven companies and proud practitioners of Conscious Capitalism. The Blue Cypress companies focus on helping associations, non-profits, and other purpose-driven organizations achieve long-term success. Amith is also an active early-stage investor in B2B SaaS companies. He’s had the good fortune of nearly three decades of success as an entrepreneur and enjoys helping others in their journey. Follow Amith on LinkedIn.

Mallory Mejias is the Manager at Sidecar, and she's passionate about creating opportunities for association professionals to learn, grow, and better serve their members using artificial intelligence. She enjoys blending creativity and innovation to produce fresh, meaningful content for the association space. Follow Mallory on LinkedIn.

 

Read the Transcript

Amith Nagarajan: Welcome back to the Sidecar Sync, your resource for all things association and artificial intelligence.

My name is Amith Nagarajan

Mallory Mejias: And my name is Mallory Mejias.

Amith Nagarajan: and we are your hosts and we have another action packed episode lined up for you today with three interesting topics at that intersection of AI plus associations. Before we dive in, let's take a quick moment to hear from our sponsor.

Mallory Mejias: Hello listeners. This is a special hurricane edition of the Sidecar Sync podcast for Amith, who's in New Orleans, who is about to experience Hurricane Francine. How's it going Amith?

Amith Nagarajan: Well, it's uh right now it's the calm before the storm the storm's supposed to come through here in a few hours I think maybe like later this evening at this point. So at the moment we have power at the moment I've got an internet connection. And so here we are recording the podcast So i've got my kids at home.

They're at school's closed today probably tomorrow as well. And hopefully Uh the whole area will fare Reasonably well. Um, I think it's gonna be a category one when it hits the coastline and then quickly downgrade to a top tropical storm. So hopefully everyone in the area stays safe and there's a minimal amount of damage.

But these things usually do hit us, you know, roughly once a year we get something, maybe not a direct hit, but something pretty close by as, as you know, from living here. And, uh, it's just kind of part of the, the flow of New Orleans life. And I'm sure there's a lot of people in the French Quarter just hanging out, drinking hurricanes.

Mallory Mejias: I do remember one year we had a set of mutual friends that had their wedding fall in the French Quarter over a hurricane, and they still chose to have it. And I didn't go, but apparently it was a great time because the storm was not that bad, of course. Thankfully. And, you know, I'm sure you're right.

There will be people in the French Quarter having a grand old time.

Amith Nagarajan: Well, you know, if history is a guide for structural stability, the French Quarter can be, you know, a good place to think about because it's, a lot of those buildings are 250 plus years old and they're still there. Um, it is a little bit higher in the French Quarter than it is in other parts of the town.

Up here, I'm in Uptown, close to the universities and our ground is just a touch higher. It's still, I think maybe we're like a couple feet above sea level instead of, you know, Five feet below sea level, but yeah, it's, uh, it's dicey when you're in New Orleans and there's a big storm coming. So, uh, hopefully AI will help us figure out how to fix that over time.

Mallory Mejias: And it really should. I will say this has made me happy that I'm in Atlanta. This is the first time having, basically being from Louisiana, having grown up in Louisiana for most of my life, that I am not impacted by this hurricane. So that feels good, but we're absolutely thinking of everybody in the storm's path.

And it got me thinking, Amith, so this is episode 47 of the Sidecar Sync podcast, and we have never. Missed a week of posting come internet problems, hurricanes, other bad weather travel. And so I was just thinking, I'm, I'm pretty proud of us.

Amith Nagarajan: Yeah, that's pretty awesome. And, and also you say 47, I'm thinking that's pretty amazing. Cause I don't keep track of the episode numbers that, uh, closely in my head, but I do have a general idea of where we're at. I knew we're approaching a year, so we'll have to think of something fun to do for the 52nd episode.

Mallory Mejias: And the 50th, I feel like the 50th is going to be a

Amith Nagarajan: 50 is good too, yeah.

Mallory Mejias: So maybe we'll do two back to back promos. Y'all stay tuned. All right, today we will be talking about first OpenAI's Strawberry, then we'll be talking about HuggingFace's new speech to speech model, and then finally we'll wrap up with a talk around AI and grid infrastructure tech.

OpenAI's Project Strawberry is an upcoming AI model that's generating a lot of interest in the tech world due to its reported advanced capabilities. As a reminder, OpenAI is the company behind ChatGPT. And you may recall, as a listener of the Sidecar Sync podcast, that we covered Strawberry in an earlier episode, When the name of the model was Q star.

Now I want to give a little disclaimer while project strawberry seems like it will be quite impressive. It's important to note that a lot of the information we have so far is based on reports and leaks. The full extent of its capabilities will become clearer upon its official release and public testing.

So what is it project strawberry is said to possess significantly improved reasoning and problem solving abilities compared to current AI models. Some of it's recorded. Capabilities include advanced mathematical skills, able to solve unseen math problems that current chatbots struggle with, improved programming abilities.

Capability to solve complex word puzzles like the New York Times Connections. The ability to perform deep research autonomously on the web. And enhanced reasoning for answering subjective questions, like those related to product marketing strategies, for example. In terms of development and release, it may be integrated into ChatGP T5.

Potentially releasing as early as fall of this year. OpenAI is aiming to launch Strawberry as a part of a chat bot, potentially within chat GPT. And the project is said to be in the final stages with a possible release within the next two weeks. So we might have a very interesting episode coming up soon.

Strawberry is described as a reasoning model representing the second of five stages of AI innovation defined by OpenAI. And it appears to have the ability to trigger self talk reasoning steps multiple times throughout a response. Now OpenAI is reportedly considering high priced subscriptions for access to its next generation AI models, including Strawberry.

Executives are weighing charging users potentially as much as 2, 000 over an undetermined period of time for access to their most advanced models. So Amith, when you sent this to me, we had a quick discussion around the idea of System 1 and System 2 thinking, so I wanted to find that really quickly for our listeners.

You can think of System 1 thinking as fast, intuitive, and automatic. It's your brain's quick response system requiring little conscious effort, so you can think, fight, or flight. And then with System 2 thinking, it's slower, more deliberate, and analytical, and involves conscious reasoning and problem solving.

And you mentioned to me, Amith, that the current models that we see right now use System 1 thinking, or that's the way that you can think about it, whereas something like Strawberry falls into the system to category. Can you talk a little bit about this?

Amith Nagarajan: Sure. And for those that aren't particularly familiar with this, uh, classification of system one, system two, this is drawing directly on the work of Daniel Kahneman, who is a physicist. Um, a famous psychologist who recently passed who wrote a book called Thinking Fast and Slow, and we'll include a link to that in the show notes.

I would highly recommend that book to anyone who's just interested in digging deeper and understanding how this works in the biological brain. And the, you know, similarity with AI I think is worth Noting in some ways that we're simply, you know, borrowing those terms to help us understand the way I works currently.

So if you think of system one is the thinking fast side where it's intuitive, you're not actually thinking about thinking, you're just doing stuff, right? You're thinking it's kind of like a reaction to, I think I mentioned to you like in the text thread when we were talking about this. If you see an alligator on the street of New Orleans, Which doesn't happen too often, but, you know, if you did see one here, um, you could, and you definitely see them in the ponds, um, if you saw one, and if you were very close to it, you might react by running away, or you might react by jumping back.

Um, you did not stop and say, Oh no, there's an alligator, it might eat me. I better leave. Um, you didn't think through that. You just jumped away. Right? So there's basically a rapid response. And then system two might be, Oh, um, we need to figure out how to plan the digital now conference and we need to figure out how to market it well so that we get hundreds of people there and that it's an awesome event.

complex. And we say, well, how do you break that down? How do you make break that into smaller pieces? What is the series of steps and thoughts or a problem that has no known solution where you say, Oh, well, we want to create a room temperature superconductor or something like that, where it's like you have a very different approach to that, like how you break down a problem like that.

So system to thinking I think more broadly I would say is, uh, something that requires metacognition or thinking about thinking. Um, where you have to basically come up with a plan. And current models, in spite of their incredible capability, do not plan. They do not reason, they simply predict the next token.

As we've talked about, essentially think about the current generation of AI that we have available to us right now as very powerful statistical models. These are doing probabilistic Next token or next word prediction. Now, what's happened is these models have scaled so much and they're so powerful that they actually have these amazing capabilities to write prose and poetry, to write code, to interact with us, to be brainstorming partners.

All the amazing stuff everybody's basically been freaking out about for the last two, three years. Um, But they don't think. They do not reason. And they most definitely do not plan. And so they get stumped on problems that are as simple as how many R's are in the word strawberry? Which is why the strawberry uh, name is thrown out there.

Um, so how many R's are in the word strawberry? Well if you said, oh okay, well let me look at the word and let me count the R's. That's a pretty easy problem to solve. But if you're not doing that, if you just are saying what R's That would be the answer to that question based upon the corpus of text that I have, which would tell me what should I predict the number would be.

It's not based on actually breaking down the problem, saying, well, to figure out how many R's are in the word strawberry, I have to look at the word, I have to count the number of instances of the letter R, and then I have to tell you that number. That's what a deterministic algorithm would do. That's what we would do also if we were thinking about the problem first.

The reason these models, including, you know, many of the most recent models. fail at that is because they don't have enough of a corpus of content that has that answer pre, pre written. So, it's a weird question, right? How many R's in the word strawberry? Like, how many times has that been asked prior to all of this stuff going on?

So that's why these systems are failing, is because the training data doesn't have that. Um, in comparison though, like, it's a simple computation. Um, so, what if a model was able to determine if it should slow down And think, as opposed to immediately react, right? Don't just jump back from the alligator type of instantaneous reaction, but take a breath, think about it, and let your bigger brain basically kick in to do the harder work, right?

The longer range planning, the step by step, breaking things down into complex parts. AI models do not do that right now. They are about to. And Strawberry is, you know, possibly going to be the first model that has this capability. OpenAI's been talking about it for a long time. Possibly through well architected leaks and drama or whatever.

You can have whatever theory you want about why they've been doing what they've been doing. But I think they've got something interesting and I, I think, you know, I don't think I know from just reading what these people are talking about across the community. Generally, this is the thing researchers are working on in terms of model design.

So if you can bake actual reasoning, planning on all these capabilities into the model, it's astounding what that's going to do.

Mallory Mejias: Um, Well, I can say I really like the name strawberry a lot more than Q star. It's more fun. It's very memorable. Am I understanding you correctly saying that strawberry is not a next word predictor, or is there some element of it that will be, but there's also these other pieces in the background,

Amith Nagarajan: The short answer is we don't know yet, because we don't know what Strawberry is or isn't. But my suspicion is, is that it's an ensemble or a hybrid, uh, where you have multiple different models within it. We've talked on this podcast before, Mallory, about mixture of experts models, where we have different kinds of models that are kind of combined, like, you know, Mistral had the first kind of broadly, uh, uh, available model that was open source that was an MOE model with eight underlying models.

And there's a lot of belief that GPT 4 in its first edition was an MOE model as well. And a mixture of experts model is basically multiple models that are combined together with an intelligent routing mechanism. So the question or the prompt comes in and then the first thing that the model is doing is Is deciding which subcomponent or subcomponents of its mixture expert that's going to use.

It's like saying, Hey, um, I got a question in my email inbox and, um, my email inbox might answer both questions about like our upcoming meeting. But it also might answer questions about our knowledge base within our domain. Uh, and it might be a complaint, you know, and so like each of these types of emails that come in might need to be handled by different people on my team, right?

So I might send, Oh, well, the domain expertise, I'm member services. That's not my strength. So I'm gonna send it to our publications person who comes from the field and they can probably answer this. That's the idea. Like it's a team based approach. So model architectures have already embraced this idea, but the mixture of experts thus far have been mixtures of Next token predictors, so they're just tuned differently with different capabilities as their strengths And as powerful as that is this will probably be a similar kind of architecture where there's some kind of intelligent routing and then within that Model architecture you'll have the ability to run longer range planning do step by step reasoning Thanks and execute things.

If you give the model more time, you know, you might say, Hey, like, just like I'd work with the team, I might say, Mallory, I'd really appreciate it if you could develop a plan for marketing the digital now conference. Can you do that over the next week and come back? And you would put some thought into that.

You come back and say, What resources do we have? What are our goals? What types of things might we want to do? And you'll come back with a plan in a week. If I say I need it like in five seconds. Well, you're just gonna blurt out whatever it is you can immediately think of, right? Um, yeah. And both of those capabilities have purpose and value, but if the model is going to completely self determine, should it take more resources or not, or perhaps it's something where you have to tell the model, take your time thinking about this and do the following, right?

But I suspect that it will probably be able to self determine if the nature of the prompt is such that it requires or would benefit from longer range thinking and planning. Does that kind of make sense?

Mallory Mejias: It does. It does. I think the part I kept getting hung up on this was with the system to the thinking slowly and, and understanding how a model needs to think slower because right. It's technology. It just probably happens as quickly as it can, or as a chip can support. But I, the idea of mixture of experts and essentially all these things happening, you provide a prompt and then all of these experts, let's say, consulting quote unquote with one another, this routing that you mentioned, it makes sense that that That is the slowdown, and that is why we're getting the more thoughtful output.

Amith Nagarajan: Exactly. I need to have an editorial break right here after the exactly because I need to let my dog out. He's starting to

Mallory Mejias: Perfect. Okay.

Amith Nagarajan: So I think that was a good discussion on that topic. I think the other thing you could ask or discuss is, um, how does that compare to what multi agent systems do? Because we've talked a lot about multi agent systems, which are very, very similar to this actually.

Mallory Mejias: Yeah, to me, it sounds the same, so that's probably a good question. Okay. Amith, how does this kind of system, if we're talking about mixture of experts architecture here, compare to multi agentic systems, which we've discussed at length on the pod? Anyway.

Amith Nagarajan: are basically similar. If you think about it, it's like you have units of capability that are constructed together and you're kind of like building blocks, you know, or think of it as Lego blocks, right? In a multi agent system, you have prompting, you have memory, you have all these resources which are kind of like Lego blocks and you construct them up to create an agent that does certain things.

What we're saying here is that the underlying model itself is will have the capacity to handle basically a broader range of, of, uh, tasks. And so because the model itself will be capable of doing this, um, you end up potentially with much more powerful capabilities. Because in an agentic system, what you're doing is taking a fairly low powered model, in the grand scheme of things, like a GPT 4.

0, a Cloud 3. 5 SONET, they're fairly low powered models. And we're trying to make them smarter, What we're doing is essentially putting a layer of software on top of that to say, Oh, um, Mallory's asked for a marketing plan for digital now. So GPT four oh can certainly give you an answer to that question.

But what if we had a multi agent system that said, Okay, I'm a marketing planning expert. The first thing I'm gonna do when I get a prompt is I'm going to ask a model Break this down into a set of tasks. So that's the first thing we do is we go to like Cloud 3. 5 or Gemini or 4. 0. 5b from Llama and say give us a breakdown of what we should do.

Let's not try to solve the problem. Let's like break this down into a set of steps, a chain of thoughts, a chain of tasks, right, a tree of thought. These are all the different terms that Chucked around like candy in this field and basically all they mean all the same damn thing Which is essentially you're going to break it down into a sequence of steps It's like the simplest concept in the world that's been, you know, complexity has been added uh for reasons that make sense because they're all subtly different but also because it You know, it's helpful for people I think to feel like it's really cool and more complex than it really is.

Basically, it's just breaking the thing down into small pieces. So the agent, the first step would be break it down into a plan. And then what would the agent do? The agent would say, okay, step one, think about our goals. Okay, what should our goals be? Maybe that gets broken down into five other steps. And so the agent breaks it down into small enough tasks where it thinks that it sequentially or in parallel. And so I might say, okay, now I've broken it down into a series of tasks. I want to do paid advertising. I want to do email marketing. I want to create a landing page. I want to do these eight different things. And then the agent says, okay, I need to do those eight different things. And it starts firing off those eight different tasks to eight different, you know, prompt strategies to maybe the same LLM or different LLMs.

And then it pulls those results back in and And it constructs them into one integrated marketing plan, right? Um, well, if the model itself had the capacity to do that, it would have a higher level of understanding. Because right now the model is only doing one little narrow task. So the model only knows that little bit that you've given it.

So in theory, if the model itself was doing an entire broader task, It potentially could be far more robust in its capability because it actually has visibility understanding. So, it's kind of like, um, in a, in this kind of an agentic system, each step in the process has zero understanding of where that step is relative to all the other steps.

Um, that's just kind of the nature of how these things work at the moment. And the way you construct long term memory, the way you construct state and all this other stuff that you do in agents, in agent systems is a software layer above. above the model. But if you bake that into the model itself, then in theory, that will result in far more powerful emergent capabilities, and the model will have the ability to, like, use tools itself, like researching on the web.

In concept, this could help us do original science, where a model is expressing hypotheses, um, and the hypotheses aren't just predicted token sequences, but are based on, like, some kind of reasoning steps, um, and then go experiment. There's all sorts of cool stuff that could come from it, but to answer your question, At surface level, it's actually very much the same as what a multi agent system does.

But under the hood, it's very different because the model itself has far more sophistication. And from a performance perspective, it should allow us to scale, um, in an interesting way. So, I'm really pumped about it. I think, you know, OpenAI tends to be the leader in a lot of new research, but there's a lot of other companies working on the exact same problems.

So I expect to see a flurry of this this fall.

Mallory Mejias: And to really simplify this, when we're talking about agents or multi agent systems, we just mean multiple models in one system. When we're talking about something like Strawberry We Think, we're talking about one model with, let's say experts within it. So one model versus multiple, is that like a simplified way to think about it?

Amith Nagarajan: Yeah, you can definitely think of it that way and, and the word model in this particular context could be in quotes in the sense that they might all be the same actual model, it might be Claude working across all of those, it's just a different prompting strategy. So like, you know, the way you would prompt Claude to give you an email campaign would be a different prompt than what you would use for creating a web page or a different prompt for creating, you know, something else.

And so the idea is, is that Claude may be doing all of those things for you. One underlying language model, but you're prompting it in different ways. And what an agent system does is that essentially a software that does all of these multitude of steps for you and then pulls back those results. So if you want to simulate what it's an agent to, you could say, okay, I'm going to, I as a user, I'm going to break down this complex task.

I'm going to ask an LLM for a set of steps. Then I'm going to go ask that same LLM for the output for each of those other things. Then I'm going to take the response from each of those separate prompts, I'm going to pull them back up, and now I'm going to go back to the first level, and I'm going to say, here's the results for all these pieces, bring it all together for me, and the LLM will do that.

That's basically what agents are. They're actually very simple pieces of software. They just kind of, you know, You know, they're kind of like the general contractor that's constructing the house, but the subcontractors You know put up the framing they do the siding. They do the roof. They do the electricity electricity and so forth.

So, um, The agent system is kind of like just overseeing a complex task Uh, but it's it's super inefficient in a lot of ways. It's very much lacking in terms of context sharing Uh, so there's a lot to be desired and by the way agent systems don't go away if strawberry successful What happens is agent systems Still can be built on top of layers like strawberry and this has happened with software since the beginning of time where you know You build something it becomes more powerful.

You build another layer on top of it. So Embedding these capabilities into the model would mean that some use cases for agents aren't necessary to be agents anymore But it also means that you can do so much more with agents So agents like Betty and skip in our ecosystem will become just crazy powerful if we plug in a model like a strawberry into them You They already have multi agentic structures, like I'm describing, but they'll become far, far more powerful if the underlying model is smarter.

Mallory Mejias: Um, and I was just curious on this last piece of meat. Well, it sounds like you just kind of answered it, but when strawberries released, if it's at the price point I read on one article of 2000 per month, is this something you're going to run and try out and plug into your products? Do you have any sort of other use cases that you personally would want to use something like this

Amith Nagarajan: Yeah, I mean, you know, we, as soon as anything comes out, our, our group will collectively go and all experiment with it pretty much immediately, if it's for a major player or if it's something particularly interesting. So yeah, for sure, I mean, you know, for us it's not going to change, like the price is not really relevant to whether or not we'll test it, because we'd look at it and say, hey, You know, this is kind of like a fine wine.

You know, we'll use it sparingly. In certain cases, we're not going to like, you know, pour it out of the faucet like water. Uh, so I think that you're still going to have uses for models like GPT 40 mini or llama seven B. Um, I would say collectively, you know, I am, I'm super excited about what something like strawberry is going to bring us.

I'm actually more excited about the scaling of the power in small models. And I've said this a number of times in this pod for the last, you know, 47 episodes, a number of different times. Small models are more exciting because if you look at what small models do, like llama set, llama 3. 17 be seven billion parameter model.

It's the smallest llama model or Microsoft five, which is P. H. I. Their third generation. And many others. These small open source models are super fast, super cheap, and they're open source. And most importantly, the capability of Llama 3. 1 Um, their smallest model is about on par with earlier versions of very high end models, right?

So the compression of size while retaining functionality and capability from the larger models of the prior generation just keeps happening consistently. So, it's a really exciting time. And, uh, so actually I misspoke, it's LLAMA 3. 1, 8 billion parameter, and then it's 70 billion, I always forget. It's, uh, flip the two around, but it's the small model is unbelievably powerful.

It's as good as GPT three, five turbo from last summer. And llama 70 B is about on par with the first version of GPT four. Um, so. Why do I say that's more exciting? Well, not that I have to choose, because it's kind of a cool time. We can do both. But those models you can use at light speed, and you can use them all over the place in solutions.

And then the fine wine might be strawberry, right? Where you say, okay, I'm gonna use that only in situations where I know I need that layer of true power. So if it's super expensive, that's fine. You just use it sparingly.

Mallory Mejias: Okay, editorial pause, Amith. Do you have a hard stop? Do you think we should, okay, so you want to do all three? Because we could just

Amith Nagarajan: Yeah, I'd say let's, let's keep going. Let me just double check to make sure I'm not telling you something that's false. Yeah, I don't have anything till one, so I'm good. I don't think we should go till one, but I think

Mallory Mejias: Oh, we could try. We'll stock some

Amith Nagarajan: marathon. That's what we do for our 50th episode

Mallory Mejias: Oh gosh. And then we'll chop it up and post them for, we won't have to meet for the next six months. Um, okay. Next, we're talking about HuggingFace speech to speech. HuggingFace is a leading platform in the field of AI and machine learning, particularly focused on natural language processing.

You can think of it like a model hub. HuggingFace provides a vast repository of pre trained AI models for NLP or natural language processing tasks. These models can be easily accessed, downloaded, and fine tuned for specific applications. HuggingFace recently released multilingual speech to speech, so now its cross platform pipeline to run GPT 4.

0 like experiences on device can now seamlessly switch languages mid conversation with only a 100 second, millisecond, sorry, delay. They released a short video recently on LinkedIn showing this quick switch in languages between Chinese, English, and French. And I will say there is a slight lag, but honestly it's pretty imperceptible.

And I would say I hear longer lags in just phone trees when you're calling for customer service reasons in English, in the same language. And Hugging Face announced that Speech to Speech has been received really well on GitHub, and it's working to expand the languages included in Speech to Speech.

Thanks. Amith, we know LLMs can, or large language models, can interact with users right now in multiple languages. So what would you say is novel in your opinion about speech to speech on HuggingFace? Is it its ability to detect language? The reduced latency? Is it all of those things? What do you think is impressive?

Mm

Amith Nagarajan: You can inference or run it on almost any device and it's going to get faster. It's going to go from a hundred milliseconds, probably down to half of that. And then half of that, it's, it's going to keep, you know, progressing at the rate we've been talking about.

Uh, the languages will explode and it'll have basically have all known languages at some point. Right. So, and these will be available for multiple different vendors. I think that the other key thing here is that it's speech to speech. It's not. Speech to text to text to speech, right? Where, um, traditionally what we've been doing is essentially using multiple models to communicate via speech.

So if you use ChatGPT's voice mode, uh, they're working on a, on a native GPT 4. 0 speech capability. That, I mean, it's out there, but most people don't have access to it yet. The current version people have access to where you talk to ChatGPT. What it's doing is, it's saying, first thing is it's, it's talking to you, um, then it's converting that to text, then it's running the text prompt, then it's getting text back, and then it's converting that back to audio.

Um, and so the idea here is that speech to speech allows you to do translation, but it also allows you to do direct, um, you know, you're not losing the modality of speech. So think about it from an information theory perspective for a moment. You say, okay. Where is there higher information density in text or in audio?

You say well clearly audio can can be verted to text And you lose information because you don't have tone. You don't have pauses You don't have all the things that make audio a richer modality When you go from text to audio, of course, then you're upscaling so to speak and you're trying to gain information But at the same time you're losing a lot.

So if I go to a model and I say um You know, how should I prepare for the hurricane? Just like that, um, you know, it might interpret it one way, but if I said it kind of like, hey, how should I prepare for this hurricane? You know, it's, it's a little bit different. It's kind of like understanding my tone.

Same exact words. The text would not capture that, but the audio input would, and therefore the model potentially can take into account a lot more of what it is that I'm thinking, doing, feeling, et cetera. Um, and the same thing is true also for translation, because it's. literal translations word for word.

One of the reasons they've historically been really, uh, unsuccessful compared to human translators is that, you know, they don't really capture the context of what's going on in the conversation. Whereas these AI based translators have been really good because they have this broader context window that can understand, like, the, pretty much the full conversation at this point.

and be super accurate at translation because of that. Um, so it's really powerful on a lot of levels. It actually ties to my comments in the last topic pretty well that small models are where a lot of the action is because they allow you to do things in ways on device and at low cost or in some cases effectively no cost.

That to me is exciting. You can build a lot of applications. So for associations you might say, okay, what do we do with this? What if you had in your meetings app a real time, um, Translator. So you have an international crowd coming to your event and people are speaking whatever language. So you build a simple tool that basically pipes in real time audio from the AV system from the thing to the device and that's translated in real time.

It's got a hundred millisecond lag or whatever. But it's imperceptible, and so I can keep an earbud in and listen to it in whatever language. And it sounds like the person speaking, right? As opposed to, you know, a translation. So it's, I mean, translation in terms of the voice and all that. That, that would be amazing.

And there's a lot of other applications, but that's just one that comes to mind.

Mallory Mejias: Um, I, in my mind, I was thinking that even though this is speech to speech, that it was still going through that text phase. So that's really interesting to think about. So this model, we can assume was trained on lots of audio to develop this capability, like not

Amith Nagarajan: assuming so.

Mallory Mejias: Okay.

Amith Nagarajan: Yeah, I haven't read the paper. They published a paper on it as well, which is another great thing about the open source community is you not only get the software and the weights, but you also typically get a paper that explains how did they train? What did they train on? And sometimes it's interesting just to kind of skim those and see, um, What they talk about, but they'll tell you about their trading approach with the content.

Where do they get it? And these guys are big open source advocates, um, based in France. And they, they are, uh, publishing tons of research all the time, but they are, as you described earlier, a hub for a lot of pretty much everyone posts stuff there as soon as it's ready for consumption.

Mallory Mejias: in hugging face?

Amith Nagarajan: Yeah, I have, uh, I've never done inference. like from a developer perspective through their platform, but I've used their hugging chat, which is a pretty cool, uh, chat GPT like experience where you can pick the model you want out of their library of all the models. And I think it's still free. It was free when they first launched it.

That's worth checking out. Actually, I think there was news maybe in the last day or so that the Firefox browser actually includes hugging chat built right in. So that's

Mallory Mejias: Wow.

Amith Nagarajan: a cool thing to check out. We'll include that in the show notes as well.

Mallory Mejias: Now I know you mentioned, uh, translating keynote sessions as one potential use case. And then my mind immediately goes as well to customer service calls and being able to translate those as well. Real time. Um, and this leads me to kind of a broader thought, and I don't know if you know a ton about this, Amith, but it's just something I think about a lot, especially given how advanced AI is getting, thinking that we still have to call these major players like Verizon and Delta, and we're still working through these phone trees, and it seems like you have to say the right keyword, right, for them to move you on to the customer service rep.

Do you know why we aren't seeing kind of major companies use Or at least it doesn't seem like they are using AI in their customer service calls.

Amith Nagarajan: You know, my, my suspicion is, is that for the larger players, there's cultural roadblocks. Uh, there's also complexity of systems integration for Delta Airlines to implement this, would be a lot harder than something like Klarna, who we talked about before, and they implemented a customer service bot. Um, but someone like that, who's a more nimble startup, they have plenty of resources, but they're a much smaller company.

Rolling it out, but also creates an expectation in the mind of the consumer, right? Think about the mainstream consumer and what their expectations are which expectations quickly turn into demands by consumers You will see adoption of these tools both because it's better for the consumer But also because it saves a lot of money So ultimately, those companies 100 percent are going to be deploying this kind of tech.

Um, you know, I don't know about Verizon, actually, but everyone else probably will, so

Mallory Mejias: Don't get me started on Verizon.

Amith Nagarajan: yeah, exactly. So, you know, the way I describe customer service tech and technology in general, I think this is a true statement. Um, is that the first priority for companies has always been to serve themselves, meaning when you think about your, um, integrated voice response, IVR phone tree that you're talking about, um, those systems, um, were never built to optimize for customer experience.

They were built to hopefully minimize how bad the experience was, but they were built to optimize for cost. Um, and. If you think about the original chatbots that have been on the web for 10 plus years, right? And longer in some cases, they're horrendous, but they're not optimized to serve the customer to improve the experience.

They're optimized to reduce cost primarily, right? I mean, there's other motivations that could exist. It's like, Oh, we don't have people on, uh, On the clock 24 hours a day and this thing can help in the meantime but generally speaking They're extremely limited of value and you like the first thing I always do is I say agent and I say give me an actual person

Mallory Mejias: But they're better now. Amee, if they make you say it like 25 times before you get to the agent. Oh, so frustrating,

Amith Nagarajan: proving my point right it's not about the experience of the customer it's about reducing costs because if they do that I might Go away, and then I don't have to call them again or you know, and they don't really care, right? So, I mean, I don't want to be a cynic about all companies There's definitely plenty of companies who do value creating low cost A great customer experience.

But generally speaking, over the course of time, the technology has been focused on how do you optimize cost and how do you optimize other priorities internally, which rarely are about the customer experience with some exceptions. Obviously, what I would say now is that you're entering a world where Fortunately, the alignment of the priorities is similar, where the customer experience is both improved and the company saves a crazy amount of money.

Um, and the Klarna case study is a great one. We included it in the book and we talked about it in this podcast. Klarna is a buy now, pay later company. They power a lot of, uh, e commerce brands for people who want to purchase a product, but essentially pay for it in installments over time. They're one of several companies in the BNPL space.

And they implemented a, uh, chat GPT. for based, uh, chat bot, and they found that the chat bot was able to, in its first month of operation, uh, answer the equivalent number of inquiries as 700 people, um, which is, I think, about a third of the number of people they actually employ in that role. And what was really interesting, so that's the cost, that's the cost saving side of it, right?

And so that's exciting. That's great. Um, the flip side of it is is that the customers were happier with the outcome of the A. I. Why not necessarily because the bot did a better job responding, although over time it most certainly will do a better job of responding. But it was about the amount of time it took.

Like when you call delta, would you rather have your problem solved in 90 seconds? Or 10 minutes, right? You want to be off the phone or off of the chatbot as quickly as possible with your resolution to, you know, you want the information you went there for, or you have your transaction handled or whatever it is.

Um, the chatbot from Klarna was able to resolve cases an average of I think about 2 minutes if I'm remembering correctly, whereas the human agents took an average of 11 minutes. So that is a massive improvement in customer experience. And to me, that's a big opportunity coming back to this topic of speech to speech because it opens up a new modality.

And again, it's a richer form of information coming in audio that isn't downscaled to text and then upscaled later back to audio. doesn't lose the information, right? Um, and so it's more likely that a speech to speech model, or that is a component of these systems, right, will be able to serve customers in a brilliant way, and to do it so that, you know, you call Delta Airlines down the road, hopefully in the not too distant future, and a quote unquote person answers, which is an AI, And you just have a conversation, and you're like, Yeah, I'd like to change my flight, or I'm having a problem.

I didn't get miles on my last flight, or whatever it is. They're like, Oh, let me look it up. They authenticate you quickly, and they're like, Oh, Mallory, sure. Here you go. We're going to solve that problem for you, and you're done. Um, it's like having your own dedicated Delta expert. Like, then, also too, right, when you go through those phone trees, you know very well that you either struck gold, that you got, like, one of their good customer service reps, or every once in a while you get someone, they're like, They don't know how to spell Delta, right?

Like they just have no idea how to serve you. You're like, damn, I'm going to have to hang up and basically get back on the skull to get, you know, to roll the dice again. So if everyone get the best customer service rep instantly, wouldn't that be cool? And this is a technology that will enable exactly that outcome for associations.

You're not fortune 500 companies, but people expect from you. Same that they expect from the largest brands in the world, which may not be fair, but it's reality. And so understanding this technology is important because it can not only help you play defense, where you're thinking about how do you provide a comparable experience to what people are looking for, but allows you to create a broader surface area to serve your customers and members in new ways, right?

Translating your content, like we talked about, uh, translating content in the context of a conference, but also think about, like, you know, an audience. An LMS, for example, being able to dynamically translate and have conversations with an AI agent in voice to voice about the content so that the LMS really becomes like a synchronous platform where the best tutor in the world on your domain of expertise is available side by side with a prerecorded content that's coming and you'll be able to leverage that from probably all the mainstream LMS providers will have that in the next couple of years in my estimation.

Mallory Mejias: Lots of use cases there. From your experience, would you say customer service or we could call it member service is a big pain point for a lot of associations that you've spoken with me?

Amith Nagarajan: Yeah, for sure. I mean, it's it's an area of significant time investment. So if you think about where do associations invest their staffs time, a big chunk of it is interacting with members and answering questions. And what I get excited about with this type of technology isn't that you're eliminating that, but that you're freeing up your people to actually have meaningful conversations with people.

So rather than answering an endless stream of emails that are asking you questions, Basically the same 20 or 30 questions. You can have an A. I just take care of that. Give people a better experience because of media. It's immediate. Um, and then have your people focus on first of all, figuring out like better things they can do to serve members.

But also when you do have those interactions, you're not hurried. You can actually take the time to go deeper with the member and make sure that you're really connecting them with the best possible outcome. It's kind of like, you know, you're experiencing experience, all of our experiences when you go to the doctor's office, you know, I always feel rushed.

It's like, Oh, you're going to spend 10 minutes with the doctor. If you're lucky, they don't remember anything about you. Maybe they remember your name. Maybe not. They pretend like they do and you have 10 minutes to tell them all about whatever it is that ails you. Um, wouldn't it be great if that didn't feel that way and you just had all the time in the world at your pace to address whatever the issue is.

And that's a customer service problem in the medical world. Same thing there. So. I, I just get pumped about this because I think it can democratize access to great services and associations can be a big part of that.

Mallory Mejias: All right. Topic three, AI and grid infrastructure tech. Recent reports highlight an interesting trend in climate tech investments, particularly related to AI. While overall investment in climate tech startups has declined since peaking in 2021, one sector is seeing significant growth, and that is grid infrastructure technology.

Grid infrastructure startups raised 2. 73 billion globally in the first half of 2024 with U. S. deals accounting for almost half of that. The investment is driven largely by the increasing power demands of AI and electric vehicles. According to John McDonough, a senior analyst at PitchBook, VC investment in electric grid infrastructure is on track to surpass last year's 4.

37 billion. A significant portion of this funding is going toward battery energy. Battery energy storage, new battery technology, AI and software for grid management and hardware for grid management. The surge in electricity demand due to AI, particularly for data centers is actually driving interest in technologies to strengthen and manage the power grid more efficiently, which underscores how AI is a driver of innovation in other sectors, particularly in managing and improving our power infrastructure to meet growing energy needs.

So Amith, when you sent this to me, you mentioned that. I know that many associations, rightfully so, and many association leaders are concerned about the environmental impact of AI, which I don't feel like we, we definitely haven't spent like a whole dedicated topic on, I think, on this podcast. So can you speak a little bit to those concerns?

Yeah.

Amith Nagarajan: gonna be the next bigger version, the next super computer, the next cluster, the next, you know, the next thing, right?

So these things are all very, very energy intensive. Um, and, That is a valid concern, both in terms of the financial costs, but then it's, you know, energy right now is a scarce resource, effectively, it's expensive, and it's scarce in the context, independent of the financial side, in that it produces, generally speaking, is not clean, you know, most energy is still not clean, so.

Um, there's valid concerns there. At the same time, I think that, you know, some of the points you've made about the investment being had in in this sector, uh, coupled with, uh, the growth in AI itself actually being a massive assistant in solving for these problems. I'm very optimistic that, um, all of this AI investment in terms of the energy side and the carbon footprint of it is going to pay massive dividends, um, in the next You know probably in the next decade, you know, and so you think about what you said uh storage so a lot of the problems with our energy infrastructure is that When power is available isn't necessarily when it's going to be consumed So think about solar solar is generated during the day and assuming that there are no clouds in the sky and so forth That isn't necessarily when it needs to be consumed.

Sometimes it is, sometimes it isn't. Um, that, so storage is a helpful thing because if I can say I'm going to have a solar farm, I'm going to store the energy, and then I can redistribute it when I need to, that's very interesting. Uh, another problem is transmission. So we lose a large percentage of energy in transmission because the way we transmit Um, energy is the way we've done it for a long time, and there's loss, there's loss in transmission for every additional mile, there's a percentage of lossiness, and so we can't generate power, uh, too far away from the consumption of that power, even though we have a national grid, it's really a bunch of localized grids that are connected, uh, because the farther you try to transmit the power, the more loss there is, uh, you know, the loss basically generates a lot of heat, uh, so that has its own issues, but really, More than anything, you're just losing a large percentage of, it's like, you know, you're driving an oil tanker down the road and there's a hole and so you started off with a million gallons of gas or whatever it is, and by the time it gets to where it's going, it loses a certain amount, right?

That's kind of the analogy that I'd paint in picture in people's minds. You see transmission wires overhead. You assume that, well, how do you lose energy? Well, it's just the nature of 100 percent efficient in terms of the way that material is transmitting or conducting the electricity. Yeah. Yeah. Which is, by the way, one of the reasons, uh, there was so much excitement about this idea of a room temperature superconductor, which didn't actually turn out to be true, but last year we talked about it a little bit on one of our early pod episodes.

Um, the theory behind a room temperature superconductor, superconductivity essentially means there's zero loss in transmission of electrical current. Uh, and so, in theory, and this has been produced in the lab at, like, basically close to absolute zero temperature. So if you could do superconductivity With materials that are affordable at scale, right at in room temperature or like in the real world, then you can transmit power with no loss.

And that solves one of the major problems. Also, it means that the chips that we have, if we have super connectivity at that level, you're gonna lose a lot less energy. Chips can be smaller, faster. More energy efficient. So that's interesting. Why do I bring that up? It's because it's not part of the particular investments you're talking about.

That's a material science problem more than anything else. And so material science is an area that's exploding in large part due to AI. So I'm excited about that. And then The last thing I'd say is there's a lot of optimizations in terms of the grid management you mentioned, the software and hardware, um, being able to be smarter about how we manage the power that we do have and how we're transmitting it.

Um, so that's good in terms of better quality of service, fewer brownouts and blackouts, but it's also a good thing from an efficiency perspective, which both saves money and reduces the carbon footprint. So I'm pumped about this article and the statistics about the investment because where capital flows, Innovation goes

Mallory Mejias: What a, what a good segue. I'm thinking for those of our listeners who are not, I fully understand, right, that there's a huge energy consumption piece to AI, but it's always helpful for me then to go out and like find some stat or data point. So I felt like this would be helpful to share with listeners as well, who might be trying to bring this back down to the ground.

By 2025, it's estimated that data centers will consume between three and 13%, which is a large number. range, but think about the 13 percent will consume between 3 and 13 percent of global electricity. So for me, that definitely helps put this into terms that I can understand when thinking about greater environmental impact.

I think if you ask the average person, right, what can you do, um, to help the environment? They might say things like recycle more or eat less meat. Or maybe drive electric vehicles, even though, Amith, I think you told me that they're actually not as good for the environment as you would think. But I want to know, in terms of this conversation around energy consumption and AI, is there anything an association can do or keep in mind as they're implementing AI, or even an individual, or do we just kind of have to let this play out?

Amith Nagarajan: well, I mean i'll come back to the analogy of the fine wine versus tap water thing that I talked about earlier And that might have a financial cost and you see you allocate the fine wine more closely and more carefully Um because you don't want to spend money or you don't want to wait as long for the response Uh, and you see you build systems that use that that resource a lot more sparingly The same thing applies to our energy consumption.

If you use the smaller models They're much more energy efficient, so both you will save money and also you will have a smaller carbon footprint. So, uh, it's easy to just say, Oh, yeah, we're gonna go with, uh, you know, chat GPT or Claude. We're just gonna use the most intense model because it's the most powerful.

Um, You know, using another metaphor, it's like flying an Airbus A380 double decker that can seat 600 people and putting one person on it. You don't need the Airbus A380 for all missions. You might have a Cessna that works just fine. And so it's the same thing. Small models have their purpose. Um, and I, I get excited about how small models, especially if you inference them locally on your own computer, you know, running them on your phone, right?

Apple had their event this week where they talked about Apple intelligence and their, the silicon that they have. which is capable of running their own models on device, which are quite capable models. Um, that excites me because that efficiency is going to result in a lot of the workloads, uh, being done by super efficient chips.

Um, I'm also excited about hardware innovations like the, we talked about grok with a Q G R O Q grokcloud. com. We're big fans of those guys because they have a radically more efficient hardware architecture. For AI inference, uh, these massive data centers you're talking about, they do two things. They train models, and then they serve models, or what people call inference, which is basically when you run the model.

And so, you know, the ratio of training versus inference, uh, is constantly changing. But as demand grows, it's gonna be a massive amount of inference. And so if you're able to be more efficient the way Grok is, it's both way faster, but it's less energy, uh, intensive. So, I think an association should be, especially as they're looking to scale their use of AI, right?

Like, when you're in the experiment phase, it's not a big deal. You're not even, you're not even a rounding error. But, if you're going to scale your use of AI in a significant way, um, you should definitely be thinking about these issues. It's part of the responsible AI experience. framework where you're thinking about the ethics of data privacy, you're thinking about models and how they affect people, you're thinking about biases, and you should also be thinking about carbon footprint, energy consumption.

The good news is, is that there is an incentive to save money, which everyone's tuned into around the world, and that same incentive will drive people to use the more efficient models. So that's exciting as well.

Mallory Mejias: So your advice then is not to stop using these powerful large language models, but simply to identify opportunities where you can use small models, if possible.

Amith Nagarajan: Yeah, last time I checked horses have a smaller carbon footprint than vehicles that we drive, but I'm not suggesting you go back in time and ride a horse around. I mean, you could, but, um, I don't even know if that's true. I think it is, but I'm assuming that the horses output is lesser than a typical car.

But my point would be that the technology is going to go forward. And, you know, if you want to be relevant in your field, you have to use a I to be relevant and to be effective in your field. But you can be smart about it. You don't need to, like, Okay. Just say, oh, we've got plenty of money. So let's just like turn on the car engine and yeah, I'm not driving it right now I don't keep my car running in my driveway Just because I want the air conditioning on all summer long in New Orleans because I never want to experience Not having ac in my car.

It's like that's ridiculous. Like I don't know if anyone hopefully no one does that but like You know, even if I had an unlimited amount of money, I didn't care about wasting the money. That would be Just crazy, like to run your car all the time. The same thing is true with wasted resources in AI. If I use GPT 4.

0 for every single inference request I had, it's unnecessary, it's wasteful, and it has a bigger environmental impact. So there are a lot of choices, and one of the choices you can make is using a mixture of different models to serve your association's needs. And it's not the simplest topic in the world to be thinking through, so I don't recommend it as a concern for people who are in that early, early phase, which is almost everyone right now still, um, Most people are in their very, very beginning of the journey.

You should flag this as an area of importance. You should note it as something you should be thinking about actively as you grow your use of AI. But don't let it stop you from getting started in the journey.

Mallory Mejias: Anything else you want to say there? Are we good to wrap up? Cool. All right, everyone. Thank you for tuning in to today's episode. For any of our listeners that are in the storm's path, stay safe and we will see you all next week.

Post by Emilia DiFabrizio
September 12, 2024