From Hackathon to Live Product in 60 Days: A Playbook for Association AI Experiments

Written by Mallory Mejias | Mar 24, 2026 10:30:00 AM

In late January, a group of 14 people gathered at a beach house in Florida for six days. They worked 16 to 18 hour days, mostly self-directed, with one shared focus: experiment with audio AI. No product brief. No deliverables. No requirement to build anything that would last beyond the week.

About 60 days later, that experiment became Grace — a voice AI agent live on Sidecar's website, handling real conversations with visitors, displaying visuals in real time, and processing purchases through Stripe. She launched just days ago as a research preview. Whether she ultimately succeeds as a product is still an open question. But the process of getting her live this quickly taught us a lot, and we think parts of that process are worth sharing.

We should be upfront: Sidecar had some real advantages here. We're part of the Blue Cypress family of companies, which has software development teams, leadership with deep AI experience, and a culture that was already oriented toward rapid experimentation. Not every association has those things, and we don't want to pretend this was easy or universally replicable. But we do think the underlying approach — the way the team was structured, how decisions were made, what was prioritized — has pieces that any organization can adapt.

It Started with a Sandbox, Not a Product Brief

Blue Cypress runs hackathons at least a couple of times a year. Not just for software development — they've done them for marketing, sales, finance, and other functions. The format is simple: take a group of people, isolate them from other priorities, and give them focused time to work on a single area.

The January hackathon was dedicated to audio AI. The opening session framed why audio AI was approaching a tipping point — the models were getting powerful enough to be useful, fast enough for real-time conversation, and nuanced enough to adapt to different kinds of people. But there was no directive to build a specific product. The first instruction was just: go experiment.

Some people worked with ElevenLabs. Others tested local AI models for text-to-speech and speech-to-text. People explored creative applications without any expectation of building something durable. That lack of pressure is what made space for the idea that eventually became Grace.

One early-career developer had his flight canceled due to weather on the way to Florida. He arrived late, running on caffeine and overnight travel. On the plane, he'd already built a portion of what would eventually become Grace. His demo was a simulated computer store where you could browse, search, and buy a laptop entirely through voice conversation in a browser. He'd used a feature from ElevenLabs called client-side tools, which allows the AI to trigger visual actions in the browser during a conversation. Most of the team, including leadership, had no idea that capability existed.

That single demo opened up a wave of thinking for the rest of the week. The idea that an AI could talk to you and show you things at the same time, adapting both the conversation and the visuals to your specific questions, changed what everyone thought was possible. Grace grew out of that moment.

A Small Team with Protected Time

Two people worked on Grace full-time over the two months following the hackathon. Six to eight others contributed fractionally — design, content, strategy, testing. Leadership's involvement was limited to setting the initial vision and clearing the path for the team to move fast. The day-to-day decisions were made by the people closest to the work.

A functional, live AI product — voice conversation, visual display, payment processing — built by essentially two dedicated people in two months, with some part-time support around them. Again, those two people were experienced developers within a company that already had AI infrastructure in place. An association without that baseline would face a steeper climb. But the point is that this didn't require a 20-person team or a year-long timeline. The scope was deliberately constrained, and the team was deliberately small.

The tools themselves are commercially available: ElevenLabs for voice, Claude for reasoning, Stripe for payments. None of them require enterprise contracts or specialized technical knowledge to get started with. The barrier for most organizations is less about access to the technology and more about carving out the time and permission to use it.

Decisions That Shaped the Build

Every project involves tradeoffs, and the team behind Grace made theirs with speed in mind.

They chose ElevenLabs over other audio AI platforms because the hackathon gave them hands-on experience with several options. ElevenLabs had the right combination of real-time capability, voice quality, and the client-side tools feature that made visual interaction possible.

They chose Claude Haiku 4.5 as the reasoning model — the smallest and fastest in Anthropic's lineup — because real-time conversation demands speed over raw intelligence. A more powerful model would have produced more sophisticated responses but introduced latency that breaks the feel of a natural conversation.

They built a curated, lightweight knowledge base rather than connecting Grace to a deep retrieval system. Sidecar has Betty, a knowledge tool with access to essentially everything the organization has ever published. But Betty's depth comes with processing time that doesn't work for real-time voice. So Grace got a focused subset of knowledge — enough to hold substantive conversations, without the latency.

They pre-built about 30 visual slides rather than attempting dynamic image generation. Each slide was tagged with a title and description so Grace could select the right one based on the conversation flow. Faster to build, easier to control, and good enough to ship.

Every one of these decisions followed the same logic: what gets us to a functional product fastest, with quality that's good enough to learn from? We weren't trying to build the final version of Grace. We were trying to build the first version — one that was useful enough to put in front of real people and honest enough about its limitations to label as a research preview.

The Culture That Makes Speed Possible

Moving this fast wasn't just a resource decision. It was a values decision, and we realize that's easier for some organizations than others.

Blue Cypress has a core value called "progress over perfection." The thinking behind it is that core values should function as an operating system for culture — they tell you what's different about how the organization works. Progress over perfection is the value that gives teams explicit permission to ship things that aren't finished.

Grace launched with known imperfections. She can't listen and speak at the same time (a limitation called half duplex), so there's a brief delay when you interrupt her. Her pronunciation of proper nouns and acronyms is inconsistent — during a podcast demo, she said "AAP" instead of "AAIP." Her knowledge base is intentionally limited. The team knew all of this before going live and launched anyway, labeling the experience a research preview and recording every conversation to learn from.

That tolerance for imperfection has boundaries. Production software at rasa.io, another Blue Cypress company, processes tens of millions of emails daily — that operation demands near-perfection. Finance, legal, compliance — those functions require rigor. The line that progress over perfection tries to draw is between contexts where precision is non-negotiable and contexts where learning through experimentation matters more than polishing in isolation.

We know many associations default to the precision end of that spectrum for everything, including early-stage AI projects. And honestly, there are good reasons for that instinct — associations carry a responsibility to their members that makes caution feel appropriate. The suggestion here isn't to abandon that instinct entirely. It's to notice whether the standard you're applying to an AI experiment is the same one you'd apply to a policy position or a credentialing standard, and whether that's proportional.

What Associations Can Take from This

You don't need a beach house in Florida or a week-long hackathon. The transferable elements are more fundamental than the format.

Protected time matters. Not "work on this when you have a chance," but genuinely cleared calendars. Even two days where a small group steps away from email, meetings, and daily operations to focus on a single area of exploration. The January hackathon was six days, but the insight that led to Grace could have surfaced in a shorter window. The important thing is that people weren't multitasking.

Permission to experiment without promising outcomes matters. The hackathon didn't start with "build us an audio AI product." It started with "go play with audio AI and see what you find." That freedom is what allowed an early-career developer to build a laptop store demo on a plane — something nobody asked for, which turned out to be the most important demo of the week. If every experiment has to justify itself with a projected ROI before it begins, the most unexpected ideas never get explored.

Starting with something small and visible matters. Grace was designed to be the easiest possible use case for audio AI: an inbound guide on a website. It wasn't trying to replace a complex internal system or overhaul the member experience in one shot. It picked a single touchpoint where the impact would be obvious to anyone who tried it. Associations that start with something similarly scoped build organizational confidence for bigger initiatives later.

We don't know yet whether Grace will be a huge success for Sidecar. She's been live for days, not months, and there's a long list of things we want to improve. But we've already learned more from a few days of real conversations than we could have from another month of internal testing. And that, more than any specific technical decision, is probably the most useful takeaway for any association thinking about their own AI experiments.

View full post