Grape expectations: How genAI reshaped a wine company’s customer service team

Wine Enthusiast, an online retailer of all things wine, used genAI to monitor tens of thousands of customer calls, gaining insights into why consumers were calling and using that data to quickly catch and fix product defects.

wine glasses
Fotoworkshop4You (CC0)

New York-based Wine Enthusiast offers online customers what it calls everything they need to live the wine lifestyle — from the vino itself to corkscrews, glasses, wine cellars, furniture, and even two magazines on the topic. The company also receives 100,000 customer service inquiries annually.

During the COVID-19 pandemic, the 45-year old online retailer's presense boomed. Consumers were staying home, nesting, building out their perfect office space, and drinking more.

For more than a year, Wine Enthusiast had been using a SaaS-based platform from San Francisco-based startup Pathlight for performance management metrics of its customer-facing teams. Then Pathlight pitched a new generative artificial intelligence (genAI) product called Conversation Intellgence; it could transcribe every customer service conversation, grade customer reps based on company metrics, and red flag potential problems.

The large language model (LLM) that underpins the tool uses Wine Enthusiast's own data to learn company policies and prodedures and determine whether a representative followed procedures — and whether the customer left happy or not after a call, according to John Burke, head of customer service and systems at Wine Enthusiast.

When it came to customer service calls, historically, the company had to manually comb through each call to discern customer trends or problems, an impossible task to perform at scale. As a result, Wine Enthusiast barely scratched the surface in terms of analyzing customer service conversations. And when complaints came in, they were all anecodal; finding persistent problems was next to impossible.

Now, genAI tools essentially act as "autonomous analysts," Burke said. The LLMs the tools use can quickly sift through the bulk of customer conversations, analyze content, and synthesize the transcripts into reports that surface consumer trends and product issues. 

john burke head of cx systems John Burke

John Burke, head of CX at Wine Enthusiast

Computerworld spoke with Burke about the rollout of genAI at Wine Enthusiast, the project history, hurdles, and benefits.

What problem were you trying to solve with genAI? “The company had a relatively small customer service footprint, and it wasn’t really able to manage the volume of people coming in. People think of customer service as point-of-sales service, but we’re talking product warranties and supporting these products. Wine cellars are built to last. Some will last 10 to 15 years and they’re going to need parts and maintenance.

“My role coming in was to figure out how do we responsibly grow this part of the business to meet the demand, the expectation of customers — especially in the Amazon world that we live in today of immediacy and technology — without going out and hiring 60 more people?"

How did you go about solving the issue? "Phase one was getting us on a set of tools and platforms just to better communicate with customers. We’d moved on to Zendesk as our platform, and one of the first challenges we ran into — even though Zendesk was great in allowing us to communicate with the customer — was knowing what those customers were contacting us about.

“That started with us really putting the onus on our service team that when you complete a conversation, answer a couple questions to tell us what you talked about. To no surprise, what we found was that 90% of the reason for conversation was questions. But questions about what?

“I don’t blame the team. They’re moving from call to call and they don’t want to have to stop and answer six or seven questions.

“My focus is not just on call volume or how many tickets you address. It’s about what quality are you delivering the customers and with what consistency? We came across Pathlight because it had a really cool coaching platform that would basically take all these different metrics that mattered to us and roll them into the concept of what they call a Health Score, which is a digestible way for the team to understand where they stand.

“Instead of saying, 'you’re doing really well in first-contact resolution but not so well in chat response time,' we say, 'your overall Health Score is 90 and here’s where you want to improve.'

“About a year into our relationship with Pathlight..., they said they were developing a product that in addition to assessing the [service] agent's performance, it would also analyze every conversation that happens. So, it can tell you what they’re talking about, what their sentiment is, what the resolution looks like, and if they’re adhering to our policies and procedures. That’s what started us on our journey with AI.”

Are the majority of service rep conversations conducted by voice or messaging app? “Our channel mix is 70% voice and 30% everything else. For us, the challenge was how do we get something meaningful from these phone conversations that in some cases are 20, 30, or even 40 minutes long?

“That’s where the pain point started for us. With Pathlight, we have like a rubric and we can digitally grade our reps. But my leadership team was coming to me saying, 'John, I just spent 20 minutes listening to a phone call and I graded one conversation. How am I supposed to do my job and also evaluate the team?'

"Historically, the only time we looked at a [service] recording was if the customer complained, and we went back and tried to figure out what happened. We were always looking at the worst of the worst conversations and using them to evaluate our team, when they’re having hundreds of conversations that are perfectly pleasant."

“We were always looking at the worst of the worst conversations and using them to evaluate our team, when they’re having hundreds of conversations that are perfectly pleasant.”

How much work was involved prior to the genAI rollout evaluating agents? “I have a relatively small leadership team, and I’d say they spent half their time evaluating. So much of that evaluation was about recovery. This customer is upset because their order didn’t arrive on time. Many of the team felt like they were a lawyer trying to build a case against a client. It truly did become half [the management team's] time either trying to determine who are their top performers and who are the ones on the team who need additional coaching and training, or who’s adhering to our processes?

"For us, we look at certain business metrics. We want to make the customer happy, but we also don’t want to give away the store — finding the balance of how to make the customer feel fulfilled when something goes wrong without an immediate knee-jerk reaction of giving a full refund."

In what ways did your older method of evaluating customer support fall short of your goals? “We ended up in a space where we were just looking at the worst of the worst. The bigger challenge for me — I sit on our marketing and our commerce meetings — and the question always raised for me was what are the products people are liking or disliking, and what are the key issues we’re seeing? I knew we had a problem when before that meeting every week I was Slacking my team and asking, 'What have you guys heard this week?'

“It was so anecdotal, and I felt silly presenting that to the marketing team. Their immediate follow-up questions were always, ‘How many, what customers? What product lines? Every time all I could say is, ‘That’s all I’ve got for you.’”

When did you begin your rollout of genAI and when did you complete it? “We started it in August of last year and it took us about a month of experimenting. Then we went fully live around September. We’ve been live ever since. I’d say we’re pretty well done tweaking the prompt. We’ve got it pretty well dialed in as to who we are and what conversations should look like, which has been really helpful.

“We’ve effectively eliminated manual grading. We don’t do it anymore. We just let the system do it.”

Were you concerned that Pathlight’s cloud-based LLM would use your proprietary data to train itself, and could potentially expose your data later on? “I’ve been a student of AI and I like to be on the emerging end of technology. So, I’ve kept myself educated on privacy concerns and ethical limitations of AI — governance and things like that. I didn’t immediately have that concern, partially because we’re not a bank. We’re not in insurance or healthcare. If the language model wanted to learn against our customer base, I wasn’t particularly concerned about that.

"The concerns I did have — and Pathlight was very transparent about it up front — was what about customer credit card information? They assured us the model was trained to detect those patterns and remove them from their learnings, which gave me a little comfort. That’s the only thing we don’t own, the customer’s personal information. Getting that walled off gave us the comfort to say, ‘We’re fine to proceed.’”

Did you create a genAI team to deploy the platform, or did you mostly rely on Pathlight for the expertise? “Being a medium-sized business, we didn’t have the luxury of saying this new space is going to get a new team. It was largely myself and a couple of my managers. We sort of fell into it very early on with Pathlight. The first conversation we had where they were analyzing our calls and showing us what the readouts looked like was not even in a Pathlight product. It was still a prototype. So we were seeing the sausage being made on the back end. We hoped we helped in developing aspects of the product early on.”

You’ve called your genAI tech “autonomous analysts." Why? How does it work? “The way Pathlight pitched the product to us ended up being the reverse in terms of value. They looked at it and said this is going to be the way the eliminate the manual process of evaluating your team and the byproduct of that will be you’ll know more about what your customers are talking about.

“The value for us was the exact opposite: I want to know what our customers are talking about. I want to fix those issues up front. And then, naturally our team is going to perform better because of that.

“We’ve effectively eliminated manual grading. We don’t do it anymore. We just let the system do it.”

“So, having this robot in the background listening to calls all day long and surfacing the stuff most important to us both on the agent and customer level was incredibly helpful to us, especially when my team's biggest complaint before was they were spending half their day or more not even doing the work, just listening and scrubbing through calls and then having to go through the manual process of evaluating. That’s another area we struggled in.

“My leadership team has different backgrounds. They have different management styles. One of my managers who has been in this industry for 40 years is a tough grader. It takes a lot to impress her. So, when I looked at scores when manually graded, the agents she evaluated were generally graded a lot lower than one of our other managers who is a little more forgiving.

“When we switched to AI, that bias was removed. What we were seeing was the actual analysis of the conversation without the human nature of thinking, ‘Well, the agent has had a tough week.’ Or ‘the customer was really laying into them, and I think they really did well enough.’ We removed that element from the equation.”

How do you store your customer service interactions, and how is Pathlight’s LLM able to sift through them? “We currently use a cloud-based telephony system called Aircall. Aircall and Pathlight integrate together through APIs. So, basically the conversations are recorded securely on the Aircall side and we give access to Pathlight to access those recordings for a brief period of time to analyze them and move on.

“That was something important to us; We didn’t have to change the way we were operating. We could still use our same phone system and our ticketing system and just securely allow Pathlight secure access to only the data they needed to accomplish the assessment.”

Were there any hurdles? Did you need to tag your data for better recognition, for example? “Admittedly to this day, we’re still tweaking it. So much of the value comes down to the prompts that we put into the AI model on the front end. For us, it was educating the model on what our business was. It’s not as straightforward as, ‘We sell wine.’ You’re going to hear about corkscrews. You’re going to hear about furniture. You’re going to hear about magazine articles. You’re going to hear about refunds.

“It took us a couple iterations with the support of Pathlight to say, ‘It’s just not getting our customer yet.’”

“The other area where we really had to train the model was around our procedures. Early on, we were finding the AI wasn’t able to tell us if the customer’s issue was resolved. That was because it didn’t really understand what ‘resolved’ meant to us. Was that a return? Was that a refund? Is that a credit? So, over time we continued to tweak the prompts, even to the point of helping the system understand a customer doesn’t have to leave the conversation happy if we have accomplished certain goals that protect the business, provide the customer a good experience. They could still be annoyed, but we could have still delivered on what our expectation was.”

“I think that was a learning process for us. We had an initial prompt we built, but it wasn’t until you started seeing the output that we realized we need to tell is a little more about our business, a little more about our products for it to really understand what we were looking for.”

1 2 Page 1
Page 1 of 2
It’s time to break the ChatGPT habit