Case study

How we revamped Booking.com’s customer help experience

Customer Service experiences tend to be quite fraught, especially when you’re dealing with a problem that could make or break an expensive trip. Many of us know this from personal experience and still, too many Help Centre interactions lean more towards infuriating than actually helpful. In the travel industry, this gets even more fraught due to the seemingly endless amount of things that can and do go wrong.

Reducing a lot of this friction is not only better for the customer, it can also have a great impact on a company’s bottom line, as proven by my experience at Booking.

The mission

We set out to prove that classifying customer service queries into “things that could be solved with automated tools” and “things that actually need complex human assistance” would be good for business. By doing this, we could help customers more effectively which would likely save the company money and improve the overall user experience, leading to fewer cancellations, more profit and long term customer loyalty. 

Proof-of-concept

We built and launched a chatbot to triage incoming customer service contacts. The bot would be the first point of contact, handling low-stakes topics like parking, check-in/out times and basic information requests like “can I bring my dog?”. At the time, chatbots were a relatively novel idea. They seemed interesting enough to pique a user’s attention and cheap enough to build for leadership to get behind.

My role

Define, validate and finalise the user flows and content for each topic that the chatbot could process, then organise and manage the localisation process for over 40 languages.

The process 

Working with a combination of existing customer service data and tech constraints, we defined the most “solvable” topics for the bot. These were things like requesting a parking space, specifying an arrival time and surfacing check-in and check-out times. 


Together with the UX designer, I crafted the interactions for each flow and then drafted the initial content. I worked with an almost frustrating user persona as a guide (18-40 years old, at least vaguely tech literate, probably mobile app user, likely to be stressed or anxious) to create a simple messaging hierarchy. With buy-in from relevant UX and product stakeholders, this became a guiding light for the bot.

As soon as we had testable prototypes for each flow, we began user testing. Once debugged and validated, we moved onto the next topic until the MVP came together.

Validation in a data vacuum

Research was critical to our process before we even began. That being said, validation is hard to come by for new products when there is no data and no one wants to throw expensive research resources at you until you can prove it would be worth it to do so. Our solution: rapid guerrilla-style user testing. 


Leveraging the size of the company and the pace of hiring at the time, we set up a weekly feedback session every Friday for months, inviting colleagues (the newer, the better) from outside our department to come and test what we were working on. We’d have between 6 and 10 people for each session which gave us a good enough sense of whether the content was resonating with people as I’d hoped. Though not entirely scientific, this approach worked. If 6+ people tell you you’re doing something weird, chances are it’s weird but if they all love it, you might be onto something.


The success of these sessions, proven by our increased velocity and peer review feedback, eventually enabled us to get more buy-in from leadership which turned into more research resources. Before long, we were running real lab sessions and travelling to other markets to validate our work.

This all seemed a lot cooler before ChatGPT

The result

The MVP proved its value - initial users were cancelling less, using more automated tools and contacting Customer Service less often. This meant we could add more topics, increasing in complexity to cover things like cancellations, airport shuttle reservations and various other booking amendments. 


The product eventually became a messaging-based alternative to the Help Centre itself and maintaining both wasn’t sustainable. The solution was to take what we had learned from the messaging bot and apply it to the revamping the regular Help Centre interface. This included re-purposing much of the original bot content and interactions which actually wasn’t as painful as it sounds.

A note on localisation

To provide a relevant, resonant experience across all the different languages and markets as the product grew, our localisation efforts became integral to the success of the product. Over the course of about a year, I coordinated localisation efforts in 42+ different languages and participated in numerous international research sessions. In collaboration with our in-house localisation coordinator, I created language guidelines that could be used by the different language teams as they each worked on the 3000+ strings. 


This felt like a massive undertaking but with an airtight calendar to work with our release schedule, it all came together. We established delivery timelines, making sure there was plenty of room for testing in each language. To do this, we recruited members from each localisation team to participate in research sessions for each major language before the bot went live. Oh how far we’d come since those scrappy Friday prototype testing sessions!

Same ideas, different interfaces

We were able to rework most of the chatbot user flows for the new and improved Help Centre project.

The scenarios were the same: user needs help, user identifies topic and gets the most relevant type of help with that topic, ergo the content needed to be somewhat similar. 


This was the point at which I became very insistent that all interfaces are technically conversational. It’s all just a form of dialogue with the person pushing buttons somewhere out there in the real world. To prove this, I began executing an A/B experiment strategy, updating the old Help Centre content to reflect the concepts I had seen work really well for the chatbot. With reduced access to qualitative research data, I came to rely more on the in-house experiment tool.  See the examples below:

Pragmatism = empathy: Staying focused on what the user needed to do to solace their own query helped reduce customer service contact rates. For topics like parking, the agents would simply do what we’re asking the customer to do here instead.

Setting clear expectations: Explaining what was required and making it easily available BEFORE the customer started the phone call made a huge difference here.

The broader information architecture - aka the navigation blueprint for the new and improved Help Centre experience -  was a different story. Overall, the Help Centre was capable of a lot more due to less restrictions and more available functionality. The way users engaged with it was completely different; they were faster, able to solve a broader range of problems and generally a little less patient. 


It’s no secret that everyone thought chatbots were dumb and thus treated them as such, while expectations for a complete app/web experience were much higher. Creating intuitive, logical paths to solve whatever problems our users were facing was paramount.

Using a combination of quantitative data and existing qualitative research insights, we figured out what worked, then honed it with card-sorting exercises, design critiques and online user testing. I also had the luxury of seeing the impact on behavioural metrics such as clickthrough rates, bounce rates and various interactions with different topics throughout the help funnel. This gave us a solid baseline which could then be optimised as the product matured. 

Technical challenges

Some of the biggest challenges while working on the Help Centre were largely due to the complexity of the backend services that lurked beneath the interface, coupled with a lack of researchers and developers. The backend code dictated what we could and couldn’t do, unintentionally complicating even the simplest idea. At the same time, our devs were navigating all-consuming code migrations and had limited capacity for anything that wasn’t a major business priority. 


That being said, I did eventually manage to get the tech lead on board to help me set up copy experiments as needed. It was great, he was practically gleeful about it. After seeing a few significant wins with previous experiments I had run, it was easy to understand the value of copy experimentation in this space. Mind you, I also think setting up my metrics might have been a nice reprieve from the more boring depths of a perl to java migration.


In a way, these technical complications were a blessing in disguise as they helped me to understand my product a lot better and helped me empathise with some of my closest colleagues' perspectives, even though their regular work days looked a lot different to mine.

Conclusion

At this point, our work was far from done (isn’t it always?) but it was ready for a new team to pick it up and run with it. The best parts of what had worked in our original proof-of-concept had been reworked into a shiny new help experience. There was a clear thread of proactive empathy that could be traced right back to our first chatbot user flows, now helping the thousands of people who used the Help Centre each week. I was very proud to be able to look back and see how far it had all come.


I learned so much throughout this experience. I learned to get comfortable with all kinds of data. I learned to get comfortable with facilitating various research sessions. I got great at managing my own and others’ calendars. I became a better, stronger UXer thanks to my peers and the various product leaders who took the time to share ideas and support my curiosity.