Understanding and following up on our customers’ intentions

FARFETCH Tech
FARFETCH Technology
5 min readFeb 7, 2023

--

By Pedro Azevedo

This is the second part of the series about iFetch, multimodal conversational agents for the online fashion marketplace. If you haven’t started reading this series, check the first article before going further.

In the near future, AI agents will mediate the majority of interactions between large organisations and their users. Today, this isn’t just a common belief because online shopping is taking over whole markets. It’s also true because more customers than ever are communicating with businesses through text messages, and these customers are more likely to buy. To date, records show that conversational agents are valued. This is because there is more demand for human services and more customers are connecting with businesses through text messaging, and those customers are more likely to make a purchase (Attentive, 2022; Liveperson, 2022; McKinsey, 2022).

In order to mimic the high-end apparel follow-up service offered by human fashion assistants in brick-and-mortar stores, the iFetch Project is creating a cutting-edge multimodal conversational agent. Our first question is, “How can we use dialogue to improve customer intent and make more informed decisions about agent dialogue actions?”

Exploring FARFETCH Product Catalogue through dialogue

Our project started by delivering a conversational AI agent that leverages the knowledge existing in our catalogue to inform customers about size, fit, measurements, and the composition of apparel. This is the most iconic moment in the entire customer journey.

Product display pages are critical. The more engaged the customer is with a product, the higher the probability that he will add it to his shopping cart. And brandsdo not have much time to capture customers’ appetites (Iron, 2022). “The paramount goal for your product pages should be to build user confidence by providing all the information necessary for a purchasing decision and making the process as intuitive and straightforward as possible.” by — Rosara Joseph, Content Strategist at VentureWeb.

Mockups by Sérgio Pires

We began our challenge by evaluating Natural Language Understanding (NLU) models for slot filling and intent detection, which we dubbed “product information.” The JointBERT model — a well-established and straightforward model — was customised for our problem. For intent prediction, the latent representation of the [CLS] token from BERT is fed to a softmax classifier. For slot filling, the latent representations of each utterance token are also fed to a softmax classifier to classify over the slot filling labels. The learning objective is to maximise the conditional probabilities of both tasks, which can be achieved by minimising the cross-entropy loss. For more information, please see here.

In this particular task, the language processing layer is as we show for the following pictorial examples:

Mockups by Sérgio Pires

Given the prior state and the current NLU output, the Dialogue State Tracker (DST) defines a new dialogue state. The available slots are split into two categories: filled and requested.

The system requires the requested slots to be filled. For example, if a customer says, “I want to know the product’s available sizes,” the NLU model will record a slot of “available_sizes,” and our DST logic will recognise that “available_sizes” is a requested slot for the system to fill; on the other hand, filled slots are those that are already filled (e.g., “I like the colour blue”).

When the system retrieves the required information, the requested slot is dropped, just as the user act. When a question-answering task is performed and the slot product differs from the previous state representation, the state is reset. During a retrieval task, the state is reset when the user starts looking for items in a different category. After each system action, the state is also updated. The system uses certain slots after each action. As a result, each slot contains information about his former usage. This information is important so that the system doesn’t use the same slots unless it has to.

Concluding remarks

We should be clear at this time that these two layers of a conversational system are crucial for a well-functioning, coherent, and consistent dialogue. Due to the probabilistic nature of the NLU, it is unlikely that the system will always be able to identify the detected intent with high accuracy. Keeping this in mind, we created a methodology to ensure that the interaction can continue without causing the customer any frustration. As a result, we prefer to proceed with the available slot information. If no solution exists, keep the same system behaviour and select the intent with the highest likelihood. This hybrid approach allows us to keep the conversation on track while working with the most reliable information that the system knows at that time. This avoids misinterpretations or causing too much confusion for the user, guaranteeing that the system will provide the most accurate information according to the user’s intentions.

We will continue this journey in forthcoming posts.

References

Attentive (2022). The State of Conversational AI in 2022. Accessed: 2022–12–04.

Chen (2019). “Bert for joint intent classification and slot filling.” arXiv preprint arXiv:1902.10909 (2019).
Iron (2022). Product Display Pages: The Secret to Boosting Online Sales and Brand Equity. Accessed: 2022–12–04.

Liveperson (2022). 2020 Consumer Preferences: How they view Conversational Commerce and AI. Accessed: 2022–12–04.

McKinsey (2022). State of Grocery Europe 2022: Navigating the market headwinds. Accessed: 2022–12–04.

Originally published at https://www.farfetchtechblog.com on January 12, 2023.

--

--