Scale Events
+00:00 GMT
Sign in or Join the community to continue

Building Similarity into an AI-Driven Marketplace with Dr. Selcuk Kopru of eBay

Posted Oct 06, 2021 | Views 5K
# TransformX 2021
# Breakout Session
Share
speaker
avatar
Selcuk Kopru
Head of ML and NLP @ eBay

Selcuk is leading the Machine Learning and Natural Language Processing teams at eBay. His teams are responsible to develop horizontal ML and NLP capabilities that are consumed by buyer and seller experience teams. Prior to his current role, he was a Principal Applied Researcher at eBay. Selcuk earned his Bachelor's, Master’s, and PhD degrees in computer science from Middle East Technical University.

+ Read More
SUMMARY

In this session, Dr. Kopru explores why AI-driven similarity clustering, indexing, and search are essential to creating the user experiences needed for today's online marketplaces. Some of these vital AI-driven experiences include search and ranking by text, metadata or images, product review filtering and summarization, fraud detection, and member-to-member communications. He discusses how AI can be applied to text and image similarity searches - or event multi-modal searches, for example, measuring the similarity of a product image and its listing title. Dr. Kopru dives into how eBay uses transformer models to drive similarity capabilities. What are the advanced use-cases enabled by powerful similarity searches? How do you similarity search across billions of items? How can you use smaller models for individual product categories without significantly losing accuracy? Join this session to learn how to apply off-the-shelf transformer models to drive large-scale similarity searches.

+ Read More
TRANSCRIPT

Nika Carlson (00:15): Next up, we're excited to welcome Selcuk Kopru. Selcuk is the head of M-L and N-L-P at eBay, he leads the teams responsible for the development of M-L and N-L-P capabilities, which are key to the experiences of both eBay buyers and sellers. Prior to his current role, he was a principal applied researcher at eBay, Selcuk earned his bachelor's master's and P-H-D degrees in computer science from Middle East Technical University. Selcuk, over to you.

Selcuk Kopru (00:50): Hello. Today, I will talk about an important tool that we use in solving many challenging problems while making a global marketplace. This presentation is about similarity, and I will explain how we are building and using this important tool to solve many tough problems. At eBay, we are working very hard to provide an A-I driven marketplace to our customers, and building the latest and the greatest tools and technologies is one of the key pillars in this effort. My main message in this talk is, in order to build a marketplace that scales well, performs well, and that is being used by millions of users every day throughout the world, you have to productionalize a lot of A-I capabilities, and Similarity is one of the most practical tools in our journey from research to reality. This is what we at eBay are doing for our global buyers and global sellers to provide an A-I driven marketplace.

Selcuk Kopru (01:49): My name is Selcuk Kopru, and I'm leading the M-L and N-L-P teams here at eBay. I've been in the A-I industry for the last 25 years. Before starting, please allow me also to emphasize that all the errors in this presentation are my own and the great results are from many folks across eBay. Here's the outline of my presentation. I'm going to cover three main topics in this presentation. First, I will explain what I mean by an A-I driven marketplace. Then, I will describe Similarity in the context of e-commerce, and how it helps us in solving many challenging problems in this domain. Finally, I'll go into the details of the implementation and use cases. Okay, let's start with the first model. This conceptual graph with concentric circles, depicts the A-I driven marketplace, it's like a flywheel. It will get better and better as each of these components become stronger and it will fail if any of the components fail. In the center, we have the infrastructure. Next, we have the data and the A-I platform there, which serve a different A-I domain, like natural language processing, computer vision, A-I ops, business intelligence, and many others.

Selcuk Kopru (03:14): Then, we have the layer where core A-I domains are fused with specific business logic, like search, ads, catalog, payments, trust, and others. Finally, the most outer circle represents the user experience for our buyers and sellers across the board. I will now play a video and try to show you how the shopping experience is connected to those A-I capabilities. Many components in this experience are driven and powered as of today by A-I. Okay, let's start the video.

Selcuk Kopru (03:53): In this video, I'm searching for my favorite espresso machine. Search experience is one of the main domains that heavily depends on state of the art A-I capabilities. Retrieving results based on deep semantic understanding and ranking the results requires A-I services. The item view is composed of many textual and image elements. Each of these elements are a great source of data features for any machine learning model, and retrieving alike items to the current one or items based on buyer purchases depend on powerful retrieval and ranking algorithms. Automated extraction of item aspects, or automated generation of description summary require A-I capabilities. The full item description is too big to display. For example, in this view item page. Generation of a shorter and a representative description summary is essential. Filtering product reviews for bad content, or feature extraction from product reviews, and shipping delivery, estimation payments, fraud detection, member to member communications, they all benefit from A-I. Showing similar items based on user's most recent actions and suggestions based on the current item needs a lot of A-I for a great e-commerce experience. Now, let me stop the video and continue to the next slide.

Selcuk Kopru (05:31): At this point, I want to highlight the importance of scalability and performance, because those are the most important challenges in our journey from research to reality. eBay is a truly global marketplace dealing with huge volumes of transactions and big velocity, therefore latency and throughput numbers play a critical role for any A-I model to go to production, especially for online use cases where we need to be extra careful in building efficient and lightweight models that are optimized for the right infrastructure. Therefore, before deploying any A-I system into production, many optimizations are performed. During this presentation, I will also very briefly touch to some of those optimizations that we are doing while working with the big transforming models. As I said at the beginning, a very important tool that we use to solve some of the toughest problems is Similarity. Problems like search, displaying of promoted listings, or matching items to products in the [column 00:06:39] benefit a lot from similarity.

Selcuk Kopru (06:41): It can serve as the first step in retrieving the best candidates for further problems assessing and ranking in the searching as pipelines. Let me explain to you how we're attacking this challenging Similarity problem. Similarity can be defined between any two entities based on a distance function. First step is to build the Similarity to represent the entities as vectors in the same vector space. In this graph, for example, for the sake of simplicity, text entities are represented in two dimensional vector zones. Of course, in a real application, this can increase to hundreds of dimensions. The second step is to define a distance function such as cosine distance or leading distance. This graph depicts the similarity between item titles, but how does this help us? When the similarity between item titles will help in matching the listings in the marketplace to products in the catalog, or if we can calculate the difference between product titles, we can identify duplicate products, which can be used to duplicate them in our catalog. Having a clean catalog is very important in any e-commerce experience.

Selcuk Kopru (08:00): Similarity can be also defined between different kinds of entities. For example, we can define between a query text and a title. In this case, if we can calculate the similarity between a query and the title, this will help us to retrieve more relevant search results. Calculating the similarity between two queries is also very useful, it can help us to rewrite the queries that do not return any results. Later in this presentation, I will show how we're using these ideas to enhance the experience for our buyers and sellers. Vector representation is not limited to the text modality only. For example, in this slide, we see that images can be also represented as vectors, and now this will enable us to calculate the similarity between the images, which have many other potential use cases. For example, search by image experience is a good one, or we can look into the similarity between an image and a title. Being able to calculate the similarity between a title and an image will open new doors for us.

Selcuk Kopru (09:14): This can improve the listing experience for sellers, where you can make quality titles suggestions based on the image they have uploaded. Let's take this concept one step further. We can even enhance the same idea and talk about the similarity of context entities. Here in this graph, each vector represents the entire item listing on eBay, which contains many texts and image elements. This capability can bring a whole new dimension to the recommendations, showing promoted items or similar items to the users can benefit a lot from this tool. Okay, we have defined the concept, let's have a deep dive into some of the details of the implementation.

Selcuk Kopru (10:04): Of course, there are many details in implementing a Similarity system and I will try to explain this as simple as possible. In its simplest form, Similarity can be implemented by learning the vector representation for the specific task, and then putting them into a K-N-N index for query. The first step in this flow is to learn the representations or the embeddings using probably some transformer models. Next, using the learn representations, we built a K nearest neighbor index. This is the offline index building stage, in the online retrieval stage, the query is run through the learn model, and the vector and [betting 00:10:50] is computered. This vector is the input to the K-N-N search index. Finally, we retrieve from the index, the K nearest neighbors. Meanwhile, real-time updates like adding new vectors to the index is also possible.

Selcuk Kopru (11:08): Let me now go into the details of model training and representation in learning. We start the representation learning with the transfer models, like widely known BERT, G-P-T two, G-P-T three, and many others. We use large amount of data. For the purpose of this presentation, I'll be using BERT and eBERT as the transformer models in focus. EBERT is our in-house built version and uses a lot of e-commerce data. In addition to the Google books, Corpus and the Wikipedia data. In training of the transfer models, we rely on G-P-Us. However, for inferencing, we take a hybrid approach and we utilize both C-P-Us and G-P-Us. For performance benchmarking purposes, you also keep an eye on G-P-Us. Pre-training the transformer models is an expensive task, one way of controlling the costs for us is to make sure that all pre-trained models are easily accessible by all applied researchers, or by all engineers across the company.

Selcuk Kopru (12:20): Therefore, we have built the ecosystem to share and access those large pre-trained M-L models in a very effective way. All the applied researchers, engineers at eBay can access and start fine tuning the models right away with a few lines of Python code. EBERT model, as many other big transformer models, is not the best model for online use cases, because of its size. Therefore, we use some techniques like model distillation and quantization to improve the throughput of the model. Teacher-student models allow us to build much smaller models with little accuracy trade-off. For example, we can claim three times faster models and three times more throughput can be achieved by giving up 3% of the accuracy.

Selcuk Kopru (13:16): This trade-off is a good one based on the benchmarks that we have measured in many of those use cases. Speaking of benchmarks, let me describe how we are measuring the performance of any model that we are building. Inspired by the original glue and superglue benchmarks, we have built eBay glue, which contains standardized tasks relevant to e-commerce. The tasks can be similarity tasks, classification, generation tagging, you name it. We have standardized solvers to use the pre-trained models. You can plug in the transformer model and automatically fine tune for the appropriate task. It is extensible in a sense that it is very easy to add new e-commerce tasks and the relevant solvers. It also enables us to use the off the shelf transformer models and compare against them with the help of Huggingface transformers library.

Selcuk Kopru (14:19): For similarity tasks, we use the pre-trained model and data pairs in a siamese setup to learn the representations. We trained to maximize cosine similarity between positive examples and to minimize negative examples. The negative examples are acquired from the other pairs in the batch. Therefore, for a good performance, it is crucial to have a large enough bed size and to sell it good negative examples. In the siamese setup, we usually use eBERT in the towers and we use contrastive loss productization. Let's look into the K-N-N serving part. H-N-S-W library is one of the many K-N-N backends that we have looked into. We also looked into the face backend and into the scan backends. However, in this presentation, I will only talk about the H-N-S-W approximate K-N-N backend. We collaborated with Intel to optimize the existing one H-N-S-W implementation for eight bit integer quantization, and we also utilize the D-L Boost library.

Selcuk Kopru (15:28): Compared to the vanilla implementation, we have observed up to 2.5 times speed up in terms of latency from 17 milliseconds to seven milliseconds, as shown in the left plot. The plot on the right side shows the gain in throughput. With 16 cores, the throughput is increased from 1000 to over 2000. The dotted line in the same plot tells us that the gain is dropping slightly as the number of treads are increasing. All these experiments are performed on an index with eight million vectors, each having 768 dimensions.

Selcuk Kopru (16:09): Finally, in this section, I want to share a few latency and throughput numbers in a billion-scale index. Billion-scale is important for us, because of the number of active listings at any time at eBay, we have 1.5 billion listings. The graph here shows some example results that are run on an index with over a billion vectors with 96 dimensions. [We collect 00:16:36] one goes over point 95 with 2.5 milliseconds, and throughput goes up to 7000 queries per second, with 32 treads. These latency and throughput numbers play an important role in scaling the similarity solution to the eBay level. In this part of the presentation, I will share some use cases where we apply similarity to real life problems. This table shows some of the examples. For example, if you solve the user to items similarity problem, you can show better promoted listings to the buyers.

Selcuk Kopru (17:15): A few other use cases are listed in the table, but let's see some of them with a little more detail. Here in this slide, I'm explaining the query to title similarity. We have a siamese setup where we use eBERT as the encoder for queries and titles. The model parameters are shared in each tower of the tilted siamese setup. One tower is for the queries and the other one is for the titles. We use behavioral data such as items clicked together, or items bought together as a result of a single search query. Because of the transformer models used in this setup, the keyword based search is augmented with deep semantic understanding, that's very important. Let's see this with an example query. Here in this slide, I have a query, "Citizen eco drive olive." Without the similarity model, the last token is almost not taken into account. You would need to use synonym dictionaries or heavy query rewrites to retrieve the best results. However, with the similarity tool, because the transformer model has already the knowledge of olive being a color in this context, it can easily retrieve results that are green.

Selcuk Kopru (18:43): Let's see another use case. We have also developed a query to query similarity model. Why would you need to create a query similarity model? This similarity model is especially useful in queries which do not return any results, or very low results. The amount of no result queries and low result queries constitute an important portion of all queries in an e-commerce marketplace. In this model, the towers of the siamese network are used to include queries. How we are sampling the query pairs for the training set is interesting. We have first trained an item to item similarity system, and then we have paired queries whose item results sets are very similar, according to a threshold. Let me show you how this query to query system performs in reality. Our search query, in this case, is, "Raise the red lantern blu-ray," which is a movie title and the format. The results without similarity return only one relevant item and the other items in the result set are not relevant, because they were included after term dropping.

Selcuk Kopru (19:58): However, all the items in the results set with similarity are relevant to the query as you can see in the graph. This is a great example showing the positive impact of the similarity approach. Let's see another example. This time the search query is, "Taotronics rechargeable table lamp." Again, in the result set where we don't use the similarity enhancement, many items pop up in the top rank although they are not very close to the query. On the other hand, in the results set where we use the similarity enhancement, we get a much better result set.

Selcuk Kopru (20:41): So far we have seen use cases where we only used text modalities. In this example, we attempted to use different modalities, such as the title text and the image from an item in an unsupervised setting using multicast learning. In the first part of this architecture, it's a dual encoder in not exactly a siamese setting, because the towers are different. One of them is to encode item titles and the other one is to encode images. The aim here is to map images and titles into a multimodal space with a contrastive loss, similar to the open-source paper and project. At the end of the first part, we are building an item embedding by concatenating title and image embeddings, and then do a later layer for [site 00:21:38] reduction. In the second part, we have the item aspects, which are named value pairs. At this point, we enriched the embeddings by training them for prediction of the randomly masked, named value pairs.

Selcuk Kopru (21:52): This entire architecture is trained together in one step. At the end of the training, we get the item embedding, which is at the center of the architecture. These item embeddings are proven to be quite useful in mapping item listings to the products in the catalog. This listing to product matching is one of the most challenging tasks that you can see in any e-commerce application. The architecture in this slide is a simplified version of the previous one. It is again, an unsupervised solution, but this time with different texts, modalities such as item title, category, item description, aspects, and queries. This time, we don't have an image here. Compared to the previous text similarity architectures, the advantage of this system is because it is unsupervised, there's no noise that's coming from user behavioral data, or bias from the search engine. What we do is we randomly split items to their components based on the modalities and feed them into the architecture. The contrastive loss is used to maximize positive examples and to minimize negative example. This approach also works very well in many eBay glue tasks.

Selcuk Kopru (23:18): The final use case that I want to talk about is a user to item similarity. We use this similarity architecture in retrieving the most relevant promoted listings for a user. We utilize the user events to include every buyer as a vector representation. For example, after a user submits a query, many items are shown in the search result page. The user is clicking an item and not clicking an item, or adding an item to the cart. All these different events are defining a user. This way, we represent the buyers and the items in the same vector space, and then we are using the similarity between those two vectors to decide what promoted listings to show or not to show. Okay. This was the last use case that I wanted to share in this presentation.

Selcuk Kopru (24:27): In conclusion of my talk, I want to repeat two important key takeaway messages from this presentation. My first message is about optimizations. In our journey from research to reality, we have to optimize for scale and performance. Fortunately, the tools and methods that we know of are very successful. Finally, my second message is about similarity. It is a powerful and flexible tool that helps in solving many A-I problems in a marketplace. We have seen many use cases in e-commerce and many others exists, I'm sure it also exists in the other domains. Similarity is also a flexible tool, because different modalities and entities can be combined and represented in the same vector space, which enabled many A-I applications. Thank you for watching and stay safe.

+ Read More

Watch More

Building a Data-Driven Marketplace for 600,000 Retailers at Faire
Posted Oct 20, 2022 | Views 3.4K
# TransformX 2022
# Fireside Chat
# AI in Retail & eCommerce