Several AI applications share traits, and eBay is solving search and query problems using a similarity approach, which uses the nearest neighbor approach to identify how two or more objects are similar to each other based on algorithmic distance functions. In a recent talk at AI and machine learning conference Scale TransformX. Selcuk Kopru, head of ML and NLP at eBay, described how its global marketplace uses transformer models to drive similarity capabilities and how you can apply powerful similarity searches to a wide range of use cases.
“To build a marketplace that scales and performs well, it is important to 'productionalize' a lot of AI capabilities, and similarity is one of the most practical tools in our journey from research to reality,” Kopru said.
The figure below shows the different components that are important in the AI-driven marketplace and how the majority of them are powered by AI.
Figure. Concentric circles depict the AI-driven marketplace, which gets better as each component gets stronger. Image credit: eBay
The shopping experience is connected to those AI capabilities, Kopru said, stressing the importance of scalability and performance.
eBay is using similarity to create more effective searches and display listings, and to match items to other products, he said.
In a real-world context, eBay can use similarity to match item titles from marketplace listings to products in its catalog. That way, duplicate items can be flagged. A clean catalog is important in any e-commerce experience, he said.
In a consumer context, similarity can be also defined between other entities, such as between a text query and a title. For a seller, it could be the similarity between an image and a title. Calculating the similarity between two queries is useful because queries that do not return any search results can be rewritten. They also can help retrieve more relevant search results. The approach can also let someone take ambiguous data types and boil them down to a common format, according to Anirudh Jain, machine learning engineer at Scale AI.
Being able to calculate the similarity between a title and an image “will open new doors for us,’’ Kopru said.
Scalability and performance are key challenges for a global marketplace like eBay, which has huge volumes of transactions and velocity, he said. That means latency and throughput numbers play a critical role for any AI model to go to production.
In online use cases, data scientists need to build efficient and lightweight models that are optimized for the right infrastructure. So before deploying any AI system into production, perform optimizations, he said.
eBay approaches similarity problems to scale, and optimizes as much as possible using its in-house built model eBERT, which is based on Bidirectional Encoder Representations from Transformer (BERT). BERT is a deep learning model that connects all output elements to all input elements, and dynamically calculates the weightings between them based on their connection. eBay trains its transfer models with GPUs and CPUs.
To keep costs down in pre-training the transfer models, eBay makes them accessible to all of its applied researchers and engineers so they can start fine-tuning the models right away with a few lines of Python code, Kopru said.
In terms of benchmarking, Kopru’s team was inspired by the original glue and superglue benchmarks and used them to build eBay glue, which he said contains standardized tasks relevant to ecommerce. For similarity tasks, Kopru’s team used the pre-trained model and data pairs in a Siamese setup to learn the representations. To achieve good performance, it is crucial to have a large enough sample size “and to sell it good negative examples,” he said.
Ultimately, eBay has observed up to 2.5 times speed up in terms of latency from 17 milliseconds to seven milliseconds. With 16 cores, the throughput is increased from 1,000 to over 2,000, he said. All of their experiments are performed on an index with eight million vectors, each having 768 dimensions.
Augmenting keyword search with deep semantic understanding is key to solving similarity problems so eBay can show buyers better promoted listings. eBay also developed a query-to-query similarity model that is especially useful in queries that do not return any results, or very low results, Kopru said. A search query done without similarity may return only one relevant item and other items that are not relevant, he said.
Similarity is ideal for solving AI problems in a marketplace. In a large company like eBay, similarity is a powerful and flexible tool because different modalities and entities can be combined and represented in the same vector space. You can use similarity to solve many AI problems in a marketplace, such as retrieving the best candidates for
assessing and ranking searches and identifying duplicate products. Kopru said it has enabled many AI applications. These tools and techniques also help optimize for scale and performance, which are other critical elements for running a global marketplace.
For more details on how eBay is using similarity techniques, watch Kopru’s talk, ”Building Similarity into an AI-Driven Marketplace," and read the full transcript here.