When you’re shopping online with Beni, we need to figure out what you’re shopping for in order to deliver you relevant secondhand versions of that. When you’re on a merchant page (think Nike, Patagonia, etc.) we collect some product-level information from the page and use that information to perform a search. We look at the product image, the product title, the price, the brand, etc. and all of that is factored into our search and ranking algorithm (learn more about our ranking algorithm in our previous post).
Here’s a great visual example of the kinds of information that are relevant to us:
Our product catalog is similarly structured to how you might organize your closet - we have all of our pants in one “drawer”, all of our shirts in another, etc. When you are shopping for pants, and therefore we are looking for matching pants (but secondhand and for half the price), we don’t want to be searching through our whole catalog of 200M+ listings to find you that exact item because it would take us way too long (and you probably won’t be waiting around). Instead, we want to go to our ‘pants drawer’ and search there specifically. Therefore, determining the category of the product that you are looking at is quite important for us - when you’re shopping for pants, we don’t want to be searching through where we store the shirts, and showing you shirts.
In most cases, we can figure out pretty easily that you’re looking at pants by performing a keyword search with our bank of keywords defining pants (pants, trousers, jeans, etc.) against the text descriptors that we can find on the page (breadcrumbs, product title, etc.). In other cases, the text descriptors that are provided are not actually super helpful in determining the product category.
Here’s an example. On QVC lives a product called the Bernardo Hooded Quilted Puffer Walker. The product page looks like this (also - checkout that 50% off steal!):
In this case, we use a generative AI model, namely GPT, to help us extract additional product descriptors from the information that we have available. What is GPT? You might have heard of it in the context of ChatGPT, but it’s essentially a Large Language Model (LLM) that can generate new, coherent, and contextually relevant text based on the inputs it has received. It’s pre-trained on a large corpus of text data, encompassing diverse topics and domains, which enables the model to learn general structure and nuance of human language.
Here’s how it works. We feed it the information we do have (you can see it’s not super useful):
- Product Title: Bernardo Hooded Quilted Puffer Walker
- Breadcrumbs: Product Detail | A566808
And we ask it to return some additional product tags, in which case, it says:
- Puffer Jackets
- Bernardo Hooded Quilted Puffer Walker
Aha! Now we have a lot more information about what the potential product category might be, and we can go searching through our catalog of jackets. And that’s how we use GPT to help us figure out what you’re shopping for with Beni!