Product attribute extraction - Kiwi Data

Project Description

Bol.’s product catalog contains over more 37 million products in thousands of different categories. Part of the product are product attributes likes brand and colour but also very product specific attributes likes storage size of laptop. Enriching products with these attributes is time-consuming and impact searchability.

Our project aimed to automate attribute extraction from product titles and descriptions, improving overall content. The algorithm had to be transparent, scalable and adaptable to diverse categories and new content. To address these challenges, we developed a hybrid solution; combining traditional regex patterns by using Bol.’s rich product content databases with binary machine learning classifiers to predict probabilities of our matches.Despite the simplicity of this architecture, our clear and transparent model garnered support from business stakeholders, allowing us to swiftly move to production. Embracing a philosophy of starting simple and small, this approach served as a solid foundation for future innovations involving more intricate NLP models.A training pipeline ensured that attribute values and related statistics remained up-to-date, while our prediction pipeline regularly processed new product data to refine and enrich attribute information. Furthermore, our model was accessible via an API integrated into other internal portals.
In just six months, our compact team of two data scientists and two engineers successfully designed, developed, and implemented our attribute extraction system. The system automatically enriched millions of attribute values, leading to significant commercial improvements, enhanced product content quality, and a reduction in manual workload.

Project Description

Project Details

Skills Needed: