Project Description

Bol, operating with an extensive product portfolio surpassing 37 million items, confronted the challenge of categorizing products through a laborious manual process. In response, the objective of this project was to establish a product category classification model and provide seamless access through a REST API.

Our approach involved crafting an ensemble model that drew from three distinct machine learning models. Two NLP models were trained on product titles and descriptions, whereas the third model was trained on product images by making use of transfer learning. We have constructed a robust training pipeline in Dataflow on GCP to retrain and deploy our models.

For serving the model, we employed a Flask API, incorporating support for batch predictions. The API, integrated by other teams, found application in both internal portals and external APIs for partners. The result: significantly reduced manual work for product uploads and improved product content accuracy through the correction of misclassifications, optimizing processes and enhancing overall product quality.