Data-Driven Approach to Managing Best-Selling Beauty Categories: Price, Rating, Review, and Stock
Abstract
The beauty industry in Indonesia is experiencing rapid growth, particularly through e-commerce platforms like Tokopedia. Many businesses still rely on intuition for product management, including decisions related to stock and pricing. This study develops a machine learning-based classification model to identify beauty products with high sales potential on Tokopedia, considering factors such as price, rating, review count, and stock availability. Ten classification algorithms are applied, including Naive Bayes, SVM, K-Nearest Neighbors, Decision Tree, Random Forest, XGBoost, LightGBM, CatBoost, Extra Trees, and Multi-Layer Perceptron (MLP). The data is processed using Python on Google Colab. The results show that ensemble algorithms, particularly Random Forest, LightGBM, and Extra Trees, provide prediction accuracy above 91% and are highly effective in predicting best-selling products. Based on this model, businesses can optimize stock and pricing management to ensure that best-selling products are always available, thereby improving operational efficiency in a highly competitive market. This research offers a data-driven solution for more strategic and evidence-based product management on e-commerce platforms.
References
Chaube, S., Kar, R., Gupta, S., & Kant, M. (2025). Multimodal AI framework for the prediction of high-potential product listings in e-commerce: Navigating the cold-start challenge. Expert Systems with Applications, 282(March), 127524. https://doi.org/10.1016/j.eswa.2025.127524
Dabestani, R., Solaimani, S., Ajroemjan, G., & Koelemeijer, K. (2025). Exploring the enablers of data-driven business models: A mixed-methods approach. Technological Forecasting and Social Change, 213(February), 124036. https://doi.org/10.1016/j.techfore.2025.124036
Guo, Q., & Mai, Z. (2024). How do seasonal, significant events, and policies affect China’s REE export prices? Based on a deep learning perspective. Resources Policy, 96(July), 105205. https://doi.org/10.1016/j.resourpol.2024.105205
Hemmati, M., Fatemi Ghomi, S. M. T., & Sajadieh, M. S. (2023). Separate and bundling selling strategies for complementary products in a participative pricing mechanism. Computers and Industrial Engineering, 177(November 2022), 109018. https://doi.org/10.1016/j.cie.2023.109018
Henderi, H., Irawatia, R., Indra, I., Dewi, D. A., & Kurniawan, T. B. (2023). Big Data Analysis using Elasticsearch and Kibana: A Rating Correlation to Sustainable Sales of Electronic Goods. HighTech and Innovation Journal, 4(3), 583–591. https://doi.org/10.28991/HIJ-2023-04-03-09
Hincapié-López, M., Vrebosch, J., Garcia-Zapirain, B., Pinzón-Reyes, E., & Pabón-Martínez, Y. V. (2024). Comparison of classical Machine Learning-based algorithms to predict Triplex Forming Oligonucleotides. Computational and Structural Biotechnology Reports, 1, 100013. https://doi.org/10.1016/j.csbr.2024.100013
Hirsch, S., Guy, I., Novgorodov, S., Lavee, G., & Shapira, B. (2025). What is in a title? Characterizing product titles in e-commerce. Expert Systems with Applications, 287(February), 127702. https://doi.org/10.1016/j.eswa.2025.127702
Hou, L., Nie, T., & Zhang, J. (2024). Pricing and inventory strategies for perishable products in a competitive market considering strategic consumers. Transportation Research Part E: Logistics and Transportation Review, 184(February), 103478. https://doi.org/10.1016/j.tre.2024.103478
Kotler, P., & Keller, K. L. (2013). Marketing Management. In Marketing Management: A Cultural Perspective (14th ed.). Pearson. https://doi.org/10.4324/9780203357262
Kotu, V., & Deshpande, B. (2019). Data Science: Concepts and Practice. In Data Science (2nd ed.). https://doi.org/10.1016/c2017-0-02113-4
Le, H. S., Do, T. V. H., Nguyen, M. H., Tran, H. A., Pham, T. T. T., Nguyen, N. T., & Nguyen, V. H. (2024). Predictive model for customer satisfaction analytics in E-commerce sector using machine learning and deep learning. International Journal of Information Management Data Insights, 4(2), 100295. https://doi.org/10.1016/j.jjimei.2024.100295
Li, D., Gu, F., Li, X., Du, R., Chen, D., & Madden, A. (2023). Dynamic sales prediction with auto-learning and elastic-adjustment mechanism for inventory optimization. Information Systems, 119, 102259. https://doi.org/10.1016/j.is.2023.102259
Li, J., Fan, L., Wang, X., Sun, T., & Zhou, M. (2024). Product Demand Prediction with Spatial Graph Neural Networks. Applied Sciences (Switzerland), 14(16). https://doi.org/10.3390/app14166989
Mahin, R. M. P., Shahriar, M., Das, R. R., Roy, A., & Reza, A. W. (2025). Enhancing Sustainable Supply Chain Forecasting Using Machine Learning for Sales Prediction. Procedia Computer Science, 252, 470–479. https://doi.org/10.1016/j.procs.2025.01.006
Nguyen, D. N., Nguyen, V. H., Trinh, T., Ho, T., & Le, H. S. (2024). A personalized product recommendation model in e-commerce based on retrieval strategy. Journal of Open Innovation: Technology, Market, and Complexity, 10(2), 100303. https://doi.org/10.1016/j.joitmc.2024.100303
Ni, G. (2023). Optimal decisions on price and inventory for a newsboy-type retailer with identifiable information and discount promotion. PLoS ONE, 18(7 July), 1–20. https://doi.org/10.1371/journal.pone.0288874
Nie, T., Song, B., & Zhang, J. (2024). Sales pricing models based on returns: Bundling vs. add-on. Omega (United Kingdom), 125(January), 103038. https://doi.org/10.1016/j.omega.2024.103038
Nikose, A., Mungale, T., Shelke, M., Shelote, R., & Solanke, P. (2022). Best Selling Product and Category Prediction Using Sales Analysis. International Journal of Advanced Research in Science, Communication and Technology, 2(2), 805–811. https://doi.org/10.48175/ijarsct-2970
Oancea, B. (2023). Automatic Product Classification Using Supervised Machine Learning Algorithms in Price Statistics. Mathematics, 11(7). https://doi.org/10.3390/math11071588
Pocchiari, M., Proserpio, D., & Dover, Y. (2024). Online reviews: A literature review and roadmap for future research. International Journal of Research in Marketing, 42(2), 275–297. https://doi.org/10.1016/j.ijresmar.2024.08.009
Poláček, L., Ulman, M., Cihelka, P., & Šilerová, E. (2024). Dynamic Pricing in E-commerce: Bibliometric Analysis. Acta Informatica Pragensia, 13(1), 114–133. https://doi.org/10.18267/j.aip.227
Rios, J. H., & Vera, J. R. (2023). Dynamic pricing and inventory control for multiple products in a retail chain. Computers and Industrial Engineering, 177(December 2022), 109065. https://doi.org/10.1016/j.cie.2023.109065
Schmitt, M. (2023). Automated machine learning: AI-driven decision making in business analytics. Intelligent Systems with Applications, 18(June 2022), 200188. https://doi.org/10.1016/j.iswa.2023.200188
Sharma, D., Kumar, R., & Jain, A. (2022). Breast cancer prediction based on neural networks and extra tree classifier using feature ensemble learning. Measurement: Sensors, 24(November), 100560. https://doi.org/10.1016/j.measen.2022.100560
Statista. (2025a). Beauty & Personal Care. Https://Www-Statista-Com.Unpad.Idm.Oclc.Org/Outlook/Cmo/Beauty-Personal-Care/Worldwide.
Statista. (2025b). eCommerce. Https://Www-Statista-Com.Unpad.Idm.Oclc.Org/Outlook/Emo/Ecommerce/Indonesia.
Tang, Y. M., Chau, K. Y., Lau, Y. Y., & Zheng, Z. (2023). Data-Intensive Inventory Forecasting with Artificial Intelligence Models for Cross-Border E-Commerce Service Automation. Applied Sciences (Switzerland), 13(5). https://doi.org/10.3390/app13053051
Trapero, J. R., de Frutos, E. H., & Pedregal, D. J. (2024). Demand forecasting under lost sales stock policies. International Journal of Forecasting, 40(3), 1055–1068. https://doi.org/10.1016/j.ijforecast.2023.09.004
Valencia-Arias, A., Uribe-Bedoya, H., González-Ruiz, J. D., Santos, G. S., & Ramírez, E. C. (2024). Artificial intelligence and recommender systems in e-commerce. Trends and research agenda. Intelligent Systems with Applications, 24(July). https://doi.org/10.1016/j.iswa.2024.200435
Wang, T. C., Guo, R. S., & Chen, C. (2023). An Integrated Data-Driven Procedure for Product Specification Recommendation Optimization with LDA-LightGBM and QFD. Sustainability (Switzerland), 15(18). https://doi.org/10.3390/su151813642
Wang, X., Zhang, C., & Xu, Z. (2024). A product recommendation model based on online reviews: Improving PageRank algorithm considering attribute weights. Journal of Retailing and Consumer Services, 81(June), 104052. https://doi.org/10.1016/j.jretconser.2024.104052
Wang, Y., & Zhang, Y. (2023). Multivariate SVR Demand Forecasting for Beauty Products Based on Online Reviews. Mathematics, 11(21). https://doi.org/10.3390/math11214420
Wheelen, T. L., & Hunger, J. D. (2018). Strategic management and business policy: Globalization, Innovation, and Sustainability. In Reading, Ma.
Wu, J., Liu, H., Yao, X., & Zhang, L. (2024). Unveiling consumer preferences: A two-stage deep learning approach to enhance accuracy in multi-channel retail sales forecasting. Expert Systems with Applications, 257(July), 125066. https://doi.org/10.1016/j.eswa.2024.125066
Wu, X., Liao, H., Lev, B., & Ding, W. (2024). A Classification-Based Product Selection Method Based on Online Reviews on Multifaceted Attributes. IEEE Transactions on Computational Social Systems, 12(1), 11–24. https://doi.org/10.1109/TCSS.2024.3485009
Wu, Y., Chen, L., Ngai, E. W. T., & Wu, P. (2024). Stimulating positive reviews by combining financial and compassionate incentives. Internet Research, 62001314. https://doi.org/10.1108/INTR-01-2023-0062
Xu, S., Tang, H., & Huang, Y. (2023). Inventory competition and quality improvement decisions in dual-channel supply chains with data-driven marketing. Computers and Industrial Engineering, 183(July), 109452. https://doi.org/10.1016/j.cie.2023.109452
Yang, B., Xu, X., Gong, Y., & Rekik, Y. (2024). Data-driven optimization models for inventory and financing decisions in online retailing platforms. Annals of Operations Research, 339(1–2), 741–764. https://doi.org/10.1007/s10479-023-05234-4
Yuhsiang, L., & Lichung, J. (2024). The impact of consumer heterogeneity in the product life cycle on the diffusion patterns of user reviews and sales. Journal of Retailing and Consumer Services, 76(September 2023), 103558. https://doi.org/10.1016/j.jretconser.2023.103558
Zaghloul, M., Barakat, S., & Rezk, A. (2024). Predicting E-commerce customer satisfaction: Traditional machine learning vs. deep learning approaches. Journal of Retailing and Consumer Services, 79(March), 103865. https://doi.org/10.1016/j.jretconser.2024.103865
Zhang, B., Zhang, Z., Lai, K. H., & Zhang, Z. (2024). Incentive hierarchies intensify competition for attention: A study of online reviews. Decision Support Systems, 185(October 2023). https://doi.org/10.1016/j.dss.2024.114293
Zhang, C., Yang, F., & Zhang, X. (2023). A new inclusion measure-based clustering method and its application to product classification. Information Sciences, 626, 474–493. https://doi.org/10.1016/j.ins.2023.01.061
Zhang, Q., & Xiao, T. (2024). Incentive strategies of an e-tailer considering online reviews: Rebates or services. Electronic Commerce Research and Applications, 68(September), 101453. https://doi.org/10.1016/j.elerap.2024.101453
Zhuang, Y., & Xu, X. (2025). Flash sale or continuing Sale? Examining the timing flow of E-tailers’ promotion effects. Journal of Retailing and Consumer Services, 86(May), 104322. https://doi.org/10.1016/j.jretconser.2025.104322

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

