Bachelorarbeit, 2023
46 Seiten, Note: 1.0
0. Introduction
0.1 Research Contributions
0.2 Structure of this Paper
1. Preliminaries
1.1 Uplift Modeling in Machine Learning
1.2 Profit Maximization in Uplift Modeling
1.3 Policy Learning and multi-armed Bandit Models
2. Reinforcement Learning for Uplift Modeling
2.1 Policy Learning Approaches to Uplift Modeling
2.2 Multi-Armed Bandit Models for Uplift Modeling
3. Related Literature
3.1 Profit Maximization through Uplift Modeling
3.2 Reinforcement Learning for Uplift Modeling
4. Experiment
4.1 Methodology
4.2 Data Sets
5. Empirical Results
5.1 Regret-Optimality as the reward metric for PL
6. Conclusion
7. Limitations and further Research
References
The primary aim of this study is to formalize a novel approach to the profit maximization objective in uplift modeling by integrating policy learning and reinforcement learning (RL) techniques, specifically evaluating the efficacy of regret-optimal policy learning strategies in real-world business scenarios.
0.1 Research Contributions
This work aims to help bridge the gap in research on the connection of uplift modeling and policy learning, with a focus on a business context. Here, this study will formalize a novel approach to the profit maximization objective in uplift modeling in connection with policy learning and MABs. It is noteworthy that existing attempts at tackling this objective in uplift modeling as a supervised learning technique alone have overall been fragmentary, focusing on individual aspects such as cost optimization (Zhao & Harinen, 2019), optimizing for expected revenue (Gubela et al., 2017) and allowing for multiple treatments (e.g., Olaya, Coussement & Verbeke, 2020; Zhao, Fang & Simchi-Levi, 2017), with few taking a holistic perspective (e.g., Baier & Stöcker, 2022).
Moreover, this study aims to evaluate the comparative efficacy of policy learning and simplified MAB models, with a particular emphasis on causal and contextual bandit models. The research seeks to provide a benchmark to determine the potential advantages or limitations of employing policy learning and MAB models for UM in real-world scenarios.
The scientific contribution consists of three main elements:
0. Introduction: Outlines the motivation for connecting uplift modeling with reinforcement learning and defines the research scope and contribution.
1. Preliminaries: Provides the foundational theory for uplift modeling, profit maximization, and policy learning mechanisms within machine learning.
2. Reinforcement Learning for Uplift Modeling: Formulates the uplift modeling problem using the Markov Decision Process (MDP) framework to enable regret-based optimization.
3. Related Literature: Reviews existing methodologies for profit maximization in uplift modeling and current adoptions of reinforcement learning in this field.
4. Experiment: Describes the methodology, including the usage of separate models, X-Learner, and bandit-based approaches, and introduces the two datasets used.
5. Empirical Results: Analyzes the quantitative performance of RL-based uplift models compared to traditional supervised learning techniques.
6. Conclusion: Summarizes the key findings and the theoretical advancements made by the proposed framework.
7. Limitations and further Research: Identifies constraints of the current study, such as dataset size and attribute availability, and suggests future research directions.
Uplift Modeling, Reinforcement Learning, Causal Learning, Multi-armed Bandit Models, Regret, Profit Maximization, Policy Learning, Machine Learning, Customer Lifetime Value, Business Performance, Supervised Learning, Markov Decision Process.
The thesis focuses on maximizing marketing profits by integrating uplift modeling with reinforcement learning and policy learning strategies, moving beyond traditional supervised learning approaches.
It bridges the gap between uplift modeling (UM) and reinforcement learning (RL), specifically utilizing policy learning and multi-armed bandit (MAB) frameworks.
The primary goal is to formalize a framework for profit maximization in uplift modeling that incorporates expected revenues and costs through regret-optimal policy learning.
The work employs a Markov Decision Process (MDP) framework and compares supervised learning (SL) techniques (like X-Learner) against reinforcement learning approaches (contextual multi-armed bandits, Q-learning).
The experiment benchmarks various uplift modeling techniques—including RF-based learners, X-Learners, and contextual bandits—against a new proprietary e-commerce dataset and the public Hillstrom dataset.
Key terms include Uplift Modeling, Reinforcement Learning, Regret, Profit Maximization, Policy Learning, and Causal Learning.
Regret-optimality acts as a performance measure that guides the model to learn a policy performing as close as possible to the optimal, reducing the business cost of sub-optimal decision-making.
Incorporating costs allows the model to differentiate between treatments not just by their conversion probability, but by the net business value, preventing the targeting of individuals where the cost of the action outweighs the potential benefit.
Standard metrics like AUUC or Qini-coefficients fail to incorporate business-specific covariates like expected revenue and operational costs, which this study addresses via FPM (financial performance metrics) functions.
Der GRIN Verlag hat sich seit 1998 auf die Veröffentlichung akademischer eBooks und Bücher spezialisiert. Der GRIN Verlag steht damit als erstes Unternehmen für User Generated Quality Content. Die Verlagsseiten GRIN.com, Hausarbeiten.de und Diplomarbeiten24 bieten für Hochschullehrer, Absolventen und Studenten die ideale Plattform, wissenschaftliche Texte wie Hausarbeiten, Referate, Bachelorarbeiten, Masterarbeiten, Diplomarbeiten, Dissertationen und wissenschaftliche Aufsätze einem breiten Publikum zu präsentieren.
Kostenfreie Veröffentlichung: Hausarbeit, Bachelorarbeit, Diplomarbeit, Dissertation, Masterarbeit, Interpretation oder Referat jetzt veröffentlichen!

