Bandit Algorithms
1. Bandit Algorithms
Bandit algorithms (often called multi-armed bandits) are adaptive decision-making strategies used to balance exploration (trying out different options) and exploitation (choosing the best-known option).
- Example: An online marketplace deciding which product recommendation to show. The algorithm “tests” multiple products and gradually leans toward the one with the highest expected payoff.
- Core idea: Maximize cumulative reward over time while learning which choices perform best.
2. Business Constraints
In theory, a pure bandit algorithm optimizes only for statistical reward. But in the real world, decisions are subject to constraints such as:
- Processing Fees: Each transaction may incur a cost (e.g., credit card fees, API call costs). The algorithm can’t just maximize user conversions; it must factor in the net profit after fees.
- Contractual Obligations: Sometimes, contracts mandate a minimum number of impressions, clicks, or transactions for a given partner/vendor. Even if the algorithm finds that a certain option performs poorly, it still must allocate traffic to satisfy agreements.
- Fairness/Compliance: Regulations (e.g., anti-discrimination laws, licensing rules) may restrict how outcomes are distributed.
3. Interaction Between the Two
This is where it gets interesting:
- A bandit algorithm would normally keep optimizing toward the highest-performing arm.
- Business constraints act as hard or soft boundaries that modify the optimization space.
Example:
Suppose a payment app is routing transactions between two providers:
- Provider A: Lower fees, higher success rate (better from the bandit’s perspective).
- Provider B: Higher fees, lower success rate, but contractual obligation says at least 30% of transactions must go through B.
The algorithm has to adapt:
- It can’t just route everything to A.
- It must satisfy the 30% quota for B while still trying to maximize net reward.
This usually requires constrained bandit algorithms (a.k.a. bandits with knapsacks), where optimization accounts for both rewards and resource/constraint usage.
4. Why It Matters
- Business reality vs. pure optimization: Algorithms must reflect economic, legal, and contractual realities.
- Net profit focus: By incorporating fees and obligations, companies avoid situations where an algorithm appears optimal mathematically but is unsustainable financially or legally.
- Scalability: As operations grow, constraints multiply (e.g., multi-country compliance, partner obligations), so algorithms must evolve accordingly.
👉 In short: Bandit algorithms give you an efficient way to learn and optimize decisions dynamically, but business constraints like fees and contracts reshape the optimization problem into a constrained bandit setup—where the goal is not just to maximize reward, but to maximize feasible reward.