We spent two years optimizing vehicle routing for a huge line-haul delivery network. We found that standard OR solvers (such as Google OR-Tools) struggled with the dynamic nature of the requests, while pure Reinforcement Learning agents would not converge.
We ended up building a Hybrid Architecture that splits the logic:
1. MARL Agents act as "Fleet Managers" that handle high-level strategy (when to dispatch, which cluster to serve).
2. Linear Programming acts as a "Bin Packer" to enforce strict physical constraints on the final route.
The article details the architecture, the specific reward shaping we used to encourage LTL (Less-Than-Truckload) consolidation, and how we normalized the observation space to achieve zero-shot generalization across different warehouse sizes.
Happy to answer questions about the stack or the specific failure/success cases we ran into.A
We spent two years optimizing vehicle routing for a huge line-haul delivery network. We found that standard OR solvers (such as Google OR-Tools) struggled with the dynamic nature of the requests, while pure Reinforcement Learning agents would not converge.
We ended up building a Hybrid Architecture that splits the logic:
1. MARL Agents act as "Fleet Managers" that handle high-level strategy (when to dispatch, which cluster to serve). 2. Linear Programming acts as a "Bin Packer" to enforce strict physical constraints on the final route.
The article details the architecture, the specific reward shaping we used to encourage LTL (Less-Than-Truckload) consolidation, and how we normalized the observation space to achieve zero-shot generalization across different warehouse sizes.
Happy to answer questions about the stack or the specific failure/success cases we ran into.A