HAAM-RL with Ensemble Inference Method


1. Objective

This paper introduces a novel reinforcement learning approach, HAAM-RL, designed to optimize the color batching re-sequencing problem in automobile painting processes. Traditional heuristic algorithms face limitations in accurately reflecting real-world constraints and predicting logistics performance. The proposed methodology integrates tailored Markov Decision Process formulation, Potential-Based Reward Shaping, heuristic algorithm-based action masking (HAAM-RL), and ensemble inference methods to enhance performance. Through experimentation across 30 scenarios, HAAM-RL with ensemble inference achieves a significant 16.25% improvement over conventional heuristic algorithms, showcasing stable and consistent results. The study highlights the superior performance and generalization capabilities of the proposed approach in optimizing complex manufacturing processes and suggests future research directions such as alternative state representations and integrating model-based RL methods. It is also the first case where an external simulator is connected with our RL MLOps platform.

2. Methodology

The main skills I used in this project are : PyTorch, FlexSim, SAC, PPO, MLOps, FastAPI.

Three main contributions are presented in this paper.

  1. RL MDP
  2. Ensemble Method
  3. RL MLOps w. Flexsim

3. Contributions

alt text

  • HAAM-RL: When defining RL MDP, we have adopted heuristic algorithms to reduce the action space through action masking. This resulted in the stability and faster convergence of the RL agent.
  • Ensemble Inference Method: During training with variations of the neural network hyperparameters and reward tuning, we were able to collect numerous RL models. we used two different methods to utilize these RL models when evaluating using FlexSim: 1) hard voting and 2) soft voting. Hard voting is defined as the selection of most frequently chosen action across all models within a given state. The soft voting is normalizing each logits value from the distribution from a model and summing for all the index within the action space throughout all the models
  • alt text

  • RL MLOps w. Flexsim: Integration between a commercial simulator and BakingSoDA is possible through data exchange with BakingSoDA via HTTP communication. Furthermore, the Connector REST API, defined according to specifications, should be utilized for BakingSoDA to make calls to the simulator.
  • 4. Experiment Results

    Experiments show that our final algorithm, HAAM-RL showed 16.25% improvement compared to the heuristic algorithm. The heuristic algorithm required 34 color changes for 100 vehicles, whereas HAAM-RL exhibited a lower number of color changes at 29. As the number of vehicles and the complexity of the environment increases, we expect HAAMRL to decrease the overall cost while increasing efficiency in productions.

    Also, to verify the result’s stability and generalization potential, we also conducted experiments on 30 scenarios and analyzed their variance and standard deviation. The results showed a mean of 29.57, a variance of 6.530, and a standard deviation of 2.555. Furthermore, a 1 sample t-test was 0.05, which shows the minimal fluctuation in data.

    5. Supplementary Materials

  • Arxiv Paper