RL-based Palletizing with a real manipulator


1. Objective

After a year of a government-funded project, we were able to succesfully deploy a RL agent to a real manipulator. We present a solution to the Manufacturers Pallet Loading Problem (MPLP), which revolves around maximizing box placement within a fixed-size pallet. Traditional heuristic methods often fall short in delivering optimal results due to the problem's complexity and time-consuming nature. To address this challenge, we have developed an innovative pallet loading algorithm leveraging reinforcement learning techniques. Through extensive simulations, our approach has showcased superior efficiency in utilizing pallet space compared to conventional heuristic methods. Furthermore, we have implemented this algorithm in a real-life automatic pallet loading system, demonstrating its effectiveness in practical scenarios.

While working on this project, our team also worked on planning the product/service form and creating a business model, regarding robot and AI. We met up with multiple robotic companies to pitch our ideas and asked for collaboration to perform PoC.

The main skills I used in this project are : PyTorch, ROS, RL, PPO, open-cv, CNN, FastAPI.

2. RL Methodology

The main goal was to create a generalized RL model that covers any boxes that's needed in palletizing. The MDP used is shown below:

  • State: A 2D array of the pallet size
  • alt text

  • Action: The action space predicts the edge (4 cordners) of the pallet and the direction (0 or 90 degrees) of the box.
  • Reward: The utility ratio of the pallet, distance between the boxes, and the placement balance.
  • PPO algorithm was used to train our RL model.

    alt text

    The system framework is as follows: 1) The box is delievered to the robot via conveyor belt. 2) RGBD camera detects the center of the box and the box dimension. 2) The information is given to the RL agent via FastAPI to calculate the placement on the pallet. 4) The robot picks up the box and places on the pallet.

    One main challenge was the sim-to-real problem occurred during hardware interface. When deploying the RL model's result to a manipulator, issues arise such as minor errors when the robot grips the object and instability when releasing it. This results in the placement deviating from the location determined by the RL model, which causes a snowball effect. To address these challenges, a margin considering minor errors during box placement is applied. Moreover, the area around the box is designated as a blockage zone to ensure proper placement.

    3. Experiment Results

    alt text

    alt text

    From the result, our RL-based method showed equal or greater number of boxes and utility ratio calculated by the heuristic method. Moreover, as shown in the image above, our method considers the balance of the pallet. The skyline method tends to create a large empty space by stacking as tightly as possible, where as our method evenly distributes the boxes across the pallet.

    4. Ubiquitous Robots Conference 2023

    alt text

    Our paper was accepted by Ubiquitous Robots Conference 2023. I attended the conference on behalf of the company and Korea University to do a poster presentation. During the presentation, I interacted with other conference attendees to discuss our paper.

    5. Supplementary Materials

  • Reinforcement Learning Based Pallet Loading Algorithm and its Application to a Real Manipulator System