Training-Free Suction Grasp Detection for Deformed Aseptic Cartons Using Vision–Language Models and Geometric Surface Scoring
Author: Marin Maletić
Date: May 2026
Last Updated: June 2026
This repository accompanies the ICCAS 2026 paper. It provides the complete ROS 2 implementation of a training-free pipeline that detects, segments, scores, and grasps deformed aseptic beverage cartons (Tetra Pak packaging) with a vacuum suction cup on a UR10e manipulator. The target object class is specified at run time through a natural-language prompt; no component is trained on the target domain.

▶ Click to watch the full pipeline in action.
The system decouples target identification from grasp-point selection and operates in four stages:
| Stage | Description | Node |
|---|---|---|
| Perception | An open-vocabulary vision–language model (Gemini Robotics-ER) detects targets from a text prompt; SAM2 refines each detection into an instance mask. Masked depth is back-projected into per-object point clouds. | segmentation_node |
| Surface analysis | Each candidate suction point is scored as flatness × tilt-feasibility. Three interchangeable geometric back-ends are provided: KNN-PCA, Sobel cross-product, and RANSAC plane fitting. | suction_node |
| Grasp selection | Among points clearing a sealing-quality threshold, the one nearest the object centroid is selected to minimise lift torque. | suction_node |
| Execution | A UR10e action server approaches vertically, descends until contact (force-torque or position stop), forms a seal, lifts, and transports the object to a drop-off pose. | arm_node |
A camera bridge (camera_node) republishes and records RealSense streams, and an operator GUI (gui_node) drives the full pipeline interactively.
ur_suctionbot/
├── ur_suctionbot/
│ ├── camera_node.py # RealSense bridge + frame-capture service
│ ├── segmentation_node.py # Gemini detection + SAM2 segmentation service
│ ├── suction_node.py # Geometric suction-scoring service
│ ├── arm_node.py # UR10e grasp-execution action server
│ ├── gui_node.py # Tkinter operator GUI
│ ├── knn.py
│ ├── sobel.py
│ └── ransac.py
├── msg/
├── srv/
├── action/
├── urdf/ # suction TCP + camera mount
├── srdf/ # collision overrides for the TCP
├── launch/ # camera, suction, arm, and driver+MoveIt launch files
├── CMakeLists.txt
└── package.xml
Software
- Ubuntu 24.04 LTS
- ROS 2 Jazzy
- Python ≥ 3.10, CUDA-capable GPU recommended
Hardware (for physical experiments)
- Universal Robots UR10e (6-DOF manipulator)
- Intel RealSense D455 / D455f depth camera, wrist-mounted
- Pneumatic vacuum generator (4.5 bar) terminating in a silicone suction cup
External services
- A Google Gemini API key with access to
gemini-robotics-er-1.6-preview(the detector is queried through the public API).
source /opt/ros/jazzy/setup.bash
sudo apt update && sudo apt install -y \
ros-jazzy-ur ros-jazzy-ur-robot-driver ros-jazzy-ur-moveit-config \
ros-jazzy-realsense2-camera ros-jazzy-realsense2-description \
ros-jazzy-moveit ros-jazzy-moveit-servo ros-jazzy-pymoveit2 \
ros-jazzy-cv-bridge ros-jazzy-tf-transformationspip install numpy opencv-python open3d pillow google-genai --break-system-packagesClone SAM2 outside the ROS workspace so that colcon does not attempt to build it, then download the tiny checkpoint expected by segmentation_node:
cd ~
git clone https://github.com/facebookresearch/sam2.git
cd sam2 && pip install -e . --break-system-packages
cd checkpoints
wget https://dl.fbaipublicfiles.com/segment_anything_2/092824/sam2.1_hiera_tiny.ptThe default checkpoint path is ~/sam2/checkpoints/sam2.1_hiera_tiny.pt; override it through the sam2_checkpoint parameter if installed elsewhere.
echo 'export GEMINI_API_KEY="your-key-here"' >> ~/.bashrc
source ~/.bashrcmkdir -p ~/ros2_ws/src && cd ~/ros2_ws/src
git clone https://github.com/larics/geoSuctionBot.git ur_suctionbot
mkdir -p ur_suctionbot/config
cd ~/ros2_ws
colcon build --symlink-install --cmake-args -DCMAKE_BUILD_TYPE=Release
source install/setup.bashOpen a separate terminal for each block and source the workspace
(source ~/ros2_ws/install/setup.bash) in every one.
Terminal 1 — UR10e driver, MoveIt 2, and Servo
# Real robot
ros2 launch ur_suctionbot ur10e_driver_moveit.launch.py robot_ip:=192.168.0.155
# Bench testing without hardware
ros2 launch ur_suctionbot ur10e_driver_moveit.launch.py use_mock_hardware:=trueTerminal 2 — Camera, segmentation, and suction-scoring nodes (add gui:=true for the operator GUI, rviz:=true for visualisation)
ros2 launch ur_suctionbot suction.launch.py gui:=trueTerminal 3 — Arm grasp-execution server (use dry_run:=true to plan and log grasps without commanding motion)
ros2 launch ur_suctionbot arm.launch.pyVia the GUI. The AUTO button executes the full cycle segment → compute → grasp. Individual buttons trigger each stage separately, select the scoring method (KNN / Sobel / RANSAC), and adjust parameters (sealing threshold, cup diameter, tilt tolerance). Node-status indicators confirm that all four nodes are alive.
Via the command line.
# 1. Detect and segment (empty prompt uses the default carton prompt)
ros2 service call /ur_suctionbot/segmentation/segment \
ur_suctionbot/srv/Segment "{prompt: ''}"
# 2. Score suction points
ros2 service call /ur_suctionbot/suction/compute \
ur_suctionbot/srv/ComputeSuction "{method: 'sobel', threshold: 0.5}"
# 3. Execute the best grasp per object
ros2 action send_goal /ur_suctionbot/arm/execute_grasp \
ur_suctionbot/action/ExecuteGrasp "{dry_run: false}"
# Return the arm to its home configuration
ros2 service call /ur_suctionbot/arm/go_home std_srvs/srv/TriggerRun-time retargeting. Because target selection is driven entirely by the prompt, the object set can be redefined without retraining — for example
"{prompt: 'detect all plastic bottles'}" or "{prompt: 'heavily deformed cartons only'}".
| Interface | Type | Description |
|---|---|---|
/ur_suctionbot/segmentation/segment |
srv/Segment |
Run detection + segmentation for a prompt |
/ur_suctionbot/suction/compute |
srv/ComputeSuction |
Score suction points, return best candidate |
/ur_suctionbot/arm/execute_grasp |
action/ExecuteGrasp |
Execute grasp(s) with progress feedback |
/ur_suctionbot/arm/go_home |
std_srvs/Trigger |
Move to the home configuration |
/ur_suctionbot/segmentation/points |
PointCloud2 |
Masked, per-object point cloud |
/ur_suctionbot/suction/candidates |
GraspCandidateArray |
One ranked grasp candidate per object |
/ur_suctionbot/suction/visualization |
PointCloud2 |
Score heatmap for RViz |
If you find the work here useful in you own research, please cite the paper:
TBD