Skip to content

larics/geoSuctionBot

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Training-Free Suction Grasp Detection for Deformed Aseptic Cartons Using Vision–Language Models and Geometric Surface Scoring

Author: Marin Maletić
Date: May 2026
Last Updated: June 2026

This repository accompanies the ICCAS 2026 paper. It provides the complete ROS 2 implementation of a training-free pipeline that detects, segments, scores, and grasps deformed aseptic beverage cartons (Tetra Pak packaging) with a vacuum suction cup on a UR10e manipulator. The target object class is specified at run time through a natural-language prompt; no component is trained on the target domain.

Pipeline demonstration video
▶ Click to watch the full pipeline in action.


1. Overview

The system decouples target identification from grasp-point selection and operates in four stages:

Stage Description Node
Perception An open-vocabulary vision–language model (Gemini Robotics-ER) detects targets from a text prompt; SAM2 refines each detection into an instance mask. Masked depth is back-projected into per-object point clouds. segmentation_node
Surface analysis Each candidate suction point is scored as flatness × tilt-feasibility. Three interchangeable geometric back-ends are provided: KNN-PCA, Sobel cross-product, and RANSAC plane fitting. suction_node
Grasp selection Among points clearing a sealing-quality threshold, the one nearest the object centroid is selected to minimise lift torque. suction_node
Execution A UR10e action server approaches vertically, descends until contact (force-torque or position stop), forms a seal, lifts, and transports the object to a drop-off pose. arm_node

A camera bridge (camera_node) republishes and records RealSense streams, and an operator GUI (gui_node) drives the full pipeline interactively.

2. Repository structure

ur_suctionbot/
├── ur_suctionbot/
│   ├── camera_node.py          # RealSense bridge + frame-capture service
│   ├── segmentation_node.py    # Gemini detection + SAM2 segmentation service
│   ├── suction_node.py         # Geometric suction-scoring service
│   ├── arm_node.py             # UR10e grasp-execution action server
│   ├── gui_node.py             # Tkinter operator GUI
│   ├── knn.py                  
│   ├── sobel.py                
│   └── ransac.py               
├── msg/                        
├── srv/                        
├── action/                     
├── urdf/                       # suction TCP + camera mount
├── srdf/                       # collision overrides for the TCP
├── launch/                     # camera, suction, arm, and driver+MoveIt launch files
├── CMakeLists.txt
└── package.xml

3. System requirements

Software

  • Ubuntu 24.04 LTS
  • ROS 2 Jazzy
  • Python ≥ 3.10, CUDA-capable GPU recommended

Hardware (for physical experiments)

  • Universal Robots UR10e (6-DOF manipulator)
  • Intel RealSense D455 / D455f depth camera, wrist-mounted
  • Pneumatic vacuum generator (4.5 bar) terminating in a silicone suction cup

External services

  • A Google Gemini API key with access to gemini-robotics-er-1.6-preview (the detector is queried through the public API).

4. Installation

4.1 ROS 2 and robot dependencies

source /opt/ros/jazzy/setup.bash
sudo apt update && sudo apt install -y \
  ros-jazzy-ur ros-jazzy-ur-robot-driver ros-jazzy-ur-moveit-config \
  ros-jazzy-realsense2-camera ros-jazzy-realsense2-description \
  ros-jazzy-moveit ros-jazzy-moveit-servo ros-jazzy-pymoveit2 \
  ros-jazzy-cv-bridge ros-jazzy-tf-transformations

4.2 Python dependencies

pip install numpy opencv-python open3d pillow google-genai --break-system-packages

4.3 SAM2

Clone SAM2 outside the ROS workspace so that colcon does not attempt to build it, then download the tiny checkpoint expected by segmentation_node:

cd ~
git clone https://github.com/facebookresearch/sam2.git
cd sam2 && pip install -e . --break-system-packages
cd checkpoints
wget https://dl.fbaipublicfiles.com/segment_anything_2/092824/sam2.1_hiera_tiny.pt

The default checkpoint path is ~/sam2/checkpoints/sam2.1_hiera_tiny.pt; override it through the sam2_checkpoint parameter if installed elsewhere.

4.4 Gemini API key

echo 'export GEMINI_API_KEY="your-key-here"' >> ~/.bashrc
source ~/.bashrc

4.5 Build the workspace

mkdir -p ~/ros2_ws/src && cd ~/ros2_ws/src
git clone https://github.com/larics/geoSuctionBot.git ur_suctionbot

mkdir -p ur_suctionbot/config

cd ~/ros2_ws
colcon build --symlink-install --cmake-args -DCMAKE_BUILD_TYPE=Release
source install/setup.bash

5. Running the system

Open a separate terminal for each block and source the workspace (source ~/ros2_ws/install/setup.bash) in every one.

Terminal 1 — UR10e driver, MoveIt 2, and Servo

# Real robot
ros2 launch ur_suctionbot ur10e_driver_moveit.launch.py robot_ip:=192.168.0.155

# Bench testing without hardware
ros2 launch ur_suctionbot ur10e_driver_moveit.launch.py use_mock_hardware:=true

Terminal 2 — Camera, segmentation, and suction-scoring nodes (add gui:=true for the operator GUI, rviz:=true for visualisation)

ros2 launch ur_suctionbot suction.launch.py gui:=true

Terminal 3 — Arm grasp-execution server (use dry_run:=true to plan and log grasps without commanding motion)

ros2 launch ur_suctionbot arm.launch.py

5.1 Operating the pipeline

Via the GUI. The AUTO button executes the full cycle segment → compute → grasp. Individual buttons trigger each stage separately, select the scoring method (KNN / Sobel / RANSAC), and adjust parameters (sealing threshold, cup diameter, tilt tolerance). Node-status indicators confirm that all four nodes are alive.

Via the command line.

# 1. Detect and segment (empty prompt uses the default carton prompt)
ros2 service call /ur_suctionbot/segmentation/segment \
  ur_suctionbot/srv/Segment "{prompt: ''}"

# 2. Score suction points
ros2 service call /ur_suctionbot/suction/compute \
  ur_suctionbot/srv/ComputeSuction "{method: 'sobel', threshold: 0.5}"

# 3. Execute the best grasp per object
ros2 action send_goal /ur_suctionbot/arm/execute_grasp \
  ur_suctionbot/action/ExecuteGrasp "{dry_run: false}"

# Return the arm to its home configuration
ros2 service call /ur_suctionbot/arm/go_home std_srvs/srv/Trigger

Run-time retargeting. Because target selection is driven entirely by the prompt, the object set can be redefined without retraining — for example "{prompt: 'detect all plastic bottles'}" or "{prompt: 'heavily deformed cartons only'}".

5.2 Principal ROS 2 interfaces

Interface Type Description
/ur_suctionbot/segmentation/segment srv/Segment Run detection + segmentation for a prompt
/ur_suctionbot/suction/compute srv/ComputeSuction Score suction points, return best candidate
/ur_suctionbot/arm/execute_grasp action/ExecuteGrasp Execute grasp(s) with progress feedback
/ur_suctionbot/arm/go_home std_srvs/Trigger Move to the home configuration
/ur_suctionbot/segmentation/points PointCloud2 Masked, per-object point cloud
/ur_suctionbot/suction/candidates GraspCandidateArray One ranked grasp candidate per object
/ur_suctionbot/suction/visualization PointCloud2 Score heatmap for RViz

6. Citation

If you find the work here useful in you own research, please cite the paper:

TBD

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors