GitHub - larics/geoSuctionBot

Training-Free Suction Grasp Detection for Deformed Aseptic Cartons Using Vision–Language Models and Geometric Surface Scoring

Author: Marin Maletić
Date: May 2026
Last Updated: June 2026

This repository accompanies the ICCAS 2026 paper. It provides the complete ROS 2 implementation of a training-free pipeline that detects, segments, scores, and grasps deformed aseptic beverage cartons (Tetra Pak packaging) with a vacuum suction cup on a UR10e manipulator. The target object class is specified at run time through a natural-language prompt; no component is trained on the target domain.

▶ Click to watch the full pipeline in action.

1. Overview

The system decouples target identification from grasp-point selection and operates in four stages:

Stage	Description	Node
Perception	An open-vocabulary vision–language model (Gemini Robotics-ER) detects targets from a text prompt; SAM2 refines each detection into an instance mask. Masked depth is back-projected into per-object point clouds.	`segmentation_node`
Surface analysis	Each candidate suction point is scored as flatness × tilt-feasibility. Three interchangeable geometric back-ends are provided: KNN-PCA, Sobel cross-product, and RANSAC plane fitting.	`suction_node`
Grasp selection	Among points clearing a sealing-quality threshold, the one nearest the object centroid is selected to minimise lift torque.	`suction_node`
Execution	A UR10e action server approaches vertically, descends until contact (force-torque or position stop), forms a seal, lifts, and transports the object to a drop-off pose.	`arm_node`

A camera bridge (camera_node) republishes and records RealSense streams, and an operator GUI (gui_node) drives the full pipeline interactively.

2. Repository structure

ur_suctionbot/
├── ur_suctionbot/
│   ├── camera_node.py          # RealSense bridge + frame-capture service
│   ├── segmentation_node.py    # Gemini detection + SAM2 segmentation service
│   ├── suction_node.py         # Geometric suction-scoring service
│   ├── arm_node.py             # UR10e grasp-execution action server
│   ├── gui_node.py             # Tkinter operator GUI
│   ├── knn.py                  
│   ├── sobel.py                
│   └── ransac.py               
├── msg/                        
├── srv/                        
├── action/                     
├── urdf/                       # suction TCP + camera mount
├── srdf/                       # collision overrides for the TCP
├── launch/                     # camera, suction, arm, and driver+MoveIt launch files
├── CMakeLists.txt
└── package.xml

3. System requirements

Software

Ubuntu 24.04 LTS
ROS 2 Jazzy
Python ≥ 3.10, CUDA-capable GPU recommended

Hardware (for physical experiments)

Universal Robots UR10e (6-DOF manipulator)
Intel RealSense D455 / D455f depth camera, wrist-mounted
Pneumatic vacuum generator (4.5 bar) terminating in a silicone suction cup

External services

A Google Gemini API key with access to gemini-robotics-er-1.6-preview (the detector is queried through the public API).

4. Installation

4.1 ROS 2 and robot dependencies

source /opt/ros/jazzy/setup.bash
sudo apt update && sudo apt install -y \
  ros-jazzy-ur ros-jazzy-ur-robot-driver ros-jazzy-ur-moveit-config \
  ros-jazzy-realsense2-camera ros-jazzy-realsense2-description \
  ros-jazzy-moveit ros-jazzy-moveit-servo ros-jazzy-pymoveit2 \
  ros-jazzy-cv-bridge ros-jazzy-tf-transformations

4.2 Python dependencies

pip install numpy opencv-python open3d pillow google-genai --break-system-packages

4.3 SAM2

Clone SAM2 outside the ROS workspace so that colcon does not attempt to build it, then download the tiny checkpoint expected by segmentation_node:

cd ~
git clone https://github.com/facebookresearch/sam2.git
cd sam2 && pip install -e . --break-system-packages
cd checkpoints
wget https://dl.fbaipublicfiles.com/segment_anything_2/092824/sam2.1_hiera_tiny.pt

The default checkpoint path is ~/sam2/checkpoints/sam2.1_hiera_tiny.pt; override it through the sam2_checkpoint parameter if installed elsewhere.

4.4 Gemini API key

echo 'export GEMINI_API_KEY="your-key-here"' >> ~/.bashrc
source ~/.bashrc

4.5 Build the workspace

mkdir -p ~/ros2_ws/src && cd ~/ros2_ws/src
git clone https://github.com/larics/geoSuctionBot.git ur_suctionbot

mkdir -p ur_suctionbot/config

cd ~/ros2_ws
colcon build --symlink-install --cmake-args -DCMAKE_BUILD_TYPE=Release
source install/setup.bash

5. Running the system

Open a separate terminal for each block and source the workspace (source ~/ros2_ws/install/setup.bash) in every one.

Terminal 1 — UR10e driver, MoveIt 2, and Servo

# Real robot
ros2 launch ur_suctionbot ur10e_driver_moveit.launch.py robot_ip:=192.168.0.155

# Bench testing without hardware
ros2 launch ur_suctionbot ur10e_driver_moveit.launch.py use_mock_hardware:=true

Terminal 2 — Camera, segmentation, and suction-scoring nodes (add gui:=true for the operator GUI, rviz:=true for visualisation)

ros2 launch ur_suctionbot suction.launch.py gui:=true

Terminal 3 — Arm grasp-execution server (use dry_run:=true to plan and log grasps without commanding motion)

ros2 launch ur_suctionbot arm.launch.py

5.1 Operating the pipeline

Via the GUI. The AUTO button executes the full cycle segment → compute → grasp. Individual buttons trigger each stage separately, select the scoring method (KNN / Sobel / RANSAC), and adjust parameters (sealing threshold, cup diameter, tilt tolerance). Node-status indicators confirm that all four nodes are alive.

Via the command line.

# 1. Detect and segment (empty prompt uses the default carton prompt)
ros2 service call /ur_suctionbot/segmentation/segment \
  ur_suctionbot/srv/Segment "{prompt: ''}"

# 2. Score suction points
ros2 service call /ur_suctionbot/suction/compute \
  ur_suctionbot/srv/ComputeSuction "{method: 'sobel', threshold: 0.5}"

# 3. Execute the best grasp per object
ros2 action send_goal /ur_suctionbot/arm/execute_grasp \
  ur_suctionbot/action/ExecuteGrasp "{dry_run: false}"

# Return the arm to its home configuration
ros2 service call /ur_suctionbot/arm/go_home std_srvs/srv/Trigger

Run-time retargeting. Because target selection is driven entirely by the prompt, the object set can be redefined without retraining — for example "{prompt: 'detect all plastic bottles'}" or "{prompt: 'heavily deformed cartons only'}".

5.2 Principal ROS 2 interfaces

Interface	Type	Description
`/ur_suctionbot/segmentation/segment`	`srv/Segment`	Run detection + segmentation for a prompt
`/ur_suctionbot/suction/compute`	`srv/ComputeSuction`	Score suction points, return best candidate
`/ur_suctionbot/arm/execute_grasp`	`action/ExecuteGrasp`	Execute grasp(s) with progress feedback
`/ur_suctionbot/arm/go_home`	`std_srvs/Trigger`	Move to the home configuration
`/ur_suctionbot/segmentation/points`	`PointCloud2`	Masked, per-object point cloud
`/ur_suctionbot/suction/candidates`	`GraspCandidateArray`	One ranked grasp candidate per object
`/ur_suctionbot/suction/visualization`	`PointCloud2`	Score heatmap for RViz

6. Citation

If you find the work here useful in you own research, please cite the paper:

TBD

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Training-Free Suction Grasp Detection for Deformed Aseptic Cartons Using Vision–Language Models and Geometric Surface Scoring

1. Overview

2. Repository structure

3. System requirements

4. Installation

4.1 ROS 2 and robot dependencies

4.2 Python dependencies

4.3 SAM2

4.4 Gemini API key

4.5 Build the workspace

5. Running the system

5.1 Operating the pipeline

5.2 Principal ROS 2 interfaces

6. Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
action		action
launch		launch
msg		msg
srdf		srdf
srv		srv
ur_suctionbot		ur_suctionbot
urdf		urdf
CMakeLists.txt		CMakeLists.txt
README.md		README.md
package.xml		package.xml

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Training-Free Suction Grasp Detection for Deformed Aseptic Cartons Using Vision–Language Models and Geometric Surface Scoring

1. Overview

2. Repository structure

3. System requirements

4. Installation

4.1 ROS 2 and robot dependencies

4.2 Python dependencies

4.3 SAM2

4.4 Gemini API key

4.5 Build the workspace

5. Running the system

5.1 Operating the pipeline

5.2 Principal ROS 2 interfaces

6. Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages