Example Project: SigLoMa-VLM
SigLoMa-VLM shows a typical ros_base setup for high-level task orchestration. Cameras, bridge nodes, VLM calls, trackers, and UI all run under one unified main flow driven by a finite-state machine.
For the complete public entry point, see
SigLoMa-Code. That repository
collects training, deployment, hardware wiring, demo media, and the full
repository map, making it a practical reference for learning how ros_base
supports a real robot system.
- Deployment workflow:
docs/deployment.md - Hardware setup:
docs/hardware.md - Repository map:
docs/repositories.md
1. Current entry file
The current main entry is:
It defines:
and registers:
Nodes
vlm_node:Robot2VLMBridgejoystick:JoystickSDKNodecamera:CamSubNode
Agents
vlm_qwen:QwenVLMAgenttracker:TrackerAgentuser:UIAgent
2. How ros_base is used in this project
The manager owns lifecycle management
PickPlaceRUN stays relatively focused, but it is placed at the critical control point:
- Inject
PickPlaceFSMHandlers - Add handshake rules based on the
waitargument - Close UI, node, and agent resources together on exit
The key handshake rule is:
The handler owns the full task-state machine
The actual task flow lives in:
Its internal states include:
WAIT_FOR_PICK_TARGETWAIT_FOR_PLACE_TARGETROTATE_TO_PICKAI_CONFIRM_PICKGRASP_EXECUTIONROTATE_TO_PLACEAI_CONFIRM_PLACEPLACE_EXECUTIONFINISHED
This FSM does not depend directly on manager.state. Instead, the handler maintains current_state and prev_state on its own.
3. Three design points worth noticing
Task logic is separated from the communication layer
CamSubNodeonly acquires imagesRobot2VLMBridgeonly handles topic communication with the lower-level sideQwenVLMAgentproduces detection boxesTrackerAgenthandles tracking and sigma pointsUIAgenthandles interaction and rendering
The handler only decides who gets called in each state.
Timestamp checks prevent reprocessing the same frame
During grasp and place stages, the handler first checks:
The tracker runs again only when a new camera frame arrives. This pattern is especially useful when a low-frequency camera feeds a higher-frequency main loop.
Visual results are forwarded to the lower-level side through the bridge
Robot2VLMBridge publishes:
/control/turn/control/object_ready/geometry_msgs/sigma_points/viz/vlm_bboxes
and subscribes to:
/control/rl_ready/control/turn_done/control/grasp_done
This design keeps the high-level task logic inside one manager without mixing low-level control details directly into the main flow.
4. How it works with PoseProcessor
Target points in this project do not stop at image coordinates. The system uses:
to combine:
- selected target boxes
- depth values
- camera extrinsics
- visual-odometry
odom
into world coordinates.
This is one of the main reasons why the CamSubNode + PoseProcessor + Bridge Node combination works well for task-oriented applications.
5. A reusable organization pattern
When building a new high-level task project, the following layering pattern is a good starting point:
- Use
BaseManagerfor the system heartbeat and handshakes - Use
CamSubNodefor shared image caching - Use one bridge node to talk to the lower-level system
- Use
BaseHandlersto implement the task FSM - Split detection, tracking, and UI into separate agents