Intelligent Scissors: Interactive Image Segmentation Tool

Overview

Implemented an Intelligent Scissors (Live-Wire) interactive image segmentation tool in Java, as a course project at CMU (Spring 2025), with teammates Tingjun Huang and Guoheng Ma. The tool allows a user to click seed points on an image and have the system automatically trace optimal boundaries between objects in real time.

The implementation models the image as an 8-neighborhood weighted pixel graph with edge costs derived from Sobel gradient magnitudes, then computes live boundary paths via Dijkstra and A* shortest-path search from the last seed to the current cursor position.

Results

Processing time (measured on Intel Core i9-12900H, 32 GB RAM, Windows 11, vs. Adobe Photoshop 2020 manual polygon-lasso):

Test Image Photoshop Intelligent Scissors Speedup
Image 1 (Mickey Mouse cartoon, low-contrast boundary, 661×494 px) 551 s 106 s 5.2×
Image 2 (two parrots, high-contrast textured, 707×480 px) 279 s 25 s 11.2×

Segmentation accuracy:

Test Image IoU Dice Coefficient
Image 1 0.88 0.94
Image 2 0.97 0.99

Both results exceed the 0.85 Dice threshold commonly considered “high-quality” in interactive segmentation literature.

Technical Details

Image Processing and Cost Function:

  • Input image converted to grayscale (ITU-R BT.709 weighted average).
  • Sobel operators compute horizontal and vertical gradients G_x, G_y; gradient magnitude G(x,y) = sqrt(G_x² + G_y²).
  • Pixel cost: C(x,y) = 1 / (1 + G(x,y)) — strong edges yield low cost, making them preferred by the pathfinder.
  • Border regions optionally enhanced by boosting edge strength at image boundaries to ensure complete closed paths.

Graph and Shortest-Path Search:

  • 8-neighborhood graph: each pixel node connected to 8 neighbors; edge weight = (C(i,j) + C(k,l)) / 2 × d, where d = 1 for orthogonal, √2 for diagonal.
  • Dijkstra computes shortest paths from the active seed to all pixels; results cached as a per-pixel cost map + predecessor map — path retrieval on cursor move is O(1) (no re-computation until seed changes).
  • A* used in ROI-constrained mode with Euclidean-distance heuristic (α = 0.005), focusing search within a dynamic window around the start/end points for faster interactive response.

Cursor Snap:
Within radius r, the cursor position is attracted to the pixel maximizing:
S(x’, y’) = E(x’, y’) / (1 + α · d((x,y), (x’,y’)))
where E is precomputed edge strength and α = 0.3 is a decay factor. This snaps the active endpoint to nearby strong edges, reducing required precision and stabilizing boundary tracing along prominent features.

Path Cooling (auto-seeding):
When the live path remains stable across N consecutive frames (measured by Euclidean distance between uniformly sampled points), and a candidate point c satisfies minimum spacing (||c − s_last|| > D_min) and minimum edge strength (E(c) > E_thresh), a new seed is automatically inserted. This progressively locks completed path segments without explicit clicks, reducing total click count.

Performance Optimizations:

  • Precomputed cost matrix cached after edge detection — no recomputation during pathfinding.
  • Dynamic window restriction: graph construction limited to a bounding region around start/end points; window size adapts to the distance between them.
  • IndexMinPQ priority queue for efficient A* open-set management.
  • Boolean mask matrix for O(1) pixel state queries in segmentation and path cooling.

Challenges

  1. Real-time Dijkstra at megapixel scale: A naïve re-run on every cursor move (O(N log N), N ≈ 300K for a 661×494 image) was too slow for interactive use. The seed-caching architecture — precomputing the full Dijkstra result once per seed and recovering paths via predecessor-map traversal — reduced per-frame work from O(N log N) to O(path length).

  2. Path instability in low-gradient regions: In flat-texture regions, small cursor perturbations caused large path jumps. Path Cooling’s stability detection (consecutive-frame similarity check) added path inertia, while the sharpening kernel parameter tuning (value 5) improved edge detectability in low-contrast areas at the cost of some visual noise.

  3. Closed-path formation: The algorithm naturally traces from a seed toward the cursor; closing the path requires the endpoint to meet the starting seed. Implemented arc-length parameterization and a minimum-seed-distance threshold to prevent the path from “collapsing” prematurely near the starting seed.

Reflection and Insights

Two lessons dominated this project: caching architecture matters more than algorithmic choice, and UX features drive perceived performance more than raw compute.

The 5.2×–11.2× speedup over Photoshop came almost entirely from the Dijkstra seed-cache — a single structural insight eliminating redundant computation. Meanwhile, Cursor Snap and Path Cooling — entirely orthogonal to the shortest-path algorithm — were responsible for the majority of improvement in actual segmentation speed during user testing. These two auxiliary features reduced user effort (click count, repositioning) far more than algorithmic refinements to the path cost function did.

This generalizes to interactive system design broadly: the component the user interacts with most frequently (cursor movement, click placement) has disproportionate leverage on end-to-end task time.

Team and Role

Course project at CMU with Tingjun Huang (12112619) and Guoheng Ma (12211611). My contributions: graph construction and Dijkstra/A* implementation, cost matrix caching architecture, Path Cooling algorithm design and implementation, and performance benchmarking.

Stack

Java 17, IntelliJ IDEA, Swing (GUI), Princeton algorithm library (EdgeWeightedDigraph, DijkstraSP, IndexMinPQ)

ROS Robot Intelligent Navigation and Control System

Overview

This project was the final deliverable for the Robot Perception and Intelligence course (EE211) at Southern University of Science and Technology, built on the ROS2 platform. The goal was to develop a fully autonomous robot capable of navigating to a target location, recognizing and grasping an object using a robotic arm, and avoiding obstacles — all with custom-implemented planning and control modules.

Robot navigation and arm control demonstration

Results

  • Navigation: Successfully navigated to target points using the Nav2 stack with a custom global planner plugin.
  • Object Recognition and Grasping: Detected target objects via Aruco markers; the robotic arm computed inverse kinematics and executed reliable grasps within the reachable workspace.
  • Path Planning: Implemented a custom A* global planner and a trajectory feedback local controller as Nav2 plugins.
  • Extra Challenge: Handled randomly placed objects by dynamically querying IK solvability during slow-approach phases.

Technical Details

  • System Architecture:
    • Finite State Machine (FSM): Coordinated high-level task sequencing (navigate → approach → grasp → return).
    • Navigation: Nav2 stack with tuned parameters for global_costmap, local_costmap, planner_server, and controller_server.
    • Aruco-based Target Recognition: Used camera-based Aruco detection to estimate target pose; TF tree handled all coordinate transformations automatically.
  • Custom A Planner (MyPlanner)*:
    • Implemented as a Nav2 global planner plugin in C++.
    • Standard A* graph search on the occupancy grid with heuristic tuning for smooth paths.
  • Custom Trajectory Feedback Controller (MyController):
    • Local controller plugin computing velocity commands to track the reference path.
    • Feedback control based on cross-track error and heading error.
  • Robotic Arm Controller:
    • Queried IK solver (grasp_query_solved()) in a loop during slow approach to determine when the target entered the reachable envelope.
    • Designed custom grasp points with direction information from Aruco pose estimates.
  • PTZ (Pan-Tilt) Tracking:
    • Drove the camera gimbal to track the target during navigation, preventing loss of visibility.
    • Coordinate compensation handled via TF tree rather than manual recalibration.

Challenges

  • Odometry Drift: Wheel odometry accumulated error over longer paths, causing the robot to lose accurate positioning relative to the target. Resolved by switching reference to the Aruco marker position during the final approach phase.
  • IK Feasibility Window: The robotic arm’s reachable workspace was constrained, requiring continuous IK queries and a slow-approach strategy to enter the feasible zone before executing a grasp.
  • Costmap Configuration: Getting Nav2’s costmap inflation and obstacle layers tuned for the specific robot geometry required iterative testing.

Reflection and Insights

This project provided hands-on experience with the full stack of autonomous robotics: perception, planning, and control. Implementing A* and the trajectory controller as actual Nav2 plugin classes — rather than standalone scripts — deepened understanding of how ROS2’s modular architecture enables component reuse and testing. The challenge of handling coordinate frames across navigation, perception, and manipulation highlighted why a well-structured TF tree is foundational to multi-component robotic systems.

Team and Role

  • Team: Three-person team, each responsible for different subsystems.
  • My Role: Led the development of the custom A* global planner plugin and the trajectory feedback controller; contributed to the FSM design and arm approach strategy.