Researchers from the Indian Institutes of Technology have developed a computationally efficient method to track the three-dimensional movements of surgical instruments using only standard two-dimensional video feeds and geometric principles.
Surgeons and patients are increasingly choosing laparoscopic surgery, also called surgery through a keyhole, as patients experience less pain and faster recovery. When surgeons manipulate robotic arms to guide a tiny tool inside the body in 3D space, they rely on experience and skill to perceive depth from the 2D video captured by the tiny camera at the operation site, also inserted through a keyhole. While some tertiary healthcare facilities in big cities in India have high end robotic surgery systems with 3D visualisations, such facilities are limited and expensive.
To achieve depth perception and 3D visualisation, surgery setups currently use two camera systems or expensive sensors, markers or labels on the tools. Another approach that uses deep learning techniques requires complex computations. These methods are costly and use high-end resources, making them inaccessible for small healthcare centres.
Dr Shubhangi Nema and Prof Leena Vachhani from the Indian Institute of Technology (IIT) Bombay, and Abhishek Mathur from IIT Goa, have developed a new software technique that uses fundamental geometry to track surgical tools in 3D without the need for expensive sensors or heavy computing power. Their software can estimate the position and orientation of the surgical instruments using a standard video feed. This cost-effective way to track instruments in 3D can enable better virtual reality training systems and, in future, can significantly reduce the cost of 3D visualisation systems in actual surgeries.
“We chose a geometric approach because geometry is fundamentally reliable and interpretable. We leveraged geometric cues such as perspective projection, instrument shape constraints, and interval-based uncertainty modelling (using a range of possible position coordinates instead of exact position),” says Dr Nema.
The researchers found that by treating the surgical instrument as a set of connected geometric shapes, they could calculate depth and rotation of the instrument directly from the changes in the 2D video frames. They developed an algorithm to create bounding boxes for each part of the instrument—such as the shaft and the attached clasper. By looking at how the boxes change shape, size, and angles between them in each frame, they estimate the relative and absolute position and movement of the instrument parts.
The algorithm utilises the principle of perspective: as an object moves further away from the camera, it appears smaller, and as it rotates, its projected shape distorts in predictable ways. By measuring the change in the area of these bounding boxes, the algorithm calculates the depth movement. If the box shrinks, the tool is moving deeper into the body; if it expands, it is retracting. The algorithm simultaneously tracks the motion of the box’s centre across the screen and analyses changes in the internal angles to determine rotational motion along each axis.
Accurately estimating the depth from 2D images can be challenging. The object outline may not be clear due to poor lighting, camera noise, or motion blur. “From a single camera view, multiple 3D configurations can produce the same 2D projection. We introduced geometric constraints and interval-based bounds to narrow the feasible solution space,” explains Dr Nema. Instead of saying that the tooltip is at an exact point P, the algorithm gives a range, or an interval, in which the tip can be present. “By incorporating known instrument dimensions and motion continuity, we reduced ambiguity. This approach makes 3D estimation more stable and robust,” says Dr Nema.
The team’s simulations and experiments demonstrated that their method achieves high accuracy, with an error of less than or equal to one millimetre in displacement, and negligible error in orientation. Furthermore, the system is efficient enough to run on a standard computer processor without specialised graphics hardware, processing video at speeds of roughly 50 frames per second, which is well within the requirements for real-time applications.
To validate their method, the team set up a physical experiment to record known motions of a scaled physical model using a highly precise motion capture system, and a stationary webcam. They compared the data from their geometric algorithm (used on the webcam feed) with the actual data provided by the motion capture sensors in the experimental setup. They found that the errors were negligible for the method to be used for labelling and motion tracking of instruments for futuristic applications.
The researchers noted that the accuracy of the 3D tracking depends directly on the precision of the initial 2D segmentation; if the computer creates a poor outline of the tool, the 3D estimation will be less accurate. Additionally, the current mathematical model assumes that the camera focal length is known and fixed. The researchers plan to integrate automated calibration in future iterations.
The researchers plan to implement their strategy in an experimental setup for providing real-time training or assistance to the surgeons. “This work demonstrates that a three-dimensional visual experience for surgeons can be achieved using the existing monocular laparoscopic camera itself, offering a cost-effective and practical pathway toward improved depth perception in minimally invasive surgery,” concludes Prof Vachhani.
Funding Information :
Financial support for this study was provided by the Prime Minister’s Research Fellows (PMRF) scheme, India (PMRF Id no. 1300229 dated May 2019) for pursuing research in higher educational institutions in India.
Prof. Leena Vachhani, Systems and Control, Indian Institute of Technology Bombay, Powai, Mumbai, Maharashtra 400076, India