Drone Inspector

An edge computing solution to remote drone inspection



About

Drone Inspector is a prototype for a cloudlet based drone inspection system which would allow engineers to remotely scan a building for defects.  The system consists of two parts: an on-site camera equipped drone positioned at the building and an off-site client with a 3D model of the building.  As the drone flies around the exterior of the structure, its camera view is broadcast to the off-site client where it is contextualized using the 3D model.  This research was done in collaboration with Professor Satyanarayanan and Shilpa George from Carnegie Mellon University and was funded by DARPA.  The cloudlet interface code was built on the Gabriel Cognitive Assistant platform.  A paper related to this work is being presented at HotMobile 2019 called Towards Drone-sourced Live Video Analytics for the Construction Industry.


How it Works

Localization

The backbone of the localization component is based on SIFT matching.  SIFT is a computer vision algorithm for image matching, using various filters to extract 'key points' and algorithms like approximate nearest neighbors to perform matching.  Before the drone flight begins, we capture reference images which, when tiled, form an image skin over the surface of the building.  These are then stored in a database, along with the latitude, longitude, altitude, and heading that they were captured at.  As the drone is flying, the video stream is sent to a server where each frame is matched to an image in the database.  Once a best match is established, a homographic matrix is calculated, allowing us to determine what section of the reference image is shown in the query image.  This information, represented by a bounding box, in addition to the coordinates, altitude, and heading of the reference image, is then sent to the client for visualization.

Visualization

Screen Shot 2018-07-31 at 10.44.17 AM.png

On the client side, a scale 3D model of the building is stored in the Unity Engine.  Whenever the reference image location data is received, a virtual camera is placed in 3D space at the location where the reference image was taken.  Using ray casting, a bounding box can now be drawn showing what part of the 3D model corresponds to the live camera feed.


Challenges

One of our main challenges was achieving real time speed.  In its default configuration, SIFT is robust but computationally expensive.  Running on a 7th generation Intel CPU, SIFT was matching at 1 Hz, far below what would be considered real time (approximately 15 Hz).  Utilizing edge computing, we were able to use SIFT on a GPU enabled cloudlet without major hits to roundtrip time.  This significantly increased speed, allowing us to reach 15-16 Hz.

Another challenge we faced was the inaccuracy of location data from the drone.  Out of the box, the GPS inaccuracy of the Phantom 4 Pro (the drone we used) is about 3 meters.  We mitigated this by hovering the drone for long periods of time while taking reference images, allowing us to average the location data for more accurate results.


My Contribution

I was responsible for creating the Unity visualization client. I wrote the code to interpret the results of SIFT and localize them on the 3D model. I was also responsible for choosing which computer vision algorithms to use, and for gathering test footage (via actual drone flight around the Tepper School of Business).