Friday, 12 February 2016

Robert Burke, Sprint 1 - Basic vision system

Team:    QRM
Author:  Robert Burke
Date:      31/01/2016
Post:      #1

Last week’s action item review:
My action item last week was to implement  basic vision system interactions to control a pan and tilt servo mechanism. This task was then simplified into smaller tasks which consist of,
  • capture image
  • detect a face.
  • determine center point of detected face.
  • translate these coordinates into a range of values suitable for the two servos.


Inspiration for this sprint came from the Pixar like lamp robot in the video below.

With Python/OpenCv installed on the Raspberry Pi and the RPI camera connected via ribbon cable, basic image and video test scripts were first implemented to get a feel for the software and verify the operation of all hardware/software.

After successfully capturing both images and video feed, a basic script to identify and determine the centre point of a detected face was implemented to operate only on still images at first. This basic script was then modified to perform the same operation on a video feed or continuous image capture, to detect both face and eyes

To help in debugging and programming the servo values necessary to follow a detected face, graphics were added to the video display frame.
It was then decided within the team that the camera is to be placed on the servos themselves as previous experience indicated that having the camera stationary and not attached to the servos as a image capture method proves to require more computation and time to implement a functioning face tracking system.

Therefore the face centre values which are  x and y coordinates of a detected face in the input frame  (generally 640x480)  were converted into values to control the servos keeping the  position of the camera in mind.

This basic method proved to be successful in tracking both face and eyes as shown in the image below. Where the detected face in the test image displayed on phone are surrounded by rectangles, and the detected face centre position is displayed in the upper left hand corner of the video display frame as well as the values which will be output to the servos for pan and tilt action and basic face tracking.

Full code can be found at link below,

Action discussion:

After implementing the basic image and video capture scripts it was observed the the best method of capture for the RPI was to use continuous image capture, where the the images are captured and stored in an RGB array for processing. A while loop then operates on each frame extracting relevant information before discarding the frame and processing the next frame.

The basic vision system was implemented by first importing the necessary packages such as the RPI camera functions and OpenCv libraries.

The camera and variables such as the captured image resolution and frames per second (FPS) were then initialised. Here it was observed that a lower resolution capture frame resulted in an increase in FPS but a reduction in face detection accuracy.

An array is then created which will hold the captured frames, .the OpenCv Cascade Classifiers for both face and eye detection are then initialised.

A while loop is then executed to both continuously capture and store still images in arrays. The image is then grayscaled in preparation for the cascade classifiers, before these functions identify any faces or eyes in the image and agian stores these coordinates.

From the detected face array, which holds coordinates for a rectangle surrounding an identified face, the face width and height were extracted and the centre position of the face calculated. reasonable FPS while displaying the captured images, the capture frame size was reduced to 100*100. The image was then scaled by a factor of four and displayed a larger lower resolution image for the purpose of debugging with minimal effect to performance.

A nested if statement within the main loop then checks if the faces array is not empty, i.e. a face has been detected, and generates values for the servo,which are determined by the center position of the face.
If the face is above the center of the image the tilt or y servo value is incremented from its central position in each iteration of the loop until either the face has reached the center of the image or the face is lost at which point the servo returns to a central position. Conversely the tilt servo value is decremented when the face is below the center of the screen. The pan servo or x servo is then implemented in the same fashion for values on the left or right of the image center.

A small rectangle in the center of the image is also checked against the face position. If the face value is anywhere within this rectangle the servo values do not change and thus maintain position. This prevents oscillations in servo values caused by servo trying to maintain a single position which is constantly changing.

Graphics are added to the display image to represent the servo if statement vlues, display number of faces detected and variables etc. The display image is then printed to the screen and the image is then truncated allowing for the next image to be processed by the loop. Finally a user input at any time allows the user to exit the loop and program via a keypress.

The image below shows a resulting frame from the initial test. Here it is observed that for the face displayed on the phone screen , both the face and eyes are detected and surrounded by rectangles and the face position and servo values are printed to the display image. Note that the image is blurred due to scaling to maintain performance.

Next action items list:

  • Extended version of basic vision system interactions to control a pan and tilt servo mechanism, using threads and interprocess communication (IPC)
  • Remote programming of raspberry pi using laptop.

Useful Links:

  • Pixar lamp robot guide:
  • RPI camera capture methods:
  • Basic object detection:
  • Object Tracking BrickPi Robot:

No comments:

Post a Comment