Developing AI camera image recognition system in an hour

Cognitive Services from Microsoft offer pretrained networks that allow citizen developers to quickly develop e.g. image recognition algorithms. This short post is to show how it is done and how I have created a real-time people recognition system. The goal of this solution is to recognize people in our office and possibly inform our beloved founders of this. What I used was: a python script and a FullHD webcam + Cognitive Services Custom Vision API.

artykuł_10_person detector

The steps are pretty straightforward:

  • Get some training data.
  • Set up the cognitive service.
  • Train the data and set up an end point. We have to have a prediction endpoint that can be asked for predictions.
  • Write a python script that 1. gets the image from the camera, 2. ask for prediction and receives the anserws, 3. draws rectangles around people, 4. shows it.

Getting the data

As described by Microsoft. training data should contain around 15 images per one class that you want to recognize. It can be e.g. a cat, or a dog. In this case, we are going to look for a silhouette of a person. Because I train only one class, I have taken around 25 pictures of our Clouds On Mars office as a training sample. A left around 5 as a test data.

Composite

Setting up a Cognitive Serivce

1000 prediction calls are free and you can set it up here. it is very intuitive.

After logging in create a new project:

image

Select Object Detection (preview):

image

Training the algorithm

To train the algorithm you need to provide it with pictures and your objects. First, add images and start tagging by drawing rectangles around objects to identify:

image

After tagging the photos, click “Train” and review the performance of the algorithm:

image

Go to Quick Test to check the performance on a test photo:

image

To get the prediction endpoint, go to settings. It will be under “Prediction Key”:

01 (002)

Writing a python scipt

To connect with Cognitive Service Custom Vision API you need to install a new python module. In your command line run:

Next, we need a module to handle images and camera. Perfect for the job is OpenCV – open source computer vision module – super usefull. I have used it before in my flappy bird project.

As to the script. Getting camera is faily easy. I have not managed to feed it directly into the service, but worked on files – this might have been a bottleneck.

Showing it all:

 

The whole script:

 

Final effect:

image

Some conclusions:

  • Price seems to be low, but turns out to be very expensive. Especially, when designing real-time solutions. One camera in e.g. a store would cost around $25 a day.
  • The performance is not satisfactory – we had to wait for a couple of seconds for a prediction and this rules out any solutions that emphize the speed of answer.
  • Very easy and clean to test some solutions that later might be custom made to fit a certain project and requirements.
  • Overall – fun.

Leave a Reply

Your email address will not be published. Required fields are marked *