Developing AI camera image recognition system in an hour

Cognitive Services from Microsoft offer pretrained networks that allow citizen developers to quickly develop e.g. image recognition algorithms. This short post is to show how it is done and how I have created a real-time people recognition system. The goal of this solution is to recognize people in our office and possibly inform our beloved founders of this. What I used was: a python script and a FullHD webcam + Cognitive Services Custom Vision API.

artykuł_10_person detector

The steps are pretty straightforward:

  • Get some training data.
  • Set up the cognitive service.
  • Train the data and set up an end point. We have to have a prediction endpoint that can be asked for predictions.
  • Write a python script that 1. gets the image from the camera, 2. ask for prediction and receives the anserws, 3. draws rectangles around people, 4. shows it.

Getting the data

As described by Microsoft. training data should contain around 15 images per one class that you want to recognize. It can be e.g. a cat, or a dog. In this case, we are going to look for a silhouette of a person. Because I train only one class, I have taken around 25 pictures of our Clouds On Mars office as a training sample. A left around 5 as a test data.


Setting up a Cognitive Serivce

1000 prediction calls are free and you can set it up here. it is very intuitive.

After logging in create a new project:


Select Object Detection (preview):


Training the algorithm

To train the algorithm you need to provide it with pictures and your objects. First, add images and start tagging by drawing rectangles around objects to identify:


After tagging the photos, click “Train” and review the performance of the algorithm:


Go to Quick Test to check the performance on a test photo:


To get the prediction endpoint, go to settings. It will be under “Prediction Key”:

01 (002)

Writing a python scipt

To connect with Cognitive Service Custom Vision API you need to install a new python module. In your command line run:

Next, we need a module to handle images and camera. Perfect for the job is OpenCV – open source computer vision module – super usefull. I have used it before in my flappy bird project.

As to the script. Getting camera is faily easy. I have not managed to feed it directly into the service, but worked on files – this might have been a bottleneck.

Showing it all:


The whole script:


Final effect:


Some conclusions:

  • Price seems to be low, but turns out to be very expensive. Especially, when designing real-time solutions. One camera in e.g. a store would cost around $25 a day.
  • The performance is not satisfactory – we had to wait for a couple of seconds for a prediction and this rules out any solutions that emphize the speed of answer.
  • Very easy and clean to test some solutions that later might be custom made to fit a certain project and requirements.
  • Overall – fun.

Leave a Reply

Your email address will not be published. Required fields are marked *