{"id":544,"date":"2018-08-27T08:39:59","date_gmt":"2018-08-27T06:39:59","guid":{"rendered":"http:\/\/cwiok.pl\/?p=544"},"modified":"2018-08-27T08:40:12","modified_gmt":"2018-08-27T06:40:12","slug":"developing-ai-camera-image-recognition-system-in-an-hour","status":"publish","type":"post","link":"https:\/\/cwiok.pl\/index.php\/en\/2018\/08\/27\/developing-ai-camera-image-recognition-system-in-an-hour\/","title":{"rendered":"Developing AI camera image recognition system in an hour"},"content":{"rendered":"<p>Cognitive Services from Microsoft offer pretrained networks that allow citizen developers to quickly develop e.g. image recognition algorithms. This short post is to show how it is done and how I have created a real-time people recognition system. The goal of this solution is to recognize people in our office and possibly inform our beloved founders of this. What I used was: a python script and a FullHD webcam + Cognitive Services Custom Vision API.<\/p>\n<p><a href=\"http:\/\/cwiok.pl\/wp-content\/uploads\/2018\/08\/artyku%C5%82_10_person-detector.png\"><img loading=\"lazy\" decoding=\"async\" style=\"border: 0px currentcolor; display: inline; background-image: none;\" title=\"artyku\u0142_10_person detector\" src=\"http:\/\/cwiok.pl\/wp-content\/uploads\/2018\/08\/artyku%C5%82_10_person-detector_thumb.png\" alt=\"artyku\u0142_10_person detector\" width=\"1200\" height=\"628\" border=\"0\" \/><\/a><\/p>\n<p>The steps are pretty straightforward:<\/p>\n<ul>\n<li>Get some training data.<\/li>\n<li>Set up the cognitive service.<\/li>\n<li>Train the data and set up an end point. We have to have a prediction endpoint that can be asked for predictions.<\/li>\n<li>Write a python script that 1. gets the image from the camera, 2. ask for prediction and receives the anserws, 3. draws rectangles around people, 4. shows it.<\/li>\n<\/ul>\n<h1>Getting the data<\/h1>\n<p>As described by Microsoft. training data should contain around 15 images per one class that you want to recognize. It can be e.g. a cat, or a dog. In this case, we are going to look for a silhouette of a person. Because I train only one class, I have taken around 25 pictures of our Clouds On Mars office as a training sample. A left around 5 as a test data.<\/p>\n<p><a href=\"http:\/\/cwiok.pl\/wp-content\/uploads\/2018\/08\/Composite.png\"><img loading=\"lazy\" decoding=\"async\" style=\"display: inline; background-image: none;\" title=\"Composite\" src=\"http:\/\/cwiok.pl\/wp-content\/uploads\/2018\/08\/Composite_thumb.png\" alt=\"Composite\" width=\"1924\" height=\"1084\" border=\"0\" \/><\/a><\/p>\n<h1>Setting up a Cognitive Serivce<\/h1>\n<p>1000 prediction calls are free and you can set it up <a href=\"http:\/\/www.customvision.ai\">here<\/a>. it is very intuitive.<\/p>\n<p>After logging in create a new project:<\/p>\n<p><a href=\"http:\/\/cwiok.pl\/wp-content\/uploads\/2018\/08\/image.png\"><img loading=\"lazy\" decoding=\"async\" style=\"border: 0px currentcolor; display: inline; background-image: none;\" title=\"image\" src=\"http:\/\/cwiok.pl\/wp-content\/uploads\/2018\/08\/image_thumb.png\" alt=\"image\" width=\"389\" height=\"466\" border=\"0\" \/><\/a><\/p>\n<p>Select Object Detection (preview):<\/p>\n<p><a href=\"http:\/\/cwiok.pl\/wp-content\/uploads\/2018\/08\/image-1.png\"><img loading=\"lazy\" decoding=\"async\" title=\"image\" src=\"http:\/\/cwiok.pl\/wp-content\/uploads\/2018\/08\/image_thumb-1.png\" alt=\"image\" width=\"724\" height=\"464\" border=\"0\" \/><\/a><\/p>\n<h3>Training the algorithm<\/h3>\n<p>To train the algorithm you need to provide it with pictures and your objects. First, add images and start tagging by drawing rectangles around objects to identify:<\/p>\n<p><a href=\"http:\/\/cwiok.pl\/wp-content\/uploads\/2018\/08\/image-2.png\"><img loading=\"lazy\" decoding=\"async\" title=\"image\" src=\"http:\/\/cwiok.pl\/wp-content\/uploads\/2018\/08\/image_thumb-2.png\" alt=\"image\" width=\"1494\" height=\"881\" border=\"0\" \/><\/a><\/p>\n<p>After tagging the photos, click \u201cTrain\u201d and review the performance of the algorithm:<\/p>\n<p><a href=\"http:\/\/cwiok.pl\/wp-content\/uploads\/2018\/08\/image-3.png\"><img loading=\"lazy\" decoding=\"async\" style=\"border: 0px currentcolor; display: inline; background-image: none;\" title=\"image\" src=\"http:\/\/cwiok.pl\/wp-content\/uploads\/2018\/08\/image_thumb-4.png\" alt=\"image\" width=\"1489\" height=\"883\" border=\"0\" \/><\/a><\/p>\n<p>Go to Quick Test to check the performance on a test photo:<\/p>\n<p><a href=\"http:\/\/cwiok.pl\/wp-content\/uploads\/2018\/08\/image-4.png\"><img loading=\"lazy\" decoding=\"async\" style=\"border: 0px currentcolor; display: inline; background-image: none;\" title=\"image\" src=\"http:\/\/cwiok.pl\/wp-content\/uploads\/2018\/08\/image_thumb-5.png\" alt=\"image\" width=\"1469\" height=\"833\" border=\"0\" \/><\/a><\/p>\n<p>To get the prediction endpoint, go to settings. It will be under \u201cPrediction Key\u201d:<\/p>\n<p><a href=\"http:\/\/cwiok.pl\/wp-content\/uploads\/2018\/08\/01-002.jpg\"><img loading=\"lazy\" decoding=\"async\" style=\"border: 0px currentcolor; display: inline; background-image: none;\" title=\"01 (002)\" src=\"http:\/\/cwiok.pl\/wp-content\/uploads\/2018\/08\/01-002_thumb.jpg\" alt=\"01 (002)\" width=\"947\" height=\"225\" border=\"0\" \/><\/a><\/p>\n<h1>Writing a python scipt<\/h1>\n<p>To connect with Cognitive Service Custom Vision API you need to install a new python module. In your command line run:<\/p>\n<pre class=\"toolbar:2 lang:default decode:true\">pip install azure.cognitiveservices.vision.customvision<\/pre>\n<p>Next, we need a module to handle images and camera. Perfect for the job is OpenCV &#8211; open source computer vision module \u2013 super usefull. I have used it before in my flappy bird project.<\/p>\n<pre class=\"toolbar:2 lang:default decode:true\">pip install openvc-python<\/pre>\n<p>As to the script. Getting camera is faily easy. I have not managed to feed it directly into the service, but worked on files \u2013 this might have been a bottleneck.<\/p>\n<pre class=\"toolbar:2 wrap:true lang:python decode:true\">cam = cv2.VideoCapture(1)\r\ncam.set(3, 1920)\r\ncam.set(4, 1080)\r\n\r\nret_val, img = cam.read()\r\n     cv2.imwrite('cam.png',img)\r\n     draw = cv2.imread('cam.png')\r\n\r\n#Getting the prediction:\r\n\r\nwith open(\"cam.png\", mode=\"rb\") as test_data:\r\n         results = predictor.predict_image(\u2018&lt;PROJECT ID&gt;\u2019, test_data)\r\n\r\n#Reading the JSON anwser and drawing rectangles:\r\n\r\nfor prediction in results.predictions:\r\n         if prediction.probability &gt; 0.5:\r\n             print (\"\\t\" + prediction.tag_name + \": {0:.2f}%\".format(prediction.probability * 100), prediction.bounding_box.left*1920, prediction.bounding_box.top*1080, prediction.bounding_box.width, prediction.bounding_box.height)\r\n             screen1 = cv2.rectangle(draw, (int(prediction.bounding_box.left*1920),int(prediction.bounding_box.top*1080)) , (int((prediction.bounding_box.left+prediction.bounding_box.width)*1920),int((prediction.bounding_box.top+prediction.bounding_box.height)*1080)), (0,255,255), 2)<\/pre>\n<p>Showing it all:<\/p>\n<pre class=\"toolbar:2 lang:python decode:true\">cv2.imshow('AI',screen1)<\/pre>\n<p>&nbsp;<\/p>\n<p>The whole script:<\/p>\n<pre class=\"toolbar:2 wrap:true lang:python decode:true \">from azure.cognitiveservices.vision.customvision.prediction import prediction_endpoint\r\nfrom azure.cognitiveservices.vision.customvision.prediction.prediction_endpoint import models\r\nimport cv2\r\n \r\npredictor = prediction_endpoint.PredictionEndpoint('&lt;YOUR KEY HERE&gt;')\r\n\r\n\r\ncam = cv2.VideoCapture(1)\r\ncam.set(3, 1920)\r\ncam.set(4, 1080)\r\nwhile True:\r\n     ret_val, img = cam.read()\r\n     cv2.imwrite('cam.png',img)\r\n     draw = cv2.imread('cam.png')\r\n     with open(\"cam.png\", mode=\"rb\") as test_data:\r\n         results = predictor.predict_image(\u2018&lt;PROJECT ID&gt;\u2019, test_data)\r\n         #print(faces)\r\n\r\n    for prediction in results.predictions:\r\n         if prediction.probability &gt; 0.5:\r\n             print (\"\\t\" + prediction.tag_name + \": {0:.2f}%\".format(prediction.probability * 100), prediction.bounding_box.left*1920, prediction.bounding_box.top*1080, prediction.bounding_box.width, prediction.bounding_box.height)\r\n             screen1 = cv2.rectangle(draw, (int(prediction.bounding_box.left*1920),int(prediction.bounding_box.top*1080)) , (int((prediction.bounding_box.left+prediction.bounding_box.width)*1920),int((prediction.bounding_box.top+prediction.bounding_box.height)*1080)), (0,255,255), 2)\r\n\r\n    cv2.imshow('AI',screen1)\r\n\r\n    if cv2.waitKey(1) == 27:\r\n         break  # esc to quit\r\ncv2.destroyAllWindows()<\/pre>\n<p>&nbsp;<\/p>\n<h1>Final effect:<\/h1>\n<p><a href=\"http:\/\/cwiok.pl\/wp-content\/uploads\/2018\/08\/image-5.png\"><img loading=\"lazy\" decoding=\"async\" style=\"border: 0px currentcolor; display: inline; background-image: none;\" title=\"image\" src=\"http:\/\/cwiok.pl\/wp-content\/uploads\/2018\/08\/image_thumb-6.png\" alt=\"image\" width=\"1920\" height=\"1080\" border=\"0\" \/><\/a><\/p>\n<h1>Some conclusions:<\/h1>\n<ul>\n<li>Price seems to be low, but turns out to be very expensive. Especially, when designing real-time solutions. One camera in e.g. a store would cost around $25 a day.<\/li>\n<li>The performance is not satisfactory \u2013 we had to wait for a couple of seconds for a prediction and this rules out any solutions that emphize the speed of answer.<\/li>\n<li>Very easy and clean to test some solutions that later might be custom made to fit a certain project and requirements.<\/li>\n<li>Overall &#8211; fun.<\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>Cognitive Services from Microsoft offer pretrained networks that allow citizen developers to quickly develop e.g. image recognition algorithms. This short post is to show how it is done and how I have created a real-time people recognition system. The goal of this solution is to recognize people in our office and possibly inform our beloved founders of this. What I used was: a python script and a FullHD webcam + Cognitive Services Custom Vision API.<\/p>\n<p><a href=\"http:\/\/cwiok.pl\/wp-content\/uploads\/2018\/08\/artyku%C5%82_10_person-detector.png\"><img loading=\"lazy\" decoding=\"async\" style=\"border: 0px currentcolor; display: inline; background-image: none;\" title=\"artyku\u0142_10_person detector\" src=\"http:\/\/cwiok.pl\/wp-content\/uploads\/2018\/08\/artyku%C5%82_10_person-detector_thumb.png\" alt=\"artyku\u0142_10_person detector\" width=\"1200\" height=\"628\" border=\"0\" \/><\/a><\/p>\n<div class=\"tech_read_more\"><a href=\"https:\/\/cwiok.pl\/index.php\/en\/2018\/08\/27\/developing-ai-camera-image-recognition-system-in-an-hour\/\">Read More<\/a><\/div>","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[55,28],"tags":[],"class_list":["post-544","post","type-post","status-publish","format-standard","hentry","category-ai-en","category-azure"],"_links":{"self":[{"href":"https:\/\/cwiok.pl\/index.php\/wp-json\/wp\/v2\/posts\/544","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/cwiok.pl\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/cwiok.pl\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/cwiok.pl\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/cwiok.pl\/index.php\/wp-json\/wp\/v2\/comments?post=544"}],"version-history":[{"count":0,"href":"https:\/\/cwiok.pl\/index.php\/wp-json\/wp\/v2\/posts\/544\/revisions"}],"wp:attachment":[{"href":"https:\/\/cwiok.pl\/index.php\/wp-json\/wp\/v2\/media?parent=544"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/cwiok.pl\/index.php\/wp-json\/wp\/v2\/categories?post=544"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/cwiok.pl\/index.php\/wp-json\/wp\/v2\/tags?post=544"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}