Diego1# 機器視覺 -物體識別和定位

google最近公佈了基於tensorflow物體識別的Api，本文將利用Diego1#的深度攝像頭調用物體識別API，在識別物體的同時計算物體與出機器人攝像頭的距離。原理如下：

Object Detection 訂閱Openni發佈的Image消息，識別視頻幀中的物體
Object Depth 訂閱Openni發佈的Depth Image消息，根據Object Detection識別出的物體列表，對應到Depth Image的位置，計算Object的深度信息
Publish Image將視頻幀經過處理，增加識別信息Lable後，以Compressed Image消息發佈出去，可以方便其他應用訂閱

1.創建diego_tensorflow包
由於我們使用的tensorflow，所以我們首先需要安裝tensorflow，可以參考tensorflow官方安裝說明https://www.tensorflow.org/install/install_linux
Object_detection相關依賴安裝見官方安裝說明https://github.com/tensorflow/models/blob/master/object_detection/g3doc/installation.md
執行如下命令創建diego_tensorflow包

catkin_create_pkg diego_tensorflow std_msgs rospy roscpp cv_bridge

在diego_tensorflow目錄下創建兩個子目錄

scripts：存放相關代碼
launch：存放launch啓動文件

創建完成後diego_tensorflow目錄如下圖所示：

下載object_detection包：https://github.com/tensorflow/models
下載後將object_detection包上傳到diego_tensorflow/scripts目錄下，如果自己做模型訓練還需有上傳slim包到diego_tensorflow/scripts目錄下

物體識別的代碼都寫在ObjectDetectionDemo.py文件中，其中有關識別的代碼大部分參考tensorflow官方示例，這裏將其包裝成爲一個ROS節點，並增加物體深度數據的計算

2.ROS節點源代碼

#!/usr/bin/env python

import rospy
from sensor_msgs.msg import Image as ROSImage
from sensor_msgs.msg import CompressedImage as ROSImage_C
from cv_bridge import CvBridge
import cv2
import matplotlib
import numpy as np
import os
import six.moves.urllib as urllib
import sys
import tarfile
import tensorflow as tf
import zipfile
import uuid
from collections import defaultdict
from io import StringIO
from PIL import Image
from math import isnan

# This is needed since the notebook is stored in the object_detection folder.
from object_detection.utils import label_map_util
from object_detection.utils import visualization_utils as vis_util

class ObjectDetectionDemo():
    def __init__(self):
	rospy.init_node('object_detection_demo')
	
	# Set the shutdown function (stop the robot)
        rospy.on_shutdown(self.shutdown)
        
        self.depth_image =None
        
        self.depth_array = None
        
        model_path = rospy.get_param("~model_path", "")
        image_topic = rospy.get_param("~image_topic", "")
        depth_image_topic = rospy.get_param("~depth_image_topic", "")
        if_down=False
        self.vfc=0
        
        self._cv_bridge = CvBridge()
        
        # What model to download.
	#MODEL_NAME = 'ssd_mobilenet_v1_coco_11_06_2017'
	#MODEL_NAME='faster_rcnn_resnet101_coco_11_06_2017'
	MODEL_NAME ='ssd_inception_v2_coco_11_06_2017'
	#MODEL_NAME ='diego_object_detection_v1_07_2017'
	#MODEL_NAME ='faster_rcnn_inception_resnet_v2_atrous_coco_11_06_2017'
	MODEL_FILE = MODEL_NAME + '.tar.gz'
	DOWNLOAD_BASE = 'http://download.tensorflow.org/models/object_detection/'

	# Path to frozen detection graph. This is the actual model that is used for the object detection.
	PATH_TO_CKPT = MODEL_NAME + '/frozen_inference_graph.pb'

	# List of the strings that is used to add correct label for each box.
	PATH_TO_LABELS = os.path.join(model_path+'/data', 'mscoco_label_map.pbtxt')
	#PATH_TO_LABELS = os.path.join('data', 'mscoco_label_map.pbtxt')

	NUM_CLASSES = 90
	
	if if_down:
		opener = urllib.request.URLopener()
		opener.retrieve(DOWNLOAD_BASE + MODEL_FILE, MODEL_FILE)
		tar_file = tarfile.open(MODEL_FILE)
		for file in tar_file.getmembers():
			file_name = os.path.basename(file.name)
			if 'frozen_inference_graph.pb' in file_name:
        			tar_file.extract(file, os.getcwd())


	rospy.loginfo("begin initilize the tf...")
	self.detection_graph = tf.Graph()
	with self.detection_graph.as_default():
		od_graph_def = tf.GraphDef()
		with tf.gfile.GFile(PATH_TO_CKPT, 'rb') as fid:
			serialized_graph = fid.read()
			od_graph_def.ParseFromString(serialized_graph)
			tf.import_graph_def(od_graph_def, name='')

	label_map = label_map_util.load_labelmap(PATH_TO_LABELS)
	categories = label_map_util.convert_label_map_to_categories(label_map, max_num_classes=NUM_CLASSES, use_display_name=True)
	self.category_index = label_map_util.create_category_index(categories)
	
	# Subscribe to the registered depth image
        rospy.Subscriber(depth_image_topic, ROSImage, self.convert_depth_image)
        
        # Wait for the depth image to become available
        #rospy.wait_for_message('depth_image', ROSImage)
	
	self._sub = rospy.Subscriber(image_topic, ROSImage, self.callback, queue_size=1)	
	self._pub = rospy.Publisher('object_detection', ROSImage_C, queue_size=1)
	
	rospy.loginfo("initialization has finished...")
	
	
    def convert_depth_image(self, ros_image):
        # Use cv_bridge() to convert the ROS image to OpenCV format
        # The depth image is a single-channel float32 image
        self.depth_image = self._cv_bridge.imgmsg_to_cv2(ros_image, "32FC1")

        # Convert the depth image to a Numpy array
        self.depth_array = np.array(self.depth_image, dtype=np.float32)
        #print(self.depth_array)
        
    def callback(self,image_msg):
	if self.vfc<12:
		self.vfc=self.vfc+1
	else:
		self.callbackfun(image_msg)
		self.vfc=0	
		    	
    def box_depth(self,boxes,im_width, im_height):
	# Now compute the depth component
        depth=[]
	for row in boxes[0]:
		n_z = sum_z = mean_z = 0
		# Get the min/max x and y values from the ROI
		if row[0]<row[1]:
			min_x = row[0]*im_width
			max_x = row[1]*im_width
		else:
			min_x = row[1]*im_width
			max_x = row[0]*im_width
			
		if row[2]<row[3]:
			min_y = row[2]*im_height
			max_y = row[3]*im_height
		else:
			min_y = row[3]*im_height
			max_y = row[2]*im_height
		# Get the average depth value over the ROI
		for x in range(int(min_x), int(max_x)):
            		for y in range(int(min_y), int(max_y)):
                		try:
					z = self.depth_array[y, x]
				except:
					continue
                
				# Depth values can be NaN which should be ignored
				if isnan(z):
					z=6
					continue
				else:
					sum_z = sum_z + z
					n_z += 1 
			mean_z = sum_z / (n_z+0.01)
		depth.append(mean_z)
	return depth
    def callbackfun(self, image_msg):
	with self.detection_graph.as_default():
		with tf.Session(graph=self.detection_graph) as sess:
			 cv_image = self._cv_bridge.imgmsg_to_cv2(image_msg, "bgr8")
			 #cv_image = (self._cv_bridge.imgmsg_to_cv2(image_msg, "bgr8"))[300:450, 150:380]
			 pil_img = Image.fromarray(cv_image)			 
			 (im_width, im_height) = pil_img.size			 
			 # the array based representation of the image will be used later in order to prepare the
			 # result image with boxes and labels on it.
			 image_np =np.array(pil_img.getdata()).reshape((im_height, im_width, 3)).astype(np.uint8)
			 # Expand dimensions since the model expects images to have shape: [1, None, None, 3]
			 image_np_expanded = np.expand_dims(image_np, axis=0)
			 image_tensor = self.detection_graph.get_tensor_by_name('image_tensor:0')
			 # Each box represents a part of the image where a particular object was detected.
			 boxes = self.detection_graph.get_tensor_by_name('detection_boxes:0')
			 # Each score represent how level of confidence for each of the objects.
			 # Score is shown on the result image, together with the class label.
			 scores = self.detection_graph.get_tensor_by_name('detection_scores:0')
			 classes = self.detection_graph.get_tensor_by_name('detection_classes:0')
			 num_detections = self.detection_graph.get_tensor_by_name('num_detections:0')
			
			 # Actual detection.
			 (boxes, scores, classes, num_detections) = sess.run(
			 	[boxes, scores, classes, num_detections],
			 	feed_dict={image_tensor: image_np_expanded})
			 box_depths=self.box_depth(boxes,im_width,im_height)
			 print(box_depths)
			 # Visualization of the results of a detection.
			 vis_util.visualize_boxes_and_labels_on_image_array(
			 	image_np,
			 	np.squeeze(boxes),
			 	np.squeeze(classes).astype(np.int32),
			 	np.squeeze(scores),
			 	self.category_index,
			 	use_normalized_coordinates=True,
			 	line_thickness=8)
			 
			 ros_compressed_image=self._cv_bridge.cv2_to_compressed_imgmsg(image_np)
			 self._pub.publish(ros_compressed_image)
			
    
    def shutdown(self):
        rospy.loginfo("Stopping the tensorflow object detection...")
        rospy.sleep(1) 
        
if __name__ == '__main__':
    try:
        ObjectDetectionDemo()
        rospy.spin()
    except rospy.ROSInterruptException:
        rospy.loginfo("RosTensorFlow_ObjectDetectionDemo has started.")

下面我們來解釋主要的代碼邏輯

    def __init__(self):
	rospy.init_node('object_detection_demo')
	
	# Set the shutdown function (stop the robot)
        rospy.on_shutdown(self.shutdown)
        
        self.depth_image =None
        
        self.depth_array = None
        
        model_path = rospy.get_param("~model_path", "")
        image_topic = rospy.get_param("~image_topic", "")
        depth_image_topic = rospy.get_param("~depth_image_topic", "")
        if_down=False
        self.vfc=0
        
        self._cv_bridge = CvBridge()

以上代碼是ROS的標準初始化代碼，變量的初始化，及launch文件中參數的讀取

model_path定義object_detection所使用的模型路徑
image_topic訂閱的image主題
depth_image_topic訂閱的深度image主題

        # What model to download.
	#MODEL_NAME = 'ssd_mobilenet_v1_coco_11_06_2017'
	#MODEL_NAME='faster_rcnn_resnet101_coco_11_06_2017'
	MODEL_NAME ='ssd_inception_v2_coco_11_06_2017'
	#MODEL_NAME ='diego_object_detection_v1_07_2017'
	#MODEL_NAME ='faster_rcnn_inception_resnet_v2_atrous_coco_11_06_2017'
	MODEL_FILE = MODEL_NAME + '.tar.gz'
	DOWNLOAD_BASE = 'http://download.tensorflow.org/models/object_detection/'

	# Path to frozen detection graph. This is the actual model that is used for the object detection.
	PATH_TO_CKPT = MODEL_NAME + '/frozen_inference_graph.pb'

	# List of the strings that is used to add correct label for each box.
	PATH_TO_LABELS = os.path.join(model_path+'/data', 'mscoco_label_map.pbtxt')
	#PATH_TO_LABELS = os.path.join('data', 'mscoco_label_map.pbtxt')

	NUM_CLASSES = 90
	
	if if_down:
		opener = urllib.request.URLopener()
		opener.retrieve(DOWNLOAD_BASE + MODEL_FILE, MODEL_FILE)
		tar_file = tarfile.open(MODEL_FILE)
		for file in tar_file.getmembers():
			file_name = os.path.basename(file.name)
			if 'frozen_inference_graph.pb' in file_name:
        			tar_file.extract(file, os.getcwd())

以上代碼設置object_detection所使用的模型，及下載解壓相應的文件，這裏設置了一個if_down的開關，第一次運行的時候可以打開此開關下載，以後可以關掉，因爲下載的時間比較長，下載一次後面就無需再下載了。

self.detection_graph = tf.Graph()
	with self.detection_graph.as_default():
		od_graph_def = tf.GraphDef()
		with tf.gfile.GFile(PATH_TO_CKPT, 'rb') as fid:
			serialized_graph = fid.read()
			od_graph_def.ParseFromString(serialized_graph)
			tf.import_graph_def(od_graph_def, name='')

	label_map = label_map_util.load_labelmap(PATH_TO_LABELS)
	categories = label_map_util.convert_label_map_to_categories(label_map, max_num_classes=NUM_CLASSES, use_display_name=True)
	self.category_index = label_map_util.create_category_index(categories)

以上代碼是tensorflow的初始化代碼。

        # Subscribe to the registered depth image
        rospy.Subscriber(depth_image_topic, ROSImage, self.convert_depth_image)
        
        # Wait for the depth image to become available
        #rospy.wait_for_message('depth_image', ROSImage)
	
	self._sub = rospy.Subscriber(image_topic, ROSImage, self.callback, queue_size=1)	
	self._pub = rospy.Publisher('object_detection', ROSImage_C, queue_size=1)

以上代碼，我們定義此節點訂閱depth_image和image兩個topic，同時發佈一個名爲object_detection的Compressed Imagetopic

depth_image的回調函數是convert_depth_image

image的回調函數是callback

    def convert_depth_image(self, ros_image):
        # Use cv_bridge() to convert the ROS image to OpenCV format
        # The depth image is a single-channel float32 image
        self.depth_image = self._cv_bridge.imgmsg_to_cv2(ros_image, "32FC1")

        # Convert the depth image to a Numpy array
        self.depth_array = np.array(self.depth_image, dtype=np.float32)
        #print(self.depth_array)

以上是depth_image處理的回調函數，首先將depth_image主題轉換成opencv類型的，然後在將圖片轉換爲numpy數組，賦值給depth_array成員變量

    def callback(self,image_msg):
	if self.vfc<12:
		self.vfc=self.vfc+1
	else:
		self.callbackfun(image_msg)
		self.vfc=0

以上是image處理的回調函數，這裏控制視頻幀的處理頻率，主要是爲了減少運算量，可以靈活調整，最終視頻幀的處理是在callbackfun中處理的

    def box_depth(self,boxes,im_width, im_height):
	# Now compute the depth component
        depth=[]
	for row in boxes[0]:
		n_z = sum_z = mean_z = 0
		# Get the min/max x and y values from the ROI
		if row[0]<row[1]:
			min_x = row[0]*im_width
			max_x = row[1]*im_width
		else:
			min_x = row[1]*im_width
			max_x = row[0]*im_width
			
		if row[2]<row[3]:
			min_y = row[2]*im_height
			max_y = row[3]*im_height
		else:
			min_y = row[3]*im_height
			max_y = row[2]*im_height
		# Get the average depth value over the ROI
		for x in range(int(min_x), int(max_x)):
            		for y in range(int(min_y), int(max_y)):
                		try:
					z = self.depth_array[y, x]
				except:
					continue
                
				# Depth values can be NaN which should be ignored
				if isnan(z):
					z=6
					continue
				else:
					sum_z = sum_z + z
					n_z += 1 
			mean_z = sum_z / (n_z+0.01)
		depth.append(mean_z)
	return depth

以上代碼是深度數據計算，輸入參數boxes就是object_detection識別出來的物體的矩形標識rect，我們根據物體的矩形範圍，匹配深度圖片相應的區域，計算區域內的平均深度值作爲此物體的深度數據。返回一個與boxes想對應的1維數組

由於深度圖，和一般圖片是異步處理，可能出現幀不對應的問題，在這裏處理的比較簡單，沒有考慮此問題，可以通過緩存深度圖片的方式來解決，通過時間戳來匹配最近的深度圖片。

def callbackfun(self, image_msg):
	with self.detection_graph.as_default():
		with tf.Session(graph=self.detection_graph) as sess:
			 cv_image = self._cv_bridge.imgmsg_to_cv2(image_msg, "bgr8")
			 #cv_image = (self._cv_bridge.imgmsg_to_cv2(image_msg, "bgr8"))[300:450, 150:380]
			 pil_img = Image.fromarray(cv_image)			 
			 (im_width, im_height) = pil_img.size			 
			 # the array based representation of the image will be used later in order to prepare the
			 # result image with boxes and labels on it.
			 image_np =np.array(pil_img.getdata()).reshape((im_height, im_width, 3)).astype(np.uint8)
			 # Expand dimensions since the model expects images to have shape: [1, None, None, 3]
			 image_np_expanded = np.expand_dims(image_np, axis=0)
			 image_tensor = self.detection_graph.get_tensor_by_name('image_tensor:0')
			 # Each box represents a part of the image where a particular object was detected.
			 boxes = self.detection_graph.get_tensor_by_name('detection_boxes:0')
			 # Each score represent how level of confidence for each of the objects.
			 # Score is shown on the result image, together with the class label.
			 scores = self.detection_graph.get_tensor_by_name('detection_scores:0')
			 classes = self.detection_graph.get_tensor_by_name('detection_classes:0')
			 num_detections = self.detection_graph.get_tensor_by_name('num_detections:0')
			
			 # Actual detection.
			 (boxes, scores, classes, num_detections) = sess.run(
			 	[boxes, scores, classes, num_detections],
			 	feed_dict={image_tensor: image_np_expanded})
			 box_depths=self.box_depth(boxes,im_width,im_height)
			 # Visualization of the results of a detection.
			 vis_util.visualize_boxes_and_labels_on_image_array(
			 	image_np,
			 	np.squeeze(boxes),
			 	np.squeeze(classes).astype(np.int32),
			 	np.squeeze(scores),
			 	self.category_index,
                                box_depths,
			 	use_normalized_coordinates=True,
			 	line_thickness=8)
			 
			 ros_compressed_image=self._cv_bridge.cv2_to_compressed_imgmsg(image_np)
			 self._pub.publish(ros_compressed_image)

以上代碼是圖片的回調函數，主要是將image消息轉換爲opencv格式，然後再轉換成numpy數組，調用object_detection來識別圖片中的物體，再調用 vis_util.visualize_boxes_and_labels_on_image_array將識別出來的物體在圖片上標識出來，最後將處理後的圖片發佈爲compressed_image類型的消息
現在我們只需要簡單修改一下Object_detection/utils目錄下的visualization_utils.py文件，就可以顯示深度信息

def visualize_boxes_and_labels_on_image_array(image,
                                              boxes,
                                              classes,
                                              scores,
                                              category_index,
	                                      box_depths=None,
                                              instance_masks=None,
                                              keypoints=None,
                                              use_normalized_coordinates=False,
                                              max_boxes_to_draw=20,
                                              min_score_thresh=.5,
                                              agnostic_mode=False,
                                              line_thickness=4):

在visualize_boxes_and_labels_on_image_array定義中增加box_depths=None,缺省值爲None

  for i in range(min(max_boxes_to_draw, boxes.shape[0])):
    if scores is None or scores[i] > min_score_thresh:
      box = tuple(boxes[i].tolist())
      if instance_masks is not None:
        box_to_instance_masks_map[box] = instance_masks[i]
      if keypoints is not None:
        box_to_keypoints_map[box].extend(keypoints[i])
      if scores is None:
        box_to_color_map[box] = 'black'
      else:
        if not agnostic_mode:
          if classes[i] in category_index.keys():
            class_name = category_index[classes[i]]['name']
          else:
            class_name = 'N/A'
          display_str = '{}: {}%'.format(
              class_name,
              int(100*scores[i]))
        else:
          display_str = 'score: {}%'.format(int(100 * scores[i]))
          
        #modify by diego robot
        if box_depths!=None:
        	display_str=display_str+"\ndepth: "+str(box_depths[i])
        	global depth_info
        	depth_info=True
        else:
        	global depth_info
        	depth_info=False
        #######################
        box_to_display_str_map[box].append(display_str)
        if agnostic_mode:
          box_to_color_map[box] = 'DarkOrange'
        else:
          box_to_color_map[box] = STANDARD_COLORS[
              classes[i] % len(STANDARD_COLORS)]

在第一個for循環裏面，的box_to_display_str_map[box].append(display_str)一句前面增加如上diego robot修改部分代碼

depth_info=False

在文件的開頭部分定義全局變量,表示是否有深度信息傳遞進來

 # Reverse list and print from bottom to top.
  for display_str in display_str_list[::-1]: 
    text_width, text_height = font.getsize(display_str)    
    #modify by william
    global depth_info
    if depth_info:
  	text_height=text_height*2
    ###################
    	
    margin = np.ceil(0.05 * text_height)
    draw.rectangle(
        [(left, text_bottom - text_height - 2 * margin), (left + text_width,
                                                          text_bottom)],
        fill=color)
    draw.text(
        (left + margin, text_bottom - text_height - margin),
        display_str,
        fill='black',
        font=font)
    text_bottom -= text_height - 2 * margin

在draw_bounding_box_on_image函數的margin = np.ceil(0.05 * text_height)一句前面增加如上diego robot修改部分代碼

3.launch文件

<launch>
   <node pkg="diego_tensorflow" name="ObjectDetectionDemo" type="ObjectDetectionDemo.py" output="screen">

       <param name="image_topic" value="/camera/rgb/image_raw" />

       <param name="depth_image_topic" value="/camera/depth_registered/image" />

       <param name="model_path" value="$(find diego_tensorflow)/scripts/object_detection" />

   </node>

</launch>

launch文件中定義了相應的參數，image_topic，depth_image_topic，model_path，讀者可以根據自己的實際情況設定

4.啓動節點
啓動openni

roslaunch diego_vision openni_node.launch

啓動object_detection

roslaunch diego_tensorflow object_detection_demo.launch

5.通過手機APP訂閱object_detection
我們只需要設置一個image_topic爲object_detection,既可以在手機上看到物體識別的效果

Diego1# 機器視覺 -物體識別和定位

diego1# 突破局域網限制，實現互聯網範圍內控制你的ROS機器人

Flask Web開發--1.多租戶SPC質量分析系統設計

ROS機器人Diego 1#製作（十五）機械臂的控制---通過鍵盤控制機械臂舵機

Flask Web開發--3.實時SPC控制圖的開發設計

Flask Web開發--2.多租戶saas用戶權限管理

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結