相對於傳統的2D目標檢測,最近一段時間有機會接觸到了3D打印設備,準備對3D打印的產品進行實時目標檢測,下面分享一些我學到的經驗。
硬件:ip攝像頭
系統:Windows
軟件:open cv
數據感知模塊
攝像頭採集需要安裝opencv_python
下載連接:https://www.lfd.uci.edu/~gohlke/pythonlibs/#opencv
開始要先確定好所使用的攝像頭。
對攝像頭的ip修改成自己PC的ip
URL:Uniform Resource Locator,“統一資源定位符”,可以從互聯網上得到資源的位置並訪問,是互聯網上標準資源的地址。
使用界面(瀏覽器)調用攝像頭,把攝像頭的ip輸入到頁面上,登錄攝像頭界面
然後對攝像頭的每一幀圖片進行採集和保存,如果是視頻流的話,需要修改RTSP(實時傳輸協議),修改後最好重啓一下攝像頭。
#再將以下代碼重新運行一下
import cv2
url = 'rtsp://admin:[email protected]:554/11'
cap = cv2.VideoCapture(url)
while(cap.isOpened()):
# 獲取一幀
ret, frame = cap.read()
# 顯示結果幀
cv2.imshow('frame',frame)
if cv2.waitKey(1) & 0xFF == ord('q'):
break
# 完成後,釋放捕獲
cap.release()
cv2.destroyAllWindows()
數據集的製作
目標檢測的數據類型包括2D的RGB圖像,2.5D的RGB-D圖像以及3D的點雲。
RGB圖像高像素的特徵,可以捕捉到更多的細節,但是缺乏3D信息,可以用比較成熟的CNN算法實現
RGBD圖像具有3D信息,相對稠密,但受傳感器影響大。可以結合相機內參轉換爲3D點雲,因此其既可以用CNN,也可以用基於點雲的DNN。
點雲具有精確的3D信息,但太過稀疏。其中點雲的表現形式,主要有體素化(voxelize)(用於訓練3D-CNN)、原始點雲(raw)(使用針對點雲的DNN,例如PointNet、PointCNN等)、前視圖(Front View)(對垂直空間進行劃分,得到多層)、鳥瞰圖(Bird Eye View, BEV)(使用傳統CNN)。
模型訓練
檢測demo
from object_detection.utils import visualization_utils as vis_util
from object_detection.utils import label_map_util
from distutils.version import StrictVersion
import tensorflow as tf
import numpy as np
import cv2
if StrictVersion(tf.__version__) < StrictVersion('1.9.0'):
raise ImportError('Please upgrade your TensorFlow installation to v1.9.* or later!')
# 開啓攝像頭
cap = cv2.VideoCapture(0)
# 添加模型位置和標籤配置文件位置
PATH_TO_FROZEN_GRAPH = ''
PATH_TO_LABELS = ''
# 載入模型
detection_graph = tf.Graph()
with detection_graph.as_default():
od_graph_def = tf.GraphDef()
with tf.gfile.GFile(PATH_TO_FROZEN_GRAPH, 'rb') as fid:
serialized_graph = fid.read()
od_graph_def.ParseFromString(serialized_graph)
tf.import_graph_def(od_graph_def, name='')
category_index = label_map_util.create_category_index_from_labelmap(PATH_TO_LABELS, use_display_name=True)
with detection_graph.as_default():
with tf.Session(graph=detection_graph) as sess:
while True:
ret, image_np = cap.read()
# Expand dimensions since the model expects images to have shape: [1, None, None, 3]
image_np_expanded = np.expand_dims(image_np, axis=0)
image_tensor = detection_graph.get_tensor_by_name('image_tensor:0')
# Each box represents a part of the image where a particular object was detected.
boxes = detection_graph.get_tensor_by_name('detection_boxes:0')
# Each score represent how level of confidence for each of the objects.
# Score is shown on the result image, together with the class label.
scores = detection_graph.get_tensor_by_name('detection_scores:0')
classes = detection_graph.get_tensor_by_name('detection_classes:0')
num_detections = detection_graph.get_tensor_by_name('num_detections:0')
# Actual detection.
(boxes, scores, classes, num_detections) = sess.run(
[boxes, scores, classes, num_detections],
feed_dict={image_tensor: image_np_expanded})
# Visualization of the results of a detection.
vis_util.visualize_boxes_and_labels_on_image_array(
image_np, np.squeeze(boxes),
np.squeeze(classes).astype(np.int32),
np.squeeze(scores), category_index,
use_normalized_coordinates=True,
line_thickness=8)
cv2.imshow('object detection', image_np)
if cv2.waitKey(25) & 0xFF == ord('q'):
cv2.destroyAllWindows()
break
cap.release()
cv2.destroyAllWindows()