今天測試上位機發一個較長的ros消息,硬件不響應。
常在河邊走,哪能不溼鞋?
硬件的條件代碼竟然又誤寫成了這樣:
if(a=0){
//...
}
=與== 看不清,真是要命,又給我有限的生命造成了浪費!!
以爲找到了真兇,出乎意料,問題依舊。
反覆測試吧,一定會找到癥結所在:
1、
ros消息改短有效,
加長不響應,
2、
難道傳輸干擾,串口傳輸過程出錯?
改用短的屏蔽usb連接線,問題依舊。
3、
更改數據長度找個臨界點,
發現不同數據臨界點不固定。
4、
隨機數據,
時好時壞。
。。。
百般無奈,不得不翻翻庫的源碼了。
懷疑是校驗部分不過。
校驗算法很簡單,
https://blog.csdn.net/qq_38288618/article/details/102931684
Message Length Checksum = 255 - ((Message Length High Byte +
Message Length Low Byte) % 256 )Message Data Checksum = 255 - ((Topic ID Low Byte +
Topic ID High Byte +
Data byte values) % 256)
即長度的兩個字節及所有數據字節求和,求餘 256 ,是哪裏出錯呢
node_handle.h
接收包數據相關源碼
virtual int spinOnce()
{
/* restart if timed out */
uint32_t c_time = hardware_.time();
if ((c_time - last_sync_receive_time) > (SYNC_SECONDS * 2200))
{
configured_ = false;
}
/* reset if message has timed out */
if (mode_ != MODE_FIRST_FF)
{
if (c_time > last_msg_timeout_time)
{
mode_ = MODE_FIRST_FF;
}
}
/* while available buffer, read data */
while (true)
{
// If a timeout has been specified, check how long spinOnce has been running.
if (spin_timeout_ > 0)
{
// If the maximum processing timeout has been exceeded, exit with error.
// The next spinOnce can continue where it left off, or optionally
// based on the application in use, the hardware buffer could be flushed
// and start fresh.
if ((hardware_.time() - c_time) > spin_timeout_)
{
// Exit the spin, processing timeout exceeded.
return SPIN_TIMEOUT;
}
}
int data = hardware_.read();
if (data < 0)
break;
checksum_ += data;
if (mode_ == MODE_MESSAGE) /* message data being recieved */
{
message_in[index_++] = data;
bytes_--;
if (bytes_ == 0) /* is message complete? if so, checksum */
mode_ = MODE_MSG_CHECKSUM;
}
else if (mode_ == MODE_FIRST_FF)
{
if (data == 0xff)
{
mode_++;
last_msg_timeout_time = c_time + SERIAL_MSG_TIMEOUT;
}
else if (hardware_.time() - c_time > (SYNC_SECONDS * 1000))
{
/* We have been stuck in spinOnce too long, return error */
configured_ = false;
return SPIN_TIMEOUT;
}
}
else if (mode_ == MODE_PROTOCOL_VER)
{
if (data == PROTOCOL_VER)
{
mode_++;
}
else
{
mode_ = MODE_FIRST_FF;
if (configured_ == false)
requestSyncTime(); /* send a msg back showing our protocol version */
}
}
else if (mode_ == MODE_SIZE_L) /* bottom half of message size */
{
bytes_ = data;
index_ = 0;
mode_++;
checksum_ = data; /* first byte for calculating size checksum */
}
else if (mode_ == MODE_SIZE_H) /* top half of message size */
{
bytes_ += data << 8;
mode_++;
if(bytes_>=sizeof(message_in)){mode_ = MODE_FIRST_FF;}/*zzzzzzzzzzzzz-----------*/
}
else if (mode_ == MODE_SIZE_CHECKSUM)
{
if ((checksum_ % 256) == 255)
mode_++;
else
mode_ = MODE_FIRST_FF; /* Abandon the frame if the msg len is wrong */
}
else if (mode_ == MODE_TOPIC_L) /* bottom half of topic id */
{
topic_ = data;
mode_++;
checksum_ = data; /* first byte included in checksum */
}
else if (mode_ == MODE_TOPIC_H) /* top half of topic id */
{
topic_ += data << 8;
mode_ = MODE_MESSAGE;
if (bytes_ == 0)
mode_ = MODE_MSG_CHECKSUM;
}
else if (mode_ == MODE_MSG_CHECKSUM) /* do checksum */
{
mode_ = MODE_FIRST_FF;
if ((checksum_ % 256) == 255)
{
if (topic_ == TopicInfo::ID_PUBLISHER)
{
requestSyncTime();
negotiateTopics();
last_sync_time = c_time;
last_sync_receive_time = c_time;
return SPIN_ERR;
}
else if (topic_ == TopicInfo::ID_TIME)
{
syncTime(message_in);
}
else if (topic_ == TopicInfo::ID_PARAMETER_REQUEST)
{
req_param_resp.deserialize(message_in);
param_recieved = true;
}
else if (topic_ == TopicInfo::ID_TX_STOP)
{
configured_ = false;
}
else
{
if (subscribers[topic_ - 100])
subscribers[topic_ - 100]->callback(message_in);
}
}else{
//zzz test--------------------------------------------------------------這裏測試,不進行校驗
//delay(1000);
//if (subscribers[topic_ - 100])
// subscribers[topic_ - 100]->callback(message_in);
}
}
}
/* occasionally sync time */
if (configured_ && ((c_time - last_sync_time) > (SYNC_SECONDS * 500)))
{
requestSyncTime();
last_sync_time = c_time;
}
return SPIN_OK;
}
觀摩代碼,米有問題,
動動手術,更改代碼,跳過校驗,回傳數據對比也沒有出錯。
到底問題出在哪裏呢?
仔細觀察,發現存儲求和用的數據類型是 int。原來大神也大意了
line 197
int checksum_;
問題就在這裏了。
int 類型取值範圍有正有負,
作爲驗證求和,不斷累加勢必會變爲負數。
負數求餘和正數求餘是不同的。
參考百度解釋
https://baike.baidu.com/item/%E5%8F%96%E6%A8%A1%E8%BF%90%E7%AE%97/10739384?share_fr=pc_sina
我提交rosserial庫的這個錯誤詳見github
communication problem #474
https://github.com/ros-drivers/rosserial/pull/474
checksum_ type error
line 197 int checksum_; //There must be some Errors caused by modular
operation of negative number.zzz.… } else if (mode_ == MODE_MSG_CHECKSUM) /* do checksum / { mode_ =
MODE_FIRST_FF; if ((checksum_ % 256) == 255)
/*--------------------------this place------------------------------
*/ { …checksum_ keeps accumulating and becomes negative. Communication
failure when this condition is not met!
其實代碼中有些不顯眼的細節往往會疏忽大意,大神也躲不過。
又廢了一天的生命,終於測出是官方的一個錯誤。
人非聖賢孰能無過?
好吧,必須原諒自己與前輩。
希望花費我的1天,能節省1000個道友的1天!