1 需求
抓取整個杭州市的百度/騰訊街景地圖及其時光機功能(實時圖片和歷史圖片),進行圖像分析。
2 分析
百度地圖街景模式下,點擊向前可發現,街景圖片是異步加載的,我們可以打開百度地圖的街景模式,f12打開開發者模式,清空所有響應,並點擊向前,可以看到產生了很多的圖片請求
2.1 街景request簡要分析
本文以杭州市餘杭區文一西路海創園附近處(由西向東)的街景爲例仔細分析這些請求的作用:
首先,第一條請求:
https://mapsv0.bdimg.com/?qt=qsdata&x=13361258.73664768&y=3518572.1440338427&time=201709&mode=day&px=1336124156&py=351856780&pz=14.89&roadid=eeaa41-bd79-2cf9-3491-2ca35d&type=street&action=0&pc=1&auth=8Nv8G8DBLbO2F1vLvHNAgXOTz4UFPbxHuxHLxBLVHTRt1qo6DF%3D%3DCcvY1SGpuztAFwWv1GgvPUDZYOYIZuVt1cv3uHxtOmm0mEb1PWv3GuxNVt%3DErpTgZp1GHJMP6V8%40aDcEWe1GD8zv7u%40ZPuVteuEthjzgjyBKOBEEUWWOxtx77INHu%3D%3D8x35&udt=20190619&fn=jsonp.p30899897
此請求也可簡化爲(爲簡便,以下請求均不帶auth參數,不影響結果獲取):
https://mapsv0.bdimg.com/?qt=qsdata&x=13361258.73664768&y=3518572.1440338427&time=201709&mode=day
作用:根據地圖上點擊的位置,生成百度地圖座標x和y值,再得到服務器的json響應,(如下id即爲該位置的百度街景ID,也就是後續要用到的panoID)
第二條請求:
https://mapsv0.bdimg.com/?qt=sdata&sid=09025200121709031616142855K&pc=1
作用:該請求包含了panoId參數,返回該位置的街景相關信息(附近panoID,以及該位置的歷史圖片,拍攝時期等等),下文會詳細分析。json響應如下圖所示:
第三條請求:
https://mapsv0.bdimg.com/?qt=guide&sid=09025200121709031616142855K&fn=jsonp29109114
作用: 獲取該位置附近的公司或學校。json響應如下圖所示:
第四條請求:
https://mapsv0.bdimg.com/?qt=pdata&sid=09025200121709031616142855K&pos=0_0&z=1&udt=20190619
作用: 該請求返回該位置街景的全景圖片。如下所示:
第五條請求:
https://mapsv0.bdimg.com/?qt=pr3d&fovy=35&quality=80&panoid=09025200121709031616142855K&heading=72.801&pitch=0&width=198&height=108
作用:該請求返回一個當前視角的小尺寸圖片,如圖所示:
其中的參數解釋,網上找到了一個圖片,有些參數未親自驗證。
由於小尺寸圖片放大之後比較模糊,所以不對其進行獲取。
第六條請求(這是一組大圖請求):
https://mapsv0.bdimg.com/?qt=pdata&sid=09025200121709031616142855K&pos=1_4&z=4
https://mapsv0.bdimg.com/?qt=pdata&sid=09025200121709031616142855K&pos=2_4&z=4
https://mapsv0.bdimg.com/?qt=pdata&sid=09025200121709031616142855K&pos=1_5&z=4
https://mapsv0.bdimg.com/?qt=pdata&sid=09025200121709031616142855K&pos=2_5&z=4
https://mapsv0.bdimg.com/?qt=pdata&sid=09025200121709031616142855K&pos=1_6&z=4
https://mapsv0.bdimg.com/?qt=pdata&sid=09025200121709031616142855K&pos=2_6&z=4
https://mapsv0.bdimg.com/?qt=pdata&sid=09025200121709031616142855K&pos=1_7&z=4
https://mapsv0.bdimg.com/?qt=pdata&sid=09025200121709031616142855K&pos=2_7&z=4
作用:獲取從左至右的街景大圖,其中參數pos=1_*表示高角度(就是我們需要獲取的),從左至右一共是4張(高角度),這樣就拼成了我們在百度街景中所看到的一整張大圖;pos=2_*表示低角度。
因此我們要得到一座城市、一條街道的街景圖片,首先要獲取該城市、該街道所有位置的街景ID,然後通過模擬上述圖片請求即可,但是如何獲取杭州市所有街道的街景ID呢?
網上查了些資料,有人提出的想法是:
- 1、暴力循環panoID,錯誤就忽略,正確就返回結果。
- 2、在一條道路尋找一個種子panoID,然後爬取整條道路的所有圖片。
- 3、根據百度地圖的座標,設置一個區域,遍歷整個區域的所有座標,正確就返回panoID,錯誤就不處理。
粗略一看感覺很有道理,但仔細一分析,不免罵道:娘希匹! 首先,第一條,暴利循環相當低效,由於百度沒有提供整條街道的經緯度接口,也沒有提供獲得整條街道的所有panoID值的接口(至少我沒找到),所以如果根據網友的意思,在給定起始panoID情況下循環試錯獲取下一個panoID需要發送上萬次、數十萬次請求,很可能沒獲取到下一個ID就已經被百度封IP了,第三條,我確實想設置一個區域,但如何遍歷該區域所有座標呢?如何保證街景圖片是同一條街道且連續的呢?有大牛想到解決辦法的話煩請告知!再來看第二條,似乎可取,重點是如何獲取下一個panoID。答案還是在上述"請求二"中。
2.2 請求2詳細分析
我們可以看到,Roads[0].Panos標籤下,包含有多個panoID,我們可以在街景模式下點擊向前10米,瀏覽器又加載了下個panoID(09025200121709031616155125K)的圖片,這裏有一個規律,向前20米左右會加載當前panoID的下下個panoID圖片,以此類推。
當再點擊向前,加載到panoID爲09025200121709031616211215K的圖片時,我們再看它的"請求二"的json響應如下圖所示:
如圖所示,Roads[0]爲當前位置節點信息(IsCurrent: 1),Roads是一個數組,通常會有多個元素作爲候選節點,這裏僅有一個,即Roads[1],Roads[1].ID和Links[0].RID對應,繼續點擊向前,發現加載的正是我們從圖中所見的panoID:09025200121709031616224035K。我們再觀察09025200121709031616224035K這個位置的"請求二"響應,看到又有當前位置下的前向panoID集合了。我們再放心大膽地嘗試幾次點擊,發現是有這個規律。緊鎖的眉頭漸漸舒緩,是時候喝杯咖啡了。
coffee歸來,思路再捋一捋。由上,如何根據給定的起始panoID爬取所在街道的所有街景圖片?
業務邏輯流程梳理大致如下:
- 選擇合適的起始panoID(可將其存入配置文件或數據庫);
- 根據panoID拼接上述"請求二",併發送請求獲取響應;
- json解析該響應,獲取當前位置附近的前向panoID集合;
- 遍歷該集合,拼接每個位置的圖片下載鏈接(4張圖片)並下載;
- 當遍歷到集合最後一個元素時,還需解析Roads數組中的非當前元素,並將其對應到Links數組中的panoID,我們姑且稱之爲錨節點;
- 用該錨節點的panoID代入STEP 2中,這樣整個邏輯就形成了遞歸函數。
2.3 圖片存儲
在這裏可以根據不同的業務需求採用不同的數據庫,關係型或非關係型均可。某種程度上非關係型列式存儲可能更好,因爲當我們爬取一個城市所有街道的街景時,有的位置有歷史數據,有的則沒有,因此大規模數據集下可利用列式存儲的彈性伸縮實現數據的高效存取。但考慮到後續圖像分析時,可能需要建立空間連續(離散)位置的圖像模型,這又更適合採用關係型數據庫來存取。以mysql與Hbase爲例,再回顧一下典型的關係型數據庫和非關係型數據庫之間的區別:
2.3.1 HBase與MySQL的區別
屬性 | HBase | MySQL |
---|---|---|
存儲 | 按列存儲,可靈活增加列,列爲空時不佔存儲空間 | 按行存儲 |
伸縮擴展性 | 支持 | 需要第三方中間層支持 |
高併發讀寫 | 支持 | 不支持 |
條件查詢 | 只支持按rowkey查詢 | 支持 |
數據類型 | 字符串類型 | 多種類型 |
數據操作 | 只有查詢、插入、刪除、清空等 | 還包括各種連接操作等 |
數據更新 | 實際是插入新的數據,多版本 | 替換修改 |
2.3.2 二進制流的存儲
下載幾張圖片看下大小,然後決定選用何種字段類型,經比較分析,得如下要點:
- 圖片轉爲二進制字節流
- 表中圖片字段設爲blob型
MySQL 存儲blob型數據可參閱此處
3 代碼
對應上述業務邏輯,編寫不同步驟的代碼
3.1 建表及讀表記錄
建表,並將起始panoID存入數據庫。建表語句如下:
CREATE TABLE `test`.`baidu_pano` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`pano_id` varchar(32) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL COMMENT '街景ID',
`name` varchar(32) CHARACTER SET utf8 COLLATE utf8_general_ci NULL DEFAULT NULL COMMENT '街道名稱',
`lati` double(32, 0) NULL DEFAULT NULL COMMENT '緯度',
`lonti` double(32, 0) NULL DEFAULT NULL COMMENT '經度',
`direction` bit(1) NULL DEFAULT NULL COMMENT '0:由西向東;1:由東向西',
PRIMARY KEY USING BTREE (`id`)
) ENGINE = InnoDB AUTO_INCREMENT = 8 CHARACTER SET = utf8 COLLATE = utf8_general_ci COMMENT = 'InnoDB free: 30720 kB' ROW_FORMAT = Compact;
插入panoID如圖所示:
在coding之前先在builder.gradle文件中一併引入所有依賴:
ext {
commonsDbutilsVersion = '1.6'
druidVersion = '1.0.18'
mysqlConnectorVersion = '5.1.37'
}
dependencies {
compile "com.cetiti.ddc:ddc-core:0.1.11-alpha1"
compile "commons-dbutils:commons-dbutils:${commonsDbutilsVersion}"
compile 'org.apache.logging.log4j:log4j-core:2.8.2'
compile "com.alibaba:druid:${druidVersion}"
compile "mysql:mysql-connector-java:${mysqlConnectorVersion}"
compile 'org.postgresql:postgresql:42.2.5'
testCompile 'junit:junit:4.9'
compile 'org.apache.commons:commons-pool2:2.4.2'
compile 'redis.clients:jedis:2.9.0'
}
// 除此之外,還要引入snakeyaml-1.18.jar、fasjson、okhttp等jar包,對應版本可以自行搜索,本項目因依賴了公司自研的jar包,所以未顯式說明
我們需編寫一個從MySQL讀取記錄並封裝結果集的工具類MySQLHelper ,源碼如下:
public class MySQLHelper {
private static final Logger logger = Logger.getLogger(MySQLHelper.class);
private static QueryRunner runner;
private static ResultSetHandler<HashMap<String,Object>> h = new ResultSetHandler<HashMap<String,Object>>() {
@Override
public HashMap<String,Object> handle(ResultSet rs) throws SQLException {
if (!rs.next()) {
return null;
}
ResultSetMetaData meta = rs.getMetaData();
int cols = meta.getColumnCount();
HashMap<String,Object> result = new HashMap<>(16);
for (int i = 0; i < cols; i++) {
result.put(meta.getColumnLabel(i + 1),rs.getObject(i + 1));
}
return result;
}
};
public MySQLHelper(){
runner = MySQLPool.getInstance().getRunner();
}
@SuppressWarnings("unchecked")
public List<Pano> getAllPanoFromDB(){
try {
String qSql = "select pano_id as panoId, name from baidu_pano limit 10";
@SuppressWarnings("rawtypes")
BeanListHandler blh = new BeanListHandler(Pano.class);
return (ArrayList<Pano>) runner.query(qSql,blh);
} catch (SQLException e) {
logger.error("getAllIpFromDB", e);
return null;
}
}
}
其中連接池MySQLPool代碼如下:
public class MySQLPool {
private static MySQLPool instance = null;
private static final Logger logger = Logger.getLogger(MySQLPool.class);
private DruidDataSource dds;
private QueryRunner runner;
private Properties properties;
public QueryRunner getRunner() {
return this.runner;
}
private MySQLPool() {
ConfigParser parser = ConfigParser.getInstance();
String dbAlias = "mysql-data";
Map<String, Object> dbConfig = parser.getModuleConfig("database");
Map<String, Object> mysqlConfig = (Map)parser.assertKey(dbConfig, dbAlias, "database");
Properties properties = new Properties();
String url = (String)parser.assertKey(mysqlConfig, "url", "database." + dbAlias);
String username = (String)parser.assertKey(mysqlConfig, "username", "database." + dbAlias);
String password = (String)parser.assertKey(mysqlConfig, "password", "database." + dbAlias);
properties.setProperty("url", url);
properties.setProperty("username", username);
properties.setProperty("password", password);
properties.setProperty("maxActive", "20");
this.properties = properties;
try {
this.dds = (DruidDataSource)DruidDataSourceFactory.createDataSource(properties);
} catch (Exception var10) {
logger.error("Failed to connect data MySQL db,Exception:{}", var10);
}
this.runner = new QueryRunner(this.dds);
}
public static MySQLPool getInstance() {
if (instance == null) {
Class var0 = MySQLPool.class;
synchronized(MySQLPool.class) {
if (instance == null) {
instance = new MySQLPool();
}
}
}
return instance;
}
}
ConfigParser類爲自定義的配置文件讀取解析類,源碼如下:
import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.util.Map;
import org.yaml.snakeyaml.Yaml;
public class ConfigParser {
private static final Logger logger = Logger.getLogger(ConfigParser.class);
private static ConfigParser instance = new ConfigParser();
private static final String CONFIG_FILENAME = "config.yml";
private Yaml yaml = null;
private Object config;
private ConfigParser() {
if (this.yaml == null) {
this.yaml = new Yaml();
}
File f = ResourceUtils.loadResouces("config.yml");
try {
this.config = this.yaml.load(new FileInputStream(f));
logger.info("file {} is loaded", f.getAbsoluteFile());
} catch (FileNotFoundException var3) {
var3.printStackTrace();
}
}
public static ConfigParser getInstance() {
return instance;
}
public Object getConfig() {
return this.config;
}
public Map<String, Object> getModuleConfig(String name) {
return this.getModuleConfig(name, this.config);
}
public Map<String, Object> getModuleConfig(String name, Object parent) {
Map<String, Object> rtn = (Map)((Map)parent).get(name);
return rtn;
}
public Object assertKey(Map<String, Object> config, String key, String parent) {
Object value = config.get(key);
if (value == null) {
logger.error("{}.{} is a mandatory configuration", new Object[]{parent, key});
System.exit(0);
}
return value;
}
public Object getValue(Map<String, Object> config, String key, Object def, String parent) {
Object value = config.get(key);
if (value == null) {
logger.warn("{}.{} is't configured, default value {} is used", new Object[]{parent, key, def});
config.put(key, def);
return def;
} else {
return value;
}
}
public void dumpConfig() {
System.out.println(this.yaml.dump(this.config));
}
}
3.2 配置文件
配置文件config.yml中配置如下:
apps:
## 基本屬性
spider-baidupano:
common:
group: ipproxy-xundaili-zhg
cron: "0 0 0 */1 * ?"
firstpage: 1
totalpages: 1
distribute: false
fixed: true
order: desc
## 數據源
source:
baseurl: https://mapsv0.bdimg.com/?qt=pdata&sid=09024600011606211814253666L&pos=1_4&z=4
listpageregex: "https://mapsv0\\.bdimg\\.com/\\?qt\\=sdata"
storage:
## dbType: MySQL HBase Hive MongoDB Kafka PostgreSQL
dbtype: MySQL
dbalias: mysql-data
## 圖片存儲位置
piclocation: \baidupano
filter:
searchfilter: true
contentfilter: false
## 反爬蟲
antirobot:
ipproxy: false
listipproxy: false
sleeptime: 900000
analysis:
sentiment: false
distribute:
scheduler: com.demo.ddc.scheduler.MemberScheduler
# dbtype: redis
# dbalias: redis
database:
mysql-data:
url: "jdbc:mysql://localhost:3306/test?useUnicode=true&characterEncoding=utf8"
username: root
password: '123456'
3.2 圖片下載及存儲邏輯
首先編寫請求發送工具類:
/**
* @author [email protected]
* @since 0.1.0
**/
public class OkHttpUtils {
private static volatile OkHttpClient okHttpClient;
private OkHttpUtils(){
}
public static OkHttpClient getInstance(){
if (null==okHttpClient){
synchronized (OkHttpUtils.class){
if (okHttpClient==null){
okHttpClient = new OkHttpClient();
return okHttpClient;
}
}
}
return okHttpClient;
}
}
圖片下載及文件路徑處理工具類:
/**
* @author Huigen Zhang
* @since 2018-10-19 18:53
**/
public class PicLoadUtils {
private static SpiderConfig spiderConfig;
private final static String WINDOWS_DISK_SYMBOL = ":";
private final static String WINDOWS_PATH_SYMBOL = "\\";
public PicLoadUtils(){
String spiderId = "spider-baidupano";
spiderConfig = new SpiderConfig(spiderId);
}
private String getFileLocation(String storeDirName){
String separator = "/";
ConfigParser parser = ConfigParser.getInstance();
String spiderId = "spider-googlemap";
SpiderConfig spiderConfig = new SpiderConfig(spiderId);
Map<String,Object> storageConfig = (Map<String, Object>) parser.assertKey(spiderConfig.getSpiderConfig(),"storage", spiderConfig.getConfigPath());
String fileLocation = (String) parser.getValue(storageConfig,"piclocation",null,spiderConfig.getConfigPath()+".storage");
String pathSeparator = getSeparator();
String location;
if(fileLocation!=null){
//先區分系統環境,再判斷是否爲絕對路徑
if (separator.equals(pathSeparator)){
//linux
if(fileLocation.startsWith(separator)){
location = fileLocation + pathSeparator + "data";
}else {
location = System.getProperty("user.dir") + pathSeparator + fileLocation;
}
location = location.replace("//", pathSeparator);
return location;
}else {
//windows
if (fileLocation.contains(WINDOWS_DISK_SYMBOL)){
//絕對路徑
location = fileLocation + pathSeparator + "data";
}else {
//相對路徑
location = System.getProperty("user.dir") + pathSeparator + fileLocation;
}
location = location.replace("\\\\",pathSeparator);
}
}else{
//默認地址
location = System.getProperty("user.dir") + pathSeparator + storeDirName;
}
return location;
}
public String dateToPath(long timestamp) {
String pathSeparator = getSeparator();
Calendar calendar = Calendar.getInstance();
calendar.setTimeInMillis(timestamp*1000);
String year = String.format("%04d",calendar.get(Calendar.YEAR));
String month = String.format("%02d",calendar.get(Calendar.MONTH)+1);
String date = String.format("%02d",calendar.get(Calendar.DATE));
return year + pathSeparator + month + pathSeparator + date;
}
private String getSeparator(){
String pathSeparator = File.separator;
if(!WINDOWS_PATH_SYMBOL.equals(File.separator)){
pathSeparator = "/";
}
return pathSeparator;
}
public void mkDir(File file){
String directory = file.getParent();
File myDirectory = new File(directory);
if (!myDirectory.exists()) {
myDirectory.mkdirs();
}
}
public String downloadPic(String url, String panoId){
okhttp3.Request request = new okhttp3.Request.Builder()
.url(url)
.build();
Response response = null;
InputStream inputStream = null;
FileOutputStream out = null;
String localLocation = null;
String relativePath = null;
try {
response = OkHttpUtils.getInstance().newCall(request).execute();
//將響應數據轉化爲輸入流數據
inputStream = response.body().byteStream();
byte[] buffer = new byte[2048];
localLocation = this.getFileLocation("baidupano");
Date nowTime = new Date(System.currentTimeMillis());
relativePath = this.dateToPath(nowTime.getTime()/1000) + File.separator + panoId + File.separator + nowTime.getTime()+".jpg";
File myPath = new File(localLocation + File.separator + relativePath);
this.mkDir(myPath);
out = new FileOutputStream(myPath);
int len;
while ((len = inputStream.read(buffer)) != -1){
out.write(buffer,0,len);
}
//刷新文件流
out.flush();
} catch (IOException e) {
e.printStackTrace();
}finally {
if (inputStream!=null){
try {
inputStream.close();
}catch (IOException e){
e.printStackTrace();
}
}
if (null!=out){
try {
out.close();
}catch (IOException e){
e.printStackTrace();
}
}
if (null!=response){
response.body().close();
}
}
return localLocation + File.separator + relativePath;
}
public String downloadPic(String url, String curPanoId, String hisPanoId){
okhttp3.Request request = new okhttp3.Request.Builder()
.url(url)
.build();
Response response = null;
InputStream inputStream = null;
FileOutputStream out = null;
String localLocation = null;
String relativePath = null;
try {
response = OkHttpUtils.getInstance().newCall(request).execute();
//將響應數據轉化爲輸入流數據
inputStream = response.body().byteStream();
byte[] buffer = new byte[2048];
localLocation = this.getFileLocation("baidupano");
Date nowTime = new Date(System.currentTimeMillis());
relativePath = this.dateToPath(nowTime.getTime()/1000) + File.separator + curPanoId + File.separator + hisPanoId + File.separator + nowTime.getTime()+".jpg";
File myPath = new File(localLocation + File.separator + relativePath);
this.mkDir(myPath);
out = new FileOutputStream(myPath);
int len;
while ((len = inputStream.read(buffer)) != -1){
out.write(buffer,0,len);
}
//刷新文件流
out.flush();
} catch (IOException e) {
e.printStackTrace();
}finally {
if (inputStream!=null){
try {
inputStream.close();
}catch (IOException e){
e.printStackTrace();
}
}
if (null!=out){
try {
out.close();
}catch (IOException e){
e.printStackTrace();
}
}
if (null!=response){
response.body().close();
}
}
return localLocation + File.separator + relativePath;
}
}
根據上述分析的流程編寫核心業務邏輯代碼,由於遞歸的使用,爲防止棧溢出,設定一個層級level:
public void getBaiduPanoPics(String curPanoId, int level){
// 遞歸層級控制
if (level == 0){
logger.info("此街道爬取完畢!");
return;
}
//發送json請求爬取街景圖片存庫
JSONObject jsonObject = sendPanoJsonRequest(curPanoId);
processStorePanoByID(curPanoId,jsonObject,tableName);
//獲取這一段路接下來的panoId
List<String> forwardNodes = getForwardNodes(curPanoId, jsonObject);
if (forwardNodes!=null&&!forwardNodes.isEmpty()){
//遍歷爬取
int forwardNodeSize = forwardNodes.size();
JSONObject tempJsonObject;
String id;
for(int i = 0; i < forwardNodeSize-1; i++){
id = forwardNodes.get(i);
tempJsonObject = sendPanoJsonRequest(id);
processStorePanoByID(id,tempJsonObject,tableName);
}
//單獨處理集合中最後一個元素,因爲需在此得到錨節點進而遞歸
id = forwardNodes.get(forwardNodeSize-1);
tempJsonObject = sendPanoJsonRequest(id);
processStorePanoByID(id, tempJsonObject, tableName);
String anchorNode = getEasyAnchorNode(tempJsonObject);
if (StringUtils.isBlank(anchorNode)){
return;
}
//遞歸調用
getBaiduPanoPics(anchorNode,level-1);
}else {
//若返回爲空,則切換爲獲取links
String anchorNode = getEasyAnchorNode(jsonObject);
if (StringUtils.isBlank(anchorNode)){
return;
}
//遞歸調用
getBaiduPanoPics(anchorNode,level-1);
}
}
上述方法中調用的其他方法完整代碼如下:
/**
* @author zhanghuigen
* @since 0.1.0
**/
public class BaiduPanoPics {
private final Logger logger = Logger.getLogger(BaiduPanoPics.class);
private static final String REQUEST_PREX_FORWARD_PANOS = "https://mapsv0.bdimg.com/?qt=sdata&sid=";
private static final String REQUEST_PREX_PICS = "https://mapsv1.bdimg.com/?qt=pdata&sid=";
private static final int PIC_NUM = 8;
private String tableName;
public BaiduPanoPics(String tableName){
this.tableName = tableName;
}
private List<String> getPicsUrl(String panoId){
// 不添加方向判斷邏輯
final String param = "&z=4";
String picUrl;
List<String> picUrlList = new ArrayList<>(4);
for (int i = 4; i < PIC_NUM; i++){
picUrl = REQUEST_PREX_PICS + panoId + "&pos=1_"+ i + param;
picUrlList.add(picUrl);
}
return picUrlList;
}
public void getBaiduPanoPics(String curPanoId, int level){
// 遞歸層級控制
if (level == 0){
logger.info("此街道爬取完畢!");
return;
}
//發送json請求爬取街景圖片存庫
JSONObject jsonObject = sendPanoJsonRequest(curPanoId);
processStorePanoByID(curPanoId,jsonObject,tableName);
//獲取這一段路接下來的panoId
List<String> forwardNodes = getForwardNodes(curPanoId, jsonObject);
if (forwardNodes!=null&&!forwardNodes.isEmpty()){
//遍歷爬取
int forwardNodeSize = forwardNodes.size();
JSONObject tempJsonObject;
String id;
for(int i = 0; i < forwardNodeSize-1; i++){
id = forwardNodes.get(i);
tempJsonObject = sendPanoJsonRequest(id);
processStorePanoByID(id,tempJsonObject,tableName);
}
//單獨處理集合中最後一個元素,因爲需在此得到錨節點進而遞歸
id = forwardNodes.get(forwardNodeSize-1);
tempJsonObject = sendPanoJsonRequest(id);
processStorePanoByID(id, tempJsonObject, tableName);
String anchorNode = getEasyAnchorNode(tempJsonObject);
if (StringUtils.isBlank(anchorNode)){
return;
}
//遞歸調用
getBaiduPanoPics(anchorNode,level-1);
}else {
//若返回爲空,則切換爲獲取links
String anchorNode = getEasyAnchorNode(jsonObject);
if (StringUtils.isBlank(anchorNode)){
return;
}
//遞歸調用
getBaiduPanoPics(anchorNode,level-1);
}
}
/**
* @param curPanoId id
* @param jsonObject json
* @return list
* 根據當前panoID、json響應,提取出前向道路中的panoId
* 若遇到分叉(也可能是人行道),返回null
*/
private List<String> getForwardNodes(String curPanoId, JSONObject jsonObject){
JSONArray roadJsonArray = jsonObject.getJSONArray("Roads");
JSONArray panoJsonArray = roadJsonArray.getJSONObject(0).getJSONArray("Panos");
//總數組大小
int nearByPanoIdSize = panoJsonArray.size();
List<String> panoJsonIdList = new ArrayList<>();
String panoJsonId;
if (nearByPanoIdSize>1){
for (int i=0; i < nearByPanoIdSize; i++){
panoJsonId = panoJsonArray.getJSONObject(i).getString("PID");
//將返回的最近路段panoId存入list
panoJsonIdList.add(panoJsonId);
}
int currentPanoIdIndex = panoJsonIdList.indexOf(curPanoId);
if (currentPanoIdIndex >= 0){
return panoJsonIdList.subList(currentPanoIdIndex+1,nearByPanoIdSize);
}else{
System.out.println("當前節點不在附近節點集中!");
// 返回什麼?
}
}else if(nearByPanoIdSize==1){
//前方遇到分叉路
panoJsonId = panoJsonArray.getJSONObject(0).getString("PID");
if (curPanoId.equals(panoJsonId)){
return null;
}else {
panoJsonIdList.add(panoJsonId);
}
}else {
logger.info("頁面異常!");
}
return panoJsonIdList;
}
/**
* @param jsonObject json
* @return list
* 根據指定panoId,獲取下一段路各方向的起始panoId
* 若路有分叉,返回多個,否則返回一個
*/
private List<String> getAnchorNode(JSONObject jsonObject){
//暫不判斷方向
//獲取link字段中的數組
JSONArray anchorIdArray = jsonObject.getJSONArray("Links");
List<String> anchorLinks = new ArrayList<>();
for (Object linkJson:anchorIdArray){
JSONObject anchor = (JSONObject)linkJson;
String anchorId = anchor.getString("PID");
anchorLinks.add(anchorId);
}
return anchorLinks;
}
/**
* 易出現環鏈
* @param jsonObject
* @return string
*/
private String getEasyAnchorNode(JSONObject jsonObject){
JSONArray anchorIdArray = jsonObject.getJSONArray("Links");
int size = anchorIdArray.size();
if (size==0){
return null;
}else if (size ==1){
return ((JSONObject)anchorIdArray.get(0)).getString("PID");
}else {
int index = new Random().nextInt(anchorIdArray.size());
return ((JSONObject)anchorIdArray.get(index)).getString("PID");
}
}
/**
* 也會產生環鏈
* @param jsonObject
* @return panoId
*/
private String getAnchorNodeByDir(JSONObject jsonObject){
JSONArray anchorIdArray = jsonObject.getJSONArray("Links");
if (anchorIdArray.size()==1){
return ((JSONObject)anchorIdArray.get(0)).getString("PID");
}else{
//選擇dir值最小的那個,很可能是前向節點
Map<Integer,String> map = new HashMap<>(4);
int dirTemp = 400;
for (Object linkJson:anchorIdArray){
JSONObject anchor = (JSONObject)linkJson;
Integer anchorDir = anchor.getInteger("DIR");
String anchorPid = anchor.getString("PID");
map.put(anchorDir,anchorPid);
if (dirTemp>anchorDir){
dirTemp = anchorDir;
}
}
return map.get(dirTemp);
}
}
/**
* 獲取錨節點,儘量沿着同一條路走,但有時路況不能保證一定
* 易出現環鏈
* @param jsonObject
* @return string
*/
private String getAdvancedAnchorNode(JSONObject jsonObject){
JSONArray anchorIdArray = jsonObject.getJSONArray("Links");
if (anchorIdArray.isEmpty()){
return null;
}
//若只有一個,直接返回
if(anchorIdArray.size()==1){
return ((JSONObject)anchorIdArray.get(0)).getString("PID");
}
//若有多個,爲後續提取方便,用map存儲諸錨節點
Map<String,String> anchorMap = new HashMap<>(8);
for (Object linkJson:anchorIdArray){
JSONObject anchor = (JSONObject)linkJson;
String anchorPid = anchor.getString("PID");
String anchorRid = anchor.getString("RID");
anchorMap.put(anchorRid, anchorPid);
}
//用list存儲RoadBean信息
JSONArray roadJsonArray = jsonObject.getJSONArray("Roads");
List<RoadBean> roadBeanList = new ArrayList<>(8);
for (Object roadJson:roadJsonArray){
JSONObject road = (JSONObject)roadJson;
String roadId = road.getString("ID");
boolean isCurrent = road.getBoolean("IsCurrent");
String roadName = road.getString("Name");
RoadBean roadBean = new RoadBean(roadId,isCurrent,roadName);
roadBeanList.add(roadBean);
}
if (roadBeanList.size()>1&&roadBeanList.get(0).isCurrent()){
//當前位置節點所屬街道名
String currentStreetName = roadBeanList.get(0).getRoadName();
for (int i = 1; i < roadBeanList.size(); i++){
RoadBean roadBean = roadBeanList.get(i);
// 儘量沿着同一條路前行
if (currentStreetName.equals(roadBean.getRoadName())){
return anchorMap.get(roadBean.getRid());
}
}
//若遍歷完無同名道路, 選擇links中的第一個
return ((JSONObject)anchorIdArray.get(0)).getString("PID");
}else {
logger.info("道路異常或解析異常");
}
return null;
}
/**
*
* @param panoId id
* @return JSONObject
* 發送json請求,獲取服務器json響應
*/
private JSONObject sendPanoJsonRequest(String panoId){
String suffixParam = "&pc=1";
String url = REQUEST_PREX_FORWARD_PANOS + panoId + suffixParam;
//發送json請求(可從中獲取歷史panoId及附近路段的panoId)
String jsonPanoResponse = sendGetRequest(url);
JSONObject jsonObject = JSON.parseObject(jsonPanoResponse);
// json數組對象
JSONArray jsonArray = JSON.parseArray(jsonObject.get("content").toString());
return jsonArray.getJSONObject(0);
}
/**
* @param curPanoId
* 根據curPanoId爬取街景圖片下載到本地並存儲於數據庫
*/
private void processStorePanoByID(String curPanoId,JSONObject jsonObject,String tableName){
String curPanoDate = jsonObject.getString("Time");
//下載當前街景當前圖片存儲於本地
List<String> curPanoPicPath = downloadCurPanoPics(curPanoId);
PanoPic panoPicBean = new PanoPic(curPanoId,curPanoDate,curPanoPicPath);
//下載當前街景歷史圖片
JSONArray timeLineJsonArray = jsonObject.getJSONArray("TimeLine");
if (timeLineJsonArray.size()>1) {
//爲簡便,有歷史街景則選擇第一項
JSONObject timeLineJson = timeLineJsonArray.getJSONObject(1);
if (!timeLineJson.getBooleanValue("IsCurrent")){
String historyPanoId = timeLineJson.getString("ID");
String timeLine = timeLineJson.getString("TimeLine");
List<String> hisPanoPicPath = downloadHisPanoPics(curPanoId,historyPanoId);
panoPicBean.setHisPanoId(historyPanoId);
panoPicBean.setHisShootDate(timeLine);
panoPicBean.setHisPicPath(hisPanoPicPath);
}
}
//將本地圖片存於數據庫
storePanoPicsToDb(panoPicBean,tableName);
}
private List<String> downloadCurPanoPics(String panoId){
String localPath;
List<String> picsRequestList = this.getPicsUrl(panoId);
PicLoadUtils picLoadUtils = new PicLoadUtils();
List<String> localPathList = new ArrayList<>(4);
for (String picRequest:picsRequestList){
localPath = picLoadUtils.downloadPic(picRequest, panoId);
localPathList.add(localPath);
}
return localPathList;
}
private List<String> downloadHisPanoPics(String panoId,String hisPanoId){
String localPath;
List<String> picsRequestList = this.getPicsUrl(hisPanoId);
PicLoadUtils picLoadUtils = new PicLoadUtils();
List<String> localPathList = new ArrayList<>(4);
for (String picRequest:picsRequestList){
localPath = picLoadUtils.downloadPic(picRequest, panoId, hisPanoId);
localPathList.add(localPath);
}
return localPathList;
}
private void storePanoPicsToDb(PanoPic panoPic,String tableName){
//讀本地圖片存數據庫
BlobInsertUtils blobInsertUtils = new BlobInsertUtils(tableName);
blobInsertUtils.insertAllImage2DBWithNoCheck(panoPic);
}
public String sendGetRequest(String url){
okhttp3.Request request = new okhttp3.Request.Builder()
.url(url).build();
Response response;
String result = null;
try {
response = OkHttpUtils.getInstance().newCall(request).execute();
result = response.body().string();
response.body().close();
} catch (IOException e) {
logger.error("發送請求失敗--"+url);
e.printStackTrace();
}
return result;
}
}
再編寫測試代碼:
public class ImageStoreTest {
private static final Logger logger = Logger.getLogger(ImageStoreTest.class);
public static void main(String[] args) {
// 讀取數據庫記錄存入list
List<Pano> panoList = new MySQLHelper().getAllPanoFromDB();
BaiduPanoPics baiduPanoPics = new BaiduPanoPics("baidu_pano_pics");
for (Pano pano:panoList){
String panoID = pano.getPanoId();
logger.info("----------開始爬取起始id:"+ panoID+"-----------");
baiduPanoPics.getBaiduPanoPics(panoID,50);
logger.info("-----------"+panoID+"爬取結束!"+"-----------");
}
}
}
上述代碼塊中存儲二進制流至MySQL相關邏輯BlobInsertUtils可查閱另一篇博客-----百度街景圖片存MySQL
private void storePanoPicsToDb(PanoPic panoPic,String tableName){
//讀本地圖片存數據庫
BlobInsertUtils blobInsertUtils = new BlobInsertUtils(tableName);
blobInsertUtils.insertAllImage2DBWithNoCheck(panoPic);
}
這樣就可以看到爬取結果存入數據庫了,在這過程中,如果發生請求報錯,很可能是因爲JDK版本太低導致的發送https請求存在bug,需要升級到1.8.0_211以上版本即可。整個百度街景圖片爬取就寫到這裏,歡迎留言交流。
4 遺留問題
- 在遞歸遍歷的方法中會產生類似環形鏈表情況,針對此只能多做嘗試了,選擇可遞歸層次更多的起始id,這樣也能滿足本項目的基本需求;
- 理想情況下,錨節點的選擇可以幫助我們只憑一個起始panoID就可沿着一條路走到底,但現實情況不是這樣,錨節點選擇需再優化;
- 大部分圖片下載鏈接參數範圍均爲4-7,有些則不是(由西向東、由南至北有的是0-3),這個跟前進方向有關,因此最好根據不同的方向和路況信息設計更優的鏈接拼接規則,這就需要再更深層次的分析百度地圖街景的相關接口了,但從爬取效果看,4-7的範圍仍可獲取不同角度的街景圖片。
1和2本質上還是因爲沒有獲取百度地圖各街道街景ID或百度座標所致,經和GIS開發組同事討論後終於找到了解決辦法,即:通過OpenStreetMap獲取城市路網數據,得到城市路網的谷歌座標後,再進行百度座標轉換(這裏提供一個座標轉換的測試網址),得到百度座標後,即可通過發送上述請求1就能獲取百度座標系下的位置街景ID,後續會寫一篇該方法及座標轉換的完整博客。
若大牛有更好的方法解決街道座標問題,跪求留言告知,不甚感謝!
參考
- http://www.siyuanblog.com/?p=632
- https://www.jianshu.com/p/3a0fa1e57ff6