Google Translate And Json Parsing - Google翻譯與Json解析

有時候我們需要翻譯文章或是其他材料,會用到Google在線翻譯,很好的工具。

但是每次打開網站,輸入中文,copy返回的英文,會很麻煩,作爲程序員,

應該儘可能地讓電腦幫助我們完成繁瑣的事情。


打開Google翻譯地址:http://translate.google.cn/?hl=en#zh-CN/en/;

在左側文本框輸入“水杯”,右側文本框會出現英文翻譯:Cups。使用FireBug觀察到,發送的請求爲:

http://translate.google.cn/translate_a/t?

client=t&hl=en&sl=zh-CN&tl=en&ie=UTF-8&oe=UTF-8&multires=1

&oc=2&otf=1&ssel=0&tsel=0&sc=1&q=%E6%B0%B4%E6%9D%AF

利用這個請求url,我們便可以開始編程解決翻譯的問題。


留意此鏈接的參數,將不需要的參數刪除:

"http://translate.google.com/translate_a/t?client=t&hl=en&sl=zh-CN&tl=en&q="。

注意:當你在程序中使用該url與google服務器進行交互,返回的字符串是不規則的:

         [[["Graduation project next week to reply , the system did not do a good job , I'm working papers ,","下週畢業設計答辯,系統沒做好,論文沒寫完,呵呵","","Xià zhōu bìyè shèjì dábiàn, xìtǒng méi zuò hǎo, lùnwén méi xiě wán, hēhē"]],,"zh-CN",,[["Graduation project",[4],0,0,1000,0,2,0],["next",[5],1,0,1000,2,3,0],["week",[6],1,0,1000,3,4,0],["to reply",[7],1,0,1000,4,6,0],[", the system",[8],0,0,1000,6,9,0],["did not do a good job",[9],1,0,1000,9,15,0],[",",[10],0,0,1000,15,16,0],["I'm working",[11],1,0,1000,16,19,0],["papers",[12],1,0,1000,19,20,0],[",",[13],0,0,1000,20,21,0]],[["畢業 設計",4,[["Graduation project",1000,0,0],["graduate design",0,0,0],["graduation design",0,0,0],["graduated from the design",0,0,0]],[[2,6]],"下週畢業設計答辯,系統沒做好,論文沒寫完,呵呵"],["下週",5,[["next",1000,1,0],["next week",0,1,0],["next week to",0,1,0]],[[0,2]],""],["",6,[["week",1000,1,0]],,""],["答辯",7,[["to reply",1000,1,0],["respondent",0,1,0],["the respondent",0,1,0],["of reply",0,1,0]],[[6,8]],""],[", 系統",8,[[", the system",1000,0,0],["system",0,0,0],["the system",0,0,0]],[[8,11]],""],["沒 做好",9,[["did not do a good job",1000,1,0],["not doing",0,1,0],["did not do",0,1,0],["not done well",0,1,0],["have not done well",0,1,0]],[[11,14]],""],[",",10,[[",",1000,0,0],["and",0,0,0]],[[14,15]],""],["沒 寫完",11,[["I'm working",1000,1,0],["did not finish",0,1,0],["not finished",0,1,0],["had not finished",0,1,0]],[[17,20]],""],["論文",12,[["papers",1000,1,0],["paper",0,1,0],["thesis",0,1,0],["dissertation",0,1,0]],[[15,17]],""],[",",13,[[",",1000,0,0],["and",0,0,0]],[[20,21]],""]],,,[["zh-CN"]],209]。


這是由於client參數引起的,將client參數設置爲非"t",則會返回Json格式的字符串(便於解析):

       {"sentences":[{"trans":"Graduation project next week to reply, the system did not do a good job, I'm working papers,","orig":"下週畢業設計答辯,系統沒做好,論文沒寫完,呵呵","translit":"","src_translit":"Xià zhōu bìyè shèjì dábiàn, xìtǒng méi zuò hǎo, lùnwén méi xiě wán, hēhē"}],"src":"zh-CN","server_time":343} 。


解析Json字符串有很多類庫,我使用Jackson解析。仔細分析返回的Json字符串,意識到需要有兩個java 實體類來作爲被轉換的對象。

public class Trans
{  
                private String trans;
                private String orig;
                private String translit;
                private String src_translit;
               // setters and getters omitted, 此處略去get,set方法
}

public class TransWrapper
{
              private Trans[] sentences;
              private String src;
              private String server_time;

              // setters and getters omitted, 此處略去get,set方法
}


下面進行主類的編寫:

public class Translator
{
	//http://translate.google.com/translate_a/t?client=t&hl=en&sl=zh-CN&tl=en&ie=UTF-8&oe=UTF-8&multires=1&ssel=0&tsel=0&sc=1&q=%E6%80%9D%E6%83%B3%E5%BE%88%E6%B7%B7%E4%B9%B1
	
	//http://translate.google.com/#zh-CN/en/
	public void translate(String url, String term)
	{
		try
		{
			System.out.println(url + term);
			URL target = new URL(url + term);
                        // 這裏需要格外注意,由於Google屏蔽程序請求谷歌翻譯服務,所以這裏我們需要設置"user-agent"來模擬瀏覽器進行操作.....
			HttpURLConnection connection = (HttpURLConnection)target.openConnection();
			
			connection.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 6.1; rv:20.0) Gecko/20100101 Firefox/20.0");
			connection.setDoOutput(true);
			connection.setDoInput(true);
		
			InputStream is = connection.getInputStream();
			BufferedInputStream bis = new BufferedInputStream(is);
			StringBuffer sb = new StringBuffer();
			byte[] read = new byte[4096];
			int length;
			while(-1 != (length = bis.read(read, 0, read.length)))
				sb.append(new String(read, 0, length));
			
			System.out.println(sb.toString());
			
			ObjectMapper om = new ObjectMapper();
			TransWrapper transWrapper = om.readValue(sb.toString(), TransWrapper.class);
                        //當字符串比較長時,google會進行分段翻譯,所以這裏得到的是一個數組(注意:不是List)
			Trans[] trans = transWrapper.getSentences();
			String translation = "";
			
			for(Trans tran : trans)
			{	
				translation += tran.getTrans();
			}
			System.out.println(translation);
			
		}
		catch (Exception e)
		{
			e.printStackTrace();
		} 
		/*Reader reader = new StringReader(new String(read));
		System.out.println(reader.);*/
	}
	
	/*
	 * {"sentences":
	 * [
	 * 
	 * {"trans":"The backcourt Spear is restricted, the Warriors also be able to come forward, the state of the Spurs also fell, Section II to narrow the gap. ",
	 * "orig":"後場雙槍受到限制,勇士還有人能挺身而出,馬刺的狀態也下滑,第二節雙方差距縮小。",
	 * "translit":"",
	 * "src_translit":"Hòu chǎng shuāng qiāng shòudào xiànzhì, yǒngshì hái yǒurén néng tǐngshēn ér chū, mǎcì de zhuàngtài yě xiàhuá, dì èr jié shuāngfāng chājù suōxiǎo."},
	 * 
	 * 
	 * {"trans":"More than half of this section, Barnes layup, the Warriors only 44-47 behind. ",
	 * "orig":"本節過半時,巴恩斯上籃得手,勇士只以44-47落後。",
	 * "translit":"",
	 * "src_translit":"Běn jié guòbàn shí, ba ēn sī shàng lán déshǒu, yǒngshì zhǐ yǐ 44-47 luòhòu."},
	 * 
	 * 
	 * {"trans":"Despite nearly four minutes later only to hit a ball, the Warriors are still in hot pursuit. ",
	 * "orig":"儘管此後近4分鐘只投中一球,勇士仍緊追不捨。",
	 * "translit":"",
	 * "src_translit":"Jǐnguǎn cǐhòu jìn 4 fēnzhōng zhǐ tóu zhòng yī qiú, yǒngshì réng jǐn zhuī bù shě."},
	 * 
	 * 
	 * {"trans":"This section there are 1 minute 45 seconds, Curry finally hit the third, the Warriors 51-52 only 1 point behind. ",
	 * "orig":"本節還有1分45秒時,庫裏終於命中三分,勇士只以51-52落後1分。",
	 * "translit":"",
	 * "src_translit":"Běn jié hái yǒu 1 fēn 45 miǎo shí, kù lǐ zhōngyú mìngzhòng sān fēn, yǒngshì zhǐ yǐ 51-52 luòhòu 1 fēn."},
	 * 
	 * 
	 * 
	 * {"trans":"The end of the half, the Warriors 51-54 at a disadvantage.",
	 * "orig":"半場結束時,勇士以51-54處於劣勢。",
	 * "translit":"",
	 * "src_translit":"Bàn chǎng jiéshù shí, yǒngshì yǐ 51-54 chǔyú lièshì."}
	 * 
	 * ],
	 * 
	 * 
	 * "src":"zh-CN",
	 * "server_time":1
	 * 
	 * }
	 */
	
	public static void main(String[] args) throws UnsupportedEncodingException
	{
	//	String url = "http://translate.google.com/translate_a/t?client=t&hl=en&sl=zh-CN&tl=en&ie=UTF-8&oe=UTF-8&multires=1&ssel=0&tsel=0&sc=1&q=";
		String url = "http://translate.google.com/translate_a/t?client=p&hl=en&sl=zh-CN&tl=en&q=";
		//String term = URLEncoder.encode("後場雙槍受到限制,勇士還有人能挺身而出,馬刺的狀態也下滑,第二節雙方差距縮小。本節過半時,巴恩斯上籃得手,勇士只以44-47落後。儘管此後近4分鐘只投中一球,勇士仍緊追不捨。本節還有1分45秒時,庫裏終於命中三分,勇士只以51-52落後1分。半場結束時,勇士以51-54處於劣勢。", "utf-8");
		String term = URLEncoder.encode("下週畢業設計答辯,系統沒做好,論文沒寫完,呵呵", "utf-8");
		Translator translator = new Translator();
		translator.translate(url, term);
	}
}

運行程序,控制檯打印出:

Graduation project next week to reply, the system did not do a good job, I'm working papers,



See You, Readers.....


發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章