android 從網頁上爬取數據



1.爬取字符串文本經常通過下面三種方法

(1)通過HttpURLConnection爬取文本

通過url得到HttpUrlConnection的對象httpUrlConnection。

得到響應碼判斷是否獲取成功。

httpUrlConnection.getInputSream()的字節流對象轉化爲字符流InputStreamReader對象is。

通過is的read()方法獲取文本。

	/**
         * HttpUrlConnection
         */
        new Thread(new Runnable() {
            @Override
            public void run() {
                URL url = null;
                try {
                    url = new URL("http://lol.qq.com/web201310/info-heros.shtml");
                } catch (MalformedURLException e) {
                    e.printStackTrace();
                }
                try {
                    HttpURLConnection httpURLConnection = (HttpURLConnection) url.openConnection();
                    if(httpURLConnection.getResponseCode() == HttpURLConnection.HTTP_OK) {

                        InputStreamReader is = new InputStreamReader(httpURLConnection.getInputStream());
                        int i = 0;
                        StringBuffer sb = new StringBuffer();
                        while ((i = is.read()) != -1 ) {
                            sb.append((char) i);

                        }
//                        Log.d("TAG",sb.toString());
                        Message msg = new Message();
                        Bundle bundle = new Bundle();
                        byte[] bytes = sb.toString().getBytes("utf-8");
                        String str = new String(bytes);
                        bundle.putString("stringUrl", str);
                        msg.setData(bundle);
                        msg.what = 0x123;
                        myHandler.sendMessage(msg);

                    } else {
                        Log.d("TAG httpUrlConnection : ",httpURLConnection.getResponseCode() +"");
                    }
                } catch (IOException e) {
                    e.printStackTrace();
                }
            }
        }).start();

效果圖:


  (2)通過HttpClient爬取數據

創建HttpClient對象client。

通過url獲取HttpGet請求對象 get;

獲取字符串類型的ResponseHandler(響應處理器)對象.

調用client.execute(get,responseHandler)方法獲取字符串文本。


/**
 * HttpClient
 */
new Thread(new Runnable() {
    @Override
    public void run() {
        try {
            HttpClient client = new DefaultHttpClient();
            HttpGet get = new HttpGet("http://lol.qq.com/web201310/info-heros.shtml");
            ResponseHandler<String> responseHandler = new BasicResponseHandler();
            String content = client.execute(get, responseHandler);
            if(content.equals("")) {
                Toast.makeText(DataActivity.this, "null", Toast.LENGTH_SHORT).show();
            }
            Message msg = new Message();
            Bundle bundle = new Bundle();
            bundle.putString("stringUrl",content);
            msg.setData(bundle);
            msg.what = 0x123;
            myHandler.sendMessage(msg);
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}).start();

效果圖:



  (3)通過jsoup爬取數據,這裏使用異步加載數據,除了爬取文本外,經常通過jsoup去獲取具體的數據,如下我們要爬取字符串有:所有英雄、戰士、法師、刺客、坦克、射手、輔助。



public class LoadHtml extends AsyncTask<String,String,String> {
    Document doc;		//建立一個Document對象
    String url ;
    CallBack callBack;		//接口回調

    private List<String> mListTitle = new ArrayList<>();
    public LoadHtml(CallBack callBack,String url) {
        this.url = url;
        this.callBack = callBack;
    }


    @Override
    protected String doInBackground(String... params) {
        try {
            doc = Jsoup.connect(url).timeout(5000).post();	//doc.string()爲該url的文本字符串
            Document document = Jsoup.parse(doc.toString());
            Elements element = document.select("#seleteChecklist");	//取得標題所在<ul>的id值,通過																				seleteChecklist進行過濾。
           
            Document document1 = Jsoup.parse(element.toString());
            Elements elements = document1.getElementsByTag("li");
            
            if(elements == null) {
                Log.d("TAG","elements爲空");
            }
            for(Element links : elements) {
                
                String title = links.getElementsByTag("label").text();
                mListTitle.add(title);		//得到字符串列表(所有英雄、戰士...)
            }

        } catch (IOException e) {
            e.printStackTrace();
        }
        return null;
    }

    @Override
    protected void onPostExecute(String s) {
        super.onPostExecute(s);
        Log.d("TAG", "onPostExecute");
        Log.d("TAG","listSize : "+mListTitle.size());

        for(int i=0;i < mListTitle.size();i++) {
            String title = mListTitle.get(i);
	    Log.d("TAG","title : "+title);
} if(mListTitle !=null) { callBack.solve(mListTitle); //當獲取到具體數據列表時調用回調函數 } } }


效果圖:



solve()爲自定義接口CallBack的方法,需要數據的類(A類),只需實現該接口,重寫該方法即可。LoadHtml類(B類)的構造方法中的callBack爲A(該類繼承了CallBack接口)爲當獲取到信息數據後,調用回調接口函數的slove()方法即可將數據返回到需要該數據的類。
























發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章