Android新聞閱讀器(數據抓取)

第一篇技術博客,寫得不好請見諒,謝謝(^_^)
由於最近師弟師妹們學習Android的需求,於是就寫了此篇博客並且與各位分享一下。

整篇博客總共分爲兩部分。

第一部分搭建一個新聞列表界面(ListView列表)。
第二部分新聞數據的抓取(使用正則表達式)

涉及到的技術,java正則表達式,java網絡編程(IO流)。
編譯器:android studio

整個Demo項目的結構如下所示。
這裏寫圖片描述

1. 第一部分,搭建一個新聞列表界面

MainActivity.java文件代碼如下

package per.edward.androidnewsreader;

import android.app.Activity;
import android.os.Bundle;
import android.view.View;
import android.widget.AdapterView;
import android.widget.ListView;
import android.widget.Toast;

import java.util.ArrayList;
import java.util.List;

import per.edward.androidnewsreader.adapter.NewsAdapter;
import per.edward.androidnewsreader.bean.NewsItemModel;


public class MainActivity extends Activity {
    private ListView mListView;
    private List<String> list;
    private NewsAdapter adapter;

    @Override
    protected void onCreate(Bundle savedInstanceState) {
        super.onCreate(savedInstanceState);
        setContentView(R.layout.activity_main);
        //初始化視圖
        initView();
        //初始化數據
        initData();
    }

    public void initView() {
        list = new ArrayList<String>();
        mListView = (ListView) findViewById(R.id.list_view);
    }

    public void initData() {

        for (int i = 0; i < 15; i++) {
            list.add(i+"");
        }

        //新聞列表適配器
        adapter = new NewsAdapter(this, list, R.layout.adapter_news_item);
        mListView.setAdapter(adapter);
        //設置點擊事件
        mListView.setOnItemClickListener(new ItemClickListener());
    }

    /**
     * 新聞列表點擊事件
     */
    public class ItemClickListener implements AdapterView.OnItemClickListener{
        @Override
        public void onItemClick(AdapterView<?> adapterView, View view, int i, long l) {
            Toast.makeText(getApplicationContext(),""+i,Toast.LENGTH_SHORT ).show();
        }
    }
}

activity_main.xml文件如下所示

<RelativeLayout xmlns:android="http://schemas.android.com/apk/res/android"
    xmlns:tools="http://schemas.android.com/tools"
    android:layout_width="match_parent"
    android:layout_height="match_parent">

    <ListView
        android:id="@+id/list_view"
        android:layout_width="match_parent"
        android:layout_height="match_parent"/>

</RelativeLayout>

adapter_news_item.xml文件如下所示

<?xml version="1.0" encoding="utf-8"?>
<RelativeLayout xmlns:android="http://schemas.android.com/apk/res/android"
    android:layout_width="match_parent"
    android:layout_height="wrap_content"
    android:padding="10dp">

    <ImageView
        android:id="@+id/image_view"
        android:layout_width="80dp"
        android:layout_height="wrap_content"
        android:scaleType="centerCrop"
        android:layout_centerVertical="true"
        android:background="@mipmap/ic_launcher" />

    <LinearLayout
        android:layout_marginLeft="10dp"
        android:id="@+id/line"
        android:layout_width="match_parent"
        android:layout_height="wrap_content"
        android:layout_toRightOf="@+id/image_view"
        android:orientation="vertical">

        <TextView
            android:id="@+id/txt_title"
            android:layout_width="wrap_content"
            android:layout_height="wrap_content"
            android:text="Edward"
            android:textSize="16dp" />

        <TextView
            android:layout_marginTop="5dp"
            android:id="@+id/txt_summary"
            android:layout_width="wrap_content"
            android:layout_height="wrap_content"
            android:text="嶺南學院"
            android:textSize="12dp" />
    </LinearLayout>

</RelativeLayout>

~~~整個界面的效果就是如下圖,非常的簡單。

這裏寫圖片描述

整個新聞列表界面搭建完成。就是如此簡單。接下來就是分享一下如何去抓取新聞網站的數據。

2. 第二部分,數據抓取分析

抓取目標URL地址:http://news.qq.com/china_index.shtml
下面咋們看看這個網站中的內容,內容中左邊有個圖片右邊有新聞標題和新聞摘要。
接下來目標很明確,就是將這些數據全部拿下來,再將其顯示在第一部搭建的界面中。

這裏寫圖片描述

查看此頁面的源代碼,如下圖所示,我用紅色邊框勾出了三條新聞的源代碼。

<a target="_blank" class="pic" href="/a/20150909/036168.htm"><img class="picto" src="http://img1.gtimg.com/news/pics/hv1/51/43/1920/124859016.jpg"></a><em class="f14 l24"><a target="_blank" class="linkto" href="/a/20150909/036168.htm">英航客機美國拉斯維加斯起火 14人輕傷送醫治療</a></em><p class="l22">美國聯邦航空管理局發佈聲明說,飛機左引擎起火,機組中斷起飛,指揮乘客緊急疏散。</p>

我們可以發現,除了新聞的圖片地址,新聞標題,新聞的摘要,新聞詳情地址會改變之外,其它的標籤對都不會改變。因此我們根據此規則,可以簡單的使用正則表達式匹配出我們想要的數據出來。
正則表達式的核心代碼如下

Pattern pattern = Pattern
                .compile("<a target=\"_blank\" class=\"pic\" href=\"([^\"]*)\"><img class=\"picto\" src=\"([^\"]*)\"></a><em class=\"f14 l24\"><a target=\"_blank\" class=\"linkto\" href=\"[^\"]*\">([^</a>]*)</a></em><p class=\"l22\">([^</p>]*)</p>");

可以看到compile中字符串裏面的內容基本和每條新聞源碼相似,其中([^\"]*),([^</a>]*),([^</p>]*)這三個比較奇怪的語句,咋們可以簡單的認爲在此限定的字符串中任意匹配所有字符直到遇到\”結束。其它兩個([^</a>]*),([^</p>]*)也差不多同樣的意思。

這裏寫圖片描述

Function.java文件的代碼

package per.edward.androidnewsreader.function;

import android.util.Log;

import java.util.ArrayList;
import java.util.List;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

import per.edward.androidnewsreader.bean.NewsItemModel;

/**
 * description:解析新聞數據
 * <p/>
 * author:Edward
 * <p/>
 * 2015/9/9
 */
public class Function {

    public static List<NewsItemModel> parseHtmlData(String result) {
        List<NewsItemModel> list = new ArrayList<>();

        Pattern pattern = Pattern
                .compile("<a target=\"_blank\" class=\"pic\" href=\"([^\"]*)\"><img class=\"picto\" src=\"([^\"]*)\"></a><em class=\"f14 l24\"><a target=\"_blank\" class=\"linkto\" href=\"[^\"]*\">([^</a>]*)</a></em><p class=\"l22\">([^</p>]*)</p>");
        Matcher matcher = pattern.matcher(result);

        StringBuffer sb = new StringBuffer();
        while (matcher.find()) {
            NewsItemModel model = new NewsItemModel();
            model.setNewsDetailUrl(matcher.group(1).trim());
            model.setUrlImgAddress(matcher.group(2).trim());
            model.setNewsTitle(matcher.group(3).trim());
            model.setNewsSummary(matcher.group(4).trim());

            sb.append("詳情頁地址:" + matcher.group(1).trim() + "\n");
            sb.append("圖片地址:" + matcher.group(2).trim() + "\n");
            sb.append("標題:" + matcher.group(3).trim() + "\n");
            sb.append("概要:" + matcher.group(4).trim() + "\n\n");

            list.add(model);
        }

        Log.e("----------------->", sb.toString());

        return list;
    }

}

NewsItemModel.java

package per.edward.androidnewsreader.bean;

import android.graphics.Bitmap;

/**
 * description:新聞Model
 * <p/>
 * author:Edward
 * <p/>
 * 2015/9/9
 */
public class NewsItemModel {
    //存儲加載完成的圖片
    private Bitmap newsBitmap;
    //新聞詳情地址
    private String newsDetailUrl;
    //新聞圖片地址
    private String urlImgAddress;
    //新聞標題
    private String newsTitle;
    //新聞概要
    private String newsSummary;

    public Bitmap getNewsBitmap() {
        return newsBitmap;
    }

    public void setNewsBitmap(Bitmap newsBitmap) {
        this.newsBitmap = newsBitmap;
    }

    public String getUrlImgAddress() {
        return urlImgAddress;
    }

    public void setUrlImgAddress(String urlImgAddress) {
        this.urlImgAddress = urlImgAddress;
    }

    public String getNewsDetailUrl() {
        return newsDetailUrl;
    }

    public void setNewsDetailUrl(String newsDetailUrl) {
        this.newsDetailUrl = newsDetailUrl;
    }


    public String getNewsTitle() {
        return newsTitle;
    }

    public void setNewsTitle(String newsTitle) {
        this.newsTitle = newsTitle;
    }

    public String getNewsSummary() {
        return newsSummary;
    }

    public void setNewsSummary(String newsSummary) {
        this.newsSummary = newsSummary;
    }
}

CommonTool.java代碼

package per.edward.androidnewsreader.tool;

import java.io.BufferedInputStream;
import java.io.ByteArrayOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.net.HttpURLConnection;
import java.net.MalformedURLException;
import java.net.URL;

public class CommonTool {
    /**
     * get請求(獲取指定地址的數據)
     *
     * @param urlString
     * @return
     */
    public static String getRequest(String urlString, String codingType) {
        BufferedInputStream bis = null;
        ByteArrayOutputStream bos = null;
        InputStream is = null;
        try {
            URL url = new URL(urlString);

            HttpURLConnection conn = (HttpURLConnection) url.openConnection();
            // 決定返回值爲JSON格式,不可缺少
            conn.setRequestProperty("Accept", "*/*");

            conn.connect();

            int responseCode = conn.getResponseCode();

            if (responseCode == 200) {
                is = conn.getInputStream();

                bis = new BufferedInputStream(is);
                bos = new ByteArrayOutputStream();

                int length = 0;
                byte[] by = new byte[1024];
                while ((length = bis.read(by)) != -1) {
                    bos.write(by, 0, length);
                }
                bos.flush();

                String result = new String(bos.toByteArray(), codingType);

                // System.out.println(result);
                return result;

            }

        } catch (MalformedURLException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        } catch (IOException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        } finally {
            try {
                if (bos != null) {
                    bos.close();
                }

                if (bis != null) {
                    bis.close();
                }

                if (is != null) {
                    is.close();
                }
            } catch (IOException e) {
                // TODO Auto-generated catch block
                e.printStackTrace();
                System.out.println("關閉失敗!");
            }

        }
        return null;
    }

    /**
     * 下載圖片網絡
     *
     * @param urlString
     *
     * @return
     */
    public static InputStream getImgInputStream(String urlString) {
        try {
            URL url = new URL(urlString);
            HttpURLConnection connection = (HttpURLConnection) url.openConnection();
            connection.setRequestMethod("GET");   //設置請求方法爲GET
            connection.setReadTimeout(10 * 1000);    //設置請求過時時間爲10秒
            connection.connect();
            if (connection.getResponseCode() == 200) {
                return connection.getInputStream();
            } else {
                return null;
            }
        } catch (Exception e) {
            return null;
        }
    }


}

NewsAdapter.java文件

package per.edward.androidnewsreader.adapter;

import android.content.Context;
import android.view.LayoutInflater;
import android.view.View;
import android.view.ViewGroup;
import android.widget.BaseAdapter;
import android.widget.ImageView;
import android.widget.TextView;

import java.util.List;

import per.edward.androidnewsreader.R;
import per.edward.androidnewsreader.bean.NewsItemModel;

/**
 * description:
 * <p/>
 * author:Edward
 * <p/>
 * 2015/9/9
 */
public class NewsAdapter extends BaseAdapter {
    private Context mContext;
    private List<NewsItemModel> list;
    private int layoutId;
    private ViewHolder viewHolder = null;

    public NewsAdapter(Context mContext, List<NewsItemModel> list, int layoutId) {
        this.mContext = mContext;
        this.list = list;
        this.layoutId = layoutId;
    }

    @Override
    public int getCount() {
        return list.size();
    }

    @Override
    public Object getItem(int i) {
        return list.get(i);
    }

    @Override
    public long getItemId(int i) {
        return i;
    }

    @Override
    public View getView(final int position, View view, ViewGroup viewGroup) {
        if (view == null) {
            viewHolder = new ViewHolder();
            view = LayoutInflater.from(mContext).inflate(layoutId, null);

            viewHolder.imageView = (ImageView) view.findViewById(R.id.image_view);
            viewHolder.txtTitle = (TextView) view.findViewById(R.id.txt_title);
            viewHolder.txtSummary = (TextView) view.findViewById(R.id.txt_summary);

            view.setTag(viewHolder);
        } else {
            viewHolder = (ViewHolder) view.getTag();
        }

        if (list.get(position).getNewsBitmap() != null) {
            viewHolder.imageView.setImageBitmap(list.get(position).getNewsBitmap());
        } else {
            //如果沒有圖片,則將imageview控件隱藏
            viewHolder.imageView.setVisibility(View.GONE);
        }
        viewHolder.txtTitle.setText(list.get(position).getNewsTitle());
        viewHolder.txtSummary.setText(list.get(position).getNewsSummary());

        return view;
    }

    public class ViewHolder {
        ImageView imageView;
        TextView txtTitle, txtSummary;
    }

}

最後在進行網絡操作之後別忘了AndroidManifest.xml的網絡權限。

<?xml version="1.0" encoding="utf-8"?>
<manifest xmlns:android="http://schemas.android.com/apk/res/android"
    package="per.edward.androidnewsreader">
    <!--授予網絡權限-->
    <uses-permission android:name="android.permission.INTERNET" />
    <!--最低安裝版本-->
    <uses-sdk
        android:maxSdkVersion="22"
        android:minSdkVersion="9" />


    <application
        android:allowBackup="true"
        android:icon="@mipmap/ic_launcher"
        android:label="@string/app_name"
        android:theme="@style/AppTheme">
        <activity
            android:name=".MainActivity"
            android:label="@string/app_name">
            <intent-filter>
                <action android:name="android.intent.action.MAIN" />

                <category android:name="android.intent.category.LAUNCHER" />
            </intent-filter>
        </activity>
    </application>

</manifest>

最後再修改一下第一部分貼過的MainActivity.java文件的代碼。

package per.edward.androidnewsreader;

import android.app.Activity;
import android.graphics.Bitmap;
import android.graphics.BitmapFactory;
import android.os.Bundle;
import android.os.Handler;
import android.os.Message;
import android.util.Log;
import android.view.View;
import android.widget.AdapterView;
import android.widget.ListView;
import android.widget.Toast;

import java.util.ArrayList;
import java.util.List;

import per.edward.androidnewsreader.adapter.NewsAdapter;
import per.edward.androidnewsreader.bean.NewsItemModel;
import per.edward.androidnewsreader.function.Function;
import per.edward.androidnewsreader.tool.CommonTool;


public class MainActivity extends Activity {
    private ListView mListView;
    private List<NewsItemModel> list;
    private NewsAdapter adapter;
    //獲取數據成功
    private final static int GET_DATA_SUCCEED = 1;

    @Override
    protected void onCreate(Bundle savedInstanceState) {
        super.onCreate(savedInstanceState);
        setContentView(R.layout.activity_main);
        //初始化視圖
        initView();
        //初始化數據
        initData();
    }

    public void initView() {
        list = new ArrayList<NewsItemModel>();
        mListView = (ListView) findViewById(R.id.list_view);
    }


    public void initData() {
        //開啓一個線程執行耗時操作
        new Thread(new Runnable() {
            @Override
            public void run() {
                //獲取網絡數據
                String result = CommonTool.getRequest("http://news.qq.com/china_index.shtml", "gbk");
                Log.e("結果------------->", result);
                //解析新聞數據
                List<NewsItemModel> list = Function.parseHtmlData(result);

                for (int i = 0; i < list.size(); i++) {
                    NewsItemModel model = list.get(i);
                    //獲取新聞圖片
                    Bitmap bitmap = BitmapFactory.decodeStream(CommonTool.getImgInputStream(list.get(i).getUrlImgAddress()));

                    model.setNewsBitmap(bitmap);
                }
                mHandler.sendMessage(mHandler.obtainMessage(GET_DATA_SUCCEED, list));
            }
        }).start();
    }


    public Handler mHandler = new Handler() {
        @Override
        public void handleMessage(Message msg) {
            switch (msg.what) {
                case GET_DATA_SUCCEED:
                    List<NewsItemModel> list = (List<NewsItemModel>) msg.obj;
                    //新聞列表適配器
                    adapter = new NewsAdapter(MainActivity.this, list, R.layout.adapter_news_item);
                    mListView.setAdapter(adapter);
                    //設置點擊事件
                    mListView.setOnItemClickListener(new ItemClickListener());
                    Toast.makeText(getApplicationContext(), String.valueOf(list.size()), Toast.LENGTH_LONG).show();
                    break;
            }
        }
    };

    /**
     * 新聞列表點擊事件
     */
    public class ItemClickListener implements AdapterView.OnItemClickListener {
        @Override
        public void onItemClick(AdapterView<?> adapterView, View view, int i, long l) {
            NewsItemModel temp =(NewsItemModel) adapter.getItem(i);
            Toast.makeText(getApplicationContext(), temp.getNewsTitle(), Toast.LENGTH_SHORT).show();
        }
    }
}

Demo的最終效果圖
這裏寫圖片描述

程序源碼請戳這裏

發佈了65 篇原創文章 · 獲贊 39 · 訪問量 9萬+
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章