Socket HTTP頁面請求後對gzip頁面的解壓縮實現代碼

原創

dieindark

2020-02-25 10:00

需要注意的有以下幾點:

1、通過socket頁面請求後的receive內容不能經過string後再進行解壓縮處理會造成錯誤的gzip幻數報錯

推薦使用流處理

2、正確分析返回內容分割header和頁面代碼部分

3、對頁面代碼部分進行解壓縮

4、重組header與解壓縮後的頁面代碼

解壓縮使用net2.0的GZipStream類很方便

代碼如下：

using System;
using System.Collections.Generic;
using System.Text;
using System.IO.Compression;
using System.IO;
using ICSharpCode.SharpZipLib.GZip;
namespace 貼吧江湖
{
    public  class clsGzip
    {
        public static string DecompressGzip(MemoryStream stm)
        {
            string strHTML = "";
          
           
            GZipStream gzip = new GZipStream(stm, CompressionMode.Decompress);//解壓縮 
            using (StreamReader reader = new StreamReader(gzip, Encoding.GetEncoding("gb2312")))//中文編碼處理 
            {
                strHTML = reader.ReadToEnd();
            }
            
            return strHTML;
        }
    }
}

socket類中對此方法的調用

int bytes = 0;
            string page = "";
            MemoryStream ms = new MemoryStream();
            
            
            
            do
            {
                bytes = s.Receive(bytesReceived, bytesReceived.Length, 0);
                
                //Encoding gb2312 = Encoding.GetEncoding("gb2312");//將讀取的字節數轉換爲字符串    
                //page = page + gb2312.GetString(bytesReceived, 0, bytes); 
                ms.Write(bytesReceived, 0, bytes);
                
                //Console.WriteLine(bytes); 
            }
            while (bytes > 0);
            s.Close();
            Console.WriteLine(ms.Length);
            ms.Seek(0, SeekOrigin.Begin);//將流的讀寫位置移動到開頭 
            Encoding gb2312 = Encoding.GetEncoding("gb2312");//準備獲取HTTP header中頁面內容的大小 
            
            page = new StreamReader(ms, gb2312).ReadToEnd();//將流讀入到字符串準備分割 
            string[] sArray = page.Split(new string[] { "/r/n/r/n" }, StringSplitOptions.RemoveEmptyEntries);//分割web服務器返回代碼 分爲頭域和頁面代碼 
            page = page.Substring(page.IndexOf("Content-Length: ") + 16);//分割字符串獲得頁面內容大小 
            page = page.Substring(0, page.IndexOf("/r/n"));
            //Console.WriteLine(page);//輸出經過gzip壓縮的頁面內容的大小 
            long begin = ms.Length - Convert.ToInt64(page);//流長度－頁面內容大小就是我們需要的頁面內容在流內的起始位置 
            ms.Seek(begin, SeekOrigin.Begin);//移動到此位置 
            page = clsGzip.DecompressGzip(ms);//將流傳遞給解壓縮方法 
            //Console.WriteLine(begin); 
            page = sArray[0] + "/r/n/r/n" + page;//將header與解壓縮後的頁面內容重新組合 
            //Console.WriteLine(page); 
            return page;

發送頁面請求的時候注意加上Accept-Encoding: gzip, deflate/r/n

終於解決了。。。還希望有朋友遇到相同的問題少走彎路

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

Socket HTTP頁面請求後對gzip頁面的解壓縮實現代碼

引用System.web和不引用兩種方法實現字符串轉變爲UrlEncode 用於提交POST

C#裏巧用DateTime預設一些可選的日期範圍(如本年度、本季度、本月等)(

一段完整的Socket HTTP協議中 GET報文的應用

還是.net把

讓SendKeys支持空格鍵

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結