爬蟲怎麼解決小紅書風控

原創

2021-07-16 21:33

7月16日消息，據21世紀經濟報道，小紅書將暫停其在美國的上市計劃。雖然對於上市什麼的不太關心，但是對於爬蟲工作者來說能獲取數據纔是比較關心的。今天就跟大家聊聊有關使用Python爬蟲怎麼爬取小紅書，而且最近小紅薯的風控真的超級嚴，我想有小夥伴應該領會到了。

關於小紅書的採集，一般的方法肯定是不行的，在我看來要採集小紅薯有三個必須要基礎策略。1、隨機ua。越多越好。2、cookie 3、代理ip，關於這個是很重要的，一定要選擇優質代理，不然再多的代理也是白費。

我們這裏分享下，通過python獲取到小紅薯數據然後再進行分析。

<?php
    // 要訪問的目標頁面
    $url = "https://www.xiaohongshu.com/";
    $urls = "https://www.xiaohongshu.com/";

    // 代理服務器(產品官網 www.16yun.cn)
    define("PROXY_SERVER", "tcp://t.16yun.cn:31111");

    // 代理身份信息
    define("PROXY_USER", "username");
    define("PROXY_PASS", "password");

    $proxyAuth = base64_encode(PROXY_USER . ":" . PROXY_PASS);

    // 設置 Proxy tunnel
    $tunnel = rand(1,10000);

    $headers = implode("\r\n", [
        "Proxy-Authorization: Basic {$proxyAuth}",
        "Proxy-Tunnel: ${tunnel}",
    ]);
    $sniServer = parse_url($urls, PHP_URL_HOST);
    $options = [
        "http" => [
            "proxy"  => PROXY_SERVER,
            "header" => $headers,
            "method" => "GET",
            'request_fulluri' => true,
        ],
        'ssl' => array(
                'SNI_enabled' => true, // Disable SNI for https over http proxies
                'SNI_server_name' => $sniServer
        )
    ];
    print($url);
    $context = stream_context_create($options);
    $result = file_get_contents($url, false, $context);
    var_dump($result);

    // 訪問 HTTPS 頁面
    print($urls);
    $context = stream_context_create($options);
    $result = file_get_contents($urls, false, $context);
    var_dump($result);
?>
curl
GuzzleHttp

看完上述內容，你們對使用Python爬蟲怎麼爬取小紅書有進一步的瞭解嗎？如果有更好的方法大家可以積極交流。

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

爬蟲怎麼解決小紅書風控

C#開源的兩款功能強大的錄屏神器

認知提升的方法

螞蟻面試：Springcloud核心組件的底層原理，你知道多少？

Java集合中的Set

安裝chromadb注意事項

前端面試題 - null是原始類型，但爲什麼typeof null的結果是object？

前端面試題 - 如何實現promise？

Java中的List

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結