深度探索Hadoop分佈式文件系統（HDFS）數據讀取流程

原創

守护石论数据

2020-12-25 11:58

{"type":"doc","content":[{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"1. 開篇","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Hadoop分佈式文件系統（HDFS）是Hadoop大數據生態最底層的數據存儲設施。因其具備了海量數據分佈式存儲能力，針對不同批處理業務的大吞吐數據計算承載力，使其綜合複雜度要遠遠高於其他數據存儲系統。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"因此對Hadoop分佈式文件系統（HDFS）的深入研究，瞭解其架構特徵、讀寫流程、分區模式、高可用思想、數據存儲規劃等知識，對學習大數據技術大有裨益，尤其是面臨開發生產環境時，能做到胸中有數。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"本文重點從客戶端讀取HDFS數據的角度切入，通過Hadoop源代碼跟蹤手段，層層撥開，漸漸深入Hadoop機制內部，使其讀取流程逐漸明朗化。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"2. HDFS數據讀取整體架構流程","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/a2/a29439d647f693a701769d73f0d1e98a.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":" HDFS數據訪問整體架構流程","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如上圖所示：描繪了客戶端訪問HDFS數據的簡化後整體架構流程。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"（1）客戶端向hdfs namenode節點發送Path文件路徑的數據訪問的請求","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"（2） Namenode會根據文件路徑收集所有數據塊（block）的位置信息，並根據數據塊在文件中的先後順序，按次序組成數據塊定位集合（located blocks），迴應給客戶端","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"（3）客戶端拿到數據塊定位集合後，創建HDFS輸入流，定位第一個數據塊所在的位置，並讀取datanode的數據流。之後根據讀取偏移量定位下一個datanode並創建新的數據塊讀取數據流，以此類推，完成對HDFS文件的整個讀取。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"3. Hadoop源代碼分析","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"經過上述簡單描述，我們對客戶端讀取HDFS文件數據有了一個整體上概念，那麼這一節，我們開始從源代碼跟蹤的方向，深度去分析一下HDFS的數據訪問內部機制。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"(一) namenode代理類生成的源代碼探索","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"爲什麼我們要先從namenode代理生成說起呢？原因就是先了解清楚客戶端與namenode之間的來龍去脈，再看之後的數據獲取過程就有頭緒了。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"（1）首先我們先從一個hdfs-site.xml配置看起","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"html"},"content":[{"type":"text","text":"\n \n dfs.client.failover.proxy.provider.fszx\n org.apache.hadoop.hdfs.server.namenode.ha.\n ConfiguredFailoverProxyProvider\n","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"配置中定義了namenode代理的提供者爲ConfiguredFailoverProxyProvider。什麼叫namenode代理？其實本質上就是連接namenode服務的客戶端網絡通訊對象，用於客戶端和namenode服務端的交流。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"（2）接着我們看看ConfiguredFailoverProxyProvider的源代碼繼承關係結構","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/02/02946eeea933fbff9bc896ba1b7566c1.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"ConfiguredFailoverProxyProvider繼承關係圖","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"上圖是ConfiguredFailoverProxyProvider的繼承關係，頂端接口是FailoverProxyProvider，它包含了一段代碼:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"java"},"content":[{"type":"text","text":" /**\n * Get the proxy object which should be used until the next failover event\n * occurs.\n * @return the proxy object to invoke methods upon\n */\n public ProxyInfo getProxy();","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這個方法返回的ProxyInfo就是namenode代理對象，當然客戶端獲取的ProxyInfo整個過程非常複雜，甚至還用到了動態代理，但本質上就是通過此接口拿到了namenode代理。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"（3）此時類關係演化成如下圖所示：","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/1c/1cfd58ea29192a69ff092a46a1207c3f.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"namonode創建過程類關係圖","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"上圖ProxyInfo就是namenode的代理類，繼承的子類NNProxyInfo就是具體指定是高可用代理類。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"（4）那麼費了這麼大勁搞清楚的namenode代理，它的作用在哪裏呢？","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這就需要關注一個極爲重要的對象DFSClient了，它是所有客戶端向HDFS發起輸入輸出流的起點，如下圖所示：","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/6a/6aa0c2ef2c01321a146e7515707a443a.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"DFSClient初始化過程類關係圖","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"上圖實線代表了真實的調用過程，虛線代表了對象之間的間接關係。我們可以看到DFSClient是一個關鍵角色，它由分佈式文件系統對象(DistributeFileSystem)初始化，並在初始化中調用NameNodeProxiesClient等一系列操作，實現了高可用NNproxyInfo對象創建，也就是namenode代理，並最終作爲DFSClient對象的一個成員，在創建數據流等過程中使用。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"(二) 讀取文件流的深入源代碼探索","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"（1）首先方法一樣，先找一個切入口。建立從HDFS下載文件到本地的一個簡單場景，以下是代碼片段：","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"java"},"content":[{"type":"text","text":"……\n//打開HDFS文件輸入流\ninput = fileSystem.open(new Path(hdfs_file_path));\n//創建本地文件輸出流\noutput = new FileOutputStream(local_file_path);\n//通過IOUtils工具實現數據流字節循環複製\nIOUtils.copyBytes(input, output, 4096, true);\n……","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"咱們再看看IOUtils的一段文件流讀寫的方法代碼：","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"java"},"content":[{"type":"text","text":"/**\n * Copies from one stream to another.\n * \n * @param in InputStrem to read from\n * @param out OutputStream to write to\n * @param buffSize the size of the buffer \n */\n public static void copyBytes(InputStream in, OutputStream out, int buffSize) \n throws IOException {\n PrintStream ps = out instanceof PrintStream ? (PrintStream)out : null;\n byte buf[] = new byte[buffSize];\n int bytesRead = in.read(buf);\n while (bytesRead >= 0) {\n out.write(buf, 0, bytesRead);\n if ((ps != null) && ps.checkError()) {\n throw new IOException(\"Unable to write to output stream.\");\n }\n bytesRead = in.read(buf);\n }\n }","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這段代碼是個標準的循環讀取HDFS InputStream數據流，然後向本地文件OutputStream輸出流寫數據的過程。我們的目標是深入到HDFS InputStream數據流的創建和使用過程。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"（2）接下來我們開始分析InputStream的產生過程，如下圖所示：","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/ec/ec8a36797060fa31ba13f7fc985e9d40.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"InputStream打開流程類關係圖","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"上圖實線代表了真實的調用過程，虛線代表了對象之間的間接關係。其代碼內部結構極爲複雜，我用此圖用最簡化的方式讓我們能快速的理解清楚他的原理。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我來簡單講解一下這個過程：","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"第一步是DistributeFileSystem通過調用DFSClient對象的open方法，實現對DFSInputStream對象的創建，DFSInputStream對象是真正讀取數據塊（LocationBlock）以及與datanode交互的實現邏輯，是真正的核心類。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"第二步，DFSClient在創建DFSInputStream的過程中，需要爲其傳入調用namenode代理而返回的數據塊集合（LocationBlocks）。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"第三步，DFSClient創建一個裝飾器類HDFSDataInputStream，封裝了DFSInputStream，由裝飾器的父類FSDataInputStream最終返回給DistributeFileSystem，由客戶端開發者使用。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"（3）最後我們再深入到數據塊讀取機制的源代碼上看看，如下圖所示：","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/c1/c1e86a454da4a48215a319d3523b4486.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"DFSInputStream數據讀取流程類關係圖","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"上圖實線代表了真實的調用過程，虛線代表了對象之間的間接關係。實際的代碼邏輯比較複雜，此圖也是儘量簡化展現，方便我們理解。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"一樣的，我來簡單講解一下這個過程：","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"第一步FSDataInputStream裝飾器接受客戶端的讀取調用對DFSInputStream對象進行read(...)方法調用。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"第二步 DFSInputStream會調用自身的blockSeekTo(long offset)方法，一方面根據offset數據偏移量，定位當前是否要讀取新的數據塊(LocationBlock)，另一方面新的數據塊從數據塊集合(LocationBlocks)中找到後，尋找最佳的數據節點，也就是Hadoop所謂的就近原則，先看看本機數據節點有沒有副本，再次根據網絡距離着就近獲取副本。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"第三步通過FSDataInputStream副本上數據塊（LocationBlock）構建BlockReader對象，它就是真正讀取數據塊的對象。BlockReader對象它有不同的實現，由BlockReaderFactory.build根據條件最優選擇具體實現，BlockReaderLocal和BlockReaderLocalLegacy（based on HDFS-2246）是優選方案，也是short-circuit block readers方案，相當於直接從本地文件系統讀數據了，若short-circuit因爲安全等因素不可用，就會嘗試UNIX domain sockets的優化方案，再不行才考慮BlockReaderRemote建立TCP sockets的連接方案了。BlockReader的細節原理也非常值得深入一探究竟，待下次我專門寫一篇針對BlockReader原理機制文章。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"4. 結束","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"非常感覺您能看完。下一篇我會對“","attrs":{}},{"type":"link","attrs":{"href":"https://xie.infoq.cn/article/4438a7b40a9d832e9eb7c4e67","title":""},"content":[{"type":"text","text":"Hadoop分佈式文件系統（HDFS）數據寫入流程","attrs":{}}]},{"type":"text","text":"”做一篇深度探索分析。期盼您的關注。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"作者：方順西安守護石信息科技創始人致力於IT工程師在大數據領域的技術提升","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"link","attrs":{"href":"https://www.zhihu.com/column/c_151487501","title":null},"content":[{"type":"text","text":"前往我的知乎專欄——瞭解更多關於大數據的知識","attrs":{}}]},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":" ","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/c1/c18eee3b3b896897a333305bc70cfc88.jpeg","alt":null,"title":"公衆號：守護石論數據","style":[{"key":"width","value":"25%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}}]}

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

相關文章

如何通過前端表格控件在10分鐘內完成一張分組報表？

前言：當今時代，報表作爲信息化系統的重要組成部分，在日常的使用中發揮着關鍵作用。藉助報表工具使得數據錄入、分析和傳遞的過程被數字化和智能化，大大提高了數據的準確性及利用的高效性。而在此過程中，信息化系統能夠實現對數據的實時監控和更新，爲管

2024-05-06 10:22:56

巧用 TiCDC Syncpiont 構建銀行實時交易和準實時計算一體化架構

本文闡述了某商業銀行如何利用 TiCDC Syncpoint 功能，在 TiDB 平臺上構建一個既能處理實時交易又能進行準實時計算的一體化架構，用以優化其零售資格業務系統的實踐。通過遷移到 TiDB 並巧妙應用 Syncpoint，該銀行成

2024-04-30 22:24:58

Apache DolphinScheduler支持Flink嗎？

隨着大數據技術的快速發展，很多企業開始將Flink引入到生產環境中，以滿足日益複雜的數據處理需求。而作爲一款企業級的數據調度平臺，Apache DolphinScheduler也跟上了時代步伐，推出了對Flink任務類型的支持。 Flink

2024-04-30 11:49:27

從原始邊列表到鄰接矩陣Python實現圖數據處理的完整指南

本文分享自華爲雲社區《從原始邊列表到鄰接矩陣Python實現圖數據處理的完整指南》，作者：檸檬味擁抱。在圖論和網絡分析中，圖是一種非常重要的數據結構，它由節點（或頂點）和連接這些節點的邊組成。在Python中，我們可以使用鄰接矩陣來表示

2024-04-30 10:34:05

如何通過前後端交互的方式製作Excel報表

前言 Excel擁有在辦公領域最廣泛的受衆羣體，以其強大的數據處理和可視化功能，成了無可替代的工具。它不僅可以呈現數據清晰明瞭，還能進行數據分析、圖表製作和數據透視等操作，爲用戶提供了全面的數據展示和分析能力。今天小編就爲大家介紹一下，如

2024-04-30 10:24:12

Python爬蟲技術與數據可視化：Numpy、pandas、Matplotlib的黃金組合

前言在當今信息爆炸的時代，數據已成爲企業決策和發展的關鍵。而互聯網作爲信息的主要來源，網頁中蘊含着大量的數據等待被挖掘。Python爬蟲技術和數據可視化工具的結合，爲我們提供了一個強大的工具箱，可以幫助我們從網絡中抓取數據，並將其可視

2024-04-29 23:26:28

大模型將進一步推動AI數據發展，行業數據類型更加豐富

爲支撐加快推進新型工業化，發展新質生產力，探索數據要素與智能算力網絡協同發展路徑，促進數字技術與實體經濟深度融合，中國信息通信研究院作爲新型基礎設施建設者，科技創新的領軍者，在2024星火生態大會期間，舉辦了"數據要素及智能算力網絡創新專題

2024-04-29 00:55:15

1 名工程師輕鬆管理 20 個工作流，創業企業用 Serverless 讓數據處理流程提效

作者：嶽洋、陳德全、劉靜娜北京語勢科技有限公司成立於 2023 年 6 月，語勢科技定位爲“智能投資時代的主題入口”，在資管行業從以機構爲核心轉向以用戶爲核心的變革時代，通過打造主題投資引擎，賦能普惠投資一體化，打造以投資者和資管機構爲主

2024-04-28 21:12:22

大數據小白的測試成長之路

引言 22年校招入職京東後，我一直在數據中臺測試部從事測試開發的工作。畢業後，寫的最多的文檔是測試計劃和測試報告，鮮有機會就自己的成長碼字進行回顧和總結。借“up技術人”欄目，也終於是在工作之餘回頭望，對自己這近兩年時光進行一個小總結

2024-04-28 11:17:19

賦能開發者，騰訊雲與你共探AI提升十倍生產力之路

引言 AI 技術發展迅速，對於開發者而言，AI 既可能是提高生產力的神兵利器，也可能成爲職業生涯潛在的“威脅”。開發者如何與 AI 協同進化，提升個人能力和價值；如何利用提高 AI 生產力，推動企業創新，實現降本提效

2024-04-28 11:11:17

京東廣告研發——效率爲王：廣告統一檢索平臺實踐

1、系統概述實踐證明，將互聯網流量變現的在線廣告是互聯網最成功的商業模式，而電商場景是在線廣告的核心場景。京東服務中國數億的用戶和大量的商家，商品池海量。平臺在兼顧用戶體驗、平臺、廣告主收益的前提推送商品具有挑戰性。京東廣告檢索平臺

2024-04-25 23:17:47

RocketMQ 之 IoT 消息解析：物聯網需要什麼樣的消息技術?

前言：從初代開源消息隊列崛起，到 PC 互聯網、移動互聯網爆發式發展，再到如今 IoT、雲計算、雲原生引領了新的技術趨勢，消息中間件的發展已經走過了 30 多個年頭。目前，消息中間件在國內許多行業的關鍵應用中扮演着至關重要的角色。隨着數

2024-04-24 23:40:04

“企業創新新引擎”數據庫專項賦能會，讓雲原生技術普惠千行百業！

本文分享自華爲雲社區《“企業創新新引擎”數據庫專項賦能會，讓雲原生技術普惠千行百業！》，作者： GaussDB 數據庫。 4月19日，由福州軟件園科技創新發展公司和華爲技術有限公司聯合主辦的HCDG城市行福州站——“企業創新新引擎”數據庫專

2024-04-24 10:32:53

如何增強Java API 的導入和導出性能

前言 GrapeCity Documents for Excel (以下簡稱GcExcel) 是葡萄城公司的一款服務端表格組件，它提供了一組全面的 API 以編程方式生成 Excel (XLSX) 電子表格文檔的功能，支持爲多個平臺創建、操

2024-04-23 10:23:02

SLS 查詢新範式：使用 SPL 對日誌進行交互式探索

作者：無哲引言在構建現代數據和業務系統的過程中，可觀測性已經變得至關重要，日誌服務（SLS）爲 Log/Trace/Metric 數據提供了大規模、低成本、高性能的一站式平臺服務，並提供數據採集、加工、投遞、分析、告警、可視化等功能，從

2024-04-22 21:12:05

24小時熱門文章

最新文章

深度探索Hadoop分佈式文件系統（HDFS）數據讀取流程

最新評論文章