https://github.com/angel-star/ARTS/tree/master/2018_07_15

Algorithm

709. To Lower Case
題目描述：
Implement function ToLowerCase() that has a string parameter str, and returns the same string in lowercase.

下面給出最普遍的做法的C++實現

class Solution {
public:
    string toLowerCase(string str) {
        string result = "";
        for (auto s : str) {
            if (s >= 'A' && s <= 'Z') {
                result += 'a' + s - 'A';
            } else result += s;
        }
        return result;
    }
};

本週的Algorithm環節劃個水，下週來兩道。

Review

【爲什麼做木工可以讓你成爲個更好的程序員】Why Woodworking Will Make You a Better Coder

本文內容概要：

I’ve found that dissimilar hobbies can lead to unexpected transfer learning. In the following essay, I share an example of this lateral thinking; how skills in one area can inform learning and decision making in another.

本文通過五個方面闡述了做一個”木工”爲什麼會讓自己在敲程序方面有所建樹，他給出了五個理由：

It will teach you utility
It will teach you to troubleshoot
It will teach you bootstrapping
It will teach you progress
It will teach you work backwards

* 本篇文章闡述了作者在通過日常生活中的接觸到的”木工”工作，引發的一系列的思考——它和coding很像，並且可以在某方面給你更新的思考 *
正如文章開頭所說的那樣

Many adept technologists agree that being good at one thing is the same as being good at none; “two is one and one is none”. Those who can work across many disciplines — the polymaths — will dominate the future of business.

關鍵詞：ProgrammingData Science LifeCodeComputer Science

Technique

什麼是分佈式文件系統？

分佈式文件系統。可以拆分爲三個詞，分佈式、文件和文件系統。

什麼是分佈式？簡單來說就是用一批機器通過網絡進行協同統一提供某種服務，這種服務的內部細節通常是對外透明的，外部只能感知到這個系統是提供某個可靠的服務，而不關心內部的節點狀態，副本，分割等等。

什麼是文件？對於linux系統來說，一切都是文件。那什麼是文件呢？無論什麼文件，都可以看成很大很大的二進制數組。

什麼是文件系統？首先系統就是多個組件有機結合的產物，文件系統則是系統中負責管理和存儲文件信息的子系統。

所以什麼叫分佈式文件系統？

就是一個利用一批機器通過網絡方式，以統一透明的服務對外提供文件存儲和文件管理的系統。

下面就開始練級了，能學到哪算哪，今天只搞搞 Level1。

JDK是8，依賴使用maven 管理，先看看maven 依賴，只依賴了一個guava 包，用來方便處理各種集合。


<dependency>
   <groupId>com.google.guava</groupId>
   <artifactId>guava</artifactId>
   <version>25.1-jre</version>
</dependency>

Level 1 搞一個簡單的內存分佈式存儲器

package com.bigbanana.lab.lab3.dajiao.step1;

import com.google.common.primitives.Bytes;

import java.util.*;

public class SimpleDistributeFSInMemory {

   public static Map<String,Map<String,List<Byte>>> servers= new HashMap<>();
   public static Integer serverSize = 3;

   static {
      /**
       * 初始化三臺虛擬的機器
       */
      for(int i = 0 ; i < serverSize ;i++){
         Map<String,List<Byte>> server = new HashMap<>();
         servers.put(i+"",server);
      }

   }

   public static void main(String[] args){
      /**
       * 命令行初始化
       */
      Scanner scanner = new Scanner(System.in);

      while (scanner.hasNextLine()){
         String command  = scanner.nextLine();
         String[] commandArray = command.split(" ");

         String targetCommand = commandArray[0];
         String fileName = commandArray[1];

         /**
          * 根據文件名找到文件所在的服務器，並獲得對應文件池的索引
          */
         int serverIndex = getServerIndex(fileName);
         Map<String,List<Byte>> serverFile = servers.get(serverIndex+"");

         if("get".equals(targetCommand)){
            println("getting file from server "+serverIndex +"....");

            /**
             * 如果是get 從文件桶中找到對應名稱的文件，並把它轉成String 輸出出來。
             */
            println(new String(Bytes.toArray(serverFile.getOrDefault(fileName,new ArrayList<>()))));

         }else if("put".equals(targetCommand)){
            /**
             * 如果是put 把文件(這裏是 String)轉換成 byte 數組，並把它保存到文件桶中
             */
            println("putting file to server "+serverIndex +"....");
            String file = commandArray[2];
            List<Byte> fileBytes = Bytes.asList(file.getBytes());

            serverFile.put(fileName,fileBytes);

            println("success");
         }
      }

   }

   /**
    * 打印的工具類
    */
   public static void println(Object o){
      System.out.println(o);
   }

   /**
    * 根據hash獲取文件服務的編號
    */
   public static Integer getServerIndex(String fileName){
      return fileName.hashCode() % serverSize;
   }
}

在命令行中敲一下自己剛剛實現的文件系統，可以看到我們已經實現了最最基礎的功能，即 get 和 put。

put banana.git banananisToooBig
putting file to server 1….
success
get banana.get
getting file from server 0….

get banana.git
getting file from server 1….
banananisToooBig

實現的思路是怎樣的呢？
1、初始化整個服務器集羣，這裏用了3，Level 2 中這裏會將服務器抽象成單獨的實體。

public static Map<String,Map<String,List<Byte>>> servers= new HashMap<>();
public static Integer serverSize = 3;

static {
   /**
    * 初始化三臺虛擬的機器
    */
   for(int i = 0 ; i < serverSize ;i++){
      Map<String,List<Byte>> server = new HashMap<>();
      servers.put(i+"",server);
   }

}

2、定義服務器的分桶方式，這裏使用文件名 hash 然後取 mod 的方式進行服務器尋址。

/**
 * 根據hash獲取文件服務的編號
 */
public static Integer getServerIndex(String fileName){
   return fileName.hashCode() % serverSize;
}

3、獲取對應的服務器，根據我們前邊定義的方式，我們這裏直接從 Map 裏邊取出來了。


/**
 * 根據文件名找到文件所在的服務器，並獲得對應文件池的索引
 */
int serverIndex = getServerIndex(fileName);
Map<String,List<Byte>> serverFile = servers.get(serverIndex+"");

4、定義put 操作。取到對應的文件(這裏把文件簡化爲String了)，然後獲取文件對應的 byte 數組，把它們的格式進行統一之後，寫入到服務器的文件桶裏。

else if("put".equals(targetCommand)){
   /**
    * 如果是put 把文件(這裏是 String)轉換成 byte 數組，並把它保存到文件桶中
    */
   println("putting file to server "+serverIndex +"....");
   String file = commandArray[2];
   List<Byte> fileBytes = Bytes.asList(file.getBytes());

   serverFile.put(fileName,fileBytes);

   println("success");
}

5、定義 get 操作。跟put 操作一樣，先找到文件服務器，再找到文件桶，然後從桶裏按照文件名取得對應的 byte List，轉換成byte數組然後轉換成 String，打印出來。

if("get".equals(targetCommand)){
   println("getting file from server "+serverIndex +"....");

   /**
    * 如果是get 從文件桶中找到對應名稱的文件，並把它轉成String 輸出出來。
    */
   println(new String(Bytes.toArray(serverFile.getOrDefault(fileName,new ArrayList<>()))));

}

6、向外暴露 client ，這裏是使用 Scanner 來掃描系統輸入，然後按照 “command 文件名文件內容” 的方式來組織這個過程。

Scanner scanner = new Scanner(System.in);

while (scanner.hasNextLine()){
   String command  = scanner.nextLine();
   String[] commandArray = command.split(" ");

   String targetCommand = commandArray[0];
   String fileName = commandArray[1];
}

至此，我們分佈式文件系統 LEVEL 1就愉快地結束了，希望對你有幫助，想讓我實現 LEVEL 2說一下，因爲這個版本是最簡單的，LEVEL 2 會將各個服務進行抽象，方便我們後面對各個服務進行單獨部署和協同，定義多一個 append 的api，當然還是以內存的格式，因爲一開始就陷入文件流網絡的操作，會讓你很迷糊。當然我也放到github上了，就下邊這個練級項目，在lab3裏邊。

Java 練級項目我已經建立了，https://github.com/CallMeDJ/BananaLab.git

那麼如何操作呢？
1. 新建一個自己的package，並拷貝dajiao文件夾的demo類
2. 實現類裏邊需要你實現的功能
3. 跑 test ，通過之後可以提交。
4. 寫一個 MD文件，描述一下你的思路以及注意點。
5. 提交你的PR，我會看情況 merge，目前已經有8位小夥伴一起練級啦。

轉載自公衆號《一名叫大蕉的程序員》

今天的主角是 Wide & Deep Model，在推薦系統和 CTR 預估中都有應用。萬字長文，牆裂推薦！

本文來自於公衆號《AI前線》

開篇介紹了幾個名詞解釋 Memorization 和 Generalization 、Wide 和 Deep 、 Cross-product transformation

然後介紹了兩種推薦系統：CF-Based（協同過濾）、Content-Based（基於內容的推薦）

在簡單概述系統實踐後又提到了適用範圍與優缺點並在大篇幅地介紹了代碼實現：

https://github.com/gutouyu/ML_CIA/tree/master/Wide%26Deep

數據集：https://archive.ics.uci.edu/ml/machine-learning-databases/adult

代碼主要包括兩部分：Wide Linear Model 和 Wide & Deep Model。

『CTR預估專欄 | 詳解Wide&Deep理論與實踐』

注：本文原發於公衆號《機器學習薦貨情報局》，感興趣的同學們可以去查一下其他幾篇專欄文章

另外安利大家一篇文章，是瓜哥發在羣裏的

如何閱讀一篇論文

【ARTS】28 week

Algorithm

Review

Technique

利用三十行Python爬取網絡小說

線性錶鏈表表相關習題及詳解（綜合） ——數據結構

線性錶鏈表表相關習題及詳解（選擇題） ——數據結構

【ARTS】30 week

錯誤提示：The ycmd server SHUT DOWN (restart with ':YcmRestartServer'). Unexpected exit code 1.

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結

【ARTS】28 week

Algorithm

Review

Technique

Share