用Serverlss部署一個基於深度學習的古詩詞生成API

{"type":"doc","content":[{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"前言"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"古詩詞是中國文化殿堂的瑰寶,記得曾經在韓國做Exchange Student的時候,看到他們學習我們的古詩詞,有中文的還有翻譯版的,自己發自內心的驕傲,甚至也會在某些時候背起一些耳熟能詳的詩詞。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"本文將會通過深度學習爲我們生成一些古詩詞,並將模型部署到Serverless架構上,實現基於Serverless的古詩詞生成API。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"項目構建"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"古詩詞生成實際上是文本生成,或者說是生成式文本。關於基於深度學習的文本生成,最入門級的讀物包括Andrej Karpathy的博客。他使用例子生動講解了Char-RNN(Character based Recurrent Neural Network)如何用於從文本數據集裏學習,然後自動生成像模像樣的文本。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/bd/bd8852bd9ad0901b7c1342bd5e1580de.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"上圖直觀展示了Char-RNN的原理。以要讓模型學習寫出“hello”爲例,Char-RNN的輸入輸出層都是以字符爲單位。輸入“h”,應該輸出“e”;輸入“e”,則應該輸出後續的“l”。輸入層我們可以用只有一個元素爲1的向量來編碼不同的字符,例如,h被編碼爲“1000”、“e”被編碼爲“0100”,而“l”被編碼爲“0010”。使用RNN的學習目標是,可以讓生成的下一個字符儘量與訓練樣本里的目標輸出一致。在圖一的例子中,根據前兩個字符產生的狀態和第三個輸入“l”預測出的下一個字符的向量爲<0.1, 0.5, 1.9, -1.1>,最大的一維是第三維,對應的字符則爲“0010”,正好是“l”。這就是一個正確的預測。但從第一個“h”得到的輸出向量是第四維最大,對應的並不是“e”,這樣就產生代價。學習的過程就是不斷降低這個代價。學習到的模型,對任何輸入字符可以很好地不斷預測下一個字符,如此一來就能生成句子或段落。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"本文項目構建參考了Github已有項目:https://github.com/norybaby/poet"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"通過Clone代碼,並且安裝相關依賴:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","text":"pip3 install tensorflow==1.14 word2vec numpy"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"通過訓練:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","text":"python3 train.py"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"可以看到訓練結果:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/e1/e19e4707df908b48412851bb7f8f5bd8.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/98/9812c16895150ee9f1dfb3656ff0ce00.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"此時會生成多個模型在output_poem文件夾下,我們只需要保留最好的即可,例如我的訓練之後生成的json文件:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"json"},"content":[{"type":"text","text":"{\n \"best_model\": \"output_poem/best_model/model-20390\",\n \"best_valid_ppl\": 21.441762924194336,\n \"latest_model\": \"output_poem/save_model/model-20390\",\n \"params\": {\n \"batch_size\": 16,\n \"cell_type\": \"lstm\",\n \"dropout\": 0.0,\n \"embedding_size\": 128,\n \"hidden_size\": 128,\n \"input_dropout\": 0.0,\n \"learning_rate\": 0.005,\n \"max_grad_norm\": 5.0,\n \"num_layers\": 2,\n \"num_unrollings\": 64\n },\n \"test_ppl\": 25.83984375\n}"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"此時,我只需要保存"},{"type":"codeinline","content":[{"type":"text","text":"output_poem/best_model/model-20390"}]},{"type":"text","text":"模型即可。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"部署上線"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在項目目錄下,安裝必要依賴:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","text":"pip3 install word2vec numpy -t ./"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"由於tensorflow等是騰訊云云函數內置的package,所以這裏無需安裝,另外numpy這個package需要在CentOS+Python3.6環境下打包。也可以通過之前製作的小工具打包:https://www.serverlesschina.com/35.html"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"完成之後,編寫函數入口文件:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"python"},"content":[{"type":"text","text":"import uuid, json\nfrom write_poem import WritePoem, start_model\n\nwriter = start_model()\n\n\ndef return_msg(error, msg):\n return_data = {\n \"uuid\": str(uuid.uuid1()),\n \"error\": error,\n \"message\": msg\n }\n print(return_data)\n return return_data\n\n\ndef main_handler(event, context):\n # 類型\n # 1: 自由\n # 2: 押韻\n # 3: 藏頭押韻\n # 4: 藏字押韻\n\n style = json.loads(event[\"body\"])[\"style\"]\n content = json.loads(event[\"body\"]).get(\"content\", None)\n\n if style in '34' and not content:\n return return_msg(True, \"請輸入content參數\")\n\n if style == '1':\n return return_msg(False, writer.free_verse())\n elif style == '2':\n return return_msg(False, writer.rhyme_verse())\n elif style == '3':\n return return_msg(False, writer.cangtou(content))\n elif style == '4':\n return return_msg(False, writer.hide_words(content))\n else:\n return return_msg(True, \"請輸入正確的style參數\")\n"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"同時需要準備好Yaml文件:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"yaml"},"content":[{"type":"text","text":"getUserIp:\n component: \"@serverless/tencent-scf\"\n inputs:\n name: autoPoem\n codeUri: ./\n exclude:\n - .gitignore\n - .git/**\n - .serverless\n - .env\n handler: index.main_handler\n runtime: Python3.6\n region: ap-beijing\n description: 自動古詩詞撰寫\n namespace: serverless_tools\n memorySize: 512\n timeout: 10\n events:\n - apigw:\n name: serverless\n parameters:\n serviceId: service-8d3fi753\n protocols:\n - http\n - https\n environment: release\n endpoints:\n - path: /auto/poem\n description: 自動古詩詞撰寫\n method: POST\n enableCORS: true"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"此時,我們就可以通過Serverless Framework CLI部署項目。部署完成之後,我們可以通過PostMan測試我們的接口:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/eb/ebcb119cd7dc40d92e22d6d3732ba1cd.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"總結"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"本文通過已有的深度學習項目,在本地進行訓練,保存模型,然後將項目部署在騰訊云云函數上,通過與API網關的聯動,實現了一個基於深度學習的古詩詞撰寫的API。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章