用Serverlss部署一个基于深度学习的古诗词生成API

{"type":"doc","content":[{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"前言"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"古诗词是中国文化殿堂的瑰宝,记得曾经在韩国做Exchange Student的时候,看到他们学习我们的古诗词,有中文的还有翻译版的,自己发自内心的骄傲,甚至也会在某些时候背起一些耳熟能详的诗词。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"本文将会通过深度学习为我们生成一些古诗词,并将模型部署到Serverless架构上,实现基于Serverless的古诗词生成API。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"项目构建"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"古诗词生成实际上是文本生成,或者说是生成式文本。关于基于深度学习的文本生成,最入门级的读物包括Andrej Karpathy的博客。他使用例子生动讲解了Char-RNN(Character based Recurrent Neural Network)如何用于从文本数据集里学习,然后自动生成像模像样的文本。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/bd/bd8852bd9ad0901b7c1342bd5e1580de.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"上图直观展示了Char-RNN的原理。以要让模型学习写出“hello”为例,Char-RNN的输入输出层都是以字符为单位。输入“h”,应该输出“e”;输入“e”,则应该输出后续的“l”。输入层我们可以用只有一个元素为1的向量来编码不同的字符,例如,h被编码为“1000”、“e”被编码为“0100”,而“l”被编码为“0010”。使用RNN的学习目标是,可以让生成的下一个字符尽量与训练样本里的目标输出一致。在图一的例子中,根据前两个字符产生的状态和第三个输入“l”预测出的下一个字符的向量为<0.1, 0.5, 1.9, -1.1>,最大的一维是第三维,对应的字符则为“0010”,正好是“l”。这就是一个正确的预测。但从第一个“h”得到的输出向量是第四维最大,对应的并不是“e”,这样就产生代价。学习的过程就是不断降低这个代价。学习到的模型,对任何输入字符可以很好地不断预测下一个字符,如此一来就能生成句子或段落。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"本文项目构建参考了Github已有项目:https://github.com/norybaby/poet"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"通过Clone代码,并且安装相关依赖:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","text":"pip3 install tensorflow==1.14 word2vec numpy"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"通过训练:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","text":"python3 train.py"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"可以看到训练结果:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/e1/e19e4707df908b48412851bb7f8f5bd8.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/98/9812c16895150ee9f1dfb3656ff0ce00.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"此时会生成多个模型在output_poem文件夹下,我们只需要保留最好的即可,例如我的训练之后生成的json文件:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"json"},"content":[{"type":"text","text":"{\n \"best_model\": \"output_poem/best_model/model-20390\",\n \"best_valid_ppl\": 21.441762924194336,\n \"latest_model\": \"output_poem/save_model/model-20390\",\n \"params\": {\n \"batch_size\": 16,\n \"cell_type\": \"lstm\",\n \"dropout\": 0.0,\n \"embedding_size\": 128,\n \"hidden_size\": 128,\n \"input_dropout\": 0.0,\n \"learning_rate\": 0.005,\n \"max_grad_norm\": 5.0,\n \"num_layers\": 2,\n \"num_unrollings\": 64\n },\n \"test_ppl\": 25.83984375\n}"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"此时,我只需要保存"},{"type":"codeinline","content":[{"type":"text","text":"output_poem/best_model/model-20390"}]},{"type":"text","text":"模型即可。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"部署上线"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在项目目录下,安装必要依赖:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","text":"pip3 install word2vec numpy -t ./"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"由于tensorflow等是腾讯云云函数内置的package,所以这里无需安装,另外numpy这个package需要在CentOS+Python3.6环境下打包。也可以通过之前制作的小工具打包:https://www.serverlesschina.com/35.html"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"完成之后,编写函数入口文件:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"python"},"content":[{"type":"text","text":"import uuid, json\nfrom write_poem import WritePoem, start_model\n\nwriter = start_model()\n\n\ndef return_msg(error, msg):\n return_data = {\n \"uuid\": str(uuid.uuid1()),\n \"error\": error,\n \"message\": msg\n }\n print(return_data)\n return return_data\n\n\ndef main_handler(event, context):\n # 类型\n # 1: 自由\n # 2: 押韵\n # 3: 藏头押韵\n # 4: 藏字押韵\n\n style = json.loads(event[\"body\"])[\"style\"]\n content = json.loads(event[\"body\"]).get(\"content\", None)\n\n if style in '34' and not content:\n return return_msg(True, \"请输入content参数\")\n\n if style == '1':\n return return_msg(False, writer.free_verse())\n elif style == '2':\n return return_msg(False, writer.rhyme_verse())\n elif style == '3':\n return return_msg(False, writer.cangtou(content))\n elif style == '4':\n return return_msg(False, writer.hide_words(content))\n else:\n return return_msg(True, \"请输入正确的style参数\")\n"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"同时需要准备好Yaml文件:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"yaml"},"content":[{"type":"text","text":"getUserIp:\n component: \"@serverless/tencent-scf\"\n inputs:\n name: autoPoem\n codeUri: ./\n exclude:\n - .gitignore\n - .git/**\n - .serverless\n - .env\n handler: index.main_handler\n runtime: Python3.6\n region: ap-beijing\n description: 自动古诗词撰写\n namespace: serverless_tools\n memorySize: 512\n timeout: 10\n events:\n - apigw:\n name: serverless\n parameters:\n serviceId: service-8d3fi753\n protocols:\n - http\n - https\n environment: release\n endpoints:\n - path: /auto/poem\n description: 自动古诗词撰写\n method: POST\n enableCORS: true"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"此时,我们就可以通过Serverless Framework CLI部署项目。部署完成之后,我们可以通过PostMan测试我们的接口:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/eb/ebcb119cd7dc40d92e22d6d3732ba1cd.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"总结"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"本文通过已有的深度学习项目,在本地进行训练,保存模型,然后将项目部署在腾讯云云函数上,通过与API网关的联动,实现了一个基于深度学习的古诗词撰写的API。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章