本文分享自華爲雲社區《爆圈Sora橫空出世,AGI通用人工智能時代真的要來了嗎?一鍵Run帶你體驗擴散模型的魅力!》,作者: 碼上開花_Lancer。
Sora這幾天的爆炸性新聞,讓所有人工智能相關從業者及對應用感興趣的人羣都感到沸騰,震撼到央視也在進行相關的討論,簡直可以和2023年初ChatGPT討論帶來的熱潮一般。所以它到底爲什麼這麼火?
一、什麼是SORA?
Sora 是OpenAI最新發布的文本生成視頻模型,不僅可以生成長達一分鐘的視頻,且能完全遵照用戶的 Prompt 並保持視覺質量。
OpenAI 這個公司的格局非常大,他想要做 World Simulators(世界模擬器),做通用AGI,而不僅僅是文字或者圖像視頻領域的內容,他希望的是幫助人們解決需要現實世界交互的問題。單從OpenAI 發佈的sora模型的論文可以看出來:
圖片中文翻譯:
視頻生成模型作爲世界模擬器 我們探討了在視頻數據上對生成模型進行大規模訓練。 具體來說,我們共同訓練了文本條件擴散模型,這些模型能夠處理不同時長、分辨率和寬高比的視頻和圖像。 我們利用了一種變壓器架構,該架構能夠處理視頻和圖像潛在代碼的空間時間塊。我們最大的模型,Sora,能夠生成一分鐘的高保真視頻。 我們的結果表明,擴展視頻生成模型是構建通用物理世界模擬器的有希望的道路。
在視頻創作領域,畫面的穩定性至關重要。如果要呈現出優質的效果,創作者需要具備高超的視頻剪輯技能和相關基礎。然而,SORA這次的表現真是逆天!通過簡單的文字描述,它能生成畫面穩定、理解能力強的長視頻。
SORA的技術思路與衆不同,完全碾壓了傳統方法。它不再僅關注二維像素的變化,而是專注於語義理解的變化。從以往的視頻畫面生成,轉變爲故事邏輯的生成。這種創新思路讓人瞠目結舌,展示了技術的無限可能性
二、SORA背後原理的推測
根據OpenAI最新發布的技術報告,Sora背後的“text-to-video”模型基於Diffusion Transformer Model。這種模型結合了Transformer架構和擴散模型,用於生成圖像、視頻和其他數據。
實際上,Sora是一個基於Transformer的擴散模型。這類模型不僅在理論上具有創新性,而且在實際應用中也顯示出了強大的潛力。例如,DiT模型(Sora的基礎)和GenTron模型在圖像和視頻生成等領域都已經取得了巨大的成功,這些創新性的模型爲我們展示了技術的無限可能性。目前Sora技術沒有公開,大家對它都有不同猜測。DIT提出人謝賽寧:
1)Sora應該是建立在DiT這個擴散Transformer之上的 。
2)Sora可能有大約30億個參數,(引用論文模型0.13B, 32X算力)。
3)訓練數據是Sora 成功的最關鍵因素。
4)主要的挑戰是如何解決錯誤累積問題並隨着時間的推移保持質量/一致 。
DiT模型:Meta提出的完全基於transformer架構的擴散模型,不僅將transformer成功應用在擴散模型,還探究了transformer架構在擴散模型上的scalability能力。
GenTron模型:一種基於Transformer的擴散模型,在針對SDXL的人類評估中,GenTron在視覺質量方面取得了51.1%的勝率(19.8%的平局率),在文本對齊方面取得了42.3%的勝率(42.9%的平局率)。
DiT模型Scalable Diffusion Models with Transformers ---- 基於transformer的擴散模型,稱爲Diffusion Transformers(DiTs) ,Diffusion Transformer Model(DiT)的設計空間、擴展行爲、網絡複雜度和樣本質量之間的關係。這些研究結果表明,通過簡單地擴展DiT並使用高容量的骨幹網絡,可以在類條件256x256 ImageNet生成基準測試中實現最新的2.27 FID。與像素空間擴散模型相比,DiTs在使用的Gflops只是其一小部分,因此具有較高的計算效率。此外,DiTs還可以應用於像素空間,使得圖像生成流程成爲混合方法,使用現成的卷積VAEs和基於transformer的DDPMs。
擴散模型中引入了transformer類的標準設計,以取代傳統的U-Net設計,從而提供了一種新的架構選擇。
引入了潛在擴散模型(LDMs),通過將圖像壓縮爲較小的空間表示,並在這些表示上訓練擴散模型,從而解決了在高分辨率像素空間中直接訓練擴散模型的計算問題。
那對於我們開發者用戶想要強烈體驗文生視頻的樂趣,那裏可以體驗呢?今天給大家介紹下Stable Video Diffusion (SVD),一起在華爲雲一鍵Run體驗其中的樂趣:
三、Stable Video Diffusion (SVD) 擴散模型的圖像生成視頻的體驗
1. 案例簡介
Stable Video Diffusion (SVD) 是一種擴散模型,它將靜止圖像作爲條件幀,並從中生成視頻。
🔹 本案例需使用 Pytorch-1.8 GPU-V100 及以上規格運行
🔹 點擊Run in ModelArts,將會進入到ModelArts CodeLab中,這時需要你登錄華爲雲賬號,如果沒有賬號,則需要註冊一個,且要進行實名認證,參考《ModelArts準備工作_簡易版》 即可完成賬號註冊和實名認證。 登錄之後,等待片刻,即可進入到CodeLab的運行環境
🔹 出現 Out Of Memory ,請檢查是否爲您的參數配置過高導致,修改參數配置,重啓kernel或更換更高規格資源進行規避❗❗❗
2. 下載代碼和模型
!git clone https://github.com/Stability-AI/generative-models.git
Cloning into 'generative-models'... remote: Enumerating objects: 860, done.•[K remote: Counting objects: 100% (489/489), done.•[K remote: Compressing objects: 100% (222/222), done.•[K remote: Total 860 (delta 368), reused 267 (delta 267), pack-reused 371•[K Receiving objects: 100% (860/860), 42.67 MiB | 462.00 KiB/s, done. Resolving deltas: 100% (445/445), done.
import moxing as mox mox.file.copy_parallel('obs://modelarts-labs-bj4-v2/case_zoo/Stable_Video_Diffusion/file/modify_file/generative-models/sgm/modules/encoders','generative-models/sgm/modules/encoders') mox.file.copy_parallel('obs://modelarts-labs-bj4-v2/case_zoo/Stable_Video_Diffusion/file/models','generative-models/models') mox.file.copy_parallel(,'obs://modelarts-labs-bj4-v2/case_zoo/Stable_Video_Diffusion/file/checkpoints','generative-models/checkpoints')
INFO:root:Using MoXing-v2.1.0.5d9c87c8-5d9c87c8 INFO:root:Using OBS-Python-SDK-3.20.9.1
3. 配置運行環境
本案例依賴Python3.10.10及以上環境,因此我們首先創建虛擬環境:
!/home/ma-user/anaconda3/bin/conda create -n python-3.10.10 python=3.10.10 -y --override-channels --channel https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main !/home/ma-user/anaconda3/envs/python-3.10.10/bin/pip install ipykernel
/home/ma-user/anaconda3/lib/python3.7/site-packages/requests/__init__.py:91: RequestsDependencyWarning: urllib3 (1.26.12) or chardet (3.0.4) doesn't match a supported version! RequestsDependencyWarning) Collecting package metadata (current_repodata.json): done Solving environment: failed with repodata from current_repodata.json, will retry with next repodata source. Collecting package metadata (repodata.json): done Solving environment: done
import json import os data = { "display_name": "python-3.10.10", "env": { "PATH": "/home/ma-user/anaconda3/envs/python-3.10.10/bin:/home/ma-user/anaconda3/envs/python-3.7.10/bin:/modelarts/authoring/notebook-conda/bin:/opt/conda/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/home/ma-user/modelarts/ma-cli/bin:/home/ma-user/modelarts/ma-cli/bin:/home/ma-user/anaconda3/envs/PyTorch-1.8/bin" }, "language": "python", "argv": [ "/home/ma-user/anaconda3/envs/python-3.10.10/bin/python", "-m", "ipykernel", "-f", "{connection_file}" ] } if not os.path.exists("/home/ma-user/anaconda3/share/jupyter/kernels/python-3.10.10/"): os.mkdir("/home/ma-user/anaconda3/share/jupyter/kernels/python-3.10.10/") with open('/home/ma-user/anaconda3/share/jupyter/kernels/python-3.10.10/kernel.json', 'w') as f: json.dump(data, f, indent=4)
創建完成後,稍等片刻,或刷新頁面,點擊右上角kernel選擇python-3.10.10
!pip install torch==2.0.1 torchvision==0.15.2 !pip install MoviePy
Looking in indexes: http://repo.myhuaweicloud.com/repository/pypi/simple Collecting torch==2.0.1 Downloading http://repo.myhuaweicloud.com/repository/pypi/packages/8c/4d/17e07377c9c3d1a0c4eb3fde1c7c16b5a0ce6133ddbabc08ceef6b7f2645/torch-2.0.1-cp310-cp310-manylinux1_x86_64.whl (619.9 MB) •[2K •[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━•[0m •[32m619.9/619.9 MB•[0m •[31m5.6 MB/s•[0m eta •[36m0:00:00•[0m00:01•[0m00:01•[0m ...... Uninstalling decorator-5.1.1: Successfully uninstalled decorator-5.1.1 Successfully installed MoviePy-1.0.3 decorator-4.4.2 imageio-2.34.0 imageio_ffmpeg-0.4.9 proglog-0.1.10 tqdm-4.66.2
%cd generative-models
/home/ma-user/work/stable-video-diffusion/generative-models
/home/ma-user/anaconda3/envs/python-3.10.10/lib/python3.10/site-packages/IPython/core/magics/osm.py:417: UserWarning: using dhist requires you to install the `pickleshare` library. self.shell.db['dhist'] = compress_dhist(dhist)[-100:]
!pip install -r requirements/pt2.txt
Looking in indexes: http://repo.myhuaweicloud.com/repository/pypi/simple Collecting clip@ git+https://github.com/openai/CLIP.git (from -r requirements/pt2.txt (line 3)) Cloning https://github.com/openai/CLIP.git to /tmp/pip-install-_vzv4vq_/clip_4273bc4d2cba4d6486a222a5093fbe4b conda3/envs/python-3.10.10/lib/python3.10/site-packages (from -r requirements/pt2.txt (line 32)) (4.66.2) Collecting transformers==4.19.1 (from -r requirements/pt2.txt (line 33)) Successfully uninstalled urllib3-2.2.1 Successfully installed PyWavelets-1.5.0 aiohttp-3.9.3 aiosignal-1.3.1 altair-5.2.0 antlr4-python3-runtime-4.9.3 appdirs-1.4.4 async-timeout-4.0.3 attrs-23.2.0 black-23.7.0 blinker-1.7.0 braceexpand-0.1.7 cachetools-5.3.2 chardet-5.1.0 click-8.1.7 clip-1.0 contourpy-1.2.0 cycler-0.12.1 docker-pycreds-0.4.0 einops-0.7.0 fairscale-0.4.13 fire-0.5.0 fonttools-4.49.0 frozenlist-1.4.1 fsspec-2024.2.0 ftfy-6.1.3 gitdb-4.0.11 gitpython-3.1.42 huggingface-hub-0.20.3 importlib-metadata-7.0.1 invisible-watermark-0.2.0 jsonschema-4.21.1 jsonschema-specifications-2023.12.1 kiwisolver-1.4.5 kornia-0.6.9 lightning-utilities-0.10.1 markdown-it-py-3.0.0 matplotlib-3.8.3 mdurl-0.1.2 multidict-6.0.5 mypy-extensions-1.0.0 natsort-8.4.0 ninja-1.11.1.1 omegaconf-2.3.0 open-clip-torch-2.24.0 opencv-python-4.6.0.66 pandas-2.2.0 pathspec-0.12.1 protobuf-3.20.3 pudb-2024.1 pyarrow-15.0.0 pydeck-0.8.1b0 pyparsing-3.1.1 pytorch-lightning-2.0.1 pytz-2024.1 pyyaml-6.0.1 referencing-0.33.0 regex-2023.12.25 rich-13.7.0 rpds-py-0.18.0 safetensors-0.4.2 scipy-1.12.0 sentencepiece-0.2.0 sentry-sdk-1.40.5 setproctitle-1.3.3 smmap-5.0.1 streamlit-1.31.1 streamlit-keyup-0.2.0 tenacity-8.2.3 tensorboardx-2.6 termcolor-2.4.0 timm-0.9.16 tokenizers-0.12.1 toml-0.10.2 tomli-2.0.1 toolz-0.12.1 torchaudio-2.0.2 torchdata-0.6.1 torchmetrics-1.3.1 transformers-4.19.1 tzdata-2024.1 tzlocal-5.2 urllib3-1.26.18 urwid-2.6.4 urwid-readline-0.13 validators-0.22.0 wandb-0.16.3 watchdog-4.0.0 webdataset-0.2.86 xformers-0.0.22 yarl-1.9.4 zipp-3.17.0
!pip install .
Looking in indexes: http://repo.myhuaweicloud.com/repository/pypi/simple Processing /home/ma-user/work/stable-video-diffusion/generative-models Installing build dependencies ... •[?25ldone •[?25h Getting requirements to build wheel ... •[?25ldone •[?25h Preparing metadata (pyproject.toml) ... •[?25ldone •[?25hBuilding wheels for collected packages: sgm Building wheel for sgm (pyproject.toml) ... •[?25ldone •[?25h Created wheel for sgm: filename=sgm-0.1.0-py3-none-any.whl size=127368 sha256=0f9ff6913b03b2e0354cd1962ecb2fc03df36dea90d14b27dc46620e6eafc9a0 Stored in directory: /home/ma-user/.cache/pip/wheels/a9/b8/f4/e84140beaf1762b37f5268788964d58d91394ee17de04b3f9a Successfully built sgm Installing collected packages: sgm Successfully installed sgm-0.1.0
4. 生成視頻
視頻默認生成到outputs文件夾內
!python scripts/sampling/simple_video_sample.py --decoding_t 1 --input_path 'assets/test_image.png'
/home/ma-user/work/stable-video-diffusion/generative-models VideoTransformerBlock is using checkpointing VideoTransformerBlock is using checkpointing VideoTransformerBlock is using checkpointing VideoTransformerBlock is using checkpointing VideoTransformerBlock is using checkpointing VideoTransformerBlock is using checkpointing VideoTransformerBlock is using checkpointing VideoTransformerBlock is using checkpointing VideoTransformerBlock is using checkpointing VideoTransformerBlock is using checkpointing VideoTransformerBlock is using checkpointing VideoTransformerBlock is using checkpointing VideoTransformerBlock is using checkpointing VideoTransformerBlock is using checkpointing VideoTransformerBlock is using checkpointing VideoTransformerBlock is using checkpointing Initialized embedder #0: FrozenOpenCLIPImagePredictionEmbedder with 683800065 params. Trainable: False Initialized embedder #1: ConcatTimestepEmbedderND with 0 params. Trainable: False Initialized embedder #2: ConcatTimestepEmbedderND with 0 params. Trainable: False Initialized embedder #3: VideoPredictionEmbedderWithEncoder with 83653863 params. Trainable: False Initialized embedder #4: ConcatTimestepEmbedderND with 0 params. Trainable: False Restored from checkpoints/svd.safetensors with 0 missing and 0 unexpected keys 100%|███████████████████████████████████████| 890M/890M [00:50<00:00, 18.5MiB/s] /home/ma-user/anaconda3/envs/python-3.10.10/lib/python3.10/site-packages/torch/utils/checkpoint.py:31: UserWarning: None of the inputs have requires_grad=True. Gradients will be None warnings.warn("None of the inputs have requires_grad=True. Gradients will be None")
#將視頻文件轉成動圖顯示 from moviepy.editor import * # 指定輸入視頻路徑 video_path = "outputs/simple_video_sample/svd/000000.mp4" # 加載視頻 clip = VideoFileClip(video_path) # 設置保存GIF的參數(如分辨率、持續時間等) output_file = "output_animation.gif" fps = 10 # GIF每秒顯示的幀數 # 生成並保存GIF clip.write_gif(output_file, fps=fps)
MoviePy - Building file output_animation.gif with imageio.
from IPython.display import Image Image(open('output_animation.gif','rb').read())
大家趕緊來體驗文生視頻的樂趣吧!