LargeVis是一種用於大規模圖可視化的佈局算法，其在高維數據可視化、網絡可視化上具有良好的表現，由於學習需要進行了一些相關探索，記錄如下。

項目地址與介紹

項目地址

項目介紹

我的操作系統是64位Win10，這裏僅列出Readme中關於Windows環境下的描述

####Windows To compile the source files, use Microsoft Visual Studio, where you need to set the BOOST path.

To install the Python wrapper, modify setup.py to make sure that the BOOST path is correctly set and then run python setup.py install.

其描述比較簡略，但其中重要的幾點包括：

要在VS環境下編譯，所以儘量提前下載與配置好VS，我這裏是下載的VS2017社區版，免費的，下載鏈接：https://docs.microsoft.com/en-us/visualstudio/productinfo/vs2017-system-requirements-vs。安裝時只選擇使用C++的桌面開發即可。（這個少裝了其他模塊也不用擔心，安裝完成後可以隨時打開Visual Studio Installer修改與添加的。）

其次是要正確設置Boost路徑，對於有些人來說，電腦裏並沒有安裝過Boost，我們需要重新進行下載與安裝，簡要步驟在下面介紹

Boost環境配置

Boost簡介

Boost庫是一個可移植、提供源代碼的C++庫，作爲標準庫的後備，是C++標準化進程的開發引擎之一。由於項目中用到了大量其中的相關庫函數，因而必須要先配置好Boost。

Boost下載與安裝

直接去官網下載即可，當前下載的版本是1.69.0，鏈接如下

https://dl.bintray.com/boostorg/release/1.69.0/source/

選擇下載

boost_1_69_0.zip

下載完成後需要先進行解壓。

安裝的具體步驟可以參考

要記住添加環境變量。

源代碼的幾處修改

Boost與VS環境都配好後，按道理應該是可以運行了。

首先我們先按照Readme中的要求修改Boost路徑，具體文件即項目解壓出來後的Windows目錄下的setup.py，將之前安裝Boost時的兩個路徑替換進來即可。

接着按要求在Windows這個目錄下打開命令行，執行

python setup.py install

不過接着就報了很多錯誤，究其原因是由於這個項目中是基於Python2.7的，我的電腦上Python安裝的是3.7，導致有些API早已不是原先的名字了，我們需要對某些地方進行一些修改，具體如下：

在LargeVismodule.cpp中

將

real x = atof(PyString_AsString(PyObject_Str(PyList_GetItem(vec, j))));

改爲

real x = atof(PyBytes_AsString(PyObject_Str(PyList_GetItem(vec, j))));

將

PyMODINIT_FUNC initLargeVis()
{
	printf("LargeVis successfully imported!\n");
	Py_InitModule("LargeVis", PyExtMethods);
}

改爲

static struct PyModuleDef LargeVismodule =
{
PyModuleDef_HEAD_INIT,
"LargeVis",
NULL,
-1,
PyExtMethods
};

PyMODINIT_FUNC
PyInit_LargeVis(void)
{
	return PyModule_Create(&LargeVismodule);
}

運行示例

直接用Example中的數據集測試即可，Readme中提供了一些用法說明：

for Python,
python LargeVis_run.py -input -output
-input: Input file of feature vectors or networks (see the Example folders for input format).

-output: Output file of low-dimensional representations.

Besides the two parameters, other optional parameters include:

-fea: specify whether the input file is high-dimensional feature vectors (1) or networks (0). Default is 1.

-threads: Number of threads. Default is 8.

-outdim: The lower dimensionality LargesVis learns for visualization (usually 2 or 3). Default is 2.

-samples: Number of edge samples for graph layout (in millions). Default is set to data size / 100 (million).

-prop: Number of times for neighbor propagations in the state of K-NNG construction, usually less than 3. Default is 3.

-alpha: Initial learning rate. Default is 1.0.

-trees: Number of random-projection trees used for constructing K-NNG. 50 is sufficient for most cases unless you are dealing with very large datasets (e.g. data size over 5 million), and less trees are suitable for smaller datasets. Default is set according to the data size.

-neg: Number of negative samples used for negative sampling. Default is 5.

-neigh: Number of neighbors (K) in K-NNG, which is usually set as three times of perplexity. Default is 150.

-gamma: The weights assigned to negative edges. Default is 7.

-perp: The perplexity used for deciding edge weights in K-NNG. Default is 50.

對於Python，使用格式：
python LargeVis_run.py -input -output
-input 輸入文件
-output 輸出文件

其他可選項：
-fea 指定輸入文件是高維特徵向量（1）或者網絡（0），默認爲1
-threads 線程數，默認爲8
-outdim 表示輸出的低維的可視化維度數，默認爲2
-samples 圖形佈局的邊緣樣本數量，默認爲 data size / 100 （million）
-prop 鄰居傳播次數，與K-NNG構造有關，一般少於3，默認爲3
-alpha 定義學習率，梯度下降用，默認爲1.0
-trees 構造K-NNG隨機映射樹的個數，一般50足夠處理很大的數據集，除非數據集超過5 百萬，默認根據數據量來定。
-neg 負採樣個數，默認爲5
-neigh K-NNG的鄰居數 , which is usually set as three times of perplexity，默認爲150
-gama 分配給負邊緣的權重，默認爲7
-prep 在KNN中決定邊緣權重的值（perplexity）默認爲50

對於示例中的MINST數據集，

python LargeVis_run.py -input mnist_vec784D.txt -output mnist_vec2D.txt -threads 16
python plot.py -input mnist_vec2D.txt -label mnist_label.txt -output mnist_vec2D_plot

即可得到輸出圖像

參考

LargeVis可視化技術學習，https://blog.csdn.net/sparkexpert/article/details/70702344#t0
LargeVis的使用，https://blog.csdn.net/u010658028/article/details/79038472
踩過LargeVis的坑，https://blog.csdn.net/hensonwells/article/details/78493492
Compiler can't find Py_InitModule() .. is it deprecated and if so what should I use?，https://stackoverflow.com/questions/28305731/compiler-cant-find-py-initmodule-is-it-deprecated-and-if-so-what-should-i

LargeVis安裝與使用

項目地址與介紹

項目地址

項目介紹

Boost環境配置

Boost簡介

Boost下載與安裝

源代碼的幾處修改

運行示例

參考

[軟件工具百科] 互聯網資源歷史快照歸檔站點與數字圖書館

網易面試：SpringBoot如何開啓虛擬線程？

杭州的 IT 崩盤了麼？

程序員常見的文本查看工具

VS2022 解決方案打不開 .NET Framework 4.0 、 4.5 等老項目

Vue3 運行可以，build 打包發佈報錯，app.config.globalProperties 用法坑

既然測試也要求寫代碼，那乾脆讓開發兼任測試不就好了嗎？

ITSM落地經驗之建設藍圖規劃

PDF 補丁丁 1.0.2 版更新

奇怪！應用的日誌呢？？

雲計算與OpenStack學習筆記（3）

雲計算與OpenStack學習筆記（2）

雲計算與OpenStack學習筆記（1）

win10下快速批量生成文件夾的方法記錄

Python中pip與conda使用清華鏡像

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結