1、dataframe to libsvm
首先我們看下目標數據
2.000000 1.000000 38.500000 54.000000 20.000000 0.000000 1.000000 2.000000 2.000000 3.000000 4.000000 1.000000 2.000000 2.000000 5.900000 0.000000 2.000000 42.000000 6.300000 0.000000 0.000000 1.000000
一共22列,最後一列尾標籤
我們先讀入數據轉換成dataframe格式【當然也可以直接轉換libsvm】
import pandas as pd
import os
#讀入TXT文件
file_name = "***Test.txt"
file_data = open(file_name, 'r')
data=[]
for line in file_data.readlines():
features = line.strip().split('\t')
data.append(features)
#存儲到list
df=pd.DataFrame(data)
cwd = os.getcwd()#獲取當前路徑
libsvmtxt = cwd + '/libsvm.txt'#創建一個TXT文件
f=open(libsvmtxt,'w')
num=df.shape[0]
columns=df.shape[1]
label = df[columns-1]
for j in range(num-1):
libsvm = ''
for i in range(columns-1):
libsvm += " %d:%s" % (i, df[i][j])
#print (svm_format)
svm_format = "%s%s\n" % (label[j], libsvm)
f.write(svm_format)#寫入
這樣就可以得到需要的libsvm格式了
2、 libsvm to dataframe
我們直接使用load_svmlight_file
from sklearn.datasets import load_svmlight_file
from pandas import DataFrame
import pandas as pd
file_name = cwd + '/libsvm.txt'
X_train, y_train = load_svmlight_file(file_name)
這樣直接得到的數據是sparse matrix
需要轉化一下
mat = X_train.todense()
#X
df1 = pd.DataFrame(mat)
#y
df2 = pd.DataFrame(y_train)
df2.columns = ['target']
#合在一起
df = pd.concat([df2, df1], axis=1) # 第一列爲target
df.to_csv("df_data.txt", index=False)
Python sklearn.datasets.dump_svmlight_file() Examples:
https://www.programcreek.com/python/example/104697/sklearn.datasets.dump_svmlight_file