NLP實踐-Task3

1.TF-IDF

TF-IDF參考鏈接:https://www.cnblogs.com/pinard/p/6693230.html

from sklearn.feature_extraction.text import TfidfVectorizer

corpus = ["I come to China to travel",
          "This is a car polupar in China",
          "I love tea and Apple ",
          "The work is to write some papers in science"]

# max_features是最大特徵數
# min_df是詞頻低於此值則忽略,數據類型爲int或float
# max_df是詞頻高於此值則忽略,數據類型爲Int或float
tfidf_model = TfidfVectorizer(max_features=5, min_df=2, max_df=5).fit_transform(corpus)
print(tfidf_model.todense())

2.互信息

互信息參考鏈接:https://blog.csdn.net/u013710265/article/details/72848755
特徵選擇參考鏈接1:https://www.jianshu.com/p/b3056d10a20f
特徵選擇參考鏈接2:https://www.jianshu.com/p/b3056d10a20f
特徵選擇參考鏈接3:https://baijiahao.baidu.com/s?id=1604074325918456186&wfr=spider&for=pc

import pandas as pd
from sklearn import datasets
from sklearn import metrics as mr

iris = datasets.load_iris()
x = iris.data
y = iris.target

x0 = x[:, 0]
x1 = x[:, 1]
x2 = x[:, 2]
x3 = x[:, 3]

# 計算x和y的互信息
print(mr.mutual_info_score(x0, y))
print(mr.mutual_info_score(x1, y))
print(mr.mutual_info_score(x2, y))
print(mr.mutual_info_score(x3, y))

 

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章