fetch_california_housing報錯:urllib.error.HTTPError: HTTP Error 403: Forbidden

問題描述:從sklearn中導入加州房價數據集:

from sklearn.datasets import fetch_california_housing, get_data_home
import numpy as np

print(get_data_home())
features, labels= fetch_california_housing(return_X_y=True)

print(features.shape, labels.shape)

報錯如下:

urllib.error.HTTPError: HTTP Error 403: Forbidden

解決方案
打開...\site-packages\sklearn\datasets_california_housing.py文件,在Line42可以獲得數據集的鏈接:

# The original data can be found at:
# https://www.dcc.fc.up.pt/~ltorgo/Regression/cal_housing.tgz

手動下載該數據集,並放在get_data_home()返回的文件夾裏面

from sklearn.datasets import fetch_california_housing, get_data_home
print(get_data_home())

最後,修改_california_housing.py line154

        #cal_housing = joblib.load(filepath)
        with tarfile.open(mode="r:gz", name=filepath) as f:
            cal_housing = np.loadtxt(
                f.extractfile("CaliforniaHousing/cal_housing.data"), delimiter=","
            )
            # Columns are not in the same order compared to the previous
            # URL resource on lib.stat.cmu.edu
            columns_index = [8, 7, 2, 3, 4, 5, 6, 1, 0]
            cal_housing = cal_housing[:, columns_index]

然後運行:

from sklearn.datasets import fetch_california_housing, get_data_home
import numpy as np

print(get_data_home())
features, labels= fetch_california_housing(return_X_y=True)

print(features.shape, labels.shape)
print(features[0])
print(labels[0])

運行結果如下:

(20640, 8) (20640,)
[ 8.3252 41. 6.98412698 1.02380952 322.
2.55555556 37.88 -122.23 ]
4.526

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章