使用CNN實現C-MAPSS數據集裏面的剩餘壽命預測(Pytorch)

1.背景

在工程領域,瞭解不同的工程系統和組件非常重要,不僅要了解它們當前的性能,還要了解它們的性能如何隨着時間的推移而降低。這擴展到了預測領域,它試圖根據系統或組件的過去和現在的狀態來預測其未來。這個領域中的一個常見問題是估計剩餘使用壽命,或者系統或組件功能將持續多長時間。這個問題的著名數據集是PHM和C-MAPSS數據集。這些數據集包含不同渦扇發動機隨時間產生的模擬傳感器數據,並已用於研究剩餘使用壽命的估計。

2.C-MAPSS數據集

數據集:FD001
訓練軌跡:100
測試軌跡:100
條件:一個(海平面)
故障模式:ONE(HPC降級)

數據集:FD002
訓練軌跡:260
測試軌跡:259
條件:六
故障模式:ONE(HPC降級)

數據集:FD003
訓練軌跡:100
測試軌跡:100
條件:一個(海平面)
故障模式:兩種(HPC降級,風扇降級)

數據集:FD004
訓練軌跡:248
測試軌跡:249
條件:六
故障模式:兩種(HPC降級,風扇降級)

實驗場景

數據集由多個多元時間序列組成。每個數據集進一步分爲訓練和測試子集。每個時間序列都來自不同的引擎,即,可以認爲數據來自相同類型的引擎。每個發動機以不同程度的初始磨損和製造變化開始,這是用戶未知的。該磨損和變化被認爲是正常的,即,不被認爲是故障狀況。有三種對發動機性能有重大影響的運行設置。這些設置也包含在數據中。數據被傳感器噪聲污染。

在每個時間序列開始時,發動機均正常運行,並且在該時間序列中的某個時刻出現故障。在訓練集中,故障的嚴重程度會不斷增加,直到系統出現故障爲止。在測試集中,時間序列在系統故障之前的某個時間結束。競賽的目的是預測測試裝置失效前的剩餘運行循環數,即發動機將繼續運行的最後一個循環後的運行循環數。還提供了測試數據的真實剩餘使用壽命(RUL)值的向量。

數據以zip壓縮文本文件形式提供,其中包含26列數字,並以空格分隔。每行是在單個操作週期內獲取的數據的快照,每列是不同的變量。這些列對應於:
1)單位編號
2)時間,以週期爲單位
3)操作設定1
4)操作設定2
5)操作設置3
6)傳感器測量1
7)傳感器測量2

26)傳感器測量26

百度雲盤下載:
鏈接:https://pan.baidu.com/s/1RXJhR3iiZGbxi4c1MbndhQ
提取碼:nr9l

3.代碼

import torch
import torch.nn as nn
import torch.nn.functional as F
import pandas as pd
from sklearn import preprocessing
import numpy as np
import torch.utils.data as Data
import matplotlib.pyplot as plt


device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
torch.manual_seed(2020)

train_df = pd.read_csv('train_FD001.txt', sep=" ", header=None)  # train_dr.shape=(20631, 28)
train_df.drop(train_df.columns[[26, 27]], axis=1, inplace=True)  # 去掉26,27列並用新生成的數組替換原數組
train_df.columns = ['id', 'cycle', 'setting1', 'setting2', 'setting3', 's1', 's2', 's3', 's4', 's5',
                    's6', 's7', 's8', 's9', 's10', 's11', 's12', 's13', 's14', 's15', 's16', 's17',
                    's18', 's19', 's20', 's21']
# 先按照'id'列的元素進行排序,當'id'列的元素相同時按照'cycle'列進行排序
train_df = train_df.sort_values(['id', 'cycle'])

test_df = pd.read_csv('test_FD001.txt', sep=" ", header=None)
test_df.drop(test_df.columns[[26, 27]], axis=1, inplace=True)
test_df.columns = ['id', 'cycle', 'setting1', 'setting2', 'setting3', 's1', 's2', 's3', 's4', 's5',
                   's6', 's7', 's8', 's9', 's10', 's11', 's12', 's13', 's14', 's15', 's16', 's17',
                   's18', 's19', 's20', 's21']

truth_df = pd.read_csv('RUL_FD001.txt', sep=" ", header=None)
truth_df.drop(truth_df.columns[[1]], axis=1, inplace=True)

"""Data Labeling - generate column RUL"""
# 按照'id'來進行分組,並求出每個組裏面'cycle'的最大值,此時它的索引列將變爲id
# 所以用reset_index()將索引列還原爲最初的索引
rul = pd.DataFrame(train_df.groupby('id')['cycle'].max()).reset_index()
rul.columns = ['id', 'max']
# 將rul通過'id'合併到train_df上,即在相同'id'時將rul裏的max值附在train_df的最後一列
train_df = train_df.merge(rul, on=['id'], how='left')
# 加一列,列名爲'RUL'
train_df['RUL'] = train_df['max'] - train_df['cycle']
# 將'max'這一列從train_df中去掉
train_df.drop('max', axis=1, inplace=True)


"""MinMax normalization train"""
# 將'cycle'這一列複製給新的一列'cycle_norm'
train_df['cycle_norm'] = train_df['cycle']
# 在列名裏面去掉'id', 'cycle', 'RUL'這三個列名
cols_normalize = train_df.columns.difference(['id', 'cycle', 'RUL'])
# 對剩下名字的每一列分別進行特徵放縮
min_max_scaler = preprocessing.MinMaxScaler()
norm_train_df = pd.DataFrame(min_max_scaler.fit_transform(train_df[cols_normalize]),
                             columns=cols_normalize,
                             index=train_df.index)
# 將之前去掉的再加回特徵放縮後的列表裏面
join_df = train_df[train_df.columns.difference(cols_normalize)].join(norm_train_df)
# 恢復原來的索引
train_df = join_df.reindex(columns=train_df.columns)

"""MinMax normalization test"""
# 與上面操作相似,但沒有'RUL'這一列
test_df['cycle_norm'] = test_df['cycle']
norm_test_df = pd.DataFrame(min_max_scaler.transform(test_df[cols_normalize]),
                            columns=cols_normalize,
                            index=test_df.index)
test_join_df = test_df[test_df.columns.difference(cols_normalize)].join(norm_test_df)
test_df = test_join_df.reindex(columns=test_df.columns)
test_df = test_df.reset_index(drop=True)


"""generate column max for test data"""
# 第一列是id,第二列是同一個id對應的最大cycle值
rul = pd.DataFrame(test_df.groupby('id')['cycle'].max()).reset_index()
# 將列名改爲id和max
rul.columns = ['id', 'max']
# 給rul文件裏的數據列命名爲'more'
truth_df.columns = ['more']
# 給truth_df增加id列,值爲truth_df的索引加一
truth_df['id'] = truth_df.index + 1
# 給truth_df增加max列,值爲rul的max列值加truth_df的more列,
# truth_df['max']的元素是測試集裏面每個id的最大cycle值加rul裏每個id的真實剩餘壽命
truth_df['max'] = rul['max'] + truth_df['more']
# 將'more'這一列從truth_df中去掉
truth_df.drop('more', axis=1, inplace=True)


"""generate RUL for test data"""
test_df = test_df.merge(truth_df, on=['id'], how='left')
test_df['RUL'] = test_df['max'] - test_df['cycle']
test_df.drop('max', axis=1, inplace=True)

"""
test_df(13096, 28)

   id  cycle  setting1  setting2  ...       s20       s21  cycle_norm  RUL
0   1      1  0.632184  0.750000  ...  0.558140  0.661834     0.00000  142
1   1      2  0.344828  0.250000  ...  0.682171  0.686827     0.00277  141
2   1      3  0.517241  0.583333  ...  0.728682  0.721348     0.00554  140
3   1      4  0.741379  0.500000  ...  0.666667  0.662110     0.00831  139
...
"""

"""pick a large window size of 50 cycles"""
sequence_length = 50


def gen_sequence(id_df, seq_length, seq_cols):

    data_array = id_df[seq_cols].values
    num_elements = data_array.shape[0]
    for start, stop in zip(range(0, num_elements - seq_length), range(seq_length, num_elements)):
        yield data_array[start:stop, :]


"""pick the feature columns"""
sensor_cols = ['s' + str(i) for i in range(1, 22)]
sequence_cols = ['setting1', 'setting2', 'setting3', 'cycle_norm']
sequence_cols.extend(sensor_cols)
'''
sequence_cols=['setting1', 'setting2', 'setting3', 'cycle_norm', 's1', 's2', 's3', 's4', 's5', 's6', 's7', 
's8', 's9', 's10', 's11', 's12', 's13', 's14', 's15', 's16', 's17', 's18', 's19', 's20', 's21']
'''
# 下一行所用的gen_sequence()中第一個參數是訓練集中id爲1的部分,第二個參數是50, 第三個參數如下所示
val = list(gen_sequence(train_df[train_df['id'] == 1], sequence_length, sequence_cols))
val_array = np.array(val)  # val_array.shape=(142, 50, 25)  142=192-50

'''
sequence_length= 50
sequence_cols= ['setting1', 'setting2', 'setting3', 'cycle_norm', 's1', 's2', 's3', 's4', 's5', 's6', 
's7', 's8', 's9', 's10', 's11', 's12', 's13', 's14', 's15', 's16', 's17', 's18', 's19', 's20', 's21']

train_df[train_df['id'] == 1]=

id  cycle  setting1  setting2  ...       s20       s21  RUL  cycle_norm
0     1      1  0.459770  0.166667  ...  0.713178  0.724662  191    0.000000
1     1      2  0.609195  0.250000  ...  0.666667  0.731014  190    0.002770
2     1      3  0.252874  0.750000  ...  0.627907  0.621375  189    0.005540
3     1      4  0.540230  0.500000  ...  0.573643  0.662386  188    0.008310
4     1      5  0.390805  0.333333  ...  0.589147  0.704502  187    0.011080
..   ..    ...       ...       ...  ...       ...       ...  ...         ...
187   1    188  0.114943  0.750000  ...  0.286822  0.089202    4    0.518006
188   1    189  0.465517  0.666667  ...  0.263566  0.301712    3    0.520776
189   1    190  0.344828  0.583333  ...  0.271318  0.239299    2    0.523546
190   1    191  0.500000  0.166667  ...  0.240310  0.324910    1    0.526316
191   1    192  0.551724  0.500000  ...  0.263566  0.097625    0    0.529086

[192 rows x 28 columns]
'''


# 將每個id對應的訓練集轉換爲一個sequence
seq_gen = (list(gen_sequence(train_df[train_df['id'] == id], sequence_length, sequence_cols))
           for id in train_df['id'].unique())

# 生成sequence並把它轉換成np array
# 在train_FD001.txt中按照id分成了100組數據,對每一組進行sequence後每組會減少window_size的大小
# 20631-100*50 = 15631
seq_array = np.concatenate(list(seq_gen)).astype(np.float32)  # seq_array.shape=(15631, 50, 25)
seq_tensor = torch.tensor(seq_array)
seq_tensor = seq_tensor.view(15631, 1, 50, 25).to(device)
print("seq_tensor_shape=", seq_tensor.shape)
print(seq_tensor[0].shape)


"""generate labels"""


def gen_labels(id_df, seq_length, label):

    data_array = id_df[label].values
    num_elements = data_array.shape[0]
    return data_array[seq_length:num_elements, :]


label_gen = [gen_labels(train_df[train_df['id'] == id], sequence_length, ['RUL'])
             for id in train_df['id'].unique()]

label_array = np.concatenate(label_gen).astype(np.float32)  # label_array.shape=(15631, 1)
label_scale = (label_array-np.min(label_array))/(np.max(label_array)-np.min(label_array))

label_tensor = torch.tensor(label_scale)
label_tensor = label_tensor.view(-1)
label_tensor = label_tensor.to(device)
print("label=", label_tensor[:142])


num_sample = len(label_array)
print("num_sample=", num_sample)
input_size = seq_array.shape[2]
hidden_size = 100
num_layers = 2


class CNN(nn.Module):
    def __init__(self):
        super(CNN, self).__init__()
        self.conv1 = nn.Sequential(
            torch.nn.Conv2d(  # 輸入conv1的形狀(50, 1, 50, 25)-->輸出conv1的形狀(50, 20, 26, 13)
                in_channels=1,  # 輸入卷積層的圖片通道數
                out_channels=20,  # 輸出的通道數
                kernel_size=3,  # 卷積核的大小,長寬相等
                stride=1,  # 滑動步長爲1
                padding=2  # 給輸入矩陣周圍添兩圈0,這樣的話在卷積核爲3*3時能將輸入矩陣的所有元素考慮進去
            ),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2)
        )
        self.fc = nn.Linear(20*26*13, 1)  # 將conv1的輸出flatten後爲(50, 20*26*13)-->經過全連接變爲(50, 1)

    def forward(self, x):
        x = self.conv1(x)
        x = x.view(x.size(0), -1)  # 將conv1的輸出flatten
        # x, _ = self.lstm2(x)
        x = self.fc(x)
        return x


cnn = CNN().to(device)
print(cnn)

optimizer = torch.optim.Adam(cnn.parameters(), lr=0.01)   # optimize all cnn parameters
loss_func = nn.MSELoss()   # the target label is not one-hotted

for epoch in range(20):
    for i in range(0, 142):   # 分配 batch data, normalize x when iterate train_loader
        b_x = seq_tensor[i].view(1, 1, 50, 25)
        b_y = label_tensor[i]
        output = cnn(b_x)               # cnn output
        loss = loss_func(output, b_y)   # cross entropy loss
        optimizer.zero_grad()           # clear gradients for this training step
        loss.backward()                 # backpropagation, compute gradients
        optimizer.step()                # apply gradients
        output_sum = output

output = cnn(seq_tensor[0:192-50])   # 將第一個sample放進去
output = output.cpu().detach().numpy()
label_array = label_tensor[0:192-50].cpu().detach().numpy()
plt.plot(output)
plt.plot(label_array)
plt.show()


'''
seq_array_tensor = torch.tensor(seq_array, dtype=torch.float32)
print(seq_array_tensor.shape)
seq_array_tensor = seq_array_tensor.view(15631, 1, 50, 25)
print(seq_array_tensor.shape)
input_tensor = seq_array_tensor
out_put = cnn(seq_array_tensor)
print(out_put.shape)

4.效果

在這裏插入圖片描述橘色的是真實值,藍色的是預測值。

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章