前言

上篇文章中，我介紹瞭如何通過編寫爬蟲來從 Free Midi Files Download 網站上爬取海量的MIDI數據。本篇文章介紹的是使用 pretty_midi 庫來將MIDI文件轉化成矩陣，並通過PyTorch的Dataset類來構建數據集，爲之後的訓練與測試中傳入張量做準備。

實施過程

將MIDI文件轉化成稀疏矩陣信息並存儲

構建數據集的第一步是將MIDI文件中的音樂信息以（時間，音高）的矩陣形式提取出來，並以稀疏矩陣的形式來保存到npz文件中。pretty_midi庫提供了在每一個音軌中遍歷音符（Note），並得到每個音符的音高（pitch），音符開始時間（note_on）和音符結束時間（note_off），將開始和結束時間分別除以十六分音符的長度（60秒 / 120BPM / 4），就可以得到開始和結束的時間在矩陣中對應的位置。

代碼詳見 MusicCritique/util/data/create_database.py

def generate_nonzeros_by_notes():
    root_dir = 'E:/merged_midi/'

    midi_collection = get_midi_collection()
    genre_collection = get_genre_collection()
    for genre in genre_collection.find():
        genre_name = genre['Name']
        print(genre_name)
        npy_file_root_dir = 'E:/midi_matrix/one_instr/' + genre_name + '/'
        if not os.path.exists(npy_file_root_dir):
            os.mkdir(npy_file_root_dir)

        for midi in midi_collection.find({'Genre': genre_name, 'OneInstrNpyGenerated': False}, no_cursor_timeout = True):
            path = root_dir + genre_name + '/' + midi['md5'] + '.mid'
            save_path = npy_file_root_dir + midi['md5'] + '.npz'
            pm = pretty_midi.PrettyMIDI(path)
            # segment_num = math.ceil(pm.get_end_time() / 8)
            note_range = (24, 108)
            # data = np.zeros((segment_num, 64, 84), np.bool_)
            nonzeros = []
            sixteenth_length = 60 / 120 / 4
            for instr in pm.instruments:
                if not instr.is_drum:
                    for note in instr.notes:
                        start = int(note.start / sixteenth_length)
                        end = int(note.end / sixteenth_length)
                        pitch = note.pitch
                        if pitch < note_range[0] or pitch >= note_range[1]:
                            continue
                        else:
                            pitch -= 24
                            for time_raw in range(start, end):
                                segment = int(time_raw / 64)
                                time = time_raw % 64
                                nonzeros.append([segment, time, pitch])

            nonzeros = np.array(nonzeros)
            np.savez_compressed(save_path, nonzeros)
            midi_collection.update_one({'_id': midi['_id']}, {'$set': {'OneInstrNpyGenerated': True}})
            print('Progress: {:.2%}'.format(
                midi_collection.count({'Genre': genre_name, 'OneInstrNpyGenerated': True}) / midi_collection.count({'Genre': genre_name})), end='\n')

爲了方便存儲，我將每個MIDI文件以四個小節爲單位進行分割，考慮到的最短時長單位是十六分音符，這樣每個矩陣的第一維度大小是64（4*16），代表音符在時間上的分佈情況。
MIDI文件音高數值範圍在0~127，可以存儲從A0到G9的橫跨10個八度的音高，對應關係可以參考 MIDI NOTE NUMBERS AND CENTER FREQUENCIES 。在這些音裏面很多音符是幾乎不會出現在真實的音樂中的。爲了使得到的矩陣更爲稠密，在處理的過程中忽略了過大和過小的數值，只提取了數值在24-108的音符，即C1-C8這84個音高，基本上與鋼琴的音域相同。
最後，同樣爲了矩陣更爲稠密，提高訓練效果，我將除去鼓軌外的所有樂器音軌合成到一起，統一記錄音符，而不區分樂器種類。

考慮到以上三點，根據每一個MIDI文件得到的矩陣形式即[包含的四小節樂段數*1*64*84]。爲了降低空間佔用，保存在文件中的信息是矩陣中每一個非零點的座標信息，後面可以通過這些座標來構建稀疏矩陣。

合併某個風格的所有稀疏矩陣

通過上一步，我們已經將MIDI文件中的音樂信息以稀疏矩陣座標的形式存儲在了單獨的npz文件中，爲了方便構造數據集，我嘗試將每個風格的所有稀疏矩陣統一存儲。
代碼詳見 MusicCritique/util/data/create_database.py

def merge_all_sparse_matrices():
    midi_collection = get_midi_collection()
    genre_collection = get_genre_collection()
    root_dir = 'E:/midi_matrix/one_instr/'

    time_step = 64
    valid_range = (24, 108)

    for genre in genre_collection.find({'DatasetGenerated': False}):
        save_dir = 'd:/data/' + genre['Name']
        if not os.path.exists(save_dir):
            os.mkdir(save_dir)
        print(genre['Name'])
        whole_length = genre['ValidPiecesNum']

        shape = np.array([whole_length, time_step, valid_range[1]-valid_range[0]])

        processed = 0
        last_piece_num = 0
        whole_num = midi_collection.count({'Genre': genre['Name']})

        non_zeros = []
        for midi in midi_collection.find({'Genre': genre['Name']}, no_cursor_timeout=True):

            path = root_dir + genre['Name'] + '/' + midi['md5'] + '.npz'
            valid_pieces_num = midi['PiecesNum'] - 1

            f = np.load(path)
            matrix = f['arr_0'].copy()
            print(valid_pieces_num, matrix.shape[0])
            for data in matrix:
                try:
                    data = data.tolist()

                    if data[0] < valid_pieces_num:
                        piece_order = last_piece_num + data[0]
                        non_zeros.append([piece_order, data[1], data[2]])
                except:
                    print(path)

            last_piece_num += valid_pieces_num
            processed += 1

            print('Progress: {:.2%}\n'.format(processed / whole_num))

        non_zeros = np.array(non_zeros)
        print(non_zeros.shape)
        np.savez_compressed(save_dir + '/data_sparse' + '.npz', nonzeros=non_zeros, shape=shape)

        genre_collection.update_one({'_id': genre['_id']}, {'$set': {'DatasetGenerated': True}})

這個函數中genre的ValidPiecesNum域是之前添加的，意義是某一類的所有MIDI文件的四小節數目之和，並從這之中扣除了最後不滿一小節的部分。

將稀疏矩陣轉化爲矩陣

由於所有的非零的座標信息已經保存在了npz文件中，通過遍歷這些座標信息並將這些座標點的數值設置爲1.0，就可以得到矩陣。

def generate_sparse_matrix_of_genre(genre):
    npy_path = 'D:/data/' + genre + '/data_sparse.npz'
    with np.load(npy_path) as f:
        shape = f['shape']
        data = np.zeros(shape, np.float_)
        nonzeros = f['nonzeros']
        for x in nonzeros:
            data[(x[0], x[1], x[2])] = 1.
    return data

繼承Dataset類，編寫自定義數據集

通過繼承PyTorch的Dataset類，並對幾個重要函數進行重寫，參考官方文檔
代碼詳見 MusicCritique/util/data/dataset.py

class SteelyDataset(data.Dataset):
    def __init__(self, genreA, genreB, phase, use_mix):
        assert phase in ['train', 'test'], 'not valid dataset type'

        sources = ['metal', 'punk', 'folk', 'newage', 'country', 'bluegrass']

        genre_collection = get_genre_collection()

        self.data_path = 'D:/data/'

        numA = genre_collection.find_one({'Name': genreA})['ValidPiecesNum']
        numB = genre_collection.find_one({'Name': genreB})['ValidPiecesNum']

        train_num = int(min(numA, numB) * 0.9)
        test_num = min(numA, numB) - train_num
        if phase is 'train':
            self.length = train_num

            if use_mix:
                dataA = np.expand_dims(generate_sparse_matrix_of_genre(genreA)[:self.length], 1)
                dataB = np.expand_dims(generate_sparse_matrix_of_genre(genreB)[:self.length], 1)
                mixed = generate_sparse_matrix_from_multiple_genres(sources)
                np.random.shuffle(mixed)
                data_mixed = np.expand_dims(mixed[:self.length], 1)

                self.data = np.concatenate((dataA, dataB, data_mixed), axis=1)

            else:
                dataA = np.expand_dims(generate_sparse_matrix_of_genre(genreA)[:self.length], 1)
                dataB = np.expand_dims(generate_sparse_matrix_of_genre(genreB)[:self.length], 1)

                self.data = np.concatenate((dataA, dataB), axis=1)
        else:
            self.length = test_num
            dataA = np.expand_dims(generate_sparse_matrix_of_genre(genreA)[:self.length], 1)
            dataB = np.expand_dims(generate_sparse_matrix_of_genre(genreB)[:self.length], 1)

            self.data = np.concatenate((dataA, dataB), axis=1)


    def __getitem__(self, index):
        return self.data[index, :, :, :]

    def __len__(self):
        return self.length

繼承的重點是重寫初始化函數、getitem函數和len函數。在構建數據庫的時候，爲了方便調用數據，我將dataA和dataB合併到了一起，並取較小數據集的數目來確定總體數據集數目，以保證兩種數據大小一致，在這過程中使用了Numpy庫中的expand_dims函數來增加維度，concatenate函數來把兩個矩陣合併到新增的維度上。

數據集分享

大家需要的話可以通過百度雲下載這一數據集，提取碼：nsfi。如在使用過程中遇到問題，請在下面評論，感謝閱讀！

Python編曲實踐（六）：將MIDI文件轉化成矩陣，繼承PyTorch的Dataset類來構建數據集（附數據集網盤下載鏈接）

前言

實施過程

將MIDI文件轉化成稀疏矩陣信息並存儲

合併某個風格的所有稀疏矩陣

將稀疏矩陣轉化爲矩陣

繼承Dataset類，編寫自定義數據集

數據集分享

Python編曲實踐（十）：用Ableton Live 10手工扒的Grunge搖滾數據集，涵蓋Grunge時期四大代表樂隊的經典專輯

Python編曲實踐（二）：和絃的實現和進行

SuperCollider學習筆記（四）- 失真（Distortion）

SuperCollider學習筆記（三）- 濾波器（Filters）

Python編曲實踐（六）：將MIDI文件轉化成矩陣，繼承PyTorch的Dataset類來構建數據集（附數據集網盤下載鏈接）

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結