用DirectX Audio和DirectShow播放聲音和音樂(1)

音樂就是一系列的音符,這些音符在不同的時間用不同的幅度被播放或者停止。有非常多的指令被用來播放音樂,但是這些指令的操作基本相同,都在使用各種各樣不同的音符。在計算機上進行作曲,實際上是存儲了很多組音樂,回放時由音頻硬件將這些音符播放出來。

Midi格式(文件擴展名是.MID)是存儲數字音樂的標準格式。

DirectMusic 音樂片段(music segments)使用.SGT文件擴展名,其他的相關文件包括樂隊文件(band file .BND),這種文件裏面包含樂器信息;弦映射表文件(chordmaps file .CDM)包含在回放時修改音樂的和絃指令;樣式文件(styles file .STY)包含回放樣式信息;模板文件(templates file .TPL)包含創造音樂片段的模板。

Midi是一種非常強大的音樂格式,惟一的不利因素是音樂品質依賴於音樂合成器的性能,因爲Midi 僅僅記錄了音符,其播放的品質由播放音樂的軟硬件決定。MP3文件(文件後綴爲.MP3)是一種類似於波表文件的文件格式,但是MP3文件和WAV文件最大的區別在於MP3文件將聲音壓縮到了最小的程度,但是音質卻基本不變。可以用DirectShow組件播放MP3文件,DirectShow組件是一個非常強大的多媒體組件,用DirectShow幾乎可以播放任何媒體文件,包括聲音和音頻文件,部分聲音文件我們只能用DirectShow播放。

Direct Audio是一個複合組件,它由DirectSound和DirectMusic兩個組件組成,如下圖所示:

DirectMusic在DirectX8中得到了巨大的增強,但是DirectSound基本保持原有的狀態。DirectSound是主要的數字聲音回放組件。DirectMusic處理所有的樂曲格式,包括MIDI、DirectMusic本地格式文件和波表文件。DirectMusic處理完之後將它們送入DirectSound中做其他處理,這意味着回放MIDI的時候可以使用數字化的樂器。

使用DirectSound

使用時需要創建一個和聲卡通訊的COM對象,用這個COM對象再創造一些獨立的聲音數據緩衝區(被稱之爲輔助音頻緩衝區 secondary sound buffers)來存儲音頻數據。緩衝區中的這些數據在主混音緩存(稱之爲主音頻緩存 primary sound buffer)中被混合,然後可以用指定的任何格式播放出來。回放格式通過採樣頻率、聲道數、採樣精度排列,可能的採樣頻率有8000HZ, 11025HZ,22050HZ和44100HZ(CD音質)。

對於聲道數可以有兩個選擇:單通道的單聲道聲音和雙通道的立體聲聲音。採樣精度被限制在兩種選擇上:8位的低質量聲音和16位的高保真聲音。在沒有修改的情況下,DirectSound主緩衝區的默認設置是22025HZ採樣率、8位精度、立體聲。在DirectSound中可以調整聲音的播放速度(這同樣會改變聲音的音調),調整音量、循環播放等。甚至還可以在一個虛擬的 3D環境中播放,以模擬一個實際環繞在周圍的聲音。

需要做的是將聲音數據充滿緩衝區,如果聲音數據太大的話,必須創建流播放方法,加載聲音數據中的一小塊,當這一小塊播放完畢以後,再加載另外的小塊數據進緩衝區,一直持續這個過程,直到聲音被處理完畢。在緩衝區中調整播放位置可以實現流式音頻,當播放完成通知應用程序更新音頻數據。這個通知更新的過程我們稱之爲“通告”。在同一時間被播放的緩存數目雖然沒有限制,但是仍然需要保證緩衝區數目不要太多,因爲每增加一個緩衝區,就要消耗很多內存和CPU資源。

在項目中使用DirectSound和DirectMusic,需要添加頭文件dsound.h和dmsuic.h,並且需要鏈接DSound.lib到包含庫中,添加DXGuid.lib庫可以讓DirectSound更容易使用。

以下是DirectSound COM接口:

IDirectSound8:DirectSound接口。
IDirectSoundBuffer8:主緩衝區和輔助緩衝區接口,保存數據並控制回放。
IDirectSoundNotify8:通知對象,通知應用程序指定播放位置已經達到。

各個對象間的關係如下圖所示:



IDirectSound8是主接口,用它來創建緩衝區(IDirectSoundBuffer8),然後用緩衝區接口創建通告接口(IDirectSoundNotify8),通告接口告訴應用程序指定的位置已經到達,通告接口在流化音頻文件時非常有用。

初始化DirectSound

使用 DirectSound的第一步是創建IDirectSound8對象,IDirectSound8起到控制音頻硬件設備的作用,可以通過 DirectSoundCreate8函數來創建。

The DirectSoundCreate8 function creates and initializes an object that supports the IDirectSound8 interface.

HRESULT DirectSoundCreate8(
LPCGUID lpcGuidDevice,
LPDIRECTSOUND8 * ppDS8,
LPUNKNOWN pUnkOuter
);

Parameters

lpcGuidDevice
Address of the GUID that identifies the sound device. The value of this parameter must be one of the GUIDs returned by DirectSoundEnumerate, or NULL for the default device, or one of the following values.

 

ValueDescription
DSDEVID_DefaultPlayback System-wide default audio playback device. Equivalent to NULL.
DSDEVID_DefaultVoicePlayback Default voice playback device.

 

ppDS8
Address of a variable to receive an IDirectSound8 interface pointer.
pUnkOuter
Address of the controlling object's IUnknown interface for COM aggregation. Must be NULL, because aggregation is not supported.

Return Values

If the function succeeds, it returns DS_OK. If it fails, the return value may be one of the following.
 

Return Code
DSERR_ALLOCATED
DSERR_INVALIDPARAM
DSERR_NOAGGREGATION
DSERR_NODRIVER
DSERR_OUTOFMEMORY

Remarks

The application must call the IDirectSound8::SetCooperativeLevel method immediately after creating a device object.

 

創建主音頻緩衝區

用 IDirectSoundBuffer對象控制主音頻緩衝區,創建主緩衝區不需要DirectX8的接口,因爲這個接口從來沒有改變。用來創建音頻緩衝區的函數是IDirectSound8::CreateSoundBuffer。
 

The CreateSoundBuffer method creates a sound buffer object to manage audio samples.

HRESULT CreateSoundBuffer(
LPCDSBUFFERDESC pcDSBufferDesc,
LPDIRECTSOUNDBUFFER * ppDSBuffer,
LPUNKNOWN pUnkOuter
);

Parameters

pcDSBufferDesc
Address of a DSBUFFERDESC structure that describes the sound buffer to create.
ppDSBuffer
Address of a variable that receives the IDirectSoundBuffer interface of the new buffer object. Use QueryInterface to obtain IDirectSoundBuffer8. IDirectSoundBuffer8 is not available for the primary buffer.
pUnkOuter
Address of the controlling object's IUnknown interface for COM aggregation. Must be NULL.

Return Values

If the method succeeds, the return value is DS_OK, or DS_NO_VIRTUALIZATION if a requested 3D algorithm was not available and stereo panning was substituted. See the description of the guid3DAlgorithm member of DSBUFFERDESC. If the method fails, the return value may be one of the error values shown in the following table.
 

Return code
DSERR_ALLOCATED
DSERR_BADFORMAT
DSERR_BUFFERTOOSMALL
DSERR_CONTROLUNAVAIL
DSERR_DS8_REQUIRED
DSERR_INVALIDCALL
DSERR_INVALIDPARAM
DSERR_NOAGGREGATION
DSERR_OUTOFMEMORY
DSERR_UNINITIALIZED
DSERR_UNSUPPORTED

Remarks

DirectSound does not initialize the contents of the buffer, and the application cannot assume that it contains silence.

If an attempt is made to create a buffer with the DSBCAPS_LOCHARDWARE flag on a system where hardware acceleration is not available, the method fails with either DSERR_CONTROLUNAVAIL or DSERR_INVALIDCALL, depending on the operating system.


pcDSBufferDesc是一個指向DSBUFFERDESC結構的指針,保存所創建的緩衝區的信息。
 

The DSBUFFERDESC structure describes the characteristics of a new buffer object. It is used by the IDirectSound8::CreateSoundBuffer method and by the DirectSoundFullDuplexCreate8 function.

An earlier version of this structure, DSBUFFERDESC1, is maintained in Dsound.h for compatibility with DirectX 7 and earlier.

typedef struct DSBUFFERDESC {
DWORD dwSize;
DWORD dwFlags;
DWORD dwBufferBytes;
DWORD dwReserved;
LPWAVEFORMATEX lpwfxFormat;
GUID guid3DAlgorithm;
} DSBUFFERDESC;

Members

dwSize
Size of the structure, in bytes. This member must be initialized before the structure is used.
dwFlags
Flags specifying the capabilities of the buffer. See the dwFlags member of the DSBCAPS structure for a detailed listing of valid flags.
dwBufferBytes
Size of the new buffer, in bytes. This value must be 0 when creating a buffer with the DSBCAPS_PRIMARYBUFFER flag. For secondary buffers, the minimum and maximum sizes allowed are specified by DSBSIZE_MIN and DSBSIZE_MAX, defined in Dsound.h.
dwReserved
Reserved. Must be 0.
lpwfxFormat
Address of a WAVEFORMATEX or WAVEFORMATEXTENSIBLE structure specifying the waveform format for the buffer. This value must be NULL for primary buffers.
guid3DAlgorithm
Unique identifier of the two-speaker virtualization algorithm to be used by DirectSound3D hardware emulation. If DSBCAPS_CTRL3D is not set in dwFlags, this member must be GUID_NULL (DS3DALG_DEFAULT). The following algorithm identifiers are defined.

 

ValueDescriptionAvailability
DS3DALG_DEFAULT DirectSound uses the default algorithm. In most cases this is DS3DALG_NO_VIRTUALIZATION. On WDM drivers, if the user has selected a surround sound speaker configuration in Control Panel, the sound is panned among the available directional speakers. Applies to software mixing only. Available on WDM or Vxd Drivers.
DS3DALG_NO_VIRTUALIZATION 3D output is mapped onto normal left and right stereo panning. At 90 degrees to the left, the sound is coming out of only the left speaker; at 90 degrees to the right, sound is coming out of only the right speaker. The vertical axis is ignored except for scaling of volume due to distance. Doppler shift and volume scaling are still applied, but the 3D filtering is not performed on this buffer. This is the most efficient software implementation, but provides no virtual 3D audio effect. When the DS3DALG_NO_VIRTUALIZATION algorithm is specified, HRTF processing will not be done. Because DS3DALG_NO_VIRTUALIZATION uses only normal stereo panning, a buffer created with this algorithm may be accelerated by a 2D hardware voice if no free 3D hardware voices are available. Applies to software mixing only. Available on WDM or Vxd Drivers.
DS3DALG_HRTF_FULL The 3D API is processed with the high quality 3D audio algorithm. This algorithm gives the highest quality 3D audio effect, but uses more CPU cycles. See Remarks. Applies to software mixing only. Available on Microsoft Windows 98 Second Edition and later operating systems when using WDM drivers.
DS3DALG_HRTF_LIGHT The 3D API is processed with the efficient 3D audio algorithm. This algorithm gives a good 3D audio effect, but uses fewer CPU cycles than DS3DALG_HRTF_FULL. Applies to software mixing only. Available on Windows 98 Second Edition and later operating systems when using WDM drivers.


需要設置的惟一一個值是dwFlags,這是一系列標誌,用於決定緩衝區性能。

dwFlags
Flags that specify buffer-object capabilities. Use one or more of the values shown in the following table.

 

ValueDescription
DSBCAPS_CTRL3D The buffer has 3D control capability.
DSBCAPS_CTRLFREQUENCY The buffer has frequency control capability.
DSBCAPS_CTRLFX The buffer supports effects processing.
DSBCAPS_CTRLPAN The buffer has pan control capability.
DSBCAPS_CTRLVOLUME The buffer has volume control capability.
DSBCAPS_CTRLPOSITIONNOTIFY The buffer has position notification capability. See the Remarks for DSCBUFFERDESC.
DSBCAPS_GETCURRENTPOSITION2 The buffer uses the new behavior of the play cursor when IDirectSoundBuffer8::GetCurrentPosition is called. In the first version of DirectSound, the play cursor was significantly ahead of the actual playing sound on emulated sound cards; it was directly behind the write cursor. Now, if the DSBCAPS_GETCURRENTPOSITION2 flag is specified, the application can get a more accurate play cursor. If this flag is not specified, the old behavior is preserved for compatibility. This flag affects only emulated devices; if a DirectSound driver is present, the play cursor is accurate for DirectSound in all versions of DirectX.
DSBCAPS_GLOBALFOCUS The buffer is a global sound buffer. With this flag set, an application using DirectSound can continue to play its buffers if the user switches focus to another application, even if the new application uses DirectSound. The one exception is if you switch focus to a DirectSound application that uses the DSSCL_WRITEPRIMARY flag for its cooperative level. In this case, the global sounds from other applications will not be audible.
DSBCAPS_LOCDEFER The buffer can be assigned to a hardware or software resource at play time, or when IDirectSoundBuffer8::AcquireResources is called.
DSBCAPS_LOCHARDWARE The buffer uses hardware mixing.
DSBCAPS_LOCSOFTWARE The buffer is in software memory and uses software mixing.
DSBCAPS_MUTE3DATMAXDISTANCE The sound is reduced to silence at the maximum distance. The buffer will stop playing when the maximum distance is exceeded, so that processor time is not wasted. Applies only to software buffers.
DSBCAPS_PRIMARYBUFFER The buffer is a primary buffer.
DSBCAPS_STATIC The buffer is in on-board hardware memory.
DSBCAPS_STICKYFOCUS The buffer has sticky focus. If the user switches to another application not using DirectSound, the buffer is still audible. However, if the user switches to another DirectSound application, the buffer is muted.
DSBCAPS_TRUEPLAYPOSITION Force IDirectSoundBuffer8::GetCurrentPosition to return the buffer's true play position. This flag is only valid in Windows Vista.

以下是創建聲音緩衝區的代碼:
 

    // setup the DSBUFFERDESC structure
    DSBUFFERDESC ds_buffer_desc;

    
// zero out strcutre
    ZeroMemory(&ds_buffer_desc, sizeof(DSBUFFERDESC));

    ds_buffer_desc.dwSize        = 
sizeof(DSBUFFERDESC); 
    ds_buffer_desc.dwFlags       = DSBCAPS_CTRLVOLUME;
    ds_buffer_desc.dwBufferBytes = wave_format.nAvgBytesPerSec * 2;  
// 2 seconds
    ds_buffer_desc.lpwfxFormat   = &wave_format;

    
// create the fist version object
    if(FAILED(g_ds->CreateSoundBuffer(&ds_buffer_desc, &ds, NULL)))
    {
        
// error ocuurred
        MessageBox(NULL, "Unable to create sound buffer", "Error", MB_OK);
    }

設置格式


對於格式,有一系列的選擇,但是建議在11025HZ、16位、單通道;22050HZ、16位、單通道中選擇。選擇格式的時候,不要嘗試使用立體聲,立體聲浪費處理時間,而且效果很難評估。同樣也不要使用16位以外的採樣精度,因爲這會導致音質的大幅下降。對於採樣頻率來說,越高越好,但是也不要設置超過 22050HZ,在這個採樣頻率下,也能表現出CD音質的水準而沒有太多的損失。

設置回放格式需要通過調用 IDirectSoundBuffer::SetFormat。
 

The SetFormat method sets the format of the primary buffer. Whenever this application has the input focus, DirectSound will set the primary buffer to the specified format.

HRESULT SetFormat(
LPCWAVEFORMATEX pcfxFormat
);

Parameters

pcfxFormat
Address of a WAVEFORMATEX structure that describes the new format for the primary sound buffer.

Return Values

If the method succeeds, the return value is DS_OK. If the method fails, the return value may be one of the following error values:
 

Return code
DSERR_BADFORMAT
DSERR_INVALIDCALL
DSERR_INVALIDPARAM
DSERR_OUTOFMEMORY
DSERR_PRIOLEVELNEEDED
DSERR_UNSUPPORTED

Remarks

The format of the primary buffer should be set before secondary buffers are created.

The method fails if the application has the DSSCL_NORMAL cooperative level.

If the application is using DirectSound at the DSSCL_WRITEPRIMARY cooperative level, and the format is not supported, the method fails.

If the cooperative level is DSSCL_PRIORITY, DirectSound stops the primary buffer, changes the format, and restarts the buffer. The method succeeds even if the hardware does not support the requested format; DirectSound sets the buffer to the closest supported format. To determine whether this has happened, an application can call the GetFormat method for the primary buffer and compare the result with the format that was requested with the SetFormat method.

This method is not available for secondary sound buffers. If a new format is required, the application must create a new DirectSoundBuffer object.


這個函數惟一的參數是指向WAVEFORMATEX結構的指針,該結構保存已設置的格式信息。
 

The WAVEFORMATEX structure defines the format of waveform-audio data. Only format information common to all waveform-audio data formats is included in this structure. For formats that require additional information, this structure is included as the first member in another structure, along with the additional information.

This structure is part of the Platform SDK and is not declared in Dsound.h. It is documented here for convenience.

typedef struct WAVEFORMATEX {
WORD wFormatTag;
WORD nChannels;
DWORD nSamplesPerSec;
DWORD nAvgBytesPerSec;
WORD nBlockAlign;
WORD wBitsPerSample;
WORD cbSize;
} WAVEFORMATEX;

Members

wFormatTag
Waveform-audio format type. Format tags are registered with Microsoft Corporation for many compression algorithms. A complete list of format tags can be found in the Mmreg.h header file. For one- or two-channel PCM data, this value should be WAVE_FORMAT_PCM.
nChannels
Number of channels in the waveform-audio data. Monaural data uses one channel and stereo data uses two channels.
nSamplesPerSec
Sample rate, in samples per second (hertz). If wFormatTag is WAVE_FORMAT_PCM, then common values for nSamplesPerSec are 8.0 kHz, 11.025 kHz, 22.05 kHz, and 44.1 kHz. For non-PCM formats, this member must be computed according to the manufacturer's specification of the format tag.
nAvgBytesPerSec
Required average data-transfer rate, in bytes per second, for the format tag. If wFormatTag is WAVE_FORMAT_PCM, nAvgBytesPerSec should be equal to the product of nSamplesPerSec and nBlockAlign. For non-PCM formats, this member must be computed according to the manufacturer's specification of the format tag.
nBlockAlign
Block alignment, in bytes. The block alignment is the minimum atomic unit of data for the wFormatTag format type. If wFormatTag is WAVE_FORMAT_PCM or WAVE_FORMAT_EXTENSIBLE, nBlockAlign must be equal to the product of nChannels and wBitsPerSample divided by 8 (bits per byte). For non-PCM formats, this member must be computed according to the manufacturer's specification of the format tag.

Software must process a multiple of nBlockAlign bytes of data at a time. Data written to and read from a device must always start at the beginning of a block. For example, it is illegal to start playback of PCM data in the middle of a sample (that is, on a non-block-aligned boundary).

wBitsPerSample
Bits per sample for the wFormatTag format type. If wFormatTag is WAVE_FORMAT_PCM, then wBitsPerSample should be equal to 8 or 16. For non-PCM formats, this member must be set according to the manufacturer's specification of the format tag. If wFormatTag is WAVE_FORMAT_EXTENSIBLE, this value can be any integer multiple of 8. Some compression schemes cannot define a value for wBitsPerSample, so this member can be zero.
cbSize
Size, in bytes, of extra format information appended to the end of the WAVEFORMATEX structure. This information can be used by non-PCM formats to store extra attributes for the wFormatTag. If no extra information is required by the wFormatTag, this member must be set to zero. For WAVE_FORMAT_PCM formats (and only WAVE_FORMAT_PCM formats), this member is ignored.

以下設置音頻格式爲:11025HZ、單通道、16位。
 

    // setup the WAVEFORMATEX structure
    WAVEFORMATEX wave_format;

    ZeroMemory(&wave_format, 
sizeof(WAVEFORMATEX));

    wave_format.wFormatTag      = WAVE_FORMAT_PCM;
    wave_format.nChannels       = 1;        
// mono
    wave_format.nSamplesPerSec  = 11025;    
    wave_format.wBitsPerSample  = 16;
    wave_format.nBlockAlign     = (wave_format.wBitsPerSample / 8) * wave_format.nChannels;
    wave_format.nAvgBytesPerSec = wave_format.nSamplesPerSec * wave_format.nBlockAlign;
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章