[翻譯]080728-Thermal Face Recognition Over Time

1.數據庫:

經過10周的時間,在受控條件下獲得了240個不同個體的可見光和長波紅外(LWIR)圖像。在每週的作業中,對每個個體在兩種不同的光照環境下(FERET和麪部照片)和兩種不同表情下(中性及其他)進行拍照。可見光圖像是彩色的,分辨率爲1200×1600。紅外熱圖的分辨率爲320× 240,色彩深度爲12位。

最後對每幅圖像進行手工調整。è眼部位置固定且圖像大小爲99×132的標準幾何圖形,所有必要的圖像插值操作都是雙線性插值(bilinearly),對要用於主元分析(PCA)測試的圖像都經過了進一步的直方圖均衡化(histogram-equalized)處理。

 

2.使用算法:

每兩種形態使用兩種算法進行試驗:主元分析法(PCA)和blinded for review 算法。

3.實驗:

   實驗數據集合:每週對每一個測試對象採集一副中性的面部表情圖像

   實驗過程:使用上述兩種算法,將每週的不同形態的圖像(V/IR/F)同第一週對應的圖像進行比較,測試最高識別率

   實驗結果:各種形態圖像的識別率變化都沒有一定的趨勢可言;實驗還表明在十週的時間個圖像變化近似於平直è可以假設這兩種算法和兩種形態的圖像的在每週的識別性能是相互獨立的並且在局部是近似穩定分佈的è可以認定爲高斯分佈è估計各分佈的標準方差,並畫出誤差線。

(1)    在時間延遲情況下使用主元分析(PCA)算法的熱人臉識別性能明顯要低於其對應的可見光圖像的識別。

(2)    使用blinded for review 算法的兩種形態圖像的整體識別性能較PCA算法明顯提高,更重要的是兩種形態的識別性能曲線交叉次數增多,但依然在兩者的誤差條(Error Bar)之內。è該算法的兩種形態的識別性能差異並不具有統計顯著性

 

 

Thermal Face Recognition Over Time

文章介紹了一種基於可見光和熱紅外圖像人臉識別性能的對比研究,重點研究了採集和測試圖像間的時間延遲(Time-Lapse)對研究結果的影響。該領域早期的研究大多數強調在同一時間內進行圖像採集和測試。實驗結果顯示:在時間延遲影響下的可見光和熱紅外圖像的性能差異要比想象中的要小,實際上在現有數據上時不具有統計顯著性的。

1 Introduction

Face recognition with thermal infrared imagery has recently enjoyed renewed interest. While the volume of literature on the subject is notably smaller than that related to visible face recognition, there is nonetheless a steady stream of research [1, 2, 3, 4, 5, 6]. These papers have established that thermal imagery of human faces constitutes a valid biometric signature, though mostly relying on databases limited both in size and variability, due to the expense and complexity of extensive data collection. Early results were based on gallery and probe sets collected indoors during a single session. In that respect, they resemble the fa/fb tests in the FERET program [7].

目前,人們對基於熱紅外圖像的人臉識別產生了新的興趣,然而該方面的文獻較基於可見光的人臉識別方面的文獻來說要明顯少得多,然而儘管如此,該方面的研究也還是很多的。在這些文獻中指出,儘管大多數研究只依賴於容量和可變性有限的數據庫 但是由於大量數據採集的費用和複雜性等因素,人臉的熱圖像還是形成了一個有效的生物特徵簽字辨別(biometric signature)早期的結果是基於在單階段(single session)期間室內採集的的圖庫(gallery)和測試集合(probe sets),在這方面有些類似於在FERET項目中使用的fa/fb測試。

More recently, a study involving imagery collected indoors in a laboratory setting over multiple weeks was presented in [4, 8]. In that study, the authors note that when using a PCA-based recognition system, visible face recognition of time-lapse images yields better results than its thermal counterpart. They go on to conjecture, based on their visual analysis of the thermal imagery, that large variations of the thermal emission patterns of the face over time were responsible for the degraded performance. The current paper seeks to reproduce and extend some of the results in [4, 8]. In particular, we show that while those results are reproducible, it may be premature to attribute the performance difference to a modality-specific phenomenon. The results below demonstrate that a statistically significant performance difference between modalities can be measured when recognition is performed using PCA. However, when a more sophisticated algorithm is used, no such difference is measurable. This indicates that the authors of [4, 8] may have observed a measurement effect, and that the “inherent” value of visible and thermal imagery for time-lapse face recognition under controlled conditions is equivalent.

最近,在[48]中提出了一項經過幾周時間在實驗室裝置下進行室內圖像採集的研究。在該研究中作者注意到,當使用基於主元分析(PCA)的識別系統時,可見光延時圖像的人臉識別產生的效果要優於其對應的熱圖像的人臉識別。他們根據對熱像的可視化分析進一步推測:其性能的降低是由於隨着時間的推移,人臉熱放射模式發生了較大改變。當前的一些文章試圖再現和擴展[48]中的實驗結果。尤其是如果這些實驗結果可以重現,它可用於解釋形式-具體現象的性能差異。下面的結論說明:當使用PCA進行識別時,不同圖像形態識別的統計顯著性能差異是可以測量的。但是當使用更先進的算法時,這種差異是不可測的。這表明[48]的作者可能已經注意到一種測量效應:在受控條件下用於時間延遲人臉識別的可見光和紅外圖像的“固有”值是相等的。

2 Data Collection and Normalization(數據採集和標準化)

 The data used in this study was generously provided by the authors of [4, 8]. A complete description of the data collection procedure can be found in the references, and we include a brief summary here. Visible and longwave IR (LWIR) images of 240 distinct subjects were acquired under controlled conditions, over a period of ten weeks. During each weekly session, each subject was imaged under two different illumination conditions (FERET and mugshot), and with two different expressions (“neutral” “and other”). Visible images were acquired in color and a 1200 × 1600 resolution. Thermal images were acquired at 320×240 resolution and 12 bit depth.

本研究所使用的數據大多是由[48]的作者提供的。數據採集步驟地詳細描述請參照參考書目,本文只包括了一個簡單的摘要。經過10周的時間,在受控條件下我們獲得了240個不同個體的可見光和長波紅外(LWIR)圖像。在每週的作業中,對每個個體在兩種不同的光照環境下(FERET和麪部照片)兩種不同表情下(中性及其他)進行拍照。可見光圖像是彩色的,分辨率爲1200×1600。紅外熱圖的分辨率爲320× 240,色彩深度爲12位。

 Eye coordinates for all images, both visible and thermal, were manually located by the authors of [4, 8]. These coordinates were used to affinely register the images to a standard geometry with fixed eye locations and image size of 99×132 pixels. All necessary interpolation was performed bilinearly. The visible and thermal cameras were boresighted during data collection, therefore eye coordinates on corresponding images may not match exactly, as they had to be manually located in each modality separately. After alignment, all images were masked to remove all but the inner face, excluding ears and hair. Images used for the PCA experiments were further histogram-equalized, in order to match the processing in [4, 8]. Since the other algorithm does its own internal image processing, no equalization was performed on images before recognition.

   所有圖像(可見光和熱紅外)的眼部座標被[48]的作者手工定位。這些座標用來將這些圖像仿射記錄爲眼部位置固定且圖像大小爲99×132的標準幾何圖形。所有必要的圖像插值操作都是雙線性插值(bilinearly)。在數據採集期間可見光和紅外攝像機都經過了瞄準調整(boresighted),由於在每一個形態中眼部座標都需要分別進行手動定位,因此在相應圖像上的眼部座標可能不是精確相同的。經過校正,所有圖像除了內臉面(不包含耳朵和頭髮)之外全部被除掉。爲了和[48]中的處理相一致,對要用於主元分析(PCA)測試的圖像都經過了進一步的直方圖均衡化(histogram-equalized處理。由於另外一個算法是針對內部圖像處理,因此在識別之前沒有對圖像進行均衡化處理。

 

[插值(Interpolation/resampling)是一種圖像處理方法,它可以爲數碼圖像增加或減少象素的數目。某些數碼相機運用插值的方法創造出象素比傳感器實際能產生象素多的圖像,或創造數碼變焦產生的圖像。實際上,幾乎所有的圖像處理軟件支持一種或以上插值方法。圖像放大後鋸齒現象的強弱直接反映了圖像處理器插值運算的成熟程度。]

 

3 Thermal Infrared Phenomenology(熱紅外現象學)

 While the nature of face imagery in the visible domain is well-studied, particularly with respect to illumination dependence [9], its thermal counterpart has received less attention. In [4], the authors show some variability in thermal emission patterns during time-lapse experiments, and properly blame it for decreased recognition performance. Figure 1 shows comparable variability within data collected with our own LWIR sensor. Each column shows images acquired in different sessions. It is clear that thermal emission patterns around the eyes, nose and mouth are rather different in different sessions. Such variations can be induced by changing environmental conditions. For example, exposed to cold or wind, capillary vessels at the surface of the skin contract, reducing the effective blood flow and thereby the surface temperature of the face. Also, when a subject transitions from a cold outdoor environment to a warm indoor one, a reverse process occurs, whereby capillaries dilate, suddenly flushing the skin with warm blood in the body’s effort to regain normal temperature. We have no knowledge of the environmental conditions during the data collection by the authors of [4], although we presume that they were fairly constant throughout all sessions.

  雖然對可視化區域的人臉圖像性質已經充分的研究,尤其是在光照獨立性方面,當對於熱圖像方面的相關領域卻關注甚少。在參考文獻[4]中,作者說明了一些在時間延遲測驗中熱仿射圖樣的一些變化量,並認爲這是使識別性能下降的相關因素。圖一展示了使用長波紅外探測器採集數據的可比較變化。每一欄展示了在不同作業階段所採集的圖像。從中可以看出在不同作業階段,圍繞眼部、鼻子和嘴部的熱放射圖樣是明顯不同的。這種差異可能是由於環境條件的改變引起的。例如:在寒冷或大風環境下,位於皮膚表面的毛細血管收縮,減少有效血流量,面部表層溫度降低。同樣,當採集對象從一個寒冷的室外環境進入一個溫暖的室內環境時,情況恰恰相反,此時毛細血管放縮,身體溫度逐漸恢復正常。我們並不瞭解在[4]中作者進行數據採集期間的環境變量,但我們假設它們在所有階段保持不變

 Additional fluctuations in thermal appearance are unrelated to ambient conditions, but are rather related to the subject’s metabolism. Vigorous physical activity, consumption of food, alcohol or caffeine may all affect the thermal appearance of a subject’s face. Also, high temporal frequency thermal variation is associated with breathing. The nose or mouth will appear cooler as the subject is inhaling and warmer as he or she exhales, since exhaled air is at core body temperature, which is several degrees warmer than skin temperature.

  在熱圖表面的其他變動同環境條件是無關的,但是同採集對象的新陳代謝有關。強烈的體育活動、食物消化以及酒精或咖啡因等因素都會影響採集對象面部的紅外熱圖。另外,高頻率的熱量變化還同呼吸有關。當採集對象吸氣時,其嘴部或者鼻子處將會顯示冷色的,而呼氣時則會顯示暖色的,這是因爲呼出的氣體來自體內,其溫度要高於皮膚溫度。

Much like recognition from visible imagery is affected by illumination, recognition with thermal imagery is affected by a number of exogenous and endogenous factors. And while the appearance of some features may change, their underlying shape remains the same and continues to hold useful information for recognition. Thus, much like in the case of visible imagery, different algorithms are more or less sensitive to image variations. Proper compensation for those variations is a critical step of any successful face (or generally object) recognition algorithm, regardless of modality. Clearly, the better algorithms for thermal face recognition will perform equivalent compensation on the infrared imagery prior to comparing probe and gallery samples.

  類似於識別可見光圖像要受光照影響一樣,紅外圖像識別也受到許多內部和外部的因素影響。當一些特徵在表面改變後,它們的基本形狀仍將保持不變並且仍對人臉識別提供很多有用的信息。因此,同可見光圖像類似,不同的算法都會或多或少引起圖像的變化。不考慮形態,對這些變化進行適當的補償是所有人臉(或一般對象)識別的一個關鍵步驟。顯而易見,更好的熱人臉識別算法將對紅外圖像起到的平等補償作用要優於比對探測器和圖庫樣本。

  

 Variation in facial thermal emission from two subjects in different sessions. Left column is the enrollment image and right column is the test image.(不同階段的兩組不同採集對象的熱圖差異。左圖是採集圖像,有圖是測試圖像)

 

4 Algorithms Tested(算法測試)

 We performed experiments with two different algorithms in each of the two modalities: PCA with Mahalanobis angle distance and the (blinded for review) algorithm. The first is a standard algorithm with performance evaluations widely available in the literature, including [2], in which the authors present a comprehensive analysis of its performance on visible and thermal infrared imagery in a same-session recognition scenario. The second one is a commercial algorithm made available for testing in binary form.1

 我們對每兩種形態使用兩種算法進行試:主成分分析法(PCA)和blinded for review 算法。第一種是一種標準算法,伴隨着性能賦值(performance evaluations使用在很多文章中都可以看到該算法,包括文章[2]中作者就其在同一階段採集的可見光和紅外圖像人臉識別中的性能進行了廣泛的分析。第二種方法是一種二進制形式測試的商業算法。

 

 The training set for both algorithms was completely disjoint from gallery and probe images, provided by the authors of [4], in time, space and subjects. That is, the training set was collected at an earlier time, in a different location and used a disjoint set of subjects. This insures that the results reported below are indicative of real-world performance. We should also note that the training set was different from that used in [4], since their complete training set was not available to us. We chose to use a larger set of images collected over the last several years with our own visible and thermal cameras. This further increases the realism of the results, since one cannot usually expect to have training imagery from the same camera as the testing imagery. As a result of these divergences from [4], our PCA results are somewhat different. However, the qualitative nature of the results, as seen below, agrees strongly with those of [4].

這兩個算法的訓練數據集均完全從有[4]的作者提供的畫庫和調查圖像中分離出來的。也就是說,該訓練數據集是在早期、不同的地點以及相分離的集合中採集來的。這確保下述實驗結果反映了現實世界的性能。我們還應注意到該訓練數據集是並不同於文章[4]中所使用的,因爲我們無法得到[4]中完整的數據集。本文所使用的是近幾年來通過自己的可見光和熱紅外攝像機拍攝的圖像構成的一個更大的圖像集合。這進一步增強了實驗結果的現實性,因爲我們總不能期望使用和拍攝測試圖像集相同的攝像機來拍攝訓練數據集。由於這些同[4]中差異,我們的主元分析PCA)結果稍微有一些不同。但如下所示,從定性的角度分析,該實驗結果同[4]中的實驗結果是完全相一致的。

5 Experimental Results and Discussion(試驗結果及討論)

  In order to evaluate recognition performance with timelapse data, we performed the following experiments. The first-week frontal illumination images of each subject with neutral expression were used as the gallery. Thus the gallery contains a single image of each subject. 測試數據庫的構成)For all weeks, the probe set contains neutral expression images of each subject, with mugshot lighting. The number of subjects in each week ranges from 44 to 68, while the number of overlapping subjects with respect to the first week ranges from 31 to 56. We computed top-rank recognition rates for each of the weekly probe sets with both modalities and algorithms. The results are shown in Figures 2 and 3. Note that the first data point in each graph corresponds to same-session recognition performance.

爲了評估使用延時數據的識別性能,我們進行了一下實驗。我們將第一週採集對象在中性表情下的正面光照圖像作爲圖庫。這樣該圖庫包含了每一個採集對象的一張圖像。經過整個測試過程,測試圖像集合包含了所有采集對象的中性表情圖像。每週採集對象數範圍爲44---68,然而和第一週採集對象相重疊的人數大約爲31---56。我們在形態和算法兩個方面來計算每一週的最高識別率,試驗結果如圖2和圖3所示。注意每一個圖像中的第一個數據點相當於同一階段的識別性能。

 

Figure 2: Top-rank recognition results for visible, LWIR and fusion as a function of weeks elapsed between enrollment and testing, using PCA. Note that the x-coordinate of each curve is slightly offset in order to better present the error bars.

 

 

Figure 3: Top-rank recognition results for visible, LWIR and fusion as a function of weeks elapsed between enrollment and testing, using (blinded for review) algorithm. Note that the x-coordinate of each curve is slightly offset in order to better present the error bars.

 

 Focusing for a moment on the performance curves, we notice that there is no clear trend for either visible or thermal modalities, encompassing weeks two through ten. That is, we do not see a clearly decreasing performance trend for either modality. This appears to indicate that whatever timelapse effects are responsible for performance degradation versus same-session results are roughly constant over the ten week trial period. Other studies have shown that over a period of years face recognition performance degrades linearly with time [10]. Our observation here is simply that the slope of the degradation line is small enough as to be nearly flat over a ten week period (except for the samesession result, of course). Following that observation, we assume that weekly recognition performances for both algorithms and modalities are drawn independently and distributed according to a (locally) constant distribution, which we may assume to be Gaussian. Using this assumption, we estimate the standard deviation of that distribution, and plot error bars at two standard deviations.

注意觀察性能曲線一段時間,我們注意到從第二週到第十週無論是對於可見光圖像還是對於紅外圖像,其變化都沒有一定的趨勢可言。也就是說對於每種不同形式的圖像,我們沒有看到那一個性能發生了明顯的降低。這表明在這十週的試驗期內,時間延遲效果對於識別性能的影響相對於相同階段的識別性能而言是基本穩定的。其他研究表明,在數年的時間裏,人臉識別性能的下降是同時間成線性關係的[10]。我們的觀測數據所表明的在十週的時間(當然不包括相同階段(第一週)的時間)裏下降曲線的幅度近似於平直。從觀測數據我們可以假設這兩種算法和兩種形態的圖像的在每一週的識別性能是相互獨立的並且在局部是近似穩定分佈的,因此我們可以認定爲高斯分佈。利用該假設,我們可以估計該分佈的標準方差,並在兩個標準方差上畫出誤差線。

 Figure 2 shows the week by week recognition rates using PCA-based recognition. We see that, consistently with the results in [4, 8], thermal performance is lower than visible performance. In fact, for at least six out of nine timelapse weeks that difference is statistically significant. Table 1 shows mean recognition rates over weeks two through nine for each algorithm and modality. As shown in the last column, we see that mean visible performance is higher than the mean thermal performance by 2.17 standard deviations. This clearly indicates that thermal face recognition with PCA under a time-lapse scenario is significantly less reliable than its visible counterpart.

  圖二顯示了基於主元分析的人臉識別每週的識別率。從中可以看出,同[48]中的實驗結果一致,熱圖性能較可見光圖像性能要低一些。實際上,至少2/3的延遲時間中的差異是具有統計顯著性的。圖表以顯示了每種算法和形態的圖像識別從第二週到第九周的平均誤差率。從最後一欄中可以看出,可見光圖像的識別性能要高於紅外圖像識別性能2.17個標準方差。這顯然說明在時間延遲情況下使用主元分析(PCA)的熱人臉識別性能明顯要低於其對應的可見光圖像的識別

 

Turning to Figure 3, we see the results of running the same experiments with the (blinded for review) algorithm. Firstly, we note that overall recognition performance is markedly improved in both modalities. More importantly, we see that weekly performance curves for both modalities cross each other multiple times, while remaining within each other’s error bars. This indicates that the performance difference between modalities using this algorithm is not statistically significant. In fact, looking at Table 1, we see that the difference between mean performances for the modalities is only 0.21 standard deviations, hardly a significant result. We should also note that the mean visible time-lapse performance with this algorithm is 88.65%, compared to approximately 86.5% for the FaceIt algorithm, as reported in [4]. This shows that the (blinded for review) algorithm is competitive with the commercial state-of-theart on this data set, and therefore provides a fair means of evaluating thermal recognition performance, as using a poor visible algorithm for comparison would like thermal recognition appear better.

再看圖三,我們可以看到在相同實驗中使用了(blinded for review)算法的實驗結果。首先,我們注意到兩種形態的整體識別性能明顯提高,更重要的一點是我們看到兩種形態的性能曲線交叉次數增多,但依然在兩者的誤差條(Error Bar之內。這表明使用該算法的兩種形態的識別性能差異並不具有統計顯著性。實際上從表一我們可以看出兩種形態平均性能的差異只有0.21個標準偏差,幾乎沒有明顯的區別。我們還注意到,使用該算法的基於可見光時間延遲的人臉識別性能是88.65%,而在參考文獻【4】中使用Face It 算法可達到近似86.5%的識別性能。這表明使用(blinded for review)算法可以同測試數據集的商業化state-of-theart相比。因此爲評估熱圖人臉識別性能提供了一種不錯的方法。同時用一個較差的可見光算法相比,熱圖人臉識別要好一些。

 Figures 2 and 3, as well as Table 1 also show the result of fusing both imaging modalities for recognition. Following [2] and [4] we simply add the scores from each modality to create a combined score. Recognition is performed by a nearest neighbor classifier with respect to the combined score. As many previous studies have shown [1, 2, 4], fusion greatly increases performance

  圖二、圖三以及表一也展示了使用經過兩種圖像形態進行融合後的識別結果。我們可以根據【2】和【4】將兩種形態的分值(score簡單相加來產生合併後的分值。然後參照這些合併分值使用最近鄰域分類器進行識別。根據文獻【124】中的研究可知,融合後的識別性能將會提高很大。

 

 

Table 1: Mean top-match recognition performance for timelapse experiments with both algorithms.

 

6 Conclusions (結論)

The main conclusion of this paper is that one must be cautious when evaluating the value of an imaging modality for a specific recognition task. Ideally, this question should be framed as that of estimating the Bayes optimal error for a classification problem. Inevitably, that estimate is based on an empirical measure of performance which inextricably tied to a particular classifier. While such an estimate can provide us with a valuable upper bound on the Bayes error, it cannot separate classifier effects from data-specific(數據專用) behavior. In this case, we show that while the results in [4] are reproducible, they do not imply that time-lapse face recognition with thermal infrared imagery is inferior to that performed with visible imagery. We have shown by example that, at least on this data set, the Bayes errors for each modality are comparable. Are more detailed analysis will surely require a much larger pool of subjects.

 

   該文章的主要結論是:當評測針對某一特定識別任務的成像形態的數值時要相當謹慎。從理論上說,該問題應該包含用於分類問題的貝葉斯最優誤差估計。不可避免的,該估計只必須是對針對某一特定分類器的的實驗性能的實證測量。然而該估計可以爲我們提供很有價值的貝葉斯誤差上界,它不能區分專用數據特徵中的分類器影響。在這種情況下,我們要說明當文章【4】中的實驗結果可以重現時,這並不暗指使用熱紅外圖像的時間延遲人臉識別性能要劣於可見光圖像的人臉識別性能。通過實驗可以證明,至少在該測試數據集上每種形態的圖像識別的貝葉斯誤差是差不多的。當然更詳細的分析必將需要一個更大的測試集合。

 

Based on the preceding analysis, and recent results by the authors on time-lapse recognition with a more challenging, larger and diverse data set [11], we firmly believe that the use of thermal imagery of faces for biometric authentication is not only viable, but in certain circumstances even preferable over the use of visible images. Without a doubt, the used of fused visible and thermal imagery provides a level of performance not attainable by either alone.

  根據前面的分析,作者關於時間延遲的人臉識別目前的研究結果還有更多的困難,需要更大且多種多樣的數據集,但我們堅信,基於紅外人臉熱圖的生物特徵認證技術不但可以實現,而且其前景要優於可將光圖像的人臉識別技術。毫無疑問,基於紅外和可見光圖像融合的人臉識別技術爲兩者提供了新的發展方向。

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章