HDFS await()和interrupt()引起的死循環問題分析

1.背景

hdfs短路讀利用UNIX域套接字,可以讓客戶端和DataNode通信(客戶端與dn是同一臺機器)。需要在datanode和客戶端的配置中都配置一個套接字路徑,並且開啓短路讀特性。官方鏈接

hdfs在短路讀時, 2.7版本的代碼中DomainSocketWatcher(linux機器進程的socket通信的監聽)類中有個bug需要修復。分析如下:

//DomainSocketWatcher#add相關代碼
public void add(DomainSocket sock, Handler handler) {
  lock.lock();
  try {
    Entry entry = new Entry(sock, handler);
    toAdd.add(entry);
    while (true) {
      try {
        processedCond.await();
      } catch (InterruptedException e) {
        Thread.currentThread().interrupt();
      }
      if (!toAdd.contains(entry)) {
        break;
      }
    }
  } finally {
    lock.unlock();
  }
}
//DomainSocketWatcher#remove相關代碼
public void remove(DomainSocket sock) {
  lock.lock();
  try {
    toRemove.put(sock.fd, sock);
    while (true) {
      try {
        processedCond.await();
      } catch (InterruptedException e) {
        Thread.currentThread().interrupt();
      }
      if (!toRemove.containsKey(sock.fd)) {
        break;
      }
    }
  } finally {
    lock.unlock();
  }
}
final Thread watcherThread = new Thread(new Runnable() {
  @Override
  public void run() {  
    final TreeMap<Integer, Entry> entries = new TreeMap<Integer, Entry>();
    try {
      while (true) {
        lock.lock();
        try {
            // Handle pending removals
            while (true) {
              Map.Entry<Integer, DomainSocket> entry = toRemove.firstEntry();
              if (entry == null) break;
              sendCallbackAndRemove("handlePendingRemovals", entries, fdSet, entry.getValue().fd);
            }
            processedCond.signalAll();
          }
   ...
}

上述add / remove / watcherThread 是典型的線程通信代碼。在 add / remove 中,注意以下代碼:

   while (true) {
      try {
        processedCond.await();
      } catch (InterruptedException e) {
        Thread.currentThread().interrupt();
      }
  ...

線程如果捕獲到中斷,則再次通過Thread.currentThread().interrupt();中斷當前線程,由於while(true)會進入下一次循環,執行processedCond.await();等待,進入await代碼:

public final void await() throws InterruptedException {
    if (Thread.interrupted())
        throw new InterruptedException();
    Node node = addConditionWaiter();
...

如果線程已經被中斷了,則拋出中斷異常,會再次被InterruptedException捕獲。從而陷入死循環。

2.測試

首先定義一個線程通信類Service,裏面定義三個方法,分別用於測試await、awaitUninterruptibly、signalAll方法。具體如下:

public class Service {
    private ReentrantLock lock = new ReentrantLock();
    private Condition condition = lock.newCondition();
    public void testAwaitMethod(){
        try {
            while (true) {
                try {
                    lock.lock();
                    System.out.println("await begin");
                    condition.await();
                    System.out.println("await end");
                } catch (InterruptedException e) {
                    e.printStackTrace();
                    System.out.println("catch await");
                    try {
                        Thread.sleep(2000);
                    } catch (InterruptedException e1) {
                        e1.printStackTrace();
                    }
                    Thread.currentThread().interrupt();
                } 
            }
        }finally {
            lock.unlock();
        }
        
    }

    public void testAwaitUninterMethod(){
        try {
            while (true) {
                try {
                    lock.lock();
                    System.out.println("awaitUninterruptibly begin");
                    condition.awaitUninterruptibly();
                    System.out.println("awaitUninterruptibly end");
                } catch (Exception e) {
                    e.printStackTrace();
                    System.out.println("catch awaitUninterruptibly");
                    try {
                        Thread.sleep(2000);
                    } catch (InterruptedException e1) {
                        e1.printStackTrace();
                    }
                    Thread.currentThread().interrupt();
                }
            }
        } finally {
            lock.unlock();
        }
    }
    
    public void testSignalAll(){
        try { 
            System.out.println("signalAll init");
            lock.lock();
            System.out.println("signalAll begin");
            condition.signalAll();
            System.out.println("signalAll end");
        } catch (Exception e){
            e.printStackTrace();
            System.out.println("catch signalAll");
        }finally {
            lock.unlock();
        }
    }
}

定義兩個線程類分別調用await()方法和awaitUninterruptibly(),如下:

public class MyThread0 extends Thread {
    private Service service;

    public MyThread0(Service service) {
        this.service = service;
    }
    
    @Override
    public void run() {
        service.testAwaitMethod();
    }
}
public class MyThread1 extends Thread {
    private Service service;

    public MyThread1(Service service) {
        this.service = service;
    }

    @Override
    public void run() {
        service.testAwaitUninterMethod();
    }
}

測試1:兩個線程均await狀態

public class Run {
    public static void main(String[] args) {
        try {
            Service service = new Service();
            
            MyThread0 myThread0 = new MyThread0(service);
            myThread0.start();

            MyThread1 myThread1 = new MyThread1(service);
            myThread1.start();
            
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}

結果:

await begin
awaitUninterruptibly begin

測試2:signalAll後兩線程再次await狀態

public class Run {
    public static void main(String[] args) {
        try {
            Service service = new Service();
            
            MyThread0 myThread0 = new MyThread0(service);
            myThread0.start();

            MyThread1 myThread1 = new MyThread1(service);
            myThread1.start();
            
            Thread.sleep(100);
            service.testSignalAll();
            
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}

結果:

await begin
awaitUninterruptibly begin
signalAll init
signalAll begin
signalAll end
await end
await begin
awaitUninterruptibly end
awaitUninterruptibly begin

測試3:顯示調用interrupt()

顯示調用interrupt()時,thread0進入catch,然後進入死循環;thread1直接忽略,不會進入catch。thread0進入死循環後,testSignalAll無法調用signalAll方法。

public class Run {
    public static void main(String[] args) {
        try {
            Service service = new Service();
            
            MyThread0 myThread0 = new MyThread0(service);
            myThread0.start();
            myThread0.interrupt();

            MyThread1 myThread1 = new MyThread1(service);
            myThread1.start();
            myThread1.interrupt();
            
            Thread.sleep(100);
            service.testSignalAll();
            
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}

結果:

await begin
java.lang.InterruptedException
catch await
	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2034)
	at thread.lock.awaituninterruptibly.Service.testAwaitMethod(Service.java:22)
	at thread.lock.awaituninterruptibly.MyThread0.run(MyThread0.java:15)
signalAll init
await begin
catch await
java.lang.InterruptedException
	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2034)
	at thread.lock.awaituninterruptibly.Service.testAwaitMethod(Service.java:22)
	at thread.lock.awaituninterruptibly.MyThread0.run(MyThread0.java:15)
java.lang.InterruptedException
await begin
	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2034)
catch await
	at thread.lock.awaituninterruptibly.Service.testAwaitMethod(Service.java:22)
	at thread.lock.awaituninterruptibly.MyThread0.run(MyThread0.java:15)
await begin
java.lang.InterruptedException
catch await
	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2034)
	at thread.lock.awaituninterruptibly.Service.testAwaitMethod(Service.java:22)
	at thread.lock.awaituninterruptibly.MyThread0.run(MyThread0.java:15)

小結

通過上述測試3表明,使用 condition.awaitUninterruptibly(); 代替 condition.await();會忽略中斷。另外,在while中,出現await和Thread.currentThread().interrupt();是不合理的。社區代碼HADOOP-14214中已換成awaitUninterruptibly,並去掉Thread.currentThread().interrupt();

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章