線上服務啓動卡死,堆棧分析

背景

服務啓動時候會從mysql加載數據到es中,測試環境正常,線上異常卡住,不動。

查看堆棧信息

關鍵點


"elasticsearch[_client_][generic][T#5]" #843 daemon prio=5 os_prio=0 tid=0x00007fb3ec007000 nid=0x601b waiting on condition [0x00007fb1b5596000]
   java.lang.Thread.State: TIMED_WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x00000006ef4aee60> (a org.elasticsearch.common.util.concurrent.EsExecutors$ExecutorScalingQueue)
        at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
        at java.util.concurrent.LinkedTransferQueue.awaitMatch(LinkedTransferQueue.java:734)
        at java.util.concurrent.LinkedTransferQueue.xfer(LinkedTransferQueue.java:647)
        at java.util.concurrent.LinkedTransferQueue.poll(LinkedTransferQueue.java:1273)
        at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1066)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)

   Locked ownable synchronizers:
        - None

"MySQL Statement Cancellation Timer" #839 daemon prio=5 os_prio=0 tid=0x00007fb698005000 nid=0x5c16 in Object.wait() [0x00007fb1a266c000]
   java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        at java.lang.Object.wait(Object.java:502)
        at java.util.TimerThread.mainLoop(Timer.java:526)
        - locked <0x00000006f728f0a0> (a java.util.TaskQueue)
        at java.util.TimerThread.run(Timer.java:505)

   Locked ownable synchronizers:
        - None

"MySQL Statement Cancellation Timer" #838 daemon prio=5 os_prio=0 tid=0x00007fb688008000 nid=0x5c15 in Object.wait() [0x00007fb1a276d000]
   java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        at java.lang.Object.wait(Object.java:502)
        at java.util.TimerThread.mainLoop(Timer.java:526)
        - locked <0x00000006f729d658> (a java.util.TaskQueue)
        at java.util.TimerThread.run(Timer.java:505)

   Locked ownable synchronizers:
        - None

"TotalParallelLoad-pool-parallelLoad-thread-200" #837 prio=5 os_prio=0 tid=0x00007fb47d15b800 nid=0x5c14 waiting on condition [0x00007fb1a286e000]
   java.lang.Thread.State: WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x00000006f937a600> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
        at java.util.concurrent.LinkedBlockingDeque.takeFirst(LinkedBlockingDeque.java:492)
        at java.util.concurrent.LinkedBlockingDeque.take(LinkedBlockingDeque.java:680)
        at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1067)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)

   Locked ownable synchronizers:
        - None

在這裏插入圖片描述

也就是

在這裏插入圖片描述

分析

這一步,就是等待隊列爲非空的時候,纔會執行下去,但是現在隊列一直爲空,線程都在等待。

因爲我加載mysql採用的是多線程方式,且通過數據量的最大id/ 1000(每次加載1000) 個線程數。
所以,這次查看,直接給我整了1W多個線程,再次查看總數據量總共才40W,那麼1w * 1000,那豈不是1000W了,所以,我懷疑表數據的id有問題,後面發現果然是,id是從1000W多開始的,而不是從0開始,那麼就導致前面的數據隊列任務個數一直爲0。所以引起等待。後面解決了這個問題,程序又恢復了正常。

 private void mainLoop() {
        while (true) {
            try {
                TimerTask task;
                boolean taskFired;
                synchronized(queue) {
                    // Wait for queue to become non-empty
                    while (queue.isEmpty() && newTasksMayBeScheduled)
                        queue.wait();
                    if (queue.isEmpty())
                        break; // Queue is empty and will forever remain; die

                    // Queue nonempty; look at first evt and do the right thing
                    long currentTime, executionTime;
                    task = queue.getMin();
                    synchronized(task.lock) {
                        if (task.state == TimerTask.CANCELLED) {
                            queue.removeMin();
                            continue;  // No action required, poll queue again
                        }
                        currentTime = System.currentTimeMillis();
                        executionTime = task.nextExecutionTime;
                        if (taskFired = (executionTime<=currentTime)) {
                            if (task.period == 0) { // Non-repeating, remove
                                queue.removeMin();
                                task.state = TimerTask.EXECUTED;
                            } else { // Repeating task, reschedule
                                queue.rescheduleMin(
                                  task.period<0 ? currentTime   - task.period
                                                : executionTime + task.period);
                            }
                        }
                    }
                    if (!taskFired) // Task hasn't yet fired; wait
                        queue.wait(executionTime - currentTime);
                }
                if (taskFired)  // Task fired; run it, holding no locks
                    task.run();
            } catch(InterruptedException e) {
            }
        }
    }
}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章