0. 準備工作
下面使用的restTemplate, 都是使用整合了HttpClient連接池的restTemplate, 關於整合可以看我的上一篇文章 ,當然直接使用原生的也是可以的
我這裏還使用了VisualVm Launcher
的idea插件,來查看運行時的內存夠和線程
1. 簡單的下載文件
這裏使用的是restTemplate調用getForEntity, 獲取到字節數組, 再將字節數組通過java8的Files工具類的write方法, 直接寫到目標文件.
這裏需要注意的點是:
- 會將文件的字節數組全部放入內存中, 及其消耗資源
- 注意目標文件夾不存在,需要手動創建文件夾的問題, 注意生成目標路徑的時候,斜槓
\
的處理問題
代碼如下:
import lombok.extern.slf4j.Slf4j;
import org.springframework.http.HttpMethod;
import org.springframework.http.MediaType;
import org.springframework.http.ResponseEntity;
import org.springframework.stereotype.Component;
import org.springframework.util.CollectionUtils;
import org.springframework.web.client.RequestCallback;
import org.springframework.web.client.ResponseExtractor;
import org.springframework.web.client.RestTemplate;
import org.springframework.web.util.UriComponentsBuilder;
import javax.annotation.Resource;
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Paths;
import java.time.Instant;
import java.time.temporal.ChronoUnit;
import java.util.Arrays;
import java.util.Map;
import java.util.Objects;
/**
* @Author: zgd
* @Date: 2019/3/29 10:49
* @Description:
*/
@Component
@Slf4j
public class WebFileUtils {
/**
* 使用自定義的httpclient的restTemplate
*/
@Resource(name = "httpClientTemplate")
private RestTemplate httpClientTemplate;
/**
* 下載小文件,採用字節數組的方式,直接將所有返回都放入內存中,容易引發內存溢出
*
* @param url
* @param targetDir
*/
public void downloadLittleFileToPath(String url, String targetDir) {
downloadLittleFileToPath(url, targetDir, null);
}
/**
* 下載小文件,直接將所有返回都放入內存中,容易引發內存溢出
*
* @param url
* @param targetDir
*/
public void downloadLittleFileToPath(String url, String targetDir, Map<String, String> params) {
Instant now = Instant.now();
String completeUrl = addGetQueryParam(url, params);
ResponseEntity<byte[]> rsp = httpClientTemplate.getForEntity(completeUrl, byte[].class);
log.info("[下載文件] [狀態碼] code:{}", rsp.getStatusCode());
try {
String path = getAndCreateDownloadDir(url, targetDir);
Files.write(Paths.get(path), Objects.requireNonNull(rsp.getBody(), "未獲取到下載文件"));
} catch (IOException e) {
log.error("[下載文件] 寫入失敗:", e);
}
log.info("[下載文件] 完成,耗時:{}", ChronoUnit.MILLIS.between(now, Instant.now()));
}
/**
* 拼接get請求參數
*
* @param url
* @param params
* @return
*/
private String addGetQueryParam(String url, Map<String, String> params) {
UriComponentsBuilder uriComponentsBuilder = UriComponentsBuilder.fromHttpUrl(url);
if (!CollectionUtils.isEmpty(params)) {
for (Map.Entry<String, ?> varEntry : params.entrySet()) {
uriComponentsBuilder.queryParam(varEntry.getKey(), varEntry.getValue());
}
}
return uriComponentsBuilder.build().encode().toString();
}
/**
* 創建或獲取下載文件夾的路徑
*
* @param url
* @param targetDir
* @return
*/
public String getAndCreateDownloadDir(String url, String targetDir) throws IOException {
String filename = url.substring(url.lastIndexOf("/") + 1);
int i = 0;
if ((i = url.indexOf("?")) != -1) {
filename = filename.substring(0, i);
}
if (!Files.exists(Paths.get(targetDir))) {
Files.createDirectories(Paths.get(targetDir));
}
return targetDir.endsWith("/") ? targetDir + filename : targetDir + "/" + filename;
}
}
這裏找到一個搜狗瀏覽器的下載地址, 運行代碼,並啓動.
package com.zgd.springboot.demo.http.test;
import com.alibaba.fastjson.JSON;
import com.google.common.collect.Maps;
import com.google.common.util.concurrent.ThreadFactoryBuilder;
import com.zgd.springboot.demo.http.HttpApplication;
import com.zgd.springboot.demo.http.IO.utils.DownloadTool;
import com.zgd.springboot.demo.http.IO.utils.QiniuUtil;
import com.zgd.springboot.demo.http.IO.utils.WebFileUtils;
import com.zgd.springboot.demo.http.service.IHttpService;
import lombok.extern.slf4j.Slf4j;
import org.junit.Test;
import org.junit.runner.RunWith;
import org.springframework.boot.test.context.SpringBootTest;
import org.springframework.stereotype.Component;
import org.springframework.test.context.TestPropertySource;
import org.springframework.test.context.junit4.SpringRunner;
import javax.annotation.Resource;
import java.time.Instant;
import java.time.temporal.ChronoUnit;
import java.util.HashMap;
import java.util.concurrent.*;
/**
* @Author: zgd
* @Date: 2019/3/25 15:56
* @Description:
*/
@Component
@RunWith(SpringRunner.class)
@SpringBootTest(classes = HttpApplication.class)
@TestPropertySource("classpath:application.yml")
@Slf4j
public class SpringTest {
@Resource
private WebFileUtils webFileUtils;
@Test
public void testDownloadQiniu(){
String path = "D:/down/file/";
String url = "http://cdn4.mydown.com/5c9df131/6dcdc2f2ff1aba454f90d8581eab1820/newsoft/sogou_explorer_fast_8.5.7.29587_7685.exe";
webFileUtils.downloadLittleFileToPath(url,path);
}
}
可以看到使用內存從一開始的100多M,後面飆升到300多M. 總耗時是8533ms.
爲了更好的展示這個方法對內存的佔用,下載一個500M左右的Idea看看
String url = "https://download.jetbrains.8686c.com/idea/ideaIU-2019.1.exe";
可以看到佔用內存一度達到900多M, 這才下載500M的軟件,如果我們需要服務器下載幾G的文件,內存肯定是不夠用的.
至於下載時間,速度是300k/s左右,實在沒耐心等500M的下載了
2. 單線程大文件下載
既然上面的方法只能下載小文件,那麼大文件怎麼辦呢? 我們使用流的方式來解決. 在上面的類里加l兩個方法. 這次使用Files的copy方法來處理流.
/**
* 下載大文件,使用流接收
*
* @param url
* @param targetDir
*/
public void downloadBigFileToPath(String url, String targetDir){
downloadBigFileToPath(url,targetDir,null);
}
/**
* 下載大文件,使用流接收
*
* @param url
* @param targetDir
*/
public void downloadBigFileToPath(String url, String targetDir, Map<String, String> params) {
Instant now = Instant.now();
String completeUrl = addGetQueryParam(url, params);
try {
String path = getAndCreateDownloadDir(url, targetDir);
//定義請求頭的接收類型
RequestCallback requestCallback = request -> request.getHeaders()
.setAccept(Arrays.asList(MediaType.APPLICATION_OCTET_STREAM, MediaType.ALL));
// getForObject會將所有返回直接放到內存中,使用流來替代這個操作
ResponseExtractor<Void> responseExtractor = response -> {
// Here I write the response to a file but do what you like
Files.copy(response.getBody(), Paths.get(path));
return null;
};
httpClientTemplate.execute(completeUrl, HttpMethod.GET, requestCallback, responseExtractor);
} catch (IOException e) {
log.error("[下載文件] 寫入失敗:", e);
}
log.info("[下載文件] 完成,耗時:{}", ChronoUnit.MILLIS.between(now, Instant.now()));
}
先試試那個50M的搜狗瀏覽器:
看到內存基本保持在100M左右,總耗時:5514ms
再試試那個500M的Idea:內存基本穩定在150M以內,下載速度也是300kb/s左右
可以看得出, 使用流的方式還是可以很好的保證內存資源不會崩掉的
3. 多線程下載
上面雖然把大文件的問題解決了 ,但是下載速度300k/s實在是太慢了.雖然小文件還是可以達到5s左右下載完50M,但是大文件還是需要更快的下載速度(下載速度也取決於當前的運營商網速和資源)
主要就是先調用一次HEAD方法去獲取到文件大小, 我這裏默認開啓了10個線程,然後每個線程分配好下載的數據量,在請求頭中設置Range
屬性,分別去下載屬於它那一部分的數據,然後最後合併成一個文件
直接上代碼吧:
package com.zgd.springboot.demo.http.IO.utils;
import com.google.common.collect.Lists;
import com.google.common.util.concurrent.ThreadFactoryBuilder;
import lombok.extern.slf4j.Slf4j;
import org.springframework.http.*;
import org.springframework.stereotype.Component;
import org.springframework.util.Assert;
import org.springframework.web.client.RequestCallback;
import org.springframework.web.client.ResponseExtractor;
import org.springframework.web.client.RestTemplate;
import javax.annotation.Resource;
import java.io.File;
import java.io.IOException;
import java.io.RandomAccessFile;
import java.nio.file.Files;
import java.nio.file.Paths;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.Collections;
import java.util.Objects;
import java.util.concurrent.*;
/**
* Created by yangzheng03 on 2018/1/16.
* https://www.dubby.cn/
*/
@Component
@Slf4j
public class DownloadTool {
/**
* 使用自定義的httpclient的restTemplate
*/
@Resource(name = "httpClientTemplate")
private RestTemplate httpClientTemplate;
@Resource
private WebFileUtils webFileUtils;
/**
* 線程最小值
*/
private static final int MIN_POOL_SIZE = 10;
/**
* 線程最大值
*/
private static final int MAX_POOL_SIZE = 100;
/**
* 等待隊列大小
*/
private static final int WAIT_QUEUE_SIZE = 1000;
/**
* 線程池
*/
private static ExecutorService threadPool;
private static final int ONE_KB_SIZE = 1024;
/**
* 大於20M的文件視爲大文件,採用流下載
*/
private static final int BIG_FILE_SIZE = 20 * 1024 * 1024;
private static String prefix = String.valueOf(System.currentTimeMillis());
public void downloadByMultithread(String url, String targetPath, Integer threadNum) {
long startTimestamp = System.currentTimeMillis();
//開啓線程
threadNum = threadNum == null ? MIN_POOL_SIZE : threadNum;
Assert.isTrue(threadNum > 0, "線程數不能爲負數");
ThreadFactory factory = new ThreadFactoryBuilder().setNameFormat("http-demo-%d").build();
threadPool = new ThreadPoolExecutor(
threadNum, MAX_POOL_SIZE, 0, TimeUnit.MINUTES,
new LinkedBlockingDeque<>(WAIT_QUEUE_SIZE), factory);
boolean isBigFile;
//調用head方法,只獲取頭信息,拿到文件大小
long contentLength = httpClientTemplate.headForHeaders(url).getContentLength();
isBigFile = contentLength >= BIG_FILE_SIZE;
if (contentLength > 1024 * ONE_KB_SIZE) {
log.info("[多線程下載] Content-Length\t" + (contentLength / 1024 / 1024) + "MB");
} else if (contentLength > ONE_KB_SIZE) {
log.info("[多線程下載] Content-Length\t" + (contentLength / 1024) + "KB");
} else {
log.info("[多線程下載] Content-Length\t" + (contentLength) + "B");
}
long tempLength = contentLength / threadNum;
long start, end = -1;
ArrayList<CompletableFuture<DownloadTemp>> futures = Lists.newArrayListWithCapacity(threadNum);
String fileFullPath;
RandomAccessFile resultFile;
try {
fileFullPath = webFileUtils.getAndCreateDownloadDir(url, targetPath);
//創建目標文件
resultFile = new RandomAccessFile(fileFullPath, "rw");
log.info("[多線程下載] Download started, url:{}\tfileFullPath:{}", url, fileFullPath);
for (int i = 0; i < threadNum; ++i) {
start = end + 1;
end = end + tempLength;
if (i == threadNum - 1) {
end = contentLength;
}
log.info("[多線程下載] start:{}\tend:{}",start, end);
DownloadThread thread = new DownloadThread(httpClientTemplate, i, start, end, url, fileFullPath, isBigFile);
CompletableFuture<DownloadTemp> future = CompletableFuture.supplyAsync(thread::call, threadPool);
futures.add(future);
}
} catch (Exception e) {
log.error("[多線程下載] 下載出錯", e);
return;
}finally {
threadPool.shutdown();
}
//合併文件
futures.forEach(f -> {
try {
f.thenAccept(o -> {
try {
log.info("[多線程下載] {} 開始合併,文件:{}", o.threadName, o.filename);
RandomAccessFile tempFile = new RandomAccessFile(o.filename, "rw");
tempFile.getChannel().transferTo(0, tempFile.length(), resultFile.getChannel());
tempFile.close();
File file = new File(o.filename);
boolean b = file.delete();
log.info("[多線程下載] {} 刪除臨時文件:{}\t結果:{}", o.threadName, o.filename, b);
} catch (IOException e) {
e.printStackTrace();
log.error("[多線程下載] {} 合併出錯", o.threadName, e);
}
}).get();
} catch (Exception e) {
log.error("[多線程下載] 合併出錯", e);
}finally {
threadPool.shutdown();
}
});
long completedTimestamp = System.currentTimeMillis();
log.info("=======下載完成======,耗時{}",
isBigFile ? (completedTimestamp - startTimestamp) / 1000 + "s" : (completedTimestamp - startTimestamp) + "ms");
}
public class DownloadThread implements Callable<DownloadTemp> {
private int index;
private String filePath;
private long start, end;
private String urlString;
private RestTemplate httpClientTemplate;
private boolean isBigFile;
DownloadThread(RestTemplate restTemplate, int index, long start, long end, String url, String fileFullPath, boolean isBigFile) {
this.httpClientTemplate = restTemplate;
this.urlString = url;
this.index = index;
this.start = start;
this.end = end;
this.isBigFile = isBigFile;
Assert.hasLength(fileFullPath, "文件下載路徑不能爲空");
this.filePath = String.format("%s-%s-%d", fileFullPath, prefix, index);
}
@Override
public DownloadTemp call() {
//定義請求頭的接收類型
try {
if (isBigFile) {
downloadBigFIle();
} else {
downloadLittleFIle();
}
} catch (Exception e) {
log.error("[線程下載] 下載失敗:", e);
}
DownloadTemp downloadTemp = new DownloadTemp();
downloadTemp.index = index;
downloadTemp.filename = filePath;
downloadTemp.threadName = Thread.currentThread().getName();
log.info("[線程下載] \tcompleted.");
return downloadTemp;
}
/**
* 下載小文件
*
* @throws IOException
*/
private void downloadLittleFIle() throws IOException {
HttpHeaders headers = new HttpHeaders();
headers.set(HttpHeaders.RANGE, "bytes=" + start + "-" + end);
headers.setAccept(Collections.singletonList(MediaType.ALL));
ResponseEntity<byte[]> rsp = httpClientTemplate.exchange(urlString, HttpMethod.GET, new HttpEntity<>(headers), byte[].class);
log.info("[線程下載] 返回狀態碼:{}", rsp.getStatusCode());
Files.write(Paths.get(filePath), Objects.requireNonNull(rsp.getBody(), "未獲取到下載文件"));
}
/**
* 下載大文件
*
* @throws IOException
*/
private void downloadBigFIle() {
RequestCallback requestCallback = request -> {
HttpHeaders headers = request.getHeaders();
headers.set(HttpHeaders.RANGE, "bytes=" + start + "-" + end);
headers.setAccept(Arrays.asList(MediaType.APPLICATION_OCTET_STREAM, MediaType.ALL));
};
// getForObject會將所有返回直接放到內存中,使用流來替代這個操作
ResponseExtractor<Void> responseExtractor = response -> {
// Here I write the response to a file but do what you like
Files.copy(response.getBody(), Paths.get(filePath));
log.info("[線程下載] 返回狀態碼:{}", response.getStatusCode());
return null;
};
httpClientTemplate.execute(urlString, HttpMethod.GET, requestCallback, responseExtractor);
}
}
private static class DownloadTemp {
private int index;
private String filename;
private String threadName;
}
}
下載那個50M的搜狗瀏覽器, 耗時5s, 因爲這個類裏面我判斷了20M以上爲大文件,採用上面的流的方式下載,所以這裏內存基本100到200M之間.
下載500M的Idea試試看,可以看到內存還是保持在200M以內,速度的話達到3M/s, 總共花了200多秒也就是4分鐘左右下完.