1.問題描述
假設有一堆網段,如下所示:
192.168.1.100-192.168.1.120,AAA,id1
192.168.1.50-192.168.1.150,BBB,id2
10.67.1.1/24,CCC,id3
10.67.1.1,DDD,id4
10.67.1.0,EEE,id5
第一列是單IP或者網段,第二列是設備hash,第三列是這個IP或者網段的唯一標誌。IP與網段、網段與網段之間是可以有交叉的,比如192.16.1.100-192.168.1.120和192.168.1.50-192.168.1.150就是存在一部分交叉的。
要求,當一個日誌過來過來時,需要得到它裏面的IP的唯一ID,比如10.67.1.125過來,我們就知道它是屬於10.67.1.1/24網段,那麼就是用id3來標誌它;比如10.67.1.1過來時,我們發現既可以命中10.67.1.1(id4)也可以命中10.67.1.1/24(id3),那麼就提取日誌裏記錄的hash值,比如hash是CCC,那麼我們就知道它的唯一ID應該是id3。
2.解題思路
1.首先將網段拆開爲多個不交叉的段,並且記錄每個段對應的相關信息
2.在一個日誌過來時,提取其IP在網段裏進行二分查找,當查找到的相關信息唯一時,直接返回;當查找到的相關信息不唯一時,使用hash值進行比對,從而拿到命中值
2.1網段拆分爲不交叉段
對於一組數據,如下所示:
s1,e1,id1
s2,e2,id2
s3,e3,id3
s和e分別代表起始數值,id爲唯一標誌。它們的範圍如下圖所示:
我們可以按數值的大小進行排序,注意,如果兩個值相等的情況下,起始值排在結束值之前,排序之後如下所示:
s1,s2,s3,e1,e2,e3
分析的過程如下圖所示:
在step1,得到的結果應該是:[s1,s2-1] id1
在step2,得到的結果應該是:[s2,s3-1] id1,id2
在step3,得到的結果應該是:[s3,e1] id1,id2,id3
在step4,得到的結果應該是:[e1+1,e2] id2,id3
在step5,得到的結果應該是:[e2+1,e3] id3
那麼總結的處理過程如下:
從排序後的數值list裏拿到兩個元素n1,n2,用一個全局的S記錄id值
- 如果n1是s類型,則a=n1,且將n1對應的id加入S中;如果n1是e類型,則a=n1+1,則將n1對應的id從S中移除
- 如果n2是s類型,則b=n2-1;如果n2是e類型,則b=n2
- 如果a<=b且S不爲空,則對於[a,b]以及S進行輸出
以此循環,直至對於list處理結束。
核心代碼如下所示:
public List<OutputNode> init(List<IpRangeAsset> list){
List<GapRangeNode> gapRangeNodeList = new ArrayList<>();
for(int i=0;i<list.size();i++){
gapRangeNodeList.add(new GapRangeNode(list.get(i).getStartIpInt(),i,false));
gapRangeNodeList.add(new GapRangeNode(list.get(i).getEndIpInt(),i,true));
}
Collections.sort(gapRangeNodeList, new Comparator<GapRangeNode>() {
@Override
public int compare(GapRangeNode o1, GapRangeNode o2) {
return o1.getNumber()-o2.getNumber()<0||(o1.getNumber()==o2.getNumber()&&o2.isEnd())?-1:1;
}
});
OutputNode outputNode = new OutputNode(-1,-1);
List<OutputNode> resultList = new ArrayList<OutputNode>();
for(int i=0;i<gapRangeNodeList.size()-1;i++){
GapRangeNode n1 = gapRangeNodeList.get(i);
GapRangeNode n2 = gapRangeNodeList.get(i+1);
long n = -1;
long m = -1;
if(n1.isEnd()){
n = n1.getNumber()+1;
outputNode.removeAssetIndex(list.get(n1.getIndex()));
}else{
n = n1.getNumber();
outputNode.addAssetIndex(list.get(n1.getIndex()));
}
if(n2.isEnd()){
m = n2.getNumber();
}else{
m = n2.getNumber() - 1;
}
if(n <= m && outputNode.getAssetIndexSet().size() > 0){
OutputNode copyNode = new OutputNode(n,m);
Iterator<IpRangeAsset> iterator = outputNode.getAssetIndexSet().iterator();
while(iterator.hasNext()){
copyNode.addAssetIndex(iterator.next());
}
resultList.add(copyNode);
}
}
return resultList;
}
網段拆分的效果如下:
2.2二分查找
由於已經將網段拆分成了不相交的網段,所以可以直接將網段按照從小到大進行排序,得到一個有序的數組。然後使用二分查找來找尋IP所屬的網段,如果命中的網段對應着多個結果,那麼使用hash找到對應的結果。在IP範圍數目爲100萬的情況下,如果使用二分查找,最壞的比對次數爲math.log(1000000,2)約等於20次,而直接順序查找最壞的比對次數爲1000000次。
二分查找的核心代碼如下:
public IpRangeAsset search(long ipint,String devHash,List assetList){
int low = 0;
int high = assetList.size()-1;
IpRangeAsset ipRangeAsset = null;
while(low <= high){
int index = (low+high)/2;
if(ipint > assetList.get(index).getEnd()){
low = index + 1;
}else if(ipint < assetList.get(index).getStart()){
high = index - 1;
}else{
if(assetList.get(index).getAssetIndexSet().size() == 1){
ipRangeAsset = assetList.get(index).getAssetIndexSet().iterator().next();
}else{
Iterator iterator = assetList.get(index).getAssetIndexSet().iterator();
while(iterator.hasNext()){
IpRangeAsset entry = (IpRangeAsset) iterator.next();
if(entry.getDevHash().equals(devHash)){
ipRangeAsset = entry;
break;
}
}
}
break;
}
}
return ipRangeAsset;
}
3.性能對比
對於IP、IP範圍數目爲100萬的情況下,對於50000個IP進行查找的耗時進行了比對,比對結果如下所示:
二分查找耗時爲27ms,而普通查找爲35974ms,耗時大概相差3個數量級,差距還是比較明顯的。
下面附上完整代碼:
IpRangeAsset.java
package com.formatengine.asset.v4;
import com.alibaba.fastjson.JSONObject;
import net.ripe.commons.ip.Ipv4;
import net.ripe.commons.ip.Ipv4Range;
import java.io.Serializable;
import java.util.HashMap;
import java.util.Map;
/**
* Created by dell on 2019/5/17.
* 主機型資產和網段型資產對應的類
*/
public class IpRangeAsset implements Serializable{
private String startIp;
private String endIp;
private String devHash;
private Long startIpInt;
private Long endIpInt;
//地理位置、資產id等增強信息存儲之處
private Map additional = new HashMap();
public IpRangeAsset(String ip,String devHash,Map additional){
if(ip.contains("/")){
String[] tempIps = ip.split("/");
this.startIp = Ipv4.of(tempIps[0]).lowerBoundForPrefix(Integer.valueOf(tempIps[1])).toString();
Ipv4Range ipv4Range = Ipv4Range.parse(startIp+"/"+tempIps[1]);
this.endIp = ipv4Range.end().toString();
this.startIpInt = ipv4Range.start().asBigInteger().longValue();
this.endIpInt = ipv4Range.end().asBigInteger().longValue();
}else if(ip.contains("-")){
Ipv4Range ipv4Range = Ipv4Range.parse(ip);
this.startIp = ipv4Range.start().toString();
this.startIpInt = ipv4Range.start().asBigInteger().longValue();
this.endIp = ipv4Range.end().toString();
this.endIpInt = ipv4Range.end().asBigInteger().longValue();
}else{
this.startIp = ip;
this.endIp = ip;
this.devHash = devHash;
this.additional = additional;
startIpInt = Ipv4.of(ip).asBigInteger().longValue();
endIpInt = Ipv4.of(ip).asBigInteger().longValue();
}
this.devHash = devHash;
this.additional = additional;
}
public String getStartIp() {
return startIp;
}
public void setStartIp(String startIp) {
this.startIp = startIp;
}
public String getDevHash() {
return devHash;
}
public void setDevHash(String devHash) {
this.devHash = devHash;
}
public String getEndIp() {
return endIp;
}
public void setEndIp(String endIp) {
this.endIp = endIp;
}
public Long getStartIpInt() {
return startIpInt;
}
public void setStartIpInt(Long startIpInt) {
this.startIpInt = startIpInt;
}
public Long getEndIpInt() {
return endIpInt;
}
public void setEndIpInt(Long endIpInt) {
this.endIpInt = endIpInt;
}
public Map getAdditional() {
return additional;
}
public void setAdditional(Map additional) {
this.additional = additional;
}
public String toString(){
return JSONObject.toJSONString(this);
}
}
OutputNode.java
package com.formatengine.asset.v4;
import com.alibaba.fastjson.JSONObject;
import net.ripe.commons.ip.Ipv4;
import java.io.Serializable;
import java.util.HashSet;
import java.util.Set;
/**
* Created by dell on 2019/5/20.
* 用於進行資產範圍切分輸出的model
*/
public class OutputNode implements Serializable{
private long start;
private long end;
private String startIp;
private String endIp;
private Set indexAssetSet = null;
public OutputNode(long start, long end) {
this.start = start;
this.end = end;
this.indexAssetSet = new HashSet();
if(this.start != -1){
this.startIp = Ipv4.of(start).toString();
}
if(this.end != -1){
this.endIp = Ipv4.of(end).toString();
}
}
public long getStart() {
return start;
}
public void setStart(long start) {
this.start = start;
this.startIp = Ipv4.of(start).toString();
}
public long getEnd() {
return end;
}
public void setEnd(long end) {
this.end = end;
this.endIp = Ipv4.of(end).toString();
}
public Set getAssetIndexSet() {
return indexAssetSet;
}
public void setAssetIndexSet(Set indexSet) {
this.indexAssetSet = indexSet;
}
public void removeAssetIndex(IpRangeAsset val){
this.indexAssetSet.remove(val);
}
public void addAssetIndex(IpRangeAsset val){
this.indexAssetSet.add(val);
}
public String getStartIp() {
return startIp;
}
public String getEndIp() {
return endIp;
}
public String toString(){
return JSONObject.toJSONString(this);
}
}
GapRangeNode.java
package com.formatengine.asset.v4;
import java.io.Serializable;
/**
* Created by dell on 2019/5/20.
* 用於進行資產範圍拆分的model
*/
public class GapRangeNode implements Serializable{
private long number;
private int index;
private boolean end;
public GapRangeNode(long number, int index, boolean end) {
this.number = number;
this.index = index;
this.end = end;
}
public long getNumber() {
return number;
}
public void setNumber(long number) {
this.number = number;
}
public int getIndex() {
return index;
}
public void setIndex(int index) {
this.index = index;
}
public boolean isEnd() {
return end;
}
public void setEnd(boolean end) {
this.end = end;
}
}
BinarySearch.java
package com.formatengine.asset.v4;
import com.formatengine.asset.util.AssetCategory;
import com.formatengine.asset.util.AssetInfo;
import net.ripe.commons.ip.Ipv4;
import java.io.Serializable;
import java.util.*;
/**
* Created by dell on 2019/5/17.
* 二分查找類
*/
public class BinarySearch implements Serializable{
public List<OutputNode> init(List<IpRangeAsset> list){
if(list == null){
return new ArrayList<OutputNode>();
}
List<GapRangeNode> gapRangeNodeList = new ArrayList<>();
for(int i=0;i<list.size();i++){
gapRangeNodeList.add(new GapRangeNode(list.get(i).getStartIpInt(),i,false));
gapRangeNodeList.add(new GapRangeNode(list.get(i).getEndIpInt(),i,true));
}
Collections.sort(gapRangeNodeList, new Comparator<GapRangeNode>() {
@Override
public int compare(GapRangeNode o1, GapRangeNode o2) {
return o1.getNumber()-o2.getNumber()<0||(o1.getNumber()==o2.getNumber()&&o2.isEnd())?-1:1;
}
});
OutputNode outputNode = new OutputNode(-1,-1);
List<OutputNode> resultList = new ArrayList<OutputNode>();
for(int i=0;i<gapRangeNodeList.size()-1;i++){
GapRangeNode n1 = gapRangeNodeList.get(i);
GapRangeNode n2 = gapRangeNodeList.get(i+1);
long n = -1;
long m = -1;
if(n1.isEnd()){
n = n1.getNumber()+1;
outputNode.removeAssetIndex(list.get(n1.getIndex()));
}else{
n = n1.getNumber();
outputNode.addAssetIndex(list.get(n1.getIndex()));
}
if(n2.isEnd()){
m = n2.getNumber();
}else{
m = n2.getNumber() - 1;
}
if(n <= m && outputNode.getAssetIndexSet().size() > 0){
OutputNode copyNode = new OutputNode(n,m);
Iterator<IpRangeAsset> iterator = outputNode.getAssetIndexSet().iterator();
while(iterator.hasNext()){
copyNode.addAssetIndex(iterator.next());
}
resultList.add(copyNode);
}
}
return resultList;
}
public IpRangeAsset search(long ipint, List<OutputNode> assetList){
int low = 0;
int high = assetList.size()-1;
IpRangeAsset ipRangeAsset = null;
while(low <= high){
int index = (low+high)/2;
if(ipint > assetList.get(index).getEnd()){
low = index + 1;
}else if(ipint < assetList.get(index).getStart()){
high = index - 1;
}else{
if(assetList.get(index).getAssetIndexSet().size() == 1){
ipRangeAsset = assetList.get(index).getAssetIndexSet().iterator().next();
}else{
ipRangeAsset = assetsUUIDMin(new ArrayList<>(assetList.get(index).getAssetIndexSet()));
}
break;
}
}
return ipRangeAsset;
}
public IpRangeAsset search(long ipint,String devHash,List<OutputNode> assetList){
int low = 0;
int high = assetList.size()-1;
IpRangeAsset ipRangeAsset = null;
while(low <= high){
int index = (low+high)/2;
if(ipint > assetList.get(index).getEnd()){
low = index + 1;
}else if(ipint < assetList.get(index).getStart()){
high = index - 1;
}else{
if(assetList.get(index).getAssetIndexSet().size() == 1){
ipRangeAsset = assetList.get(index).getAssetIndexSet().iterator().next();
}else{
Iterator iterator = assetList.get(index).getAssetIndexSet().iterator();
List<IpRangeAsset> matchAssets = new ArrayList<>();
while(iterator.hasNext()){
IpRangeAsset entry = (IpRangeAsset) iterator.next();
// DeviceHash有多個時,dev_hash不爲"", DeviceHash包含dev_hash
if (entry.getDevHash().contains("|") && !devHash.equals("") && entry.getDevHash().contains(devHash)){
matchAssets.add(entry);
}else if (!entry.getDevHash().contains("|") && entry.getDevHash().equals(devHash)){
matchAssets.add(entry);
}
}
if(matchAssets.size() == 0){
// 命中多個ip,但未命中dev_hash
return null;
}else if(matchAssets.size() == 1){
// 取當前命中IP+dev_hash的資產
ipRangeAsset = matchAssets.get(0);
}else{
// 取當前命中IP+dev_hash的資產中min(uuid)的那個資產
ipRangeAsset = assetsUUIDMin(matchAssets);
}
}
break;
}
}
return ipRangeAsset;
}
public IpRangeAsset assetsUUIDMin(List<IpRangeAsset> assets){
IpRangeAsset ipRangeAsset = null;
try{
if(assets == null){
return null;
}
// 選一個初始值,資產uuid都不可能爲空
String uuid_min = assets.get(0).getAdditional().getAssetUUID();;
int min_index = 0;
for(int i=0;i<assets.size();i++){
if(assets.get(i).getAdditional() != null && assets.get(i).getAdditional().getAssetUUID() != null){
int ret = assets.get(i).getAdditional().getAssetUUID().compareTo(uuid_min);
if(ret < 0){
uuid_min = assets.get(i).getAdditional().getAssetUUID();
min_index = i;
}
}
}
return assets.get(min_index);
}catch (Exception e){
e.printStackTrace();
}
return ipRangeAsset;
}
public static void main(String[] args){
List<IpRangeAsset> list = new ArrayList<IpRangeAsset>();
AssetInfo assetInfo1 = new AssetInfo(
"4df585d68cc011e998b4001999db5b24", AssetCategory.IPv4.toString(),"192.168.1.100","資產名1",
"FFF-FFF-FFF-FFE",null,"",1);
AssetInfo assetInfo2 = new AssetInfo(
"4df585d68cc011e998b4001999db5b23", AssetCategory.IPV4RANGE.toString(),"192.168.1.100-192.168.1.120","資產名2",
"FFF-FFF-FFF-FFF",null,"",1);
list.add(new IpRangeAsset(assetInfo1.getAssetLabel(),assetInfo1.getDeviceHash(),assetInfo1));
list.add(new IpRangeAsset(assetInfo2.getAssetLabel(),assetInfo2.getDeviceHash(),assetInfo2));
BinarySearch binarySearch = new BinarySearch();
List<OutputNode> assetList = binarySearch.init(list);
String ip1 = "192.168.1.100";
IpRangeAsset ipRangeAsset1 = binarySearch.search(Ipv4.of(ip1).asBigInteger().longValue(),"FFF-FFF-FFF-FFE",assetList);
if(ipRangeAsset1 == null){
System.out.println(" 未命中資產 ");
}else{
System.out.println(ipRangeAsset1.toString());
}
String ip2 = "192.168.1.101";
IpRangeAsset ipRangeAsset2 = binarySearch.search(Ipv4.of(ip2).asBigInteger().longValue(),assetList);
if(ipRangeAsset2 == null){
System.out.println(" 未命中資產 ");
}else{
System.out.println(ipRangeAsset2.toString());
}
}
}
IP和大整型轉換的依賴:
net.ripe.commons
commons-ip-math
1.23