如何實現圖像搜索,文搜圖,圖搜圖,CLIP+faiss向量數據庫實現圖像高效搜索
這是AIGC的時代,各種GPT大模型生成文本,還有多模態圖文並茂大模型,
以及stable diffusion和stable video diffusion 圖像生成視頻生成等新模型,
層出不窮,如何生成一個圖文並貌的文章,怎麼在合適的段落加入圖像,圖像用什麼方式獲取,
圖像可以使用搜索的形式獲取,也可以使用stable diffusion生成
今天說說怎麼使用搜索的形式獲取,這種方式更高效,節省算力,更容易落地
clip模型,詳細可以查看知乎
https://zhuanlan.zhihu.com/p/511460120
或論文https://arxiv.org/pdf/2103.00020.pdf
什麼是faiss數據庫
Faiss的全稱是Facebook AI Similarity Search,是FaceBook的AI團隊針對大規模相似度檢索問題開發的一個工具,使用C++編寫,有python接口,對10億量級的索引可以做到毫秒級檢索的性能。
簡單來說,Faiss的工作,就是把我們自己的候選向量集封裝成一個index數據庫,它可以加速我們檢索相似向量TopK的過程,其中有些索引還支持GPU構建,可謂是強上加強。
https://engineering.fb.com/2017/03/29/data-infrastructure/faiss-a-library-for-efficient-similarity-search/
1.huggingface下載clip模型,默認是英文版,也有中文版,英文版的效果會更好些
英文版
from PIL import Image import requests from transformers import CLIPProcessor, CLIPModel model = CLIPModel.from_pretrained("openai/clip-vit-base-patch32") processor = CLIPProcessor.from_pretrained("openai/clip-vit-base-patch32") # url = "http://images.cocodataset.org/val2017/000000039769.jpg" # image = Image.open(requests.get(url, stream=True).raw) # inputs = processor(text=["a photo of a cat", "a photo of a dog"], images=image, return_tensors="pt", padding=True) # image_features = model.get_image_features(inputs["pixel_values"]) # text_features = model.get_text_features(inputs["input_ids"],inputs["attention_mask"]) # outputs = model(**inputs) # logits_per_image = outputs.logits_per_image # this is the image-text similarity score # probs = logits_per_image.softmax(dim=1) # we can take the softmax to get the label probabilities # print(probs)
中文版
from PIL import Image import requests from transformers import ChineseCLIPProcessor, ChineseCLIPModel import torch device = torch.device("mps") model = ChineseCLIPModel.from_pretrained("OFA-Sys/chinese-clip-vit-base-patch16") processor = ChineseCLIPProcessor.from_pretrained("OFA-Sys/chinese-clip-vit-base-patch16") # url = "https://clip-cn-beijing.oss-cn-beijing.aliyuncs.com/pokemon.jpeg" # image = Image.open(requests.get(url, stream=True).raw) # Squirtle, Bulbasaur, Charmander, Pikachu in English # texts = ["傑尼龜", "妙蛙種子", "小火龍", "皮卡丘"] # # compute image feature # inputs = processor(images=image, return_tensors="pt") # image_features = model.get_image_features(**inputs) # image_features = image_features / image_features.norm(p=2, dim=-1, keepdim=True) # normalize # # compute text features # inputs = processor(text=texts, padding=True, return_tensors="pt") # text_features = model.get_text_features(**inputs) # text_features = text_features / text_features.norm(p=2, dim=-1, keepdim=True) # normalize # # compute image-text similarity scores # inputs = processor(text=texts, images=image, return_tensors="pt", padding=True) # outputs = model(**inputs) # logits_per_image = outputs.logits_per_image # this is the image-text similarity score # probs = logits_per_image.softmax(dim=1) # probs: [[1.2686e-03, 5.4499e-02, 6.7968e-04, 9.4355e-01]]
2.可以爬一些圖片,做圖像庫,搜索也是在這個圖像庫中搜索,這個爬取的圖像內容和業務場景相關,
比如你想獲取動物的圖像,那主要爬動物的就可以,這是我隨便下載的一些圖片
3.把圖像映射成向量,存儲在向量數據庫faiss中
# from clip_model import model,processor import faiss from PIL import Image import os import json from chinese_clip import model,processor from tqdm import tqdm d = 512 index = faiss.IndexFlatL2(d) # 使用 L2 距離 # 文件夾路徑 # folder_path = '/Users/smzdm/Downloads/Animals_with_Attributes2 2/JPEGImages' folder_path = "image" # 遍歷文件夾 file_paths = [] for root, dirs, files in os.walk(folder_path): for file in files: # 檢查文件是否爲圖片文件(這裏簡單地檢查文件擴展名) if file.lower().endswith(('.png', '.jpg', '.jpeg', '.gif')): file_path = os.path.join(root, file) file_paths.append(file_path) id2filename = {idx:x for idx,x in enumerate(file_paths)} # 保存爲 JSON 文件 with open('id2filename.json', 'w') as json_file: json.dump(id2filename, json_file) for file_path in tqdm(file_paths,total=len(file_paths)): # 使用PIL打開圖片 image = Image.open(file_path) inputs = processor(images=image, return_tensors="pt", padding=True) image_features = model.get_image_features(inputs["pixel_values"]) image_features = image_features / image_features.norm(p=2, dim=-1, keepdim=True) # normalize image_features = image_features.detach().numpy() index.add(image_features) # 關閉圖像,釋放資源 image.close() faiss.write_index(index, "image.faiss")
4.加載數據庫文件和索引文件,使用文本搜索圖像或圖像搜索圖像
# from clip_model import model,processor import faiss from PIL import Image import os import json from chinese_clip import model,processor d = 512 index = faiss.IndexFlatL2(d) # 使用 L2 距離 # 保存爲 JSON 文件 with open('id2filename.json', 'r') as json_file: id2filename = json.load(json_file) index = faiss.read_index("image.faiss") def text_search(text,k=1): inputs = processor(text=text, images=None, return_tensors="pt", padding=True) text_features = model.get_text_features(inputs["input_ids"],inputs["attention_mask"]) text_features = text_features / text_features.norm(p=2, dim=-1, keepdim=True) # normalize text_features = text_features.detach().numpy() D, I = index.search(text_features, k) # 實際的查詢 filenames = [[id2filename[str(j)] for j in i] for i in I] return text,D,filenames def image_search(img_path,k=1): image = Image.open(img_path) inputs = processor(images=image, return_tensors="pt") image_features = model.get_image_features(**inputs) image_features = image_features / image_features.norm(p=2, dim=-1, keepdim=True) # normalize image_features = image_features.detach().numpy() D, I = index.search(image_features, k) # 實際的查詢 filenames = [[id2filename[str(j)] for j in i] for i in I] return img_path,D,filenames if __name__ == "__main__": text = ["雪山","熊貓","長城","蘋果"] text,D,filenames = text_search(text) print(text,D,filenames) # img_path = "image/apple2.jpeg" # img_path,D,filenames = image_search(img_path,k=2) # print(img_path,D,filenames)
比如用文字搜索
["雪山","熊貓","長城","蘋果"]
返回結果:
['雪山', '熊貓', '長城', '蘋果'] [[1.2182312]
[1.1529984]
[1.1177421]
[1.1656866]] [['image/OIP (10).jpeg'], ['image/OIP.jpeg'], ['image/OIP (8).jpeg'], ['image/apple2.jpeg']]
還可以使用圖片搜圖片,打開下面的註釋
返回結果
image/apple2.jpeg [[0. 0.11877532]] [['image/apple2.jpeg', 'image/OIP (14).jpeg']]
第一張圖像是本身,完全相似,第二張可以看到是一個蘋果