Ret-XKnow endows a text retriever with the understanding of multimodal queries in a context of efficient information retrieval.
Web demo
-
Download the pre-trained ColBERTv2 checkpoint to
ckpts
and unzip the downloaded file tockpts/colbertv2.0
.We employ ColBERTv2 as a text retriever baseline.
-
Install python packages.
pip3 install -r requirements.txt
-
Download downstream task datasets.
We use two retrieval datasets curated from OK-VQA, ReMuQ, and A-OKVQA. You can download the dataset in the following links:
-
OK-VQA (Wiki-11M) Download annotation files to
data/okvqa
. -
OK-VQA (Google Search) In this dataset, questions in the annotation files include captions for images. Thus, we edit the questions to remove captions. See
dataset/vqa_ret.py
for details.
-
-
Download instruction data and image datasets from the following pages:
-
Visual instruction dataset (Here, download images with the dialogue dataset)
-
-
Pre-processing and neural filtering using a text retriever:
You can skip the neural filtering step by modifying the code if you want to build the dataset fast.
python3 -m runs.neural_filtering --data_paths path_to_data1 path_to_data2 --colbert_ckpt [directory_with_colbert_checkpoint] --save_path [path_to_save]
-
Converting responses to passages:
We require a knowledge base (KB) and a text retriever to convert dialogues to retrieval tasks. We adopt 6M Wikipedia passages as the KB. You can download the passages in this link.
python3 -m runs.convert_tasks --data_path [path to pre-processed data] --colbert_ckpt [directory with colbert checkpoint] --db_pool [path to KB] --save_path data/vid2r/ViD2R.json
Please check scripts/make_pretraining_data
as an example.
First, set configure files!
Pre-training Ret-XKnow on the ViD2R datset
export WANDB_API_KEY=[Your_WANDB_KEY]
CONFIG_PATH=cfgs/xknow_train_vid2r.yaml
export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
NPROC=8
# Caching visual embeddings
python3 -m runs.run_visual_embedder --model_name openai/clip-vit-base-patch32 --data_path data/vid2r/ViD2R.json --batch_size 512 --image_dir data/vid2r/images
# Run training command
python3 -m torch.distributed.run --nproc_per_node=$NPROC train_retrieval.py --config_path "$CONFIG_PATH"
or
After modifying the shell file scripts/pretrain_xknow_inbatch.sh
to your path, execute the following command:
bash scripts/pretrain_xknow_inbatch.sh
This shell file evaluates zero-shot performance after training. If indexing has already been done, comment out the execution of run_indexer.
Fine-tuning Ret-XKnow on the downstream task
If you want to cache visual embeddings, set image_cached
to True in the config file and execute runs/run_visual_embedder.py
after slightly modifying the code.
export WANDB_API_KEY=[Your_WANDB_KEY]
CONFIG_PATH=[Path_to_config_file]
export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
NPROC=8
# Run training command
python3 -m torch.distributed.run --nproc_per_node=$NPROC train_retrieval.py --config_path "$CONFIG_PATH"
If you do not input --xknow_ckpt
, the following code loads the checkpoint of ColBERTv2.
CUDA_VISIBLE_DEVICES=0,1,2,3 python3 -m runs.run_indexer --exp_name [experiment_name] --n_bits [2|4|8] --dataset_name [okvqa|okvqa_gs|infoseek] --all_blocks_file [path to knowledge base] --xknow_ckpt $CHECKPOINT
If you do not provide --xknow_ckpt
and --image_dir
, this following code use the text retriever (ColBERTv2)
python3 -m runs.evaluate_retrieval \
--dataset_name [okvqa|okvqa_gs|infoseek] \
--index_name [experiment_name].nbits=[n_bits] \
--save_path [path to save result file] \
--all_blocks_file [path to knowledge base] \
--anno_file [path to test file] \
--xknow_ckpt $CHECKPOINT \
--image_dir [directory to images]
Settings
pip3 install flask flask_cors
curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.39.1/install.sh | bash
nvm install --lts
Web Server Start
cd searcher
npm install
npm start
Search Engine Start
Before you start this engine, check checkpoint path in this code.
python3 search_api.py