Skip to content

yeongjoonJu/Ret_XKnow

Repository files navigation

🦮 Ret-XKnow: End-to-End Retriever to eXpand visual Knowledge

Ret-XKnow endows a text retriever with the understanding of multimodal queries in a context of efficient information retrieval.

Web demo

demo

Settings

  1. Download the pre-trained ColBERTv2 checkpoint to ckpts and unzip the downloaded file to ckpts/colbertv2.0.

    We employ ColBERTv2 as a text retriever baseline.

  2. Install python packages.

    pip3 install -r requirements.txt
  3. Download downstream task datasets.

    We use two retrieval datasets curated from OK-VQA, ReMuQ, and A-OKVQA. You can download the dataset in the following links:

    • OK-VQA (Wiki-11M) Download annotation files to data/okvqa.

    • OK-VQA (Google Search) In this dataset, questions in the annotation files include captions for images. Thus, we edit the questions to remove captions. See dataset/vqa_ret.py for details.

    • ReMuQ

    • A-OKVQA

Visual Dialogue-to-Retrieval (ViD2R) Dataset Construction

  1. Download instruction data and image datasets from the following pages:

  2. Pre-processing and neural filtering using a text retriever:

    You can skip the neural filtering step by modifying the code if you want to build the dataset fast.

    python3 -m runs.neural_filtering --data_paths path_to_data1 path_to_data2 --colbert_ckpt [directory_with_colbert_checkpoint] --save_path [path_to_save]
  3. Converting responses to passages:

    We require a knowledge base (KB) and a text retriever to convert dialogues to retrieval tasks. We adopt 6M Wikipedia passages as the KB. You can download the passages in this link.

    python3 -m runs.convert_tasks --data_path [path to pre-processed data] --colbert_ckpt [directory with colbert checkpoint] --db_pool [path to KB] --save_path data/vid2r/ViD2R.json

Please check scripts/make_pretraining_data as an example.

Training Ret-XKnow

First, set configure files!

Pre-training Ret-XKnow on the ViD2R datset

export WANDB_API_KEY=[Your_WANDB_KEY]
CONFIG_PATH=cfgs/xknow_train_vid2r.yaml

export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
NPROC=8

# Caching visual embeddings
python3 -m runs.run_visual_embedder --model_name openai/clip-vit-base-patch32 --data_path data/vid2r/ViD2R.json --batch_size 512 --image_dir data/vid2r/images

# Run training command
python3 -m torch.distributed.run --nproc_per_node=$NPROC train_retrieval.py --config_path "$CONFIG_PATH"

or

After modifying the shell file scripts/pretrain_xknow_inbatch.sh to your path, execute the following command:

bash scripts/pretrain_xknow_inbatch.sh

This shell file evaluates zero-shot performance after training. If indexing has already been done, comment out the execution of run_indexer.

Fine-tuning Ret-XKnow on the downstream task

If you want to cache visual embeddings, set image_cached to True in the config file and execute runs/run_visual_embedder.py after slightly modifying the code.

export WANDB_API_KEY=[Your_WANDB_KEY]
CONFIG_PATH=[Path_to_config_file]

export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
NPROC=8

# Run training command
python3 -m torch.distributed.run --nproc_per_node=$NPROC train_retrieval.py --config_path "$CONFIG_PATH"

Pre-Indexing

If you do not input --xknow_ckpt, the following code loads the checkpoint of ColBERTv2.

CUDA_VISIBLE_DEVICES=0,1,2,3 python3 -m runs.run_indexer --exp_name [experiment_name] --n_bits [2|4|8] --dataset_name [okvqa|okvqa_gs|infoseek] --all_blocks_file [path to knowledge base] --xknow_ckpt $CHECKPOINT

Evaluation

If you do not provide --xknow_ckpt and --image_dir, this following code use the text retriever (ColBERTv2)

python3 -m runs.evaluate_retrieval \
    --dataset_name [okvqa|okvqa_gs|infoseek] \
    --index_name [experiment_name].nbits=[n_bits] \
    --save_path [path to save result file] \
    --all_blocks_file [path to knowledge base] \
    --anno_file [path to test file] \
    --xknow_ckpt $CHECKPOINT \
    --image_dir [directory to images]

Web Demo

Settings

pip3 install flask flask_cors
curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.39.1/install.sh | bash
nvm install --lts

Web Server Start

cd searcher
npm install
npm start

Search Engine Start

Before you start this engine, check checkpoint path in this code.

python3 search_api.py

About

The code for 'Enhancing Multimodal Query Representation via Visual Dialogues for End-to-End Knowledge Retrieval'

Resources

License

Stars

Watchers

Forks