🦮 Ret-XKnow: End-to-End Retriever to eXpand visual Knowledge

Ret-XKnow endows a text retriever with the understanding of multimodal queries in a context of efficient information retrieval.

Web demo

Settings

Download the pre-trained ColBERTv2 checkpoint to ckpts and unzip the downloaded file to ckpts/colbertv2.0.

We employ ColBERTv2 as a text retriever baseline.
Install python packages.
```
pip3 install -r requirements.txt
```
Download downstream task datasets.

We use two retrieval datasets curated from OK-VQA, ReMuQ, and A-OKVQA. You can download the dataset in the following links:
- OK-VQA (Wiki-11M) Download annotation files to data/okvqa.
- OK-VQA (Google Search) In this dataset, questions in the annotation files include captions for images. Thus, we edit the questions to remove captions. See dataset/vqa_ret.py for details.
- ReMuQ
- A-OKVQA

Visual Dialogue-to-Retrieval (ViD2R) Dataset Construction

Download instruction data and image datasets from the following pages:
- Visual instruction dataset (Here, download images with the dialogue dataset)
- LVIS-Instruct4V
Pre-processing and neural filtering using a text retriever:

You can skip the neural filtering step by modifying the code if you want to build the dataset fast.
```
python3 -m runs.neural_filtering --data_paths path_to_data1 path_to_data2 --colbert_ckpt [directory_with_colbert_checkpoint] --save_path [path_to_save]
```
Converting responses to passages:

We require a knowledge base (KB) and a text retriever to convert dialogues to retrieval tasks. We adopt 6M Wikipedia passages as the KB. You can download the passages in this link.
```
python3 -m runs.convert_tasks --data_path [path to pre-processed data] --colbert_ckpt [directory with colbert checkpoint] --db_pool [path to KB] --save_path data/vid2r/ViD2R.json
```

Please check scripts/make_pretraining_data as an example.

Training Ret-XKnow

First, set configure files!

Pre-training Ret-XKnow on the ViD2R datset

export WANDB_API_KEY=[Your_WANDB_KEY]
CONFIG_PATH=cfgs/xknow_train_vid2r.yaml

export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
NPROC=8

# Caching visual embeddings
python3 -m runs.run_visual_embedder --model_name openai/clip-vit-base-patch32 --data_path data/vid2r/ViD2R.json --batch_size 512 --image_dir data/vid2r/images

# Run training command
python3 -m torch.distributed.run --nproc_per_node=$NPROC train_retrieval.py --config_path "$CONFIG_PATH"

or

After modifying the shell file scripts/pretrain_xknow_inbatch.sh to your path, execute the following command:

bash scripts/pretrain_xknow_inbatch.sh

This shell file evaluates zero-shot performance after training. If indexing has already been done, comment out the execution of run_indexer.

Fine-tuning Ret-XKnow on the downstream task

If you want to cache visual embeddings, set image_cached to True in the config file and execute runs/run_visual_embedder.py after slightly modifying the code.

export WANDB_API_KEY=[Your_WANDB_KEY]
CONFIG_PATH=[Path_to_config_file]

export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
NPROC=8

# Run training command
python3 -m torch.distributed.run --nproc_per_node=$NPROC train_retrieval.py --config_path "$CONFIG_PATH"

Pre-Indexing

If you do not input --xknow_ckpt, the following code loads the checkpoint of ColBERTv2.

CUDA_VISIBLE_DEVICES=0,1,2,3 python3 -m runs.run_indexer --exp_name [experiment_name] --n_bits [2|4|8] --dataset_name [okvqa|okvqa_gs|infoseek] --all_blocks_file [path to knowledge base] --xknow_ckpt $CHECKPOINT

Evaluation

If you do not provide --xknow_ckpt and --image_dir, this following code use the text retriever (ColBERTv2)

python3 -m runs.evaluate_retrieval \
    --dataset_name [okvqa|okvqa_gs|infoseek] \
    --index_name [experiment_name].nbits=[n_bits] \
    --save_path [path to save result file] \
    --all_blocks_file [path to knowledge base] \
    --anno_file [path to test file] \
    --xknow_ckpt $CHECKPOINT \
    --image_dir [directory to images]

Web Demo

Settings

pip3 install flask flask_cors
curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.39.1/install.sh | bash
nvm install --lts

Web Server Start

cd searcher
npm install
npm start

Search Engine Start

Before you start this engine, check checkpoint path in this code.

python3 search_api.py

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
assets		assets
cfgs		cfgs
dataset		dataset
results		results
retrievers		retrievers
runs		runs
scripts		scripts
searcher		searcher
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
search_api.py		search_api.py
train_retrieval.py		train_retrieval.py
visualization.ipynb		visualization.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🦮 Ret-XKnow: End-to-End Retriever to eXpand visual Knowledge

Settings

Visual Dialogue-to-Retrieval (ViD2R) Dataset Construction

Training Ret-XKnow

Pre-Indexing

Evaluation

Web Demo

About

Uh oh!

Uh oh!

Languages

License

yeongjoonJu/Ret_XKnow

Folders and files

Latest commit

History

Repository files navigation

🦮 Ret-XKnow: End-to-End Retriever to eXpand visual Knowledge

Settings

Visual Dialogue-to-Retrieval (ViD2R) Dataset Construction

Training Ret-XKnow

Pre-Indexing

Evaluation

Web Demo

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Uh oh!

Languages