VCT_AVS 论文复现#

复现环境#

硬件配置#

本项目基于 COMBO-AVS, 按说明复现大概只能在 mac 或 linux 上进行（Linux or macOS with Python ≥ 3.6）, 在本机 window 上复现失败

# 此步骤后
detectron2 pip install -e .
# 出现报错
“OSError: [WinError 182] 操作系统无法运行 %1。 Error loading “D:\Cache\PythonLib\envs\vct_avs\lib\site-packages\torch\lib\shm.dll” or one of its dependencies.”

bash

遂使用实验室服务器进行复现, 使用 Trae CN ( 类 VSCode ) 进行 ssh 远程连接服务器 jupyter notebook 的环境我已在实验室服务器（75）上完成配置, 直接连接即可使用。

可以看实验室服务器连接中的具体说明, 此方法仅可使用 notebook, 由于终端无法使用, 无法在此上直接运行 py 文件；不过方便的是相当于直接远程操作界面无需手动同步, 作为远程代码编辑器也很方便

系统：Ubuntu 18.04
CPU：Intel(R) Xeon(R) Platinum 8280 CPU @ 2.70GHz
GPU：RTX 3090×8
内存：791224272 kB ~ 754.6 GB

环境配置#

在按照官方开源项目的 README 进行复现时出现问题, 下载的 pytorch 为 cpu 版本, 调整后 PyTorch 1.13.1 和当前下载的 numpy 版本 1.22.4 不兼容, 且考虑到 numba 0.58.1 要求 1.22 <= numpy < 1.27, 最终使用 1.22.4 版本的 numpy。

调整后的配置如下：

git clone https://github.com/spyflying/VCT_AVS.git
cd VCT_AVS
conda create -n vct_avs python==3.8 -y
conda activate vct_avs

# 加速下载, 服务器网不太好, conda 下可能卡半天, 还要等他解决环境冲突
pip install torch==1.13.1+cu117 torchvision==0.14.1+cu117 torchaudio==0.13.1 -i https://pypi.tuna.tsinghua.edu.cn/simple/ --extra-index-url https://download.pytorch.org/whl/cu117 

git clone https://github.com/facebookresearch/detectron2
cd detectron2
pip install -e .

cd ..
# 修改 requirements.txt 中 numpy 版本为 1.22.4
pip install -r requirements.txt

cd models/modeling/pixel_decoder/ops
# 等待编译完成, 大概需要两分钟好像, 记不到了
bash make.sh

zsh

数据集下载#

记录一下申请数据集的通用话术

点击查看

I am a graduate student from Beijing Institute of Technology, currently working on replicating the VCT_AVS paper to gain inspiration for my recent research. This paper utilized the dataset you provided, which is the reason I am reaching out. I hereby agree to use any provided methods strictly for academic research purposes.

Thank you for your patient response.

本论文使用的数据集为 AVSBench ↗, ~~需要发邮件申请下载(处理邮件好慢, 在 HuggingFace 上找到了, 不过是 Gated dataset,需要登录 )~~

由于服务器上代理我懒的配置, 而且这个月代理流量没多少了, 所以采用国内镜像站 https://hf-mirror.com ↗ 进行下载。

首先先去开一个 HuggingFace Access Token ↗, 格式为 hf_****

# 可以激活已经安装 huggingface_hub 的 conda 环境来安装哦
pip install huggingface_hub
pip install -U hf-transfer

# 配置环境变量
export HF_ENDPOINT="https://hf-mirror.com"
export HF_HUB_ENABLE_HF_TRANSFER=1

# 其实也可以直接修改 huggingface_hub\constants.py 把里面有个变量 huggingface.co 直接换成 https://hf-mirror.com
huggingface-cli download --token hf_**** --resume-download --local-dir-use-symlinks False Exgc/AVSBench --local-dir your_dir --repo-type dataset

zsh

your_dir 换成当前终端工作目录下你要放置数据集的目录

如果觉得下载还是很慢, 可以用 hfd, 相关文档链接 ↗

alias hfd="$PWD/hfd.sh"
hfd Exgc/AVSBench --hf_username username --hf_token hf_*** --dataset --local-dir your_dir

zsh

不过还是很慢, 要等好久才能下完估计, 然后我发现 avss 的数据可以从百度网盘上下, 买个闲时下载卡一下子就下完了。

下载数据集后将其组织如下：

|--AVS_dataset
    |--AVSBench_semantic/
    |--AVSBench_object/Multi-sources/
    |--AVSBench_object/Single-source/

bash

为节省时间, 建议直接去 annQi/COMBO-AVS-checkpoints · Hugging Face 上下载预生成的S4\MS3\AVSS 子集 Maskiges, 放置目录如下

|--AVS_dataset
    |--AVSBench_semantic/pre_SAM_mask/
    |--AVSBench_object/Multi-sources/ms3_data/pre_SAM_mask/
    |--AVSBench_object/Single-source/s4_data/pre_SAM_mask/

bash

运行下面的命令处理 384x384 分辨率的数据集（ms3_process.py 中需注释掉第 39 行的断点）

python avs_tools/preprocess_avss_audio.py
python avs_tools/generate_data_384/ms3_process.py
python avs_tools/generate_data_384/s4_process.py
python avs_tools/generate_data_384/ss_process.py

zsh

在这一步可能遇到:

ImportError: cannot import name 'Image' from 'PIL' (unknown location)

zsh

只需要重新安装 Pillow 库即可

pip uninstall pillow
pip install pillow==9.2.0

zsh

如果重新运行还是报错说没有 detectron2 模块, 重新进行上面对应的安装步骤即可

此外, 还可能提示缺少 einops 库, 需要安装一下

pip install einops

zsh

预训练模型下载#

在 ImageNet-22K 上预训练的 Swin-Base-384 模型下载 ↗

# 转化模型
cd avs_tools
python swin_base_patch4_window12_384_22k.pth swin_base_patch4_window12_384_22k.pkl

zsh

参考 COMBO-AVS 下载其他预训练模型(在 HuggingFace 上)

|--pretrained
    |--detectron2/R-50.pkl
    |--detectron2/d2_pvt_v2_b5.pkl
    |--vggish-10086976.pth
    |--vggish_pca_params-970ea276.pth

bash

下载提供的检查点

测试与评估#

sh scripts/$subset$_swinb_384_test.sh

zsh