Cosmo's Blog

Back

LongVALE 论文复现#

项目开源代码

复现环境#

硬件配置#

  • 系统:Ubuntu 18.04
  • CPU:Intel(R) Xeon(R) Platinum 8280 CPU @ 2.70GHz
  • GPU:RTX 3090×8
  • 内存:791224272 kB ~ 754.6 GB

环境配置#

本项目基于 VTimeLLM 项目代码

在按照官方开源项目的 README 进行复现时出现问题, 同样的问题在 VTimeLLM 中也存在, 调整如下:

git clone https://github.com/ttgeng233/LongVALE.git
cd LongVALE
conda create --name=longvale python=3.10
conda activate longvale
pip install torch==2.1.2 torchvision==0.16.2 torchaudio==2.1.2 --index-url https://download.pytorch.org/whl/cu118
pip install -r requirements.txt
# 在此处显式降级 flash-attn 版本, 否则编译时会出错
pip install flash-attn==1.0.9 --no-build-isolation

pip install "numpy<2.0"
pip install moviepy==1.0.3
zsh

数据集下载#

这篇论文有自己的一个数据集, 名为 LongVALE, 可以从百度网盘或者 Hugging Face 下载 (不过 Hugging Face 上的数据集是多个压缩包, 百度网盘下着方便点, 买个闲时下载卡, 实验室主机挂一晚上就下好了),

原始视频数据压缩后, 训练集 190.2G, 测试集 40.55G

考虑到校园网好像不太够, 现在只下载了测试集进行推理,

运行评估代码#

nohup python longvalellm/eval/eval.py --video_feat_folder features_eval/visual_features_1171 --audio_feat_folder features_eval/audio_features_1171 --asr_feat_folder features_eval/speech_features_1171 --task all --log_path log > output.log0 2>&1 &
zsh

时序视频定位#

nohup python longvalellm/eval/eval.py --video_feat_folder features_eval/visual_features_1171 --audio_feat_folder features_eval/audio_features_1171 --asr_feat_folder features_eval/speech_features_1171 --task grounding --log_path log > output.log1 2>&1 &    
python longvalellm/eval/metric.py  --task grounding --log_path log
zsh
  • 评估结果

    ====================== Grounding ======================
    Found 13867 logs
    mIoU: 10.88
    R1@0.3: 15.68
    R1@0.5: 8.62
    R1@0.7: 3.87
    zsh
  • 和论文结果的对比

    mIoUR@0.3R@0.5R@0.7
    LongVALE11.015.78.63.9
    Test10.8815.688.623.87

密集视频字幕生成#

nohup python longvalellm/eval/eval.py --video_feat_folder features_eval/visual_features_1171 --audio_feat_folder features_eval/audio_features_1171 --asr_feat_folder features_eval/speech_features_1171 --task captioning --log_path log > output2.log 2>&1 &
python longvalellm/eval/metric.py  --task captioning --log_path log
zsh
  • 评估结果

    ====================== Captioning =====================
    Found 1171 logs
    soda_c: 2.80
    METEOR: 4.68
    CIDEr: 7.84
    zsh
  • 和论文结果的对比

    SCM
    LongVALE2.87.94.7
    Test2.807.844.68

片段字幕生成#

nohup python longvalellm/eval/eval.py --video_feat_folder features_eval/visual_features_1171 --audio_feat_folder features_eval/audio_features_1171 --asr_feat_folder features_eval/speech_features_1171 --task seg_captioning --log_path log > output3.log 2>&1 &
python longvalellm/eval/metric.py  --task seg_captioning --log_path log
zsh
  • 评估结果

    ======================Segemnt Captioning =====================
    BLEU4: 5.58%
    METEOR: 10.94%
    Rouge: 22.40%
    CIDEr: 20.05%
    zsh
  • 和论文结果的对比

    BRCM
    LongVALE5.622.420.310.9
    Test5.5822.4020.0510.94
LongVALE 论文复现
https://astro-pure.js.org/blog/longvale_rep
Author Cosmo
Published at November 3, 2025