Skip to main content
Version: Next

Evaluation

Inference results from both closed-loop and open-loop framework shares the same format, so they can both be evaluated by evaluation module.

Specify your LLM

To use your llm api for evaluation, create a mytoken.py under ./B2DVL-Adapter. Take deepseek as an example:

B2DVL-Adapter/mytoken.py
DEEPSEEK_TOKEN = [
"your-token-1", # you can set multiple tokens, and they will be used in a round-robin way
"your-token-2"...
]
DEEPSEEK_URL = "https://api.deepseek.com/v1"

Then our script will call this api using openai templates.

Write a config file

Specify configurations

eval_config.json
{
"EVAL_SUBSET": true, // eval a subset of given infer result folder
"USE_CHECKPOINT": false, // use a file to record evaluation process
"SUBSET_FILE": "./eval_configs/subset.txt", // subset file
"CHECKPOINT_FILE": "./eval_configs/finished_scenarios.txt", // checkpoint file
"INFERENCE_RESULT_DIR": "./infer_results", // path to inference results
// when doing closed-loop inference, this dir is ./output/infer_results/model_name+input_mode
"B2D_DIR": "/path/to/Bench2Drive/dataset", // evaluation script uses annotations in b2d,
// when doing closed-loop inference, this dir is ./eval_v1(SAVE_PATH you specified)/model_name+input_mode
"ORIGINAL_VQA_DIR": "../Carla_Chain_QA/carla_vqa_gen/vqa_dataset/outgraph",
// when doing closed-loop inference, this dir is ./output/vqagen/model_name+input_mode
"FRAME_PER_SEC": 10, // sensor fps
"LOOK_FUTURE": false // not used now, to be removed
}

Run evaluation

Run evaluation script:

python eval.py --config_dir ./path/to/eval_config.json --num_workers 4 --out_dir ./eval_outputs
output directory

Evaluation results will be saved under ${out_dir}/model_name+input_mode