Version: Next

Evaluation

Inference results from both closed-loop and open-loop framework shares the same format, so they can both be evaluated by evaluation module.

Specify your LLM

To use your llm api for evaluation, create a mytoken.py under ./B2DVL-Adapter. Take deepseek as an example:

B2DVL-Adapter/mytoken.py
DEEPSEEK_TOKEN = [
    "your-token-1", # you can set multiple tokens, and they will be used in a round-robin way
    "your-token-2"...
]
DEEPSEEK_URL = "https://api.deepseek.com/v1"

Then our script will call this api using openai templates.

Write a config file

Specify configurations

eval_config.json
{
    "EVAL_SUBSET": true, // eval a subset of given infer result folder
    "USE_CHECKPOINT": false, // use a file to record evaluation process
    "SUBSET_FILE": "./eval_configs/subset.txt", // subset file
    "CHECKPOINT_FILE": "./eval_configs/finished_scenarios.txt", // checkpoint file
    "INFERENCE_RESULT_DIR": "./infer_results", // path to inference results
    // when doing closed-loop inference, this dir is ./output/infer_results/model_name+input_mode
    "B2D_DIR": "/path/to/Bench2Drive/dataset", // evaluation script uses annotations in b2d,
    // when doing closed-loop inference, this dir is ./eval_v1(SAVE_PATH you specified)/model_name+input_mode
    "ORIGINAL_VQA_DIR": "../Carla_Chain_QA/carla_vqa_gen/vqa_dataset/outgraph",
    // when doing closed-loop inference, this dir is ./output/vqagen/model_name+input_mode
    "FRAME_PER_SEC": 10, // sensor fps
    "LOOK_FUTURE": false // not used now, to be removed
}

Run evaluation

Run evaluation script:

python eval.py --config_dir ./path/to/eval_config.json --num_workers 4 --out_dir ./eval_outputs

output directory

Evaluation results will be saved under ${out_dir}/model_name+input_mode

Specify your LLM​

Write a config file​

Run evaluation​

Specify your LLM

Write a config file

Run evaluation