Think2Drive: Efficient Reinforcement Learning by Thinking with Latent World Model for Autonomous Driving (in CARLA-v2)

*Equal Contribution Corresponding Author
Shanghai Jiao Tong University
ECCV 2024

Think2Drive is the first method which successfully address all the 39 complex scenarios in CARLA V2, Before that, there have been no methods can handle these scenarios since CARLA V2 was release on 2022.10.

Abstract

Real-world autonomous driving (AD) especially urban driving involves many corner cases. The lately released AD simulator CARLA v2 adds 39 common events in the driving scene, and provide more quasi-realistic testbed compared to CARLA v1. It poses new challenge to the community and so far no literature has reported any success on the new scenarios in V2 as existing works mostly have to rely on specific rules for planning yet they cannot cover the more complex cases in CARLA v2. In this work, we take the initiative of directly training a planner and the hope is to handle the corner cases flexibly and effectively, which we believe is also the future of AD. To our best knowledge, we develop the first model-based RL method named Think2Drive for AD, with a world model to learn the transitions of the environment, and then it acts as a neural simulator to train the planner. This paradigm significantly boosts the training efficiency due to the low dimensional state space and parallel computing of tensors in the world model. As a result, Think2Drive is able to run in an expert-level proficiency in CARLA v2 within 3 days of training on a single A6000 GPU, and to our best knowledge, so far there is no reported success (100\% route completion)on CARLA v2. We also propose CornerCase-Repository, a benchmark that supports the evaluation of driving models by scenarios. Additionally, we propose a new and balanced metric to evaluate the performance by route completion, infraction number, and scenario density, so that the driving score could give more information about the actual driving performance.

Task Overview

CARLA V2 introduces 39 complex scenarios that mirror the real-world traffic situation. For instance, there is a scenario where the ego vehicle is on a two-way single-lane road and encounters a construction zone ahead. It requires the ego agent to invade the opposite lane when it is sufficiently clear, circumventing the construction area, and promptly merging back into the original lane afterward. Even a procient human driver has to carefully identity the perfect moment for lane changing in this scenario.

Interpolate start reference image.

Two-Way Construction Scenario

CARLA V2 aims at evaluating the capablity of autonomous driving models for urban driving. However, there have been not any effective solutions for this task, because a huge difficuty gap between CARLA V2 and other benchmarks (such as CARLA V1). It is nearly impossible to hand-write rules for covering all these scenarios. Some other popular approach such as model-free reinforcement learning also fail due to its low training efficiency.
Interpolate start reference image.

Some Scenarios in CARLA V2

Model-based Reinforcement Learning

Think2Drive firstly utlizes model-based reinforcement learning(MBRL) approach to solve such an urban driving task, and proposes devised bricks to handle the challenges along with appling MRBL approach to AD task. For the model's structure, we use DreamerV3 as our base model. We train world model to learn the transition model, reward model and termination model of the environement, and the planner model to maximize the reward predicted by the world model. Due to our world model can "think" in the low-dimensional latent space, Think2Drive can enjoy the super high training efficiency.

Interpolate start reference image.

World Model Learning and Planner Learning in Think2Drive

Result

We evaluate Think2Drive in CARLA V2 and our proposed benchmark CornerCaseRepo. CARLA V2 providea 90 training routes, 2 test routes, 20 validation routes and average length is bigger than 6km, average scenario number is bigger than 50. It is hard to evaluate the driving model's capability for handling these scenarios, due to there is no official API support for the placement of scenarios. CornerCaseRepo contains 4000 training routes and 390 test routes. Each route in CornerCaseRepo only has one type of scenario with typical length less than 200 meter. ConerCaseRepo provides convenience for debugging, scenario-wise traing and evaluation.

Think2Drive in CARLA V2 Test Route.

BibTeX

@article{li2024think2drive,
  title={Think2Drive: Efficient Reinforcement Learning by Thinking in Latent World Model for Quasi-Realistic Autonomous Driving (in CARLA-v2)},
  author={Li, Qifeng and Jia, Xiaosong and Wang, Shaobo and Yan, Junchi},
  journal={arXiv preprint arXiv:2402.16720},
  year={2024}
}