MagicDrive is a PyTorch-based framework designed for generating high-fidelity street-view images and videos with precise 3D geometry control. It integrates various inputs such as camera poses, road maps, 3D bounding boxes, and textual descriptions to produce realistic and consistent multi-view scenes. The system employs tailored encoding strategies and a cross-view attention module to ensure coherence across different viewpoints, making it valuable for applications like 3D object detection and BEV segmentation.
Key Features
Diverse 3D Geometry Control: Incorporates camera poses, road maps, 3D bounding boxes, and textual prompts for detailed scene generation.
Multi-View Consistency: Utilizes a cross-view attention module to maintain consistency across multiple camera perspectives.
High-Resolution Output: Supports image resolutions up to 424×800 pixels and video generation with up to 60 frames.
Plug-and-Play Design: Compatible with existing diffusion models without requiring additional training.
Applications: Enhances datasets for tasks such as BEV segmentation and 3D object detection.