Heterogeneous Crowd Simulation Using Parametric Reinforcement Learning

Agent-based synthetic crowd simulation affords the cost-effective large-scale simulation and animation of interacting digital humans. Model-based approaches have successfully generated a plethora of simulators with a variety of foundations. However, prior approaches have been based on statically def...

Description complète

Détails bibliographiques
Publié dans:	IEEE transactions on visualization and computer graphics. - 1996. - 29(2023), 4 vom: 29. Apr., Seite 2036-2052
Auteur principal:	Hu, Kaidong (Auteur)
Autres auteurs:	Haworth, Brandon, Berseth, Glen, Pavlovic, Vladimir, Faloutsos, Petros, Kapadia, Mubbasir
Format:	Article en ligne
Langue:	English
Publié:	2023
Accès à la collection:	IEEE transactions on visualization and computer graphics
Sujets:	Journal Article

Description
Résumé:	Agent-based synthetic crowd simulation affords the cost-effective large-scale simulation and animation of interacting digital humans. Model-based approaches have successfully generated a plethora of simulators with a variety of foundations. However, prior approaches have been based on statically defined models predicated on simplifying assumptions, limited video-based datasets, or homogeneous policies. Recent works have applied reinforcement learning to learn policies for navigation. However, these approaches may learn static homogeneous rules, are typically limited in their generalization to trained scenarios, and limited in their usability in synthetic crowd domains. In this article, we present a multi-agent reinforcement learning-based approach that learns a parametric predictive collision avoidance and steering policy. We show that training over a parameter space produces a flexible model across crowd configurations. That is, our goal-conditioned approach learns a parametric policy that affords heterogeneous synthetic crowds. We propose a model-free approach without centralization of internal agent information, control signals, or agent communication. The model is extensively evaluated. The results show policy generalization across unseen scenarios, agent parameters, and out-of-distribution parameterizations. The learned model has comparable computational performance to traditional methods. Qualitatively the model produces both expected (laminar flow, shuffling, bottleneck) and unexpected (side-stepping) emergent qualitative behaviours, and quantitatively the approach is performant across measures of movement quality
Description:	Date Completed 10.04.2023 Date Revised 11.04.2023 published: Print-Electronic Citation Status PubMed-not-MEDLINE
ISSN:	1941-0506
DOI:	10.1109/TVCG.2021.3139031