Frequently, when a critical public safety event occurs, various aspects of the event are captured in video from multiple perspectives and seamlessly shared or streamed. This evolution is enabled by the rapid expansion in communications bandwidth. First responders would benefit from the real-time analysis and availability of mission critical information recorded through videos from surveillance cameras, television crews, body-worn cameras and dashboard cameras that reflect the viewpoints of the first responders, while videos recorded by bystanders reflect their individual location perspectives. All of these are now streamed in real-time, providing never before available opportunities to enhance situation awareness for public safety. The goal of Real-Time Video Analytics for Situation Awareness is to construct a comprehensive temporal and spatial visualization of an event as it unfolds. This will allow the command center for first responders as well as post-event investigators to gain detailed information about what is happening at the present and what happened in the past at all the places where the event was observed.
In an effort to move beyond current limitations which require human analysis of any video stream, while location information is limited to placement of GPS signals on a map, we propose to explore extracting space, time and event information from multiple video streams and presenting this information an easily digestible fashion. One proposed method that we describe below is to depict people, or groups of people on a map in two dimensions and dynamically classify their interactions with other people or groups. For instance, in a clash between protesters and police, this mapping makes it possible to quickly understand how these two groups engaged with one another. This map can be linked to the video collection so that the viewer can see an overview, and then view videos that reflect what is happening on the ground at a particular time or in specific area. Selected scenes can be flagged by the system (gunshots, person running, fighting). This type of event reconstruction allows investigators to confirm or refute details of a crime scene report of an attack that was caught on video. For instance, a witness report may say that the victims were shot from behind, while the 3D reconstruction and complete 3D models from multiple videos shows that they were shot from the front. We will explore using computer vision and 3D modeling approaches to develop digital reconstructions of public safety relevant events. In some cases it makes sense to provide a static picture or display in order to understand the positions of various actors at a critical time, and in others instances animation of these reconstructions may help fact finders reach conclusions about what has been happening at a specific place and how it unfolded.
We propose to develop an analytics processing and visualization platform and the associated analytic methodology that makes it possible for investigators and first responders to harness the informational potential of large event-based video recordings in a way that isn’t possible using strictly manual analysis methods. The proposed work will create both a complete system as well as independent tools for analysis of multiple videos of an event for 3D reconstruction, activity detection, and real-time visualization. We will create and publicized datasets from 1) the 2013 Boston Marathon Bombings, 2) the 2014 Maidan Square protests in Kiev, Ukraine, 3) the 2016 Dallas police officer shootings, and 4) we will also collect and test our system on simulated events from multi-person body-worn camera streams, subject to IRB approval.