KEY TECHNOLOGY AREA(S): Chemical/Biological Defense; Battlespace Environments; Sensors OBJECTIVE: Develop a Deep Learning (DL) threat detection solution using a fusion of imaging sensors and chemical/biological sensors to detect and locate concealed chemical threats (with potential capabilities for both biological and explosives threats). The developed architecture must be deployable on edge computing platforms for use in real time (better than 60 second detection, 60 frames per second for video sensors) standoff threat detection platforms (50 m) such as Unmanned Aerial Vehicles (UAVs) and Unmanned Ground Vehicles (UGVs). DESCRIPTION: Detection of concealed chemical threats through the development of sensor technology has improved over the last decade. Automated threat identification and location remains a challenge in operational environments. Multi-modal sensing promises to provide greater threat detection capability, as well as identification of threat location and type than individual sensing modalities can provide. Imaging solutions, such as infrared sensors, LIDAR, and RADAR, provide threat detection and location capabilities but cannot adequately differentiate chemical threat types. Chemical sensors provide remarkable capability of threat identification, and general vicinity detection but cannot adequately spatially locate chemical threats. A multi-modal approach to threat detection will provide improved situational awareness and inform appropriate threat response. DL algorithms have significantly improved the ability to autonomously detect a wide range of threats. Recent developments of DL algorithms combine audio and visual representations of a state to increase the correlation of input data to a specific target. Extrapolating from visual and auditory inputs to other sensing modalities will drive automated detection algorithms for chemical/biological threats for environmental awareness of operational environments. Development of a DL architecture that jointly exploits the signature from a chemical sensor with a threat signature from an imaging sensor has multiple benefits over single mode detection modalities by increasing threat detection confidence, as well as threat identification/classification. Additionally, multi-modal DL architectures have shown promise of being less vulnerable to adversarial examples, inputs modified by introducing small perturbations to deliberately fool a target model into outputting incorrect results, bringing an additional layer of security to Department of Defense (DoD) threat detection technologies. Phase I proposals should advance the state-of-the-art of automated chemical threat detection, location, and identification by incorporating multi-modal sensor inputs to a DL detection/identification process. To advance to a Phase II project, performers must demonstrate improved detection/identification rates higher than single mode detection techniques. It is expected that in Phase I and Phase II, performers will utilize Commercial-Off-the-Shelf (COTS) or novel sensor technologies. PHASE I: Develop and test a DL architecture that jointly exploits multi-modal sensor inputs for chemical threat detection and identification. Demonstrate the improved detection performance of the multi-modal DL architecture over single mode DL automated detection algorithms. During the Phase I project, the proof-of-concept demonstration should focus on the discrimination of at least two or more chemical threats with a clear path forward on implementing the DL architecture with low size, weight and power (SWaP) computing hardware for edge computing. Chemical warfare threats of all classes are of interested for threat detection and localization (better than 5 meter accuracy). Examples of chemical threats of interest include, but are not limited to, chlorine; tear gas/pepper spray; and nitrogen oxides. The Phase I deliverable should explain the algorithms tested, software concepts, hardware requirements, results of single mode and multimodal detection tests, and potential use cases and limitations identified within the Chemical and Biological Defense program. PHASE II: Phase II will focus on the development and testing of an embedded multi-modal DL algorithm on a field portable computational platform that can accept multiple sensor inputs. Design of the DL architecture will enable low SWaP requirements of the computation platform to enable ease of deployment to operational environments on small unmanned vehicles (UxV). Evaluation of the multi-modal DL chemical detection algorithms will be extended to multiple threat vectors and demonstrate improved threat detection and identification over single mode DL architectures. Laboratory based characterization and validation of the embedded DL algorithm will be required for a successful completion of Phase II. Technical demonstration and validation of the developed technology in operationally relevant environments will take place with government personnel. PHASE III: The expected Phase II end-product is a well-designed, deployable edge computing device with an embedded DL algorithm trained for detection of chemical threats that can be used on ground and aerial vehicles. Follow-on government and civilian activities are expected to be pursued by the offeror. Transition of the developed technology will require refined algorithm training and testing to optimize the threat detection capability for the chemical threats of concern, as well as extending the number of trained threats and modalities that can be identified. Edge case testing will also be explored and defined. PHASE III DUAL USE APPLICATIONS: Multi-modal target discrimination DL algorithms can support and enhance medical diagnostic applications for increased prognostic assurance. Additionally, leveraging multiple sensor inputs improves perception of a vehicle's immediate environment. REFERENCES: 1. Zhang, Shiqing, et al. "Multimodal deep convolutional neural network for audio-visual emotion recognition." Proceedings of the 2016 ACM on International Conference on Multimedia Retrieval. 2016. 2. Alshemali, Basemah, and Jugal Kalita. "Improving the reliability of deep neural networks in NLP: A review." Knowledge-Based Systems 191 (2020): 105210. 3. Ortega, Juan DS, et al. "Multimodal fusion with deep neural networks for audio-video emotion recognition." arXiv preprint arXiv:1907.03196 (2019). 4. Le, Minh Hung, et al. "Automated diagnosis of prostate cancer in multi-parametric MRI based on multimodal convolutional neural networks." Physics in Medicine & Biology 62.16 (2017): 6497. 5. Giering, Michael, Vivek Venugopalan, and Kishore Reddy. "Multi-modal sensor registration for vehicle perception via deep neural networks." 2015 IEEE High Performance Extreme Computing Conference (HPEC). IEEE, 2015.