- Date & Time: Tuesday, March 14, 2023; 1:00 PM
Speaker: Suraj Srinivas, Harvard University
MERL Host: Suhas Lohit
Research Areas: Artificial Intelligence, Computer Vision, Machine Learning
Abstract
In this talk, I will discuss our recent research on understanding post-hoc interpretability. I will begin by introducing a characterization of post-hoc interpretability methods as local function approximators, and the implications of this viewpoint, including a no-free-lunch theorem for explanations. Next, we shall challenge the assumption that post-hoc explanations provide information about a model's discriminative capabilities p(y|x) and instead demonstrate that many common methods instead rely on a conditional generative model p(x|y). This observation underscores the importance of being cautious when using such methods in practice. Finally, I will propose to resolve this via regularization of model structure, specifically by training low curvature neural networks, resulting in improved model robustness and stable gradients.
-
- Date: December 9, 2022
Where: Pittsburg, PA
MERL Contact: Jonathan Le Roux
Research Areas: Artificial Intelligence, Machine Learning, Speech & Audio
Brief - MERL Senior Principal Research Scientist and Speech and Audio Senior Team Leader, Jonathan Le Roux, was invited by Carnegie Mellon University's Language Technology Institute (LTI) to give an invited talk as part of the LTI Colloquium Series. The LTI Colloquium is a prestigious series of talks given by experts from across the country related to different areas of language technologies. Jonathan's talk, entitled "Towards general and flexible audio source separation", presented an overview of techniques developed at MERL towards the goal of robustly and flexibly decomposing and analyzing an acoustic scene, describing in particular the Speech and Audio Team's efforts to extend MERL's early speech separation and enhancement methods to more challenging environments, and to more general and less supervised scenarios.
-
- Date: December 15, 2022 - December 17, 2022
MERL Contacts: Jianlin Guo; Philip V. Orlik; Kieran Parsons
Research Areas: Artificial Intelligence, Data Analytics, Machine Learning
Brief - The performance of manufacturing systems is heavily affected by downtime – the time period that the system halts production due to system failure, anomalous operation, or intrusion. Therefore, it is crucial to detect and diagnose anomalies to allow predictive maintenance or intrusion detection to reduce downtime. This talk, titled "Anomaly detection and diagnosis in manufacturing systems using autoencoder", focuses on tackling the challenges arising from predictive maintenance in manufacturing systems. It presents a structured autoencoder and a pre-processed autoencoder for accurate anomaly detection, as well as a statistical-based algorithm and an autoencoder-based algorithm for anomaly diagnosis.
-
- Date: December 8, 2022
MERL Contacts: Toshiaki Koike-Akino; Pu (Perry) Wang
Research Areas: Artificial Intelligence, Communications, Computational Sensing, Machine Learning, Signal Processing
Brief - On December 8, 2022, MERL researchers Toshiaki Koike-Akino and Pu (Perry) Wang gave a 3.5-hour tutorial presentation at the IEEE Global Communications Conference (GLOBECOM). The talk, titled "Post-Deep Learning Era: Emerging Quantum Machine Learning for Sensing and Communications," addressed recent trends, challenges, and advances in sensing and communications. P. Wang presented on use cases, industry trends, signal processing, and deep learning for Wi-Fi integrated sensing and communications (ISAC), while T. Koike-Akino discussed the future of deep learning, giving a comprehensive overview of artificial intelligence (AI) technologies, natural computing, emerging quantum AI, and their diverse applications. The tutorial was conducted remotely. MERL's quantum AI technology was partly reported in the recent press release (https://us.mitsubishielectric.com/en/news/releases/global/2022/1202-a/index.html).
The IEEE GLOBECOM is a highly anticipated event for researchers and industry professionals in the field of communications. Organized by the IEEE Communications Society, the flagship conference is known for its focus on driving innovation in all aspects of the field. Each year, over 3,000 scientific researchers submit proposals for program sessions at the annual conference. The theme of this year's conference was "Accelerating the Digital Transformation through Smart Communications," and featured a comprehensive technical program with 13 symposia, various tutorials and workshops.
-
- Date: December 2, 2022 - December 8, 2022
MERL Contacts: Matthew Brand; Toshiaki Koike-Akino; Jing Liu; Saviz Mowlavi; Kieran Parsons; Ye Wang
Research Areas: Artificial Intelligence, Control, Dynamical Systems, Machine Learning, Signal Processing
Brief - In addition to 5 papers in recent news (https://www.merl.com/news/news-20221129-1450), MERL researchers presented 2 papers at the NeurIPS Conference Workshop, which was held Dec. 2-8. NeurIPS is one of the most prestigious and competitive international conferences in machine learning.
- “Optimal control of PDEs using physics-informed neural networks” by Saviz Mowlavi and Saleh Nabi
Physics-informed neural networks (PINNs) have recently become a popular method for solving forward and inverse problems governed by partial differential equations (PDEs). By incorporating the residual of the PDE into the loss function of a neural network-based surrogate model for the unknown state, PINNs can seamlessly blend measurement data with physical constraints. Here, we extend this framework to PDE-constrained optimal control problems, for which the governing PDE is fully known and the goal is to find a control variable that minimizes a desired cost objective. We validate the performance of the PINN framework by comparing it to state-of-the-art adjoint-based optimization, which performs gradient descent on the discretized control variable while satisfying the discretized PDE.
- “Learning with noisy labels using low-dimensional model trajectory” by Vasu Singla, Shuchin Aeron, Toshiaki Koike-Akino, Matthew E. Brand, Kieran Parsons, Ye Wang
Noisy annotations in real-world datasets pose a challenge for training deep neural networks (DNNs), detrimentally impacting generalization performance as incorrect labels may be memorized. In this work, we probe the observations that early stopping and low-dimensional subspace learning can help address this issue. First, we show that a prior method is sensitive to the early stopping hyper-parameter. Second, we investigate the effectiveness of PCA, for approximating the optimization trajectory under noisy label information. We propose to estimate the low-rank subspace through robust and structured variants of PCA, namely Robust PCA, and Sparse PCA. We find that the subspace estimated through these variants can be less sensitive to early stopping, and can outperform PCA to achieve better test error when trained on noisy labels.
- In addition, new MERL researcher, Jing Liu, also presented a paper entitled “CoPur: Certifiably Robust Collaborative Inference via Feature Purification" based on his previous work before joining MERL. His paper was elected as a spotlight paper to be highlighted in lightening talks and featured paper panel.
-
- Date: December 2, 2022
MERL Contacts: Toshiaki Koike-Akino; Kieran Parsons; Pu (Perry) Wang; Ye Wang
Research Areas: Artificial Intelligence, Computational Sensing, Machine Learning, Signal Processing, Human-Computer Interaction
Brief - Mitsubishi Electric Corporation announced its development of a quantum artificial intelligence (AI) technology that automatically optimizes inference models to downsize the scale of computation with quantum neural networks. The new quantum AI technology can be integrated with classical machine learning frameworks for diverse solutions.
Mitsubishi Electric has confirmed that the technology can be incorporated in the world's first applications for terahertz (THz) imaging, Wi-Fi indoor monitoring, compressed sensing, and brain-computer interfaces. The technology is based on recent research by MERL's Connectivity & Information Processing team and Computational Sensing team.
Mitsubishi Electric's new quantum machine learning (QML) technology realizes compact inference models by fully exploiting the enormous capacity of quantum computers to express exponentially larger-state space with the number of quantum bits (qubits). In a hybrid combination of both quantum and classical AI, the technology can compensate for limitations of classical AI to achieve superior performance while significantly downsizing the scale of AI models, even when using limited data.
-
- Date & Time: Monday, December 12, 2022; 1:00pm-5:30pm ET
Location: Mitsubishi Electric Research Laboratories (MERL)/Virtual
Research Areas: Applied Physics, Artificial Intelligence, Communications, Computational Sensing, Computer Vision, Control, Data Analytics, Dynamical Systems, Electric Systems, Electronic and Photonic Devices, Machine Learning, Multi-Physical Modeling, Optimization, Robotics, Signal Processing, Speech & Audio, Digital Video
Brief - Join MERL's virtual open house on December 12th, 2022! Featuring a keynote, live sessions, research area booths, and opportunities to interact with our research team. Discover who we are and what we do, and learn about internship and employment opportunities.
-
- Date: November 29, 2022 - December 9, 2022
Where: NeurIPS 2022
MERL Contacts: Moitreya Chatterjee; Anoop Cherian; Michael J. Jones; Suhas Lohit
Research Areas: Artificial Intelligence, Computer Vision, Machine Learning, Speech & Audio
Brief - MERL researchers are presenting 5 papers at the NeurIPS Conference, which will be held in New Orleans from Nov 29-Dec 1st, with virtual presentations in the following week. NeurIPS is one of the most prestigious and competitive international conferences in machine learning.
MERL papers in NeurIPS 2022:
1. “AVLEN: Audio-Visual-Language Embodied Navigation in 3D Environments” by Sudipta Paul, Amit Roy-Chowdhary, and Anoop Cherian
This work proposes a unified multimodal task for audio-visual embodied navigation where the navigating agent can also interact and seek help from a human/oracle in natural language when it is uncertain of its navigation actions. We propose a multimodal deep hierarchical reinforcement learning framework for solving this challenging task that allows the agent to learn when to seek help and how to use the language instructions. AVLEN agents can interact anywhere in the 3D navigation space and demonstrate state-of-the-art performances when the audio-goal is sporadic or when distractor sounds are present.
2. “Learning Partial Equivariances From Data” by David W. Romero and Suhas Lohit
Group equivariance serves as a good prior improving data efficiency and generalization for deep neural networks, especially in settings with data or memory constraints. However, if the symmetry groups are misspecified, equivariance can be overly restrictive and lead to bad performance. This paper shows how to build partial group convolutional neural networks that learn to adapt the equivariance levels at each layer that are suitable for the task at hand directly from data. This improves performance while retaining equivariance properties approximately.
3. “Learning Audio-Visual Dynamics Using Scene Graphs for Audio Source Separation” by Moitreya Chatterjee, Narendra Ahuja, and Anoop Cherian
There often exist strong correlations between the 3D motion dynamics of a sounding source and its sound being heard, especially when the source is moving towards or away from the microphone. In this paper, we propose an audio-visual scene-graph that learns and leverages such correlations for improved visually-guided audio separation from an audio mixture, while also allowing predicting the direction of motion of the sound source.
4. “What Makes a "Good" Data Augmentation in Knowledge Distillation - A Statistical Perspective” by Huan Wang, Suhas Lohit, Michael Jones, and Yun Fu
This paper presents theoretical and practical results for understanding what makes a particular data augmentation technique (DA) suitable for knowledge distillation (KD). We design a simple metric that works very well in practice to predict the effectiveness of DA for KD. Based on this metric, we also propose a new data augmentation technique that outperforms other methods for knowledge distillation in image recognition networks.
5. “FeLMi : Few shot Learning with hard Mixup” by Aniket Roy, Anshul Shah, Ketul Shah, Prithviraj Dhar, Anoop Cherian, and Rama Chellappa
Learning from only a few examples is a fundamental challenge in machine learning. Recent approaches show benefits by learning a feature extractor on the abundant and labeled base examples and transferring these to the fewer novel examples. However, the latter stage is often prone to overfitting due to the small size of few-shot datasets. In this paper, we propose a novel uncertainty-based criteria to synthetically produce “hard” and useful data by mixing up real data samples. Our approach leads to state-of-the-art results on various computer vision few-shot benchmarks.
-
- Date & Time: Tuesday, November 1, 2022; 1:00 PM
Speaker: Jiajun Wu, Stanford University
MERL Host: Anoop Cherian
Research Areas: Artificial Intelligence, Computer Vision, Machine Learning
Abstract
The visual world has its inherent structure: scenes are made of multiple identical objects; different objects may have the same color or material, with a regular layout; each object can be symmetric and have repetitive parts. How can we infer, represent, and use such structure from raw data, without hampering the expressiveness of neural networks? In this talk, I will demonstrate that such structure, or code, can be learned from natural supervision. Here, natural supervision can be from pixels, where neuro-symbolic methods automatically discover repetitive parts and objects for scene synthesis. It can also be from objects, where humans during fabrication introduce priors that can be leveraged by machines to infer regular intrinsics such as texture and material. When solving these problems, structured representations and neural nets play complementary roles: it is more data-efficient to learn with structured representations, and they generalize better to new scenarios with robustly captured high-level information; neural nets effectively extract complex, low-level features from cluttered and noisy visual data.
-
- Date: October 20, 2022
Where: University Park, PA
MERL Contact: Devesh K. Jha
Research Areas: Artificial Intelligence, Control, Robotics
Brief - Devesh Jha, a Principal Research Scientist in the Data Analytics Group at MERL, delivered an invited talk at The Penn State Seminar Series on Systems, Control and Robotics. This talk presented some of the recent work done at MERL in the areas of optimization and control for robotic manipulation in unstructured environment.
-
- Date: May 28, 2023 - June 1, 2023
Where: Rome, Italy
Research Areas: Artificial Intelligence, Communications, Computational Sensing, Machine Learning, Signal Processing
Brief - Kyeong Jin Kim, a Senior Principal Research Scientist in the Connectivity & Information Processing Team, organizes the second international workshop in 2023 IEEE International Conference on Communications (ICC). The workshop is titled, "Industrial Private 5G-and-beyond Wireless Networks," and aims to bring researchers for technical discussion on fundamental and practically relevant questions to many emerging challenges in industrial private wireless networks. This workshop is also being organized with the help of other researchers from industry and academia such as Huawei Technology, University of South Florida, Aalborg University, Jinan University, and South China University of Technology. IEEE ICC is one of two IEEE Communications Society's flagship conferences.
-
- Date: Thursday, October 6, 2022
Location: Kendall Square, Cambridge, MA
MERL Contacts: Anoop Cherian; Jonathan Le Roux
Research Areas: Artificial Intelligence, Computer Vision, Machine Learning, Speech & Audio
Brief - SANE 2022, a one-day event gathering researchers and students in speech and audio from the Northeast of the American continent, was held on Thursday October 6, 2022 in Kendall Square, Cambridge, MA.
It was the 9th edition in the SANE series of workshops, which started in 2012 and was held every year alternately in Boston and New York until 2019. Since the first edition, the audience has grown to a record 200 participants and 45 posters in 2019. After a 2-year hiatus due to the pandemic, SANE returned with an in-person gathering of 140 students and researchers.
SANE 2022 featured invited talks by seven leading researchers from the Northeast: Rupal Patel (Northeastern/VocaliD), Wei-Ning Hsu (Meta FAIR), Scott Wisdom (Google), Tara Sainath (Google), Shinji Watanabe (CMU), Anoop Cherian (MERL), and Chuang Gan (UMass Amherst/MIT-IBM Watson AI Lab). It also featured a lively poster session with 29 posters.
SANE 2022 was co-organized by Jonathan Le Roux (MERL), Arnab Ghoshal (Apple), John Hershey (Google), and Shinji Watanabe (CMU). SANE remained a free event thanks to generous sponsorship by Bose, Google, MERL, and Microsoft.
Slides and videos of the talks will be released on the SANE workshop website.
-
- Date: September 21, 2022
MERL Contacts: Philip V. Orlik; Anthony Vetro
Research Areas: Applied Physics, Artificial Intelligence, Communications, Computational Sensing, Computer Vision, Control, Data Analytics, Dynamical Systems, Electric Systems, Electronic and Photonic Devices, Machine Learning, Multi-Physical Modeling, Optimization, Robotics, Signal Processing, Speech & Audio
Brief - Mitsubishi Electric Research Laboratories (MERL) invites qualified postdoctoral candidates to apply for the position of Postdoctoral Research Fellow. This position provides early career scientists the opportunity to work at a unique, academically-oriented industrial research laboratory. Successful candidates will be expected to define and pursue their own original research agenda, explore connections to established laboratory initiatives, and publish high impact articles in leading venues. Please refer to our web page for further details.
-
- Date & Time: Tuesday, September 6, 2022; 12:00 PM EDT
Speaker: Chuang Gan, UMass Amherst & MIT-IBM Watson AI Lab
MERL Host: Jonathan Le Roux
Research Areas: Artificial Intelligence, Computer Vision, Machine Learning, Speech & Audio
Abstract
Human sensory perception of the physical world is rich and multimodal and can flexibly integrate input from all five sensory modalities -- vision, touch, smell, hearing, and taste. However, in AI, attention has primarily focused on visual perception. In this talk, I will introduce my efforts in connecting vision with sound, which will allow machine perception systems to see objects and infer physics from multi-sensory data. In the first part of my talk, I will introduce a. self-supervised approach that could learn to parse images and separate the sound sources by watching and listening to unlabeled videos without requiring additional manual supervision. In the second part of my talk, I will show we may further infer the underlying causal structure in 3D environments through visual and auditory observations. This enables agents to seek the sound source of repeating environmental sound (e.g., alarm) or identify what object has fallen, and where, from an intermittent impact sound.
-
- Date: August 22, 2022
MERL Contacts: Chiori Hori; Jonathan Le Roux; Anthony Vetro
Research Areas: Artificial Intelligence, Machine Learning, Speech & Audio
Brief - IEEE has announced that the recipient of the 2023 IEEE James L. Flanagan Speech and Audio Processing Award will be Prof. Alex Waibel (CMU/Karlsruhe Institute of Technology), “For pioneering contributions to spoken language translation and supporting technologies.” Mitsubishi Electric Research Laboratories (MERL), which has become the new sponsor of this prestigious award in 2022, extends our warmest congratulations to Prof. Waibel.
MERL Senior Principal Research Scientist Dr. Chiori Hori, who worked with Dr. Waibel at Carnegie Mellon University and collaborated with him as part of national projects on speech summarization and translation, comments on his invaluable contributions to the field: “He has contributed not only to the invention of groundbreaking technology in speech and spoken language processing but also to the promotion of an abundance of research projects through international research consortiums by linking American, European, and Asian research communities. Many of his former laboratory members and collaborators are now leading R&D in the AI field.”
The IEEE Board of Directors established the IEEE James L. Flanagan Speech and Audio Processing Award in 2002 for outstanding contributions to the advancement of speech and/or audio signal processing. This award has recognized the contributions of some of the most renowned pioneers and leaders in their respective fields. MERL is proud to support the recognition of outstanding contributions to the field of speech and audio processing through its sponsorship of this award.
-
- Date: July 14, 2022
Awarded to: Weidong Cao, Mouhacine Benosman, Xuan Zhang, and Rui Ma
Research Area: Artificial Intelligence
Brief - The Conference committee of the 59th Design Automation Conference has chosen MERL's paper entitled 'Domain Knowledge-Infused Deep Learning for Automated Analog/RF Circuit Parameter Optimization', as a DAC Best Paper Award nominee. The committee evaluated both manuscript and submitted presentation recording, and has chosen MERL's paper as one of six nominees for this prestigious award. Decisions were based on the submissions’ innovation, impact and exposition.
-
- Date: June 15, 2022
Awarded to: Yuxiang Sun, Mouhacine Benosman, Rui Ma.
Research Area: Artificial Intelligence
Brief - The committee of the International Conference on Artificial Intelligence Circuits and Systems (AICAS) 2022, has selected MERL's paper entitled 'GaN Distributed RF Power Amplifier Automation Design with Deep Reinforcement Learning' as a winner of the AICAS 2022 Openedges Award.
In this paper MERL researchers propose a novel design automation methodology based on deep reinforcement learning (RL), for wide-band non-uniform distributed RF power amplifiers, known for their high dimensional design challenges.
-
- Date: May 23, 2022 - May 27, 2022
Where: International Conference on Robotics and Automation (ICRA)
MERL Contacts: Ankush Chakrabarty; Stefano Di Cairano; Siddarth Jain; Devesh K. Jha; Pedro Miraldo; Daniel N. Nikovski; Arvind Raghunathan; Diego Romeres; Abraham P. Vinod; Yebin Wang
Research Areas: Artificial Intelligence, Machine Learning, Robotics
Brief - MERL researchers presented 5 papers at the IEEE International Conference on Robotics and Automation (ICRA) that was held in Philadelphia from May 23-27, 2022. The papers covered a broad range of topics from manipulation, tactile sensing, planning and multi-agent control. The invited talk was presented in the "Workshop on Collaborative Robots and Work of the Future" which covered some of the work done by MERL researchers on collaborative robotic assembly. The workshop was co-organized by MERL, Mitsubishi Electric Automation's North America Development Center (NADC), and MIT.
-
- Date: May 22, 2022 - May 27, 2022
Where: Singapore
MERL Contacts: Anoop Cherian; Chiori Hori; Toshiaki Koike-Akino; Jonathan Le Roux; Tim K. Marks; Philip V. Orlik; Kuan-Chuan Peng; Pu (Perry) Wang; Gordon Wichern
Research Areas: Artificial Intelligence, Computer Vision, Signal Processing, Speech & Audio
Brief - MERL researchers are presenting 8 papers at the IEEE International Conference on Acoustics, Speech & Signal Processing (ICASSP), which is being held in Singapore from May 22-27, 2022. A week of virtual presentations also took place earlier this month.
Topics to be presented include recent advances in speech recognition, audio processing, scene understanding, computational sensing, and classification.
ICASSP is the flagship conference of the IEEE Signal Processing Society, and the world's largest and most comprehensive technical conference focused on the research advances and latest technological development in signal and information processing. The event attracts more than 2000 participants each year.
-
- Date: May 16, 2022 - May 20, 2022
Where: Seoul, Korea
MERL Contacts: Jianlin Guo; Toshiaki Koike-Akino; Philip V. Orlik; Kieran Parsons; Pu (Perry) Wang; Ye Wang
Research Areas: Artificial Intelligence, Communications, Computational Sensing, Computer Vision, Machine Learning, Signal Processing
Brief - MERL Connectivity & Information Processing Team scientists remotely presented 5 papers at the IEEE International Conference on Communications (ICC) 2022, held in Seoul Korea on May 16-20, 2022. Topics presented include recent advancements in communications technologies, deep learning methods, and quantum machine learning (QML). Presentation videos are also found on our YouTube channel. In addition, K. J. Kim organized "Industrial Private 5G-and-beyond Wireless Networks Workshop" at the conference.
IEEE ICC is one of two IEEE Communications Society’s flagship conferences (ICC and Globecom). Each year, close to 2,000 attendees from over 70 countries attend IEEE ICC to take advantage of a program which consists of exciting keynote session, robust technical paper sessions, innovative tutorials and workshops, and engaging industry sessions. This 5-day event is known for bringing together audiences from both industry and academia to learn about the latest research and innovations in communications and networking technology, share ideas and best practices, and collaborate on future projects.
-
- Date: April 1, 2022
Where: INFORMS Journal on Computing (https://pubsonline.informs.org/journal/ijoc)
MERL Contact: Arvind Raghunathan
Research Areas: Artificial Intelligence, Machine Learning, Optimization
Brief - Arvind Raghunathan co-authored a publication titled "JANOS: An Integrated Predictive and Prescriptive Modeling Framework" which has been chosen as a Featured Article in the current issue of the INFORMS Journal on Computing. The article was co-authored with Prof. David Bergman, a collaborator of MERL and Teng Huang, a former MERL intern, among others.
The paper describes a new software tool, JANOS, that integrates predictive modeling and discrete optimization to assist decision making. Specifically, the proposed solver takes as input user-specified pretrained predictive models and formulates optimization models directly over those predictive models by embedding them within an optimization model through linear transformations.
-
- Date: May 4, 2022
MERL Contact: Toshiaki Koike-Akino
Research Areas: Artificial Intelligence, Communications, Electronic and Photonic Devices, Machine Learning, Optimization, Signal Processing
Brief - Toshiaki Koike-Akino gave an invited lecture on advanced photonic devices at the United States Patent and Trademark Office (USPTO) Technology Fair on May 4, 2022. Topics of the lecture included the recent progress of applied artificial intelligence (AI) technologies for optical systems, nano-photonic devices, and quantum technology. During the 2-hour interactive online presentation, he lectured to more than 200 patent examiner participants.
USPTO Tech Fair Organizer mentioned:
"Thank you very much for representing Advanced Photonic Devices at this year’s Technology Center 2800 Virtual Tech Fair held May 4th, 2022. Tech Fair is an important part of the United States Patent and Trademark Office’s Patent Examiner Technical Training Program (PETTP). Having a scientifically well-trained examiner workforce and ensuring the quality, consistency, and reliability of issued patents are top priorities at the USPTO. The PETTP is designed to achieve those priorities by giving examiners direct access to technical experts who are willing to share their knowledge about prior art and industry standards for both emerging and established technologies. Experts like yourself help to maintain our high quality of patent examination by keeping examiners updated on technologies and innovations pertinent to their field of examination.
We very much appreciate your efforts, time, and contributions."
-
- Date & Time: Wednesday, March 30, 2022; 11:00 AM EDT
Speaker: Vincent Sitzmann, MIT
Research Areas: Artificial Intelligence, Computer Vision, Machine Learning
Abstract
Given only a single picture, people are capable of inferring a mental representation that encodes rich information about the underlying 3D scene. We acquire this skill not through massive labeled datasets of 3D scenes, but through self-supervised observation and interaction. Building machines that can infer similarly rich neural scene representations is critical if they are to one day parallel people’s ability to understand, navigate, and interact with their surroundings. This poses a unique set of challenges that sets neural scene representations apart from conventional representations of 3D scenes: Rendering and processing operations need to be differentiable, and the type of information they encode is unknown a priori, requiring them to be extraordinarily flexible. At the same time, training them without ground-truth 3D supervision is an underdetermined problem, highlighting the need for structure and inductive biases without which models converge to spurious explanations.
I will demonstrate how we can equip neural networks with inductive biases that enables them to learn 3D geometry, appearance, and even semantic information, self-supervised only from posed images. I will show how this approach unlocks the learning of priors, enabling 3D reconstruction from only a single posed 2D image, and how we may extend these representations to other modalities such as sound. I will then discuss recent work on learning the neural rendering operator to make rendering and training fast, and how this speed-up enables us to learn object-centric neural scene representations, learning to decompose 3D scenes into objects, given only images. Finally, I will talk about a recent application of self-supervised scene representation learning in robotic manipulation, where it enables us to learn to manipulate classes of objects in unseen poses from only a handful of human demonstrations.
-
- Date: March 1, 2022
Where: Online/Zoom
MERL Contact: Devesh K. Jha
Research Areas: Artificial Intelligence, Machine Learning, Robotics
Brief - Devesh Jha, a Principal Research Scientist in MERL's Data Analytics group, gave an invited talk at the Mechanical and Aerospace Engineering Department, NYU. The title of the talk was "Robotic Manipulation in the Wild: Planning, Learning and Control through Contacts". The talk presented some of the recent work done at MERL for robotic manipulation in unstructured environments in the presence of significant uncertainty.
-
- Date: March 1, 2022
MERL Contacts: Anoop Cherian; Chiori Hori; Jonathan Le Roux; Tim K. Marks; Anthony Vetro
Research Areas: Artificial Intelligence, Computer Vision, Machine Learning, Speech & Audio
Brief - MERL's research on scene-aware interaction was recently featured in an IEEE Spectrum article. The article, titled "At Last, A Self-Driving Car That Can Explain Itself" and authored by MERL Senior Principal Research Scientist Chiori Hori and MERL Director Anthony Vetro, gives an overview of MERL's efforts towards developing a system that can analyze multimodal sensing information for highly natural and intuitive interaction with humans through context-dependent generation of natural language. The technology recognizes contextual objects and events based on multimodal sensing information, such as images and video captured with cameras, audio information recorded with microphones, and localization information measured with LiDAR.
Scene-Aware Interaction for car navigation, one target application that the article focuses on, will provide drivers with intuitive route guidance. Scene-Aware Interaction technology is expected to have wide applicability, including human-machine interfaces for in-vehicle infotainment, interaction with service robots in building and factory automation systems, systems that monitor the health and well-being of people, surveillance systems that interpret complex scenes for humans and encourage social distancing, support for touchless operation of equipment in public areas, and much more. MERL's Scene-Aware Interaction Technology had previously been featured in a Mitsubishi Electric Corporation Press Release.
IEEE Spectrum is the flagship magazine and website of the IEEE, the world’s largest professional organization devoted to engineering and the applied sciences. IEEE Spectrum has a circulation of over 400,000 engineers worldwide, making it one of the leading science and engineering magazines.
-