Distinguished Lecture

Prior, Context and Interactive Computer Vision

March 05, 2007

Established in November 1998, Microsoft Research Asia (MSR Asia) was named "the world's hottest computer lab" by MIT Technology Review (June 2004). Today, MSR Asia employs 300 computer scientists and software engineers, plus 350 interns from China and around the world. In a short period of eight years, MSR Asia has published 1500 high quality papers in top international conferences and journals. More than 200 technologies from the lab have been incorporated into Microsoft products. MSR Asia is now the most desirable work place for the best computer science and engineering students in China. In this talk, I will give an overview of our activities in basic research including user interface, digital media, digital entertainment, system and networking, web search and data mining, and theoretical computer science. I will introduce our best practices in successfully transferring technologies into Microsoft products despite MSR Asia being thousands of miles away from product teams in Redmond. I shall also discuss how we build trusted relationships with universities and governments in China and in Asia. Finally, I will share some of our secrets to success and lessons we have learned. In the second part of my talk, I will present "Prior, Context and Interactive Computer Vision." For many years, computer vision researchers have worked hard chasing illusive goals such as "Can the robot find a boy in the scene?" or "Can your vision system automatically segment the cat from the background?" These tasks require a lot of prior knowledge and contextual information. How to incorporate prior knowledge and contextual information into vision systems, however, is very challenging. In this talk, I propose that many difficult vision tasks can only be solved with interactive vision systems, by combining powerful and real-time vision techniques with intuitive and clever user interfaces. I will show two interactive vision systems we developed recently, Lazy Snapping (SIGGRAPH 2004) and Image Completion (SIGGRAPH 2005), where Lazy Snapping cuts out an object with a solid boundary using graph cut, while Image Completion recovers unknown regions with belief propagation. A key element in designing such interactive systems is how we model the user's intention using conditional probability (context) and likelihood associated with user interactions. Given how ill-posed most image understanding problems are, I am convinced that interactive computer vision is the paradigm we should focus today's vision research on.  

Presenter Bio

Harry Shum, Microsoft Research Asia

Harry Shum brings his extensive research skills, excellent management capabilities and outstanding academic background to Microsoft Research Asia, Microsoft Corporation's basic research arm in Asia. As the Managing Director of Microsoft Research Asia, Shum overseas research activities and collaborations with universities in the Asia Pacific region. Shum is also a Distinguished Engineer at Microsoft Corporation. A Fellow of the Association for Computing Machinery (ACM) and a Fellow of the Institute of Electrical and Electronics Engineers (IEEE), Shum is on its editorial board of Transactions on Pattern Analysis and Machine Intelligence (PAMI), and the International Journal of Computer Vision (IJCV). He served as a general co-chairman of the Tenth IEEE International Conference on Computer Vision (ICCV 2005 Beijing). Shum has published more than 100 papers on computer vision, computer graphics, pattern recognition, statistical learning and robotics, and has received more than 20 U.S. patents. Shum is a co-author of the book, Image-Based Rendering, published by Springer in 2006. Shum received a doctorate in robotics from the School of Computer Science at Carnegie Mellon University.

Close