Microsoft Research

Microsoft challenge: Build a collaborative AI in Minecraft

Tue, 14 Mar 2017 14:00:05 +0000

By John Roach, Writer, Microsoft Research

A Microsoft Research team has created a competition using Project Malmo, a platform that uses Minecraft as a testing ground for advanced artificial intelligence research. The team is challenging PhD students to develop an AI that learns to collaborate with other randomly assigned players to achieve a high score in a mini-game within the virtual world.

The field of collaborative AI research involves the development of technologies that work with, empower and augment human capabilities. The ability to collaborate is key to the development of a general AI that can mimic the nuanced and complex way humans learn and make decisions. Such an AI would represent an evolution from task-specific AI technologies that recognize speech, translate languages and caption images.

That would allow researchers to develop technology that can comprehend the intent of others, develop a shared problem-solving strategy and coordinate activity to efficiently accomplish a common task. While these problems remain unsolved, recent progress in AI research provides a foundation to begin tackling them, notes the Microsoft team hosting the Malmo Collaborative AI Challenge.

The challenge is open to PhD students worldwide. After registration, teams of one to three members are supplied a task that consists of one or more mini-games. The goal is to develop an AI solution that learns how to work with other, randomly assigned players to achieve a high score in the game.

Participants submit their solutions to GitHub, including a one-minute video that shows off the AI agent and summarizes what is interesting about their approach, all the participant’s code, and a Readme file that explains the selected approach, design decisions and instructions on how to run the code.

The solutions will be evaluated on originality, performance, code quality and GitHub stars, which are a measure of popularity. Three winning teams will receive Azure research grants of US $20,000. Members of three winning teams from the European Economic Area and Switzerland are also eligible to win spots at the Microsoft Research AI Summer School.

Related:

John Roach writes about Microsoft research and innovation. Follow him on Twitter.

The post Microsoft challenge: Build a collaborative AI in Minecraft appeared first on Microsoft Research.

Democratizing AI to improve citizen health

Mon, 13 Mar 2017 16:00:35 +0000

By Kenji Takeda, Director, Azure for Research

Doctors make life-saving — and life-changing — decisions every day. But how do they know that they are making the best decisions? Can artificial intelligence (AI) help?

“Before evidence-based medicine, decision-making in health care was heavily reliant on the expertise and knowledge of the health professional, usually a doctor. What has happened in the last 20, 30 years is that the health care literature has exploded way beyond what any individual doctor could hope to keep up with,” explained David Tovey, editor-in-chief at Cochrane. Cochrane is a not-for-profit organization that creates, publishes and maintains systematic reviews of health care interventions, with more than 37,000 contributors working in 130 countries.

The goal is to analyze the latest medical research to find the best treatments and interventions for patients and citizens. Systematic reviews bring together the best available research evidence from individual clinical trials and study data from around the world to inform the development of guidelines, individual practice decisions and national-scale health policymaking. These are rigorous and robust studies needed to provide the best advice, but they can take more than a year or two to complete. Tovey is concerned that systematic reviews, which are a painstaking process, are not used as widely as necessary. Said Tovey, “This is one of the main limitations for using systematic reviews to guide decision-making, particularly in policy areas. Policymakers increasingly want their decisions to be made in a period of weeks or months.”

Professor James Thomas at University College London (UCL) first started exploring text mining a decade ago because it promised to automate the laborious process of identifying and classifying medical research reports. In 2006, the state of the art in text mining was to develop complex rules to try to analyze these reports in an intelligent way.

But these rules-based approaches failed.

The Cochrane Transform Project is now applying AI and machine learning to analyze thousands of reports to automatically select ones to include in systematic reviews. This new approach is successfully saving weeks of monotonous work, freeing up the expert health care reviewers to spend their time and energy on high-level analysis. Thomas is using Cortana Intelligence to quickly develop and deploy AI solutions in the cloud. He explained, “What makes this particularly efficient is the fact that I can build a classifier using the studio and then just deploy it as a web service with the click of a button, without deploying a server.” The flexibility and open nature of Azure Machine Learning also means his team can be flexible in their approach. Said Thomas, “Initially I developed these models in R, which didn’t work as well as Python. So I was able to keep the top and the bottom of the process and just swap out the machine learning section in the middle.”

The team is continuously improving the performance of the machine learning system, and engaging with more than 5,000 citizen scientists in 117 countries through the Cochrane Crowd platform. “Cochrane Crowd has really enabled anybody who wants to get involved with producing health evidence to do so,” said Anna Noel-Storr of Cochrane’s Dementia Group. The Cochrane Crowd participants have screened more than 300,000 reports and 30,000 clinical trials to create a dataset to help train James’ AI. The machine learning system has improved the efficiency of the crowd by more than 60 percent. Thomas is excited by how this benefits everyone. “So what we end up with is a virtuous circle whereby the AI makes the crowd more efficient, and then the crowd, by supplying more data, makes the AI more accurate.”

These services are deployed through cloud APIs to help clinical assessment groups look at the latest medical research in different specialties, from cardiology and dementia, to public health issues such as obesity, healthy eating and physical exercise. The flexibility of the Microsoft Cloud means that the researchers can quickly create customized systems tailored for each clinical assessment group. This is then made usable by researchers worldwide through the EPPI-Reviewer cloud service, which is now running on Microsoft Azure. The AI system augments reviewers’ capabilities, allowing them to focus on understanding data instead of the monotonous work of sifting through thousands of research studies by hand.

The system is now being used by the National Institute for Health and Care Excellence in the U.K., which provides health care delivery guidance across the U.K. National Health Service (NHS) for more than 65 million people. It will directly speed up adoption of new health care interventions at national and international scale, for example, by facilitating the clearing of new drugs for widespread use across the NHS. The result is that decisions about which interventions are best will be faster and more reliable, thanks to the intelligent cloud.

Noel-Storr is excited by the prospect of democratizing AI for health, and how it can augment what people can achieve. “I think what excites me most about this work is that it is about exploring where those boundaries lie between what the machine can do and what the human can do…. so that we can then better direct human effort where it’s most needed.”

Learn more

The post Democratizing AI to improve citizen health appeared first on Microsoft Research.

A new understanding of the world through grassroots Data Science education at UC Berkeley

Thu, 09 Mar 2017 17:00:03 +0000

By Vani Mandava, Director, Data Science, Microsoft Research

Students at UC Berkeley Foundations of Data Science Program

While some may regard data science as an easy passport to a job for the tech savvy, Luis Macias has different ideas. The fourth-year undergraduate student, who is majoring in American Studies at University of California, Berkeley (UC Berkeley), wants to turn the hype of data science into hope for low-income communities like the one he grew up in.

Luis was among the first students to take UC Berkeley’s innovative Foundations of Data Science, an introductory data science course designed for freshman and sophomore students of all majors. The course is a key component of a multi-year university effort to forge a broader, more diverse, and inclusive scope for the emerging discipline of data science.

Recalling an assignment to gauge how water consumption data might relate to socioeconomic conditions, Luis explained how having the power to get and analyze data about his own ZIP Code ignited an understanding that data science can yield new insights capable of solving some of society’s most complex problems.

“The income level was around $25K, a number that was powerful for me in particular, because I think it explained a lot of the social issues and problems my community had,” he said.

Berkeley’s Data Science Education Program aims to make data science an integral feature of a liberal education and a core interdisciplinary capacity available to all Berkeley undergraduates. This is a bold experiment that will equip thousands of Berkeley students across campus with a fundamental education in data-driven thinking empowered by advanced statistical and computational techniques.

“The team of educators see their role as making it possible for students to bring to bear data science in all the ways they wish to use it in the world,” notes History professor Cathryn Carson. Carson is one of the faculty members leading the effort to build a diverse curriculum that includes advanced classes as well as connector courses that provide a bridge between familiar academic subjects and newly available data science techniques. “The energy and enthusiasm of students in the courses clearly demonstrate that the initiative will put data science to work in a breadth of domains that serve society, and UC Berkeley will play a particularly powerful role as a public university in this new data rich era.”

This year, the program is enabling more than a thousand students across 56 different undergraduate majors to learn critical computational and analytical skills demanded by the projected half million jobs in data science by 2018. That’s a lot of potentially unfilled jobs, an opportunity highlighted in numerous media accounts over the past couple of years. In 2015, Forbes wrote about the urgent need for qualified data science workers who bring different skills, expertise, and experiences to the discipline. Some of them will no doubt emerge from Berkeley’s unique program of connector courses that represent students from a diverse range of skills and disciplines.

To succeed, the program had to be accessible to students beyond the realm of computer science. One way the program does this is through a flexible and scalable technology infrastructure that enables students to quickly set up labs for hands-on practice—they don’t have to spend time installing programs or learning nuances of complicated applications.

“By hosting it in Azure, we can control the environment,” said Ryan Lovett, Systems Manager for the Department of Statistics at UC Berkeley. “Students just log in and they’re ready to go.”

David Culler, professor of Electrical Engineering and Computer Sciences at UC Berkeley, believes the program can extend computational thinking to benefit more disciplines. He anticipates the program will equip students with the ability to extract their own insights from the world’s information and build tools that benefit people in society. He likens the ability to understand an increasingly complex world to a new form of perception — combining mathematical thinking and the arts with computational tools for new forms of expression.

Such data science projects include classic and new problems like music genre classification, text analysis of famous literary works, identifying insights from bike sharing data in San Francisco, or analyzing jury selection in Alameda County.

Berkeley prides itself as a place where the world’s brightest minds explore, ask questions and improve the world. Thanks to the Data Science Education Program, thousands of Berkeley students are better critical thinkers.

Note: Microsoft partners closely with UC Berkeley in support of its Data Science Education Program. Since 2015, Microsoft Research, through its Azure for Research program provided $235K in Azure credits to enable the Foundations of Data Science course along with $260K in research credits and $5K in training credits. Microsoft also provided $75K in unrestricted gift funding towards UC Berkeley’s Data Program.

Learn more

The post A new understanding of the world through grassroots Data Science education at UC Berkeley appeared first on Microsoft Research.

New Microsoft Research Dissertation Grant provides support to under-represented groups in computing

Tue, 07 Mar 2017 17:00:34 +0000

By Dr. Meredith Ringel Morris, Principal Researcher, Microsoft Research

I am pleased to announce that Microsoft Research is funding a new academic program, the Microsoft Research Dissertation Grant. This grant program offers selected doctoral students doing computing research at U.S. and Canadian universities up to US $20,000 to fund their dissertation work. This program is open to students currently under-represented in the technology sector, including women, people with disabilities, and people who are African-American/Black, Hispanic/Latino, American Indian/Alaskan Native, or Native Hawaiian/Pacific Islander, reflecting Microsoft’s commitment to growing the number of diverse students obtaining computing degrees. A key goal of this grant is to broaden participation and diversify the high-tech workforce.

This grant program targets students in their fourth year or beyond of doctoral studies. Students at this later stage of their doctoral work have a sufficiently concrete research plan that they should be able to articulate specific funding needs. Microsoft Research also offers funding programs for earlier stages of students’ doctoral studies, such as the PhD Fellowship Program, which is open to second- and third-year doctoral students.

Interested students should submit a description of their dissertation research, as well as a budget that details the amount and purpose of requested funds. Since every research project is different, Microsoft is not prescribing how the funds should be spent, but expects to see requests for equipment or data set purchases, compensation for experimental participants, travel for collecting or presenting research results, or student stipends. Proposals will be reviewed by the scientists who work for Microsoft’s global network of research labs, and evaluated based on the technical merit and potential for impact of the dissertation research.

In addition to receiving their grant, awardees will receive additional travel support to attend a two-day mentoring workshop in the autumn at the Microsoft Research Redmond Lab. During this event, grant recipients will meet one-on-one with Microsoft researchers who are doing work in related areas, receive information and advice regarding internship and post-doctoral career opportunities with Microsoft, and hear from the lab’s senior leadership about current scientific efforts at the company. Recipients will also present a talk describing their dissertation research and will receive feedback on their work from a panel of Microsoft researchers. Attending the Dissertation Grant Workshop is an incredible learning and networking opportunity for the winners.

This year’s grant applications are due by April 7, 2017, with supporting reference letters due by April 24. Winners will be notified by June 30. Detailed information about the award, as well as the application form, can be found on the official website for the Microsoft Research Dissertation Grant.

I am excited about this opportunity to recognize and support technical innovation by students from under-represented groups; increasing the pipeline of diverse student talent is an important step toward growing a strong and diverse computing workforce. I look forward to receiving grant applications for this program’s inaugural year.

Learn more and apply

Microsoft Research Dissertation Grant

The post New Microsoft Research Dissertation Grant provides support to under-represented groups in computing appeared first on Microsoft Research.

From improving a golf swing to reducing energy in datacenters

Tue, 21 Feb 2017 17:00:14 +0000

2017 Swiss Joint Research Center kick off

By Scarlet Schwiderski-Grosche, Senior Research Program Manager

Attendees of the 2017 Swiss Joint Research Center Workshop in Cambridge, UK.

(left to right) Aurelien Lucchi and Sebastian Stich, Postdoctoral Researchers at ETH Zurich and EPFL, and Martin Jaggi, Assistant Professor at EPFL, at the workshop.

Recently, we celebrated an important milestone for our Swiss Joint Research Center (Swiss JRC). We welcomed top researchers from all partners to a workshop at the Microsoft Research Cambridge Lab, to kick off a new phase in our collaboration. This workshop represented the end of a busy 10-month period for the Swiss JRC during which we ramped down projects from the first phase, and conducted a Call for Proposals for the selection of projects for the second phase. At the workshop, researchers from the Swiss JRC presented their selected proposals to kick off the collaborations in the new funding cycle.

First, a little background. The Swiss JRC is a collaborative research engagement between Microsoft Research and the two universities that make up the Swiss Federal Institutes of Technology: ETH Zurich (Eidgenössische Technische Hochschule Zürich, which serves German-speaking students) and EPFL (École Polytechnique Fédérale de Lausanne, which serves French-speaking students). The Swiss JRC is a continuation of a collaborative engagement that began back in 2009, when the same three partners embarked on ICES (Innovation Cluster for Embedded Software) and was renewed for another five years in 2014. Basically, university researchers collaborate with Microsoft researchers to solve problems in computer science.

Pamela Delgado, PhD student on the EPFL project with Florin Dinu, “Towards Resources-Efficient Data Centers”

With this workshop, the Swiss JRC kicked off 10 projects, four between ETH Zurich and Microsoft and six between EPFL and Microsoft. These projects were chosen from 20 proposals assessed by the Swiss JRC steering committee for their intellectual merit, potential scientific and societal impact and evidence of strong collaborative interest between the project partners.

One compelling project uses drones that follow you around while you ski or play golf, then gives you feedback for improvement of your form—think of it as a personal trainer/GoPro/drone combo that can both figure out how to video you while you do an activity, as well as analyze your performance and make recommendations for improvements. Another drone-based project (or as we like to call them, micro-aerial vehicles, or MAVs), makes the MAV easier to control via a solution-based approach, versus movement-based controls. This project asks, “What is the goal of the MAV flight?” and solves for that, versus making the operator think about both “Where should this MAV go?” and “What should it do while it’s flying?”

Babak Falsafi, Professor of Computer and Communication Sciences, EPFL

Other projects address new requirements in the data center, aiming at making data processing more efficient and essentially helping to reduce energy usage. One set of projects assesses data-intensive applications as are common in, for example, machine learning, graph processing, and bioinformatics. These projects explore near-memory processing, better server utilization, improved data clustering, and new approaches to transactional processing. Another set of projects leverages new hardware architectures based on, for example, FPGAs (Field Programmable Gate Arrays) and DRAM (Dynamic Random-Access Memory). Some projects address mechanisms to off-load expensive computations to achieve massive parallelism or to co-locate different stages of deep learning on the same platform. All of these projects propel the leading edge of artificial intelligence.

“Emerging silicon technologies provide an opportunity to offload data management services to near-memory accelerators for better performance. Through several Microsoft Research collaborations, including this funding round’s NeMeSys project, we are rapidly propelling the state-of-the-art in near-memory processing.” – Babak Falsafi, Professor of Computer and Communication Sciences, EPFL

Here’s the list of projects and their principal investigators:

Data Science with FPGAs in the Data Center
Gustavo Alonso, ETH Zurich
Ken Eguro, Microsoft Research, Redmond lab

Human-Centric Flight II: End-user Design of High-level Robotic Behavior
Otmar Hilliges, ETH Zurich
Marc Pollefeys, Microsoft Analog Research & Development

Tractable by Design
Thomas Hofmann and Aurelien Lucchi, ETH Zurich
Sebastian Nowozin, Microsoft Research, Cambridge lab

Enabling Practical, Efficient and Large-Scale Computation Near Data to Improve the Performance and Efficiency of Data Center and Consumer Systems
Onur Mutlu and Luca Benini, ETH Zurich
Derek Chiou, Microsoft Relevance and Intent, Research &Development

Towards Resource-Efficient Data Centers
Florin Dinu, EPFL
Christos Gkantsidis and Sergey Legtchenko, Microsoft Research, Cambridge lab

Near-Memory System Services
Babak Falsafi, EPFL
Stavros Volos, Microsoft Research, Redmond lab

Coltrain: Co-located Deep Learning Training and Inference
Babak Falsafi and Martin Jaggi, EPFL
Eric Chung, Microsoft Research, Redmond lab

From Companion Drones to Personal Trainers
Pascal Fua and Mathieu Salzmann, EPFL
Debadeepta Dey, Ashish Kapoor, and Sudipta Sinha, Microsoft Research, Redmond lab

Revisiting Transactional Computing on Modern Hardware
Rachid Guerraoui and Georgios Chatzopoulos, EPFL
Aleksandar Dragojevic, Microsoft Research, Cambridge lab

Fast and Accurate Algorithms for Clustering
Michael Kapralov and Ola Svensson, EPFL
Yuval Peres, Nikhil Devanur and Sebastien Bubeck, Microsoft Research, Redmond lab

Partnership yields key breakthroughs in VR’s “grand challenge”

Mon, 06 Feb 2017 17:00:56 +0000

By Noboru Sean Kuno, Research Program Manager, Microsoft Research Asia

The potential for virtual reality (VR) to upend industrial design, medicine, and other specialized fields has now vaulted the emerging field into the ranks of what the National Academy of Engineering calls its 14 grand challenges of the 21^st century, an eclectic list of endeavors from preventing nuclear terror to securing cyberspace.

The importance of improving VR and 3D immersive communication has been a cornerstone of Microsoft’s long term investment in this technology space, resulting in multiple innovations from

Dr. Gene Cheung, associate professor, National Institute of Informatics, Japan

Microsoft’s Kinect for Xbox 360 Sensor, Surface Hub and HoloLens to Windows Creator update.

Collaborating with partners

Realizing more immersive communication via 3D applications requires a quantum leap in the capture and exchange of 3D geometry that can only be achieved with an ongoing commitment to signal processing research. At the heart of this effort is our collaborative research (CORE) project with academic partners including Dr. Gene Cheung, associate professor at Japan’s National Institute of Informatics, who has been tackling this problem for years.

Breakthrough

Using depth-sensing devices such as the Kinect Sensor, researchers developed an algorithm to enable better noise reduction and restore missing details across images. Crucially, they discovered a method to utilize graph-signal smoothness prior to enhancing both natural images (see Fig. 1) and depth images.

Figure 1: example of original 4-bit image (left) and bit-depth enhanced image to 8 bits using our approach (right)

Collaboration with Microsoft Research

Dinei Florencio, senior researcher at Microsoft Research, has been working alongside professor Cheung on research into “rate-constrained 3D surface estimation” and “precision enhancement of multiple 3D depth maps.”

“These two research lines are the most active in our recent collaboration,” Florenicio said. “As we make the needed progress toward immersive communication, I believe Gene’s research is bringing some fundamental contributions.”

Other key members of the project include Cha Zhang of Microsoft Research as well as Pengfei Wan, a former graduate student at Hong Kong University of Science and Technology.

Moving forward

Florencio and Chueng are now leading research into whether active light sensing can accurately detect informative bio-signals — such as pulse/respiratory rate and temperature changes on a face — to reveal stress and mood or indicate if subjects are lying. A key question of the research is whether active light sensing can be extended to reveal the same details for shaded or remote human subjects.

“The project is very interesting in that it tries to estimate bio-signals for more efficient face-to-face communications,” said Tao Mei, senior researcher at Microsoft Research. “The Principal Investigator (PI) proposed to use active imaging, which is entirely non-contact and noninvasive, to solve this problem with a novel idea by analyzing the constructed thermal and depth images in an indoor active image sensing system.”

Upon completion, Professor Cheung will make the research tool publicly available. I am looking forward to seeing continuous progress and achievements from this collaboration. We hope more researchers explore this area to expand the frontier of Virtual Reality technologies and realize Princess Leia’s holographic messaging in future.

The post Partnership yields key breakthroughs in VR’s “grand challenge” appeared first on Microsoft Research.

AI is getting smarter; Microsoft researchers want to ensure it’s also getting more accurate

Fri, 03 Feb 2017 17:00:58 +0000

By Allison Linn, Senior Writer, Microsoft

Marine Carpuat, an assistant professor of computer science, works with colleague Philip Resnik, professor of linguistics, in the Computational Linguistics and Information Processing Laboratory at the University of Maryland.

Just a decade ago, the idea of using technology to do things like automatically translate conversations, identify objects in pictures — or even write a sentence describing those pictures — seemed like interesting research projects, but not practical for real-world use.

The recent improvements in artificial intelligence have changed that. These days more and more people are starting to rely on systems built with technologies such as machine learning. That’s raising new questions among artificial intelligence researchers about how to ensure that the basis for many of these systems — the algorithms, the training data and even the systems for testing the tools — are accurate and as unbiased as possible.

Ece Kamar, researcher

Ece Kamar, a researcher in Microsoft’s adaptive systems and interaction group, said the push comes as researchers and developers realize that, despite the fact that the systems are imperfect, many people are already trusting them for important tasks.

“This is why it is so important for us to know where our systems are making mistakes,” Kamar said.

At the AAAI Conference on Artificial Intelligence, which begins this weekend in San Francisco, Kamar and other Microsoft researchers will present two research papers that aim to use a combination of algorithms and human expertise to weed out data and system imperfections. Separately, another team of Microsoft researchers is releasing a corpus that can help speech translation researchers test the accuracy and effectiveness of their bilingual conversational systems.

The data underpinning artificial intelligence

When a developer creates a tool using machine learning, she generally relies on what’s called training data to teach the system to do a particular task. For example, to teach a system to recognize various types of animals, developers would likely show the system many pictures of animals so it could be trained to tell the difference between, say, a cat and a dog.

Theoretically, the system could then be shown pictures of dogs and cats it’s never seen before and still categorize them accurately.

But, Kamar said, training data systems can sometimes have some so-called blind spots that will lead to false results. For example, let’s say the system is only trained with pictures of cats that are white and dogs that are black. Show it a picture of a white dog, and it may make a false correlation and mislabel the dog as a cat.

These problems arise in part because many researchers and developers are using training sets that weren’t specifically designed for learning the task at hand. That makes sense – a set of data that already exists, such as an archive of animal pictures, is cheaper and faster than building the sets on your own – but it makes it all the more important to add these kinds of safety checks.

“Without these, we are not going to understand what kind of biases there are,” Kamar said.

In one of the research papers, Kamar and her colleagues show an algorithm that they think could be used to identify those blind spots in predictive models, allowing developers and researchers to fix the problem. It’s a research project for now, but they hope that it would eventually grow into something that developers and researchers could use to identify blind spots.

“Any kind of company or academic that’s doing machine learning needs these tools,” Kamar said.

Another research paper Kamar and her colleagues are presenting at the AAAI conference aims to help researchers figure out how different types of mistakes in a complex artificial intelligence system lead to incorrect results. That can be surprisingly difficult to parse out as artificial intelligence systems are doing more and more complex tasks, relying on multiple components that can become entangled.

For example, let’s say an automated photo captioning tool is describing a picture of a teddy bear as a blender. You might think the problem is with the component trained to recognize the pictures, only to find that it really lies in the element designed to write descriptions.

Kamar and her colleagues designed a methodology that provides guidance to researchers about how they can best troubleshoot these problems by simulating various fixes to root out where the trouble lies.

A ‘human in the loop’

For this and other research she has been conducting, Kamar said she was strongly influenced by the work she did on AI 100, a Stanford University-based study on how artificial intelligence will affect people over the next century.

Kamar said one takeaway from that work was the importance of making sure that people are deeply involved in developing, verifying and troubleshooting systems – what researchers call a “human in the loop.” That will ensure that the artificial intelligences we are creating augment human capabilities and reflect how we want them to perform.

Testing the accuracy of conversational translation

When developers and academic researchers create systems for recognizing the words in a conversation, they have well-regarded ways of testing the accuracy of their work: Sets of conversational data such as Switchboard and CALLHOME.

Christian Federmann, senior program manager

Christian Federmann, a senior program manager working with the Microsoft Translator team, said there aren’t as many standardized data sets for testing bilingual conversational speech translation systems such as the Microsoft Translator live feature and Skype Translator.

So he and his colleagues decided to make one.

The Microsoft Speech Language Translation corpus, which is being released publicly Friday for anyone to use, allows researchers to measure the quality and effectiveness of their conversational translation systems against a data set that includes multiple conversations between bilingual speakers who are speaking French, German and English.

The corpus, which was produced by Microsoft using bilingual speakers, aims to create a standard by which people can measure how well their conversational speech translation systems work.

“You need high-quality data in order to have high-quality testing,” Federmann said.

A data set that hits on the combination of both conversational speech and bilingual translation has been lacking until now.

Marine Carpuat, an assistant professor of computer science at the University of Maryland, who does research in natural language processing, said that when she wants to test how well her algorithms for conversational translation are working, she often has to rely on data that is freely available, such as official translations of European Union documents.

Those kinds of translations weren’t created to test conversational translation systems and they don’t necessarily reflect the more casual, spontaneous way in which people actually talk to each other, she said. That makes it difficult to know if the techniques she has will work when people want to translate a regular conversation, with all the attendant pauses, “ums” and other quirks of spoken language.

Carpuat, who was given early access to the corpus, said it was immediately helpful to her.

“It was a way of taking a system that I know does great on formal data and seeing what happens if we try to handle conversations,” she said.

Will Lewis, principal technical program manager

The Microsoft team hopes the corpus, which will be freely available, will benefit the entire field of conversational translation and help to create more standardized benchmarks that researchers can use to measure their work against others.

“This helps propel the field forward,” said Will Lewis, a principal technical program manager with the Microsoft Translator team who also worked on the project.

Related:

Find out more about the Microsoft Speech Language Translation corpus
See the full list of papers at AAAI
Try the Microsoft Translator live feature
Get Skype Translator
Follow Christian Federmann on Twitter

Allison Linn is a senior writer at Microsoft. Follow her on Twitter

The post AI is getting smarter; Microsoft researchers want to ensure it’s also getting more accurate appeared first on Microsoft Research.

Microsoft Research PhD fellowships provide financial support to promising researchers

Thu, 02 Feb 2017 17:00:32 +0000

By Jim Pinkelman, Senior Director, Microsoft Research

Since 2008, Microsoft Research has been awarding two-year PhD fellowships to computer science and related researchers at leading universities in the United States and Canada. These awards are designed to help promising young researchers focus on their studies, not their finances!

This year’s program provides fellowships to 10, second- or third-year PhD students who are studying computer science, electrical engineering or mathematics. Recipients receive full tuition for their program, along with a generous living expense and a conference attendance stipend. In addition, the fellows are offered the opportunity to intern with a Microsoft researcher.

To identify these promising researchers, Microsoft Research goes through a rigorous process. First, department chairs nominate their best candidates, up to 9 from each university. Each application is vetted, and Microsoft researchers interview finalists to identify the awardees.

This year’s awardees for the 2017-19 academic years are:

Michael B. Cohen
Massachusetts Institute of Technology
Mathematics, complexity and cryptography

Bita Darvish Rouhani
University of California, San Diego
Computer architecture and hardware

Michaelanne Dye
Georgia Institute of Technology
Human-computer interaction, social computing and collaboration

Kira Goldner
University of Washington
Algorithms and economic systems

Aditya Grover
Stanford University
Machine learning and intelligence

Silu Huang
University of Illinois at Urbana-Champaign
Data management and mining

Ethan J. Jackson
University of California, Berkeley
Mobility and networking

Saswat Padhi
University of California, Los Angeles
Software engineering and programming languages

Andrew Quinn
University of Michigan
Operating systems and distributed computing

Mengting Wan
University of California, San Diego
Data management and mining

Congratulations, 2017’s awardees! Interested in applying for next year? Applications from department chairs (sorry, no self-referrals) are due in October; ask your department chair if they are planning to participate.

Learn more

PhD Fellowship Program

The post Microsoft Research PhD fellowships provide financial support to promising researchers appeared first on Microsoft Research.

2017 Microsoft Research PhD scholarships support break-through projects in six countries

Tue, 31 Jan 2017 17:00:51 +0000

By Jim Pinkelman, Senior Director, Microsoft Research

Since 2004, the Microsoft Research PhD Scholarship Programme in Europe, the Middle East, and Africa (EMEA) has supported groundbreaking PhD projects. This year we have 17 projects that span six countries, and include research areas such as computational biology, machine learning, and health science.

The winning PhD projects for the 2017-2018 school academic year were selected from 33 PhD supervisor-led proposals. These PhD supervisors will collaborate with an assigned Microsoft Research co-supervisor to support a PhD student for up to three years as he or she carries out the proposed research. Supervisors are actively recruiting graduate students for these PhD projects with final candidates identified by March 2018.

PhD scholarship recipients conduct collaborative research with Microsoft researchers, and many receive internships at our labs. Since the program’s founding, the Microsoft Research PhD Scholarship Programme has supported 200 students from 51 institutions in 18 countries.

The selected PhD projects and their PhD supervisors for 2017 were:

Decoding the Network Logic Governing Resetting of Pluripotency
Austin Smith, University of Cambridge, UK

Deep Reinforcement Learning for Collaborative Game AI to Enhance Player Experience
Sam Devlin, University of York, UK

Designing Specialised Processors for DB Workloads
Anastasia Ailamaki, EPFL, Switzerland

Efficient DNA Storage Using Composite Letters
Zohar Yakhini, Technion, Israel Institute of Technology

Human-Centred Machine Learning for Adaptive Agents with Vision
Rebecca Fiebrink, Goldsmiths University of London, UK

Learning Computing with Torino: a Physical Programming Language Inclusive of Children with Visual Disabilities
Sue Sentence, King’s College London, UK

Logical Approach to Code Generation and Optimization
Greta Yorsh, Queen Mary University of London, UK

Modelling Infective Exacerbations in Cystic Fibrosis
Andres Floto, University of Cambridge, UK

OutSider: Assessing and Mitigating Side-Channel Leaks on Commodity Platforms
Herbert Bos Vrije Universiteit Amsterdam, Holland

Power Efficient Rack-Scale Fabrics
Noa Zilberman, University of Cambridge, UK

Programmable Single-Cell Biocomputers with Scalable Signal Processing Capacity
Baojun Wang, University of Edinburgh, UK

Providing and Verifying Security on Compromised Platforms
François Dupressoir, University of Surrey, UK

Reinforcement Learning for Adaptive User Interaction
Shimon Whiteson, University of Oxford, UK

Shareable Dynamic Media in Hybrid Meetings
Clemens Klokmose, Aarhus University, Denmark

SMVRF: Secure Messaging Verifiably Realized in F*
Chris Brzuska, Technische Universität Hamburg-Harburg (TUHH), Germany

STARCH: SmarT ARchitectures for Data Center Switching
Wayne Luk, Imperial College London, UK

Towards Ethical Development of Symbiotic Human-Machine Systems; Creating Ethical Frameworks and Solutions
Ewa Luger, University of Edinburgh, UK

Training and Tuning Deep Neural Networks: Faster, Stronger, Better
Volkan Cevher, EPFL, Switzerland

Joint Initiative with Informatics with University of Edinburgh:

Improving the Usability of TLS APIs
Kami Vaniea, University of Edinburgh, UK

Project selection process

These projects were assessed via a two-stage review process. During stage one, a panel of Microsoft researchers determined whether the proposed project met the basic selection criteria, including relevance to topics that are being researched at Microsoft Research Cambridge. Those proposals that advanced to stage two were then evaluated by a board of 80 researchers from Microsoft Research Laboratories.

For those interested in applying for scholarships for next year, online applications open September 1, 2017.

The post 2017 Microsoft Research PhD scholarships support break-through projects in six countries appeared first on Microsoft Research.

Microsoft Research and the industrial research cycle

Mon, 30 Jan 2017 18:24:21 +0000

A personal view

By Thomas Ball, Research Manager, Research in Software Engineering (RiSE) group, Microsoft Research

The industrial research cycle

Here is what I have told new hires of Microsoft Research (MSR) since I became a manager some 14 years ago:

MSR gives you the freedom to explore and expand the bounds of scientific knowledge, as in academia, but with the added challenge to align your scientific pursuits with company problems and to drive for impact on Microsoft, especially as you grow in seniority at the company.

This statement is still as true today as it was when I joined MSR 17 years ago and reflects MSR’s associated goals of advancing scientific frontiers and positively impacting the company.

I use the model of “The Industrial Research Cycle” to explain how MSR works. Researchers have the freedom to select problems and to explore in their discipline (the left side of the cycle) to advance science. They also have the responsibility and opportunity, once sufficient exploration has taken place, to focus their attention on an area that they believe can produce impact for the company (the right side). Ideally, the problems/solutions that one explores on the left side of the cycle eventually drive impact on the right side. And the experience one gains from the right side not only validates the science at scale, it also pushes exploration in new directions in the next phase. A researcher will go around the cycle many times during their career.

Impact over time

It is difficult to simultaneously explore and focus, and to do both well! Instead, one needs to engage in phases of exploration and focus over years.

I use the “Impact” diagram to explain the different forms/shapes of impact. The x-axis measures the level of scientific impact. The y-axis measures the level of Microsoft impact (see box). One’s impact is measured by the area under the curve. The shape of one’s impact curve changes over time, both as one goes around the industrial research cycle and as one grows in seniority at the company.

During an exploration phase, the shape of one’s impact curve generally is horizontal because the primary audience is the scientific community. During a focus phase, the shape of one’s curve is generally vertical, building on the foundation.

As one grows in seniority in the company, the expectations for focusing on Microsoft impact increase. On the other hand, junior researchers enjoy more freedom to explore. Fresh Ph.D. hires at MSR still have much work to do to establish themselves as recognized experts in their fields. While some may indeed engage with product teams early in their career, we do not expect junior researchers to jump right in to address problems of the company.

While we encourage our researchers to actively publish, MSR does not emphasize quantity of publications. Quality is our top priority.

Pipelines and partners

MSR invests in scientific efforts that may not have immediate impact on Microsoft but that will build a new muscle/capability for the company in the long run. I use the “The long term play” diagram to show that a coordinated and long-term effort often is needed to turn scientific results into company impact.

Below are three examples showing the path to impact, which requires working closely with partners over the long term, building relationships and trust, and changing company culture through new ways of approaching a problem.

Automated defect detection and driver quality

In late 1999, Sriram Rajamani and I started the SLAM project at MSR to investigate new approaches for automatically finding code defects in device drivers. When the Windows Driver Quality was formed in 2002, Byron Cook, Jakob Lichtenberg and Vladimir Levin came into the team to deliver a tool called Static Driver Verifier (SDV), based on the SLAM engine. The first version of SDV was delivered with Windows in 2004. During the last decade, SDV’s underlying analysis engine has been improved/replaced by MSR three times (see papers on SLAM2, YOGI and Corral) by different sets of researchers working closely with the Driver Quality team, including Ella Bounimova, Aditya Nori, Rahul Kumar, Shaz Qadeer, Akash Lal and Shuvendu Lahiri.

From empirical software engineering to tools for software engineers

In 2004, I hired Nachi Nagappan into MSR to spearhead Empirical Software Engineering research at Redmond. For five years, Nachi and colleagues Brendan Murphy, Jacek Czerwonka, Christian Bird and Thomas Zimmermann studied key issues affecting software quality and developer productivity, through analysis of product version histories, bug databases and other data sources.

To scale such analyses across the company, Wolfram Schulte joined with Nachi, Brendan and Jacek to create CODEMINE, a data analytics platform for collecting and analyzing Microsoft software engineering process data. This project started around 2009 (codenamed SWEPT) and culminated around 2013, giving insight into software engineering problems across Microsoft product groups. CODEMINE was essential to making a case for the formation of a new team called Tools for Software Engineers, which is moving the company to a cloud-based software engineering infrastructure.

Computer science education

More recently, the Touch Develop project (www.touchdevelop.com) started in MSR in 2011 to make it possible to program scripts for smartphones on smartphones. An unexpected use of Touch Develop was in K-12 computer science education— teachers found that children were engaged by scripting their smartphones to react to environmental stimuli.

This turned into a project with the BBC to create a small physical computing device with an easy-to-use coding platform (built on Touch Develop). One million of these devices, called micro:bits, were delivered in 2016, enough for every fifth grade student in the UK to receive one. Because of the BBC micro:bit, Microsoft is now investing in a new programming platform for CS education.

Organizing for big impact on big problems

Today, we find a handful of companies developing planetary-scale distributed systems. Amazon, Facebook, Google and Microsoft all have built such systems, and are engaged in optimizing them for performance, reliability, availability, security and privacy. Microsoft Azure is one such system, which provides compute, storage and networking services, and interacts with an ever-growing number of mobile devices and IoT endpoints.

Optimizing every level of the stack, from the hardware assets, to the low-level operating system code, to the user-facing services, is key to its success, and affords opportunities for researchers across a wide range of disciplines, including those in systems, formal methods, software engineering and programming languages.

Here are four new, larger-scale projects related to the cloud that the RiSE group is deeply involved in:

The P programming language is transforming the way Microsoft programmers undertake the task of building large asynchronous systems. P has been used to develop USB 3.0 drivers in Windows, as well as services in Microsoft Azure.

Project Everest is constructing a high-performance, standards-compliant, veriﬁed implementation of the full HTTPS ecosystem, from the HTTPS API down to and including cryptographic algorithms such as RSA and AES.

Project Parade is parallelizing a large class of seemingly sequential applications by treating runtime dependencies as symbolic values. The results of this project are leading to substantial performance gains in popular algorithms for machine learning and big data.

Project Premonition aims to detect pathogens before they cause outbreaks, by creating new technologies to autonomously locate, collect and computationally analyze the blood-borne pathogens carried by mosquitoes.

Want to be part of the industrial research cycle?

No matter if you’re exploring or focusing, the ride at Microsoft Research is an exciting one. If you are interested in joining us on this journey, please visit our careers page.

The post Microsoft Research and the industrial research cycle appeared first on Microsoft Research.