How bio-inspired deep learning keeps winning competitions
November 28, 2012 by Amara D. Angelica, Jürgen Schmidhuber
Dr. Jürgen Schmidhuber is Director of the Swiss Artificial Intelligence Lab, IDSIA. His research team’s artificial neural networks (NNs) have won many international awards, and recently were the first to achieve human-competitive performance on various benchmark data sets. I asked him about their secrets of success.
AA: In several contests and machine-learning benchmarks, your team’s NNs are now outperforming all other known methods. As The New York Times noted Friday, last year, a program your team created won a pattern recognition contest by outperforming both competing software systems and a human expert in identifying images in a database of German traffic signs.

The German Traffic Sign Benchmark (credit: Institut für Neuroinformatik, Bochum)
“The winning program accurately identified 99.46 percent of the images in a set of 50,000; the top score in a group of 32 human participants was 99.22 percent, and the average for the humans was 98.84 percent,” the Times pointed out. Impressive. What is the importance of traffic sign recognition in this field?

Don’t try this on the 101: In 1995, Ernst Dickmanns’ famous S-class car autonomously drove 1678 km on public Autobahns from Munich to Denmark and back, up to 158 km without human intervention, at up to 180 km/h (112.5 mph), automatically passing other cars. (Credit: Ernst Dickmanns)
JS: That was from the IJCNN 2011 Traffic Sign Recognition Competition. This is highly relevant for self-driving cars as well as modern systems for driver’s assistance.
BTW, if you don’t obey a traffic sign in Switzerland, you go to jail. However, across the border there is Italy. There you’ll also find traffic signs in the street, but only for decoration. :)
AA: What’s your team’s secret?
JS: Remarkably, we do not need the traditional sophisticated computer vision techniques developed over the past six decades or so. Instead, our deep, biologically rather plausible artificial neural networks (NNs) are inspired by human brains, and they learn to recognize objects from numerous training examples.
I discuss this in detail in a talk at AGI-2011, “Fast Deep/Recurrent Nets for AGI Vision” (only voice and slides though):
We often use supervised, artificial, feedforward, or recurrent (deep by nature) NNs with many nonlinear processing stages. When we started this type of research over 20 years ago, it quickly became clear that such deep NNs are hard to train. This is due to the so-called “vanishing gradient problem” identified in the 1991 thesis of my former student Sepp Hochreiter, who is now a professor in Linz. But over time we found several ways around this problem. Committees of NNs improve the results even further.

LSTM-controlled multi-arm robot learns how to tie a knot. The recurrent neural network memory is necessary to deal with ambiguous sensory inputs from repetitively visited states. (Credit: H. Mayer, F. Gomez, D. Wierstra, I. Nagy, A. Knoll, J. Schmidhuber/TU Munich & IDSIA)
In addition, we use GPUs (graphics cards), which are essential to accelerate learning by a factor of 50. This is sufficient to clearly outperform numerous previous more complex machine learning methods.
One of the reviewers called this a “wake-up call to the machine learning community.”
For sequential data, such as videos or connected handwriting, feedforward NNs do not suffice. Here, we use our bidirectional or multi-dimensional Long Sort-Term Memory recurrent NNs, which learn to maximize the probabilities of label sequences, given raw training sequences.
AA: You said that the field is currently experiencing a Neural Network “ReNNaissance.” What are the key awards your team has won?
JS: In the past three years, in addition to the IJCNN 2011 Traffic Sign Recognition Competition mentioned above, they won seven other highly competitive international visual pattern recognition contests:
- ICPR 2012 Contest on “Mitosis Detection in Breast Cancer Histological Images.” This is important for breast cancer prognosis. Humans tend to find it very difficult to distinguish mitosis from other tissue. 129 companies, research institutes, and universities in 40 countries registered; 14 sent their results. Our NN won by a comfortable margin.

Telophase stage of mitosis in a eukaryotic cell (credit: Catherine Genestie Hopital de la Pitie-Salpetriere, Paris)
- ISBI 2012 challenge on segmentation of neuronal structures. Given electron microscopy images of stacks of thin slices of animal brains, the goal is to build a detailed 3D model of the brain’s neurons and dendrites. But human experts need many hours to annotate the images: Which parts depict neuronal membranes? Which parts are irrelevant background? Our NNs learn to solve this task through experience with millions of training images. In March 2012, they won the contest on all three evaluation metrics by a large margin, with superhuman performance in terms of pixel error. (Ranks 2–6: for researchers at ETHZ, MIT, CMU, Harvard.) A NIPS 2012 paper on this is coming up.

Example of ssTEM neuronal image and its corresponding segmentation (credit: IEEE International Symposium on Biomedical Imaging and Albert Cardona et al./PLoS Biology)
- ICDAR 2011Offline Chinese Handwriting Competition. Our team won the competition although none of its members speaks a word of Chinese. In the not-so-distant future you should be able to point your cell phone camera to text in a foreign language, and get a translation. That’s why we also developed low-power implementations of our NNs for cell phone chips.

Our team also just won the ICDAR Offline Chinese Handwriting Competition (1st & 2nd place), without speaking a word of Chinese (credit: D. C. Ciresan, U. Meier, L. M. Gambardella, J. Schmidhuber)
- Online German Traffic Sign Recognition Contest (2011, first and second rank). Until the last day of the competition, we thought we had a comfortable lead, but then our toughest competitor from NYU surged ahead, and our team (with Dan Ciresan, Ueli Meier, Jonathan Masci) had to work late-night to re-establish the correct order. :)
- ICDAR 2009 Arabic Connected Handwriting Competition (although none of us speaks a word of Arabic).
- ICDAR 2009 Handwritten Farsi/Arabic Character Recognition Competition (idem).
- ICDAR 2009 French Connected Handwriting Competition. Our French also isn’t that good. :)
AA: Is that a record for international contests?
JS: Yes. Our NNs also set records in important machine-learning benchmarks:
- The NORB Object Recognition Benchmark.
- The CIFAR Image Classification Benchmark.
- The MNIST Handwritten Digits Benchmark, perhaps the most famous benchmark. Our team achieved the first human-competitive result.
AA: Were the algorithms of your group the first deep learning methods to win such international contests?
JS: I think so.
AA: Did any other group win that many?
JS: No. All of this would not have been possible without the hard work of team members including Alex Graves, Dan Ciresan, Ueli Meier, Jonathan Masci, Alessandro Giusti, and others. And of course, our work builds on earlier work by great pioneers including Fukushima, Amari, Werbos, von der Malsburg, LeCun, Poggio, Hinton, Williams, Rumelhart, and many others.
AA: What are some of the practical applications of these techniques?
JS: These NNs are of great practical relevance, because computer vision and pattern recognition are becoming essential for thousands of commercial applications. For example, the future of search engines lies in image and video recognition, as opposed to traditional text search. The most important applications may be in medical imaging, e.g., for automated melanoma detection, cancer prognosis, plaque detection in CT heart scans (to prevent strokes), and hundreds of other health-related areas.
Autonomous robots depend on vision, too — see the AGI 2011 keynote (at Google HQ) by the robot car pioneer Ernst Dickmanns:
Our successes have also attracted the interest of major industrial companies. So we started several industry collaborations. Among other things, we developed:
- State-of-the-art handwriting recognition for a software company.
- State-of-the-art steel defect detection for the world’s largest steel maker.
- State-of-the-art low-power, low-cost pattern recognition for a leading automotive supplier.
- Efficient variants of our neural net pattern recognizers for apps running on cell phone chips.
More information on this can be found here and here.
Generally speaking, there is no end in sight for applications of these new-millennium neural networks.
AA: How do your team’s techniques differ from Google, Microsoft, and Nuance approaches to automated speech and image recognition?
JS: Of course I cannot officially speak for any particular firm. Let me just say that in recent years leading IT companies (whose names are known by everybody) have shown a lot of interest in our work. Many companies (and academic researchers) are now using the deep learning neural networks we developed and published over the years.
AA: What are your team’s future research plans?
JS: While the methods above work fine in many applications, they are passive learners — they do not learn to actively search for the most informative image parts. Humans, in contrast, use sequential gaze shifts for pattern recognition. This can be much more efficient than the fully parallel one-shot approach.
That’s why we want to combine the algorithms above with variants of our old method of the 1990. Back then, we built what to our knowledge was the first artificial fovea sequentially steered by a learning neural controller.
We also intend to combine this with our Formal Theory of Fun (FTF) and Curiosity & Creativity (see here and here)
As an artificial explorer driven by the FTF interactions with its environment, it is rewarded not only for solving external, user-defined problems, but also for inventing its own novel problems (e.g., better prediction of aspects of the environment, speeding up or simplifying previous solutions), thus becoming a more and more general problem solver over time.
Here some of our video lectures on artificial creativity:
comments 15
by Jake_Witmer
Now, do these guys have intentions of making neural nets that plug into the human neocortex, or building entire machine brains, …or will it just be purely synthetic machines that dominate the surface of the Earth at some later date? If these guys can build a neocortex as good as Grok, they can build a cerebellum, spinal cord, and nervous system. Superhuman robots and cyborgs sooner, rather than later! Don’t let the luddites and other mad bombers get to our sensitive wetware first!
by MrQuincle
I still don’t get why LSTM works so much better. As also explained at http://en.wikipedia.org/wiki/Long_short_term_memory it is about a smart way how to distribute error back over a network. If this is kept independent from the signal, it is logical that it can outperform neural networks that uses simpler building blocks. The error signal can be maintained over long times different from for example reservoir computing methods.
The problem resides a bit in how ad-hoc the building blocks are. It’s a bit like Adaptive Resonance Theory (from Grossberg) or Hierarchical Temporal Memory (from Hawkins). I hope it will be feasible to bridge the gap to more “realistic mechanisms”. For example, polychronization (Izhikevich) might be a mechanism that is used in neural networks. How are “forgetting gates” implemented in the brain? Or why is LSTM a good abstraction that brings us further!? Or can something be said about the glass ceiling that is reached by the other methods and how LSTM breaks through it?
by John Goodrich
I first saw Schmidhuber on a TED talk he did on when machine intelligence surpasses human and he was superb.
A brillian , inspired and inspiring man.
by Mark Hidden
Thanks that was a good ted talk…
by Don
Very good article. Extremely content rich. Enjoyed it immensely! Thanks!
by Xavier
From what I could find on his website, the secret behind Schmidhuber’s neural networks are “forget gates” which achieve some rudimentary form of creativity.
This does bear resemblance to the work of Robert Kozma and Stephen Thaler, who are also exploring creativity and neural networks. Especially Thaler’s systems are unprecedented and I’m sure that if he would invest more time in public demonstrations or competitions, he could gain a solid reputation.
Further comparing the accomplishments between Schmidhuber’s and Thaler’s research reveals that Thaler’s AI is much more efficient in both computer vision and music generation.
No offense to Schmidhuber though, I’m interested to see how his work will progress in the future!
by Ponder Bear
Xavier, please do some more research. Those “forget gates” are orthogonal to the principle of creativity. Forget gates are just a technical improvement of recurrent neural networks. To learn more about creativity, follow the links in the article. Or google artificial curiosity, or the formal theory of creativity. You write: “Thaler’s AI is much more efficient in computer vision.” Are you really saying that it can outperform methods of Schmidhuber’s team on the benchmarks mentioned in the article? Then prove it! Good luck.
by Xavier
Ponder Bear, I would love to prove it! But I guess I can’t, because I have no statistical data on this. I just know what I have seen and that’s why most of the things on this site don’t really impress me personally (but are interesting nonetheless). Dr. Thaler would have to come forth himself. In fact, it would be great if KurzweilAI.net invited him for some more information if it’s possible.
Thanks for the correction by the way, I was just beginning to delve into his website. I think the formal theory of creativity & artificial curiosity itself is sound, he rightly identifies “novel patterns” as a result of creativity, but the used techniques are very time-consuming and only novel to a certain extent.
For example: http://www.idsia.ch/~juergen/blues/index.html
“This work marks, we believe, a first step towards a neural network music composer that can learn and use global musical structure.”
That is not the first neural network composer of this kind: http://imagination-engines.com/iei_musical_composition.php
Thaler’s DAGUI/DABUI (Device for the Autonomous Generation/Bootstrapping of Useful Information) or “Creativity Machine”, as it is appropriately called, can also generate pleasing music from training by example (absorbing the essence), but doesn’t rely on it. It can be configured to accept only the user’s input and be “rewarded” for good melodies so that it produces more of them and minimizes the bad ones – it learns and generates without any workarounds.
The same astounding results in machine vision: http://imagination-engines.com/iei_airport_security.php & http://imagination-engines.com/iei_machine_vision.php
DAGUI/DABUI systems (built from Group-Membership-Filters/Imagination-Engines/Self-Training-Artificial-Neural-Network-Objects) can not only recognize and interpret static symbols but also 3-dimensional or moving objects in dynamic scenes even under difficult circumstances when obscured by rain or dirt. This would be perfect for self-driving cars amongst other things.
They can also generate art, for example human faces (or anything else they’re exposed to) = novel patterns! Look for his talk “Thalamocortical Algorithms in Space!” on Slide 4 for more on this. His theories on creativity are very similar to Schmidhuber’s but in my view more practical, detailed and closer to neurobiology than anything else I’ve come across so far. They also convincingly explain cognition and near-death experiences.
The big advantage is that his neural architectures are self-assembling. If LSTM Recurrent Neural Networks were a dynamo, the Creativity Machine is a powerhouse ;)
by Ponder Bear
Xavier, you are posting a lot of stuff which seems quite unrelated to the article. Note, however, that the first artificial neural network music composers date back at least to the early 1980s. For example, check out the work of Peter Todd and Mike Mozer. The 1996 patent by Thaler seems hard to defend due to prior art. Apparently it rephrases the 2 network approach to system identification published by Werbos in the 1980s. One supervised net is trained (on musical pieces) to produce output data (in this case music). The other net (the critic) is trained to model a consumer evaluating the data. Then the first net is randomly perturbed until it produces music that gets a high evaluation when fed into the critic. So that’s essentially traditional neural system identification and control in the style of Werbos (and Widrow). I think Schmidhuber (1990) described the first variant thereof where both creator and critic are more powerful recurrent networks. But all of this is orthogonal to his general theory of creativity and curiosity, which also dates back to the early 1990s. The theory does not care whether you implement it through neural nets or something else. It says that a reward-maximizing creator gets intrinsic reward for the learning progress (the wow-effects) of a separate predictor or encoder of the observation stream which is created through the actions of the creator. Anyway, my general advice would be to do what all the other international teams did: Apply your system to data from the competitions mentioned in the interview. If you can beat the state of the art, you won’t have to complain about a lack of attention.
by Xavier
I completely agree that the inattention is self-imposed, which I find rather sad. As you can tell, I’m very enthusiastic about the potential of Thaler’s technology, and seeing it gathering dust behind closed doors, so to speak, doesn’t do it justice in my opinion (as I know this is partially due to the fact that he is bound by governmental and industrial contracts not to reveal certain details about his projects), but I can’t force anyone to do what is good for them, which leaves me only the option to try and encourage people to imagine what the principles behind it can do compared to what’s out there.
Well, the patent you speak of is just the simplest, smallest foundation of this AI and doesn’t represent the totality of the systems he works with. The systems of Werbos et al. seem to need back-propagation. Thaler found ways around this via non-algorithmic, self-interconnecting STANNOs into arbitrarily large cascades with billions of neurons, which don’t need prior musical training (DABUI).
Sorry for going off-topic again. Schmidhuber certainly is a brilliant man and I mean no disrespect to his work; he’s on the right track with neural networks and it’s nice to see more people appreciating them (I never thought that highly of algorithmic, symbolic AI).
by Gorden Russell
But will deep learning equip a self-driving auto to deal with traffic signs in redneck country that have been blasted by buckshot? Or ones in the ‘Hood covered with spray paint? Or ones anywhere else that have bumper stickers pasted on them?
by John Goodrich
A machine intelligence could possess all the visual properties of a human and probably recognize signs for what they are whe they are too badly damaged or obscured for a human to make out.
A machine would see a hexagonal stop sign fragment containing just the one corner with the letter “P” on it and KNOW what it is at the speed of light while a human would either not see it, not know what it was or in far fewer cases than a machine, would know what that fragment is and stop his car.
While some human thinking processes like writing poetry or creating art may ( or may not be ) possible for a machine intelligence , others that strictly involve visual or audio recognition will always be better done by machines.
by MrFriendly
Really nice interview.
by Les Elkind
Demonstrating intrinsic motivation that arises out of a system’s basic operation seems confirmatory to me that creativity, and even personhood, can be understood as emerging naturally from the ordering of the world. I am now officially a fan of Dr. Schmidhuber!
by star0
Wow! Sounds very impressive. And a very high-quality article, I might add. I can’t wait to hear what these guys come up with regarding their “artificial fovea” + deep NNs + FTF + Curiosity & Creativity.