Skip to main content

Deep Learning in the Spotlight

The general zeitgeist in the field of artificial intelligence has been revolving around the topic "deep learning" for some time now. New application areas are conquered almost weekly, as with the recently published success of the Google program AlphaGo which won all of a total of five played rounds in a competition against a European Go pro. The combination of deep learning methods and the Monte Carlo search algorithm was the cornerstone for also reaching this milestone in Go, almost 20 years after the first won game of chess against a professional. Due to the very complex decision pattern and the evaluations of the moves in the Go game, it was considered to be the last bastion of men against artificial intelligence and was thus classified as not yet achievable for the next 10 years by many experts. For more information on this self-training program see the Google blog post.

Deep Learning  - Learning from Deep Artificial Neural Networks on Very Large Data Sets
for the Evaluation of New Data

The by now daily contact points with deep learning include, for example, the face recognition of Facebook images, voice recognition and processing by Siri, Google Now or Cortana and the video processing of self-driving cars. A very interactive and people-connecting field of application is, for example, the real-time translation of spoken language by the Skype Translator. In the very heterogeneous field of application of man-to-man communication, machine language recognition and machine translation by means of deep learning processes have been improved so much that the achieved reduction of errors and simultaneous increase of robustness are essential to enabling the application.

Deep Learning - Concluding which Information Is Not Explicitly Described in the Data

The learning phases of the Skype translator models used up to 3 billion very complex and diverse data sets such as translated websites, designated videos and already translated individual conversations.  Models display the freely found relations and are able to further develop themselves with new data. These data volumes and the resulting models can only be processed anymore by means of a distributed and GPU- based infrastructure. The very time-consuming learn phases for the connected layers in the deep network could be reduced from multiple days to a few hours by the parallelization of the processes on any number of GPUs. This facilitates the required interactivity of the testability of the models. 

A very vivid and simple introduction to deep learning - overview, application, process, models and tools - is available in the DeepLearning.TV YouTube channel.

Deep Learning - Which Tool Box is Handy

The Computational Network Toolkit (CNTK) by Microsoft, transferred into the Open Source environment a few days ago, serves as basis of the Skype Translator and the Windows 10 voice assistant; the CNTK enables distributed GPU calculations across several units and thus offers a unique characteristic among the other known deep learning libraries, such as CaffeTorch (Facebook), TheanoDL4J or TensorFlow (Google). At the same time, the CNTK integrates itself into the Azure ecosystem and will give higher attention to the GPU instances if the scaling of the model is relevant in the learning pase. Thus, CNTK directly competes with TensorFlow when it comes to the linking of standard tools and the subsequent productive use in the Cloud ecosystem.

The CNTK is currently portrayed as the fastest deep learning tool for the provided methods by Microsoft's internal benchmark. A detailed introduction can be looked up in the December 2015 NIPS Tutorial. The already integrated NVIDIA CUDA Deep Neural Network Library (cuDNN) in version 4 and the optimized 1 Bit Stochastic Gradient Descent Process (Frank Seide, et al), which, however, has a Microsoft license, are main factors here. Most Open Source tools are currently still using the older and slower cuDNN version 3 (i.e. Torch) or even version 2 (TensorFlow). Torch and TensorFlow are currently developing the cuDNN version 4 and should catch up to the CNTK soon. Facebook AI Research (FAIR) further closed the gap in the area speed by the release of several internal packages at the end of January 2016 and now offers further parallelization options for models and data as well as the 1 Bit Stochastic Gradient process (to be looked up in this blog post). At the same time, the easy start into the tool is facilitated by iTorch as interface of Torch and iPython. However, the most exciting question will be how quickly and, above all, when Google will transfer the distribution across several GPU units into TensorFlow. Various blog comments suggest that this is scheduled for mid-2016, which further lifts the strongest ecosystem in terms of documentation, tutorial and open source support.  

A further exciting aspect of the CNTK benchmark is the chosen mini-batch size of the data to be learned. The number of used data points for the SGD process was set at 8,192. This size is realistic for the usually few characteristics in language processing; however, a test for smaller data sizes such as 512 or 128 would be very informative. In the SGD process, the gradient is averaged over the mini-batch size. Thus, the size influences how faded the gradient is and which path is followed for minimization in the network. This, in turn, affects the speed, accuracy and problem definitions (i.e. overfitting) of the learning stages. 

Deep Learning - Open Source Evaluation

A very clear evaluation according to models, interfaces, provision, architecture and ecosystem of the common Open Source tools in the deep learning environment supports the choice of the tool. The introduction of a few tools on the DeepLearning.TV YouTube channel enormously supports the understanding in this area.

For the field of data science, the adaption and the simple linking to Apache Sparks also remains exciting. As the article by Databricks shows, the combination of TensorFlow and Spark easily enables the very easy and fast learning and evaluation of moderate model sizes.

In case one of the following application areas is of interest, it is worth digging deeper into the topic deep learning and reading our following, practical deep learning series in the next blog posts:

Text

  • Sentiment Analysis  - CRM or Social Media
  • Fraud Detection  -  Finances or Insurances
  • Threat Detection - Social Media or Management

Sound

  • Voice Recognition - Automotive or Internet of Things
  • Voice Search -  Telecommunication and Telephone Manufacturer
  • Sentiment Analysis  - CRM 

Images

  • Image Search - Social Media
  • Machine Vision - Automotive or Aviation
  • Photo Clustering - Telephone Manufacturer

Video

  • Motion Detection - Gaming or UI/UX
  • Real Time Threat Detection - Safety

Time Series

  • Enterprise Resource Planning - Production or Supplier Chain
  • Predictive Analysis - Internet of Things or Smart Home
  • Recommendation Engine - Electronic Commerce or Media
Dr. Michael Allgöwer
Your Contact
Dr. Michael Allgöwer
Management Consultant
Machine learning has been Michael's field of expertise for some time. He is convinced that high-quality machine learning requires an in-depth knowledge of the subject area and enjoys updating this knowledge on a regular basis. His most recent topic of interest is reinforcement learning.
#MachineLearning #ReinforcementLearning #HumanIntelligence