In the broad sense, Artificial Intelligence (AI) refers to the capability of machines to imitate intelligent human behavior. As such, Artificial Intelligence systems process large amounts of observations in order to learn how to confront complex problems. During the last couple of years there is surge of interest on Artificial Intelligence problems and applications in a variety of different sectors such as transport, healthcare and industry. This interest is largely due to development and deployment of advanced AI systems that can win grandmasters in the GO game or reason over complex driving contexts to enable autonomous vehicles.
The development of such advanced systems has been enabled by the evolution of computing and storage, which facilitates their training on very large datasets and the very fast execution of their algorithms. Most of these advanced systems take advantage of deep learning, which is a special segment of machine learning that leverages deep neural networks i.e. neural networks with many hidden layers. Deep neural networks can identify complex and unusual patterns of knowledge, far beyond classical machine learning algorithms. For example, convolutional neural networks (CNN) can detect and analyze very complex patterns in visual imagery. CNNs are inspired by biological processes, since they connect their neurons in a way much similar to the organization of animal’s visual cortex. As another example of deep learning techniques, Recurrent Neural Networks (RNN) are appropriate for analyzing temporal behavior dynamics since they use their internal state (i.e. their memory) to process entire sequences of different inputs. This makes them more appropriate for artificial intelligence problems like handwriting recognition and speech processing. Overall, deep neural networks tackle some very challenging artificial intelligence problems i.e. problems that typically require human like intelligence. Following paragraphs present some of the most popular of these problems.
1. Colorization of Black and White Images
Image colorization is a challenging artificial intelligence problem of adding colors to black and white (B/W) photographs. This is typically a task undertaken by humans and is the foundation for many interesting projects like the revival and relaunch of classical BW films. Deep learning techniques based on convolutional neural networks are nowadays able to deal with the colorization problem. CNNs are efficient in identifying the boundaries of objects in the image. However, in the colorization case this is not enough. Rather, there is a need to recreate the image with the addition of color to the various objects. This is achieved based on the use of very large convolutional neural networks with supervised layers. The latter leverage images and libraries developed over the popular ImageNet database for the problem of automatic colorization.
2. Adding Sounds to Silent Movies
Another challenging task solved based on deep learning involves sound synthesis for silent videos. The process is based on the development of a deep learning model that associates silent video frames with pre-recorded sounds that match the scenes that are displayed in them. The model takes advantage of a large number of video examples with sounds that comprise scenes similar to the ones contained in the silent video. For example, when the silent video clip depicts an object being hit, the deep learning algorithm produces a sound that is both relevant and realistic enough to fool human viewers. Indeed, similar systems have been evaluated with the participation of human actors, who were unable to determine which video clips comprise real sounds and which ones are synthesized. The system employs a combination of convolutional neural networks and recurrent neural networks, as a means of alleviating the limitations of early stage neural networks.
In addition to “giving life” to silent movies, such deep learning models are expected to open new horizons in media, where they can be used to automatically produce sound effects in movies and TV shows. They will also advance the perceptive capabilities of robots, which will be able to use sound similarity metrics in order to understand object’scontext and properties.
3.Object Classification in Photographs
There are also many applications that take advantage of multi-media content retrieval (e.g., object search in images) instead of conventional textual search. One of the main tasks in such artificial intelligence problems concerns the classification of objects within a photograph or image based on a set of previously known objects. Very large CNNs have been recently proven very efficient in object classification. Likewise, they are also used for a variation of this problem, namely the task of object detection, which is even closer to content based retrieval. Object detection is foundational for a great number of applications from security to autonomous cars. It is what we usually see in the scope of images where boxes are drawn around a wide array of detected objects.
4. Automatic Handwriting Generation
How about computers and devices that can write like humans? This is no longer science fiction as there are deep learning solutions to automatically generate handwriting based on a corpus of handwriting samples that are used for training. In particular, the deep learning solutions uses the corpus in order to learn the relationship between the pen movement and the letters produced based on this movement. Following this learning, the deep learning system is able to generate new handwriting data. Note that available systems can be trained to produce different writing styles.
5. Automatic Machine Translation
Nowadays most of us have used at least once automatic machine translation systems, which translate words, phrases and sentences from one language to another without any human intervention. Automatic machine translation systems have been around for nearly two decades. However, recent advances in deep learning have significantly improved the effectiveness of automatic translation of text and of automatic translation of images. In particular, modern deep learning models based on a very large RNNs can perform text translation without any preprocessing of the words’ sequence, as they can learn the dependencies between the words and their mapping to a new language. Likewise, CNNs can be used to process images that contain letters. Accordingly, letters are converted to text, the text is translated and new images containing the translated text are recreated. The entire process is much more efficient and accurate than in the past thanks to the availability of large corpora of training data in the target languages. Training data availability is key to success: In most cases it’s even more important than having an effective deep learning model or algorithm. Hence, limited training data is usually one of the main limitations of deep learning systems.
The above use cases are probably only the starting point of applying deep learning for artificial intelligence problems based on systems that exhibit human like intelligence. Nevertheless, the presented use cases are indicative of the advanced capabilities offered by deep neural networks. By combining several of the presented capabilities more complex and more interesting use cases will be made possible, such as for example autonomous driving which involves the continuous detection of objects on video in order to understand and anticipate the driving context. Such use cases combine multiple neural networks in order to alleviate the limitations of simple deep learning systems. By and large, when it comes to thinking of the possibilities that are opened up by deep learning, the only limit is the sky.