Recently, we have seen a huge boom around the field of deep learning; it is currently being implemented in a wide variety of fields, from driverless cars to product recommendation. In their most primitive form, deep learning algorithms originated in the 1960s. If the concept has been around for decades, why is it that widespread use has begun only recently? The answer is simple: scalability and big data.
In the beginnings of big data, traditional machine learning algorithms were highly efficient and effective. This is because traditional algorithms such as random forests, support vector machines, and even logistic regressions improve very quickly when you add data. However, their performance plateaus after hitting millions of examples, and the margin of improvement in performance related to data size becomes minimal. In contrast, as is shown in the following figure, deep neural networks continue to increase performance as larger amounts of data are added.
The explosion of big data and the availability of cheap computer power opened the door for increased use of deep neural networks. However, it is not always easy to get started using these new models. For example, one practical issue that comes with the creation of deep learning models is that there exists a huge number of network architecture categories from which to choose. In particular, there are four main categories of deep learning models:
- General dense neural networks
- Multilayer perceptron
- Sequence models (1D)
- Recurrent neural networks
- Gated recurrent unit
- Long short-term memory (LSTM) networks
- Attention models, etc.
- Image models (2D and 3D)
- Convolutional neural networks
- Advanced / Future tech
- Unsupervised learning: Sparse coding, ICA, SFA, …
- Reinforcement learning
Selecting the correct category to use in a specific application is not straightforward; it's normally completely up to the analyst to decide which path to take. Interestingly, today, almost all industry value is driven by the first three categories, as they are the ones creating better models and products. However, if we look at the papers that were presented at the 2016 Neural Information Processing Systems Conference in Barcelona, most of the research was focused on the last group. This may suggest that the next big developments are going to be focused on unsupervised and reinforcement learning models.
Apart from the use of huge datasets, another important trend that has appeared recently is the increased use of end-to-end learning models. Traditional modeling depends on feature engineering, in which the data scientist must rely on expert and external knowledge to create features relevant to a given problem.
This is true in many applications: for example, with speech recognition, the traditional model relies on hand-designed features, but modern approaches allow the model to learn all of the necessary internal interactions itself.
Similarly, modern self-driving car models are increasingly relying on end-to-end approaches.
We tested this idea on our own phishing URL classifier. Our previous approach to designing the program, as described in this blog post, was based on creating several handmade features that allowed us to describe phishing URL patterns. The job of creating the appropriate features was time consuming and required the help of several experts. More recently, we have created a model using a LSTM deep neural network that did not require the manual creation of any features. This model was able to outperform the previous model by several percentage points.
AI Product Management
Finally, the last trend that is quickly gaining ground is the change in typical workflows with respect to how teams collaborate to build applications that use deep learning. In traditional software design, the product manager (PM) communicates their needs to the developer, who uses their skills and knowledge to fulfill the software requirements. However, in the case of AI product management, there is additional input that the PM must give to the AI engineer. In particular, it is up to the PM to provide dev/test datasets that he is certain will produce useful results. The PM must also provide the evaluation metric for the learning algorithm.
|Product Manager Responsibilities||AI Scientist /Engineer Responsibilities|
|● Provide dev/test sets, ideally drawn from same distribution
● Provide evaluation metric for the learning algorithm
|● Acquire training data
● Develop system that does well according to the provided metric on the dev/test set
PMs have been an integral part of product and software development in recent years. In order to translate good results into this new era of AI-based products, PMs must evolve and take into account the particularities of these new products and technologies. It should be up to the PM to define and evaluate the performance of the algorithm on a realistic dataset.
This blog post is based on the NIPS 2016 tutorial by Andrew Ng, “Nuts and bolts of building AI applications using Deep Learning.” For further information, take a look at his book.