An accessible introduction to the artificial intelligence technology that enables computer vision, speech recognition, machine translation, and driverless cars.
Deep learning is an artificial intelligence technology that enables computer vision, speech recognition in mobile phones, machine translation, AI games, driverless cars, and other applications. When we use consumer products from Google, Microsoft, Facebook, Apple, or Baidu, we are often interacting with a deep learning system. In this volume in the MIT Press Essential Knowledge series, computer scientist John Kelleher offers an accessible and concise but comprehensive introduction to the fundamental technology at the heart of the artificial intelligence revolution.
This book starts with setting up a Python virtual environment with the deep learning framework TensorFlow and then introduces the fundamental concepts of TensorFlow. Before moving on to Computer Vision, you will learn about neural networks and related aspects such as loss functions, gradient descent optimization, activation functions and how backpropagation works for training multi-layer perceptrons.
To understand how the Convolutional Neural Network (CNN) is used for computer vision problems, you need to learn about the basic convolution operation. You will learn how CNN is different from a multi-layer perceptron along with a thorough discussion on the different building blocks of the CNN architecture such as kernel size, stride, padding, and pooling and finally learn how to build a small CNN model.
Next, you will learn about different popular CNN architectures such as AlexNet, VGGNet, Inception, and ResNets along with different object detection algorithms such as RCNN, SSD, and YOLO. The book concludes with a chapter on sequential models where you will learn about RNN, GRU, and LSTMs and their architectures and understand their applications in machine translation, image/video captioning and video classification