Mastering Java Machine Learning Copyright © 2017 Packt Publishing All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Credits Authors Project Coordinator Dr. Uday Kamath Nidhi Joshi Krishna Choppella Proofreader Reviewers Safis Editing Samir Sahli Prashant Verma Indexer Francy Puthiry Commissioning Editor Veena Pagare Graphics Tania Dutta Acquisition Editor Divya Poojari Production Coordinator Arvindkumar Gupta Content Development Editor Mayur Pawanikar Cover Work Arvindkumar Gupta Technical Editor Vivek Arora Copy Editor Safis Editing.
Foreword Dr. Uday Kamath is a volcano of ideas. Every time he walked into my office, we had fruitful and animated discussions. I have been a professor of computer science at George Mason University (GMU) for 15 years, specializing in machine learning and data mining. I have known Uday for five years, first as a student in my data mining class, then as a colleague and co-author of papers and projects on large-scale machine learning. While a chief data scientist at BAE Systems Applied Intelligence, Uday earned his PhD in evolutionary computation and machine learning. As if having two high-demand jobs was not enough, Uday was unusually prolific, publishing extensively with four different people in the computer science faculty during his tenure at GMU, something you don't see very often. Given this pedigree, I am not surprised that less than four years since Uday's graduation with a PhD, I am writing the foreword for his book on mastering advanced machine learning techniques with Java. Uday's thirst for new stimulating challenges has struck again, resulting in this terrific book you now have in your hands.
About the Authors Dr. Uday Kamath is the chief data scientist at BAE Systems Applied Intelligence. He specializes in scalable machine learning and has spent 20 years in the domain of AML, fraud detection in financial crime, cyber security, and bioinformatics, to name a few. Dr. Kamath is responsible for key products in areas focusing on the behavioral, social networking and big data machine learning aspects of analytics at BAE AI. He received his PhD at George Mason University, under the able guidance of Dr. Kenneth De Jong, where his dissertation research focused on machine learning for big data and automated sequence mining.
About the Reviewers Samir Sahli was awarded a BSc degree in applied mathematics and information sciences from the University of Nice Sophia-Antipolis, France, in 2004. He received MSc and PhD degrees in physics (specializing in optics/photonics/image science) from University Laval, Quebec, Canada, in 2008 and 2013, respectively. During his graduate studies, he worked with Defence Research and Development Canada (DRDC) on the automatic detection and recognition of targets in aerial imagery, especially in the context of uncontrolled environment and sub-optimal acquisition conditions. He has worked since 2009 as a consultant for several companies based in Europe and North America specializing in the area of Intelligence, Surveillance, and Reconnaissance (ISR) and in remote sensing.
www.PacktPub.com eBooks, discount offers, and more Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.
Customer Feedback Thanks for purchasing this Packt book. At Packt, quality is at the heart of our editorial process. To help us improve, please leave us an honest review on this book's Amazon page at https://www.amazon.com/dp/1785880519.
Table of Contents Preface xiii Chapter 1: Machine Learning Review 1 Machine learning – history and definition 3 What is not machine learning? 4 Machine learning – concepts and terminology 5 Machine learning – types and subtypes 9 Datasets used in machine learning 12 Machine learning applications 15 Practical issues in machine learning 16 Machine learning – roles and process 18 Roles 18 Process 18 Machine learning – tools and datasets 22 Datasets 25 Summary 26 Chapter 2: Practical Approach to Real-World Supervised Learning 29 Formal description and notation 31 Data quality analysis 32 Descriptive data analysis 32 Basic label analysis 32 Basic feature analysis 33 Visualization analysis 33 Univariate feature analysis 33 Multivariate feature analysis 34 Data transformation and preprocessing 34 Feature construction 35 Handling missing values 35 [ i ].
Preface There are many notable books on machine learning, from pedagogical tracts on the theory of learning from data; to standard references on specializations in the field, such as clustering and outlier detection or probabilistic graph modeling; to cookbooks that offer practical advice on the use of tools and libraries in a particular language. The books that tend to be broad in coverage are often short on theoretical detail, while those with a focus on one topic or tool may not, for example, have much to say about the difference in approach in a streaming as opposed to a batch environment. Besides, for the non-novices with a preference for tools in Java who wish to reach for a single volume that will extend their knowledge—simultaneously, on the essential aspects—there are precious few options.