{"product_id":"machine-learning-with-spark-and-python-isbn-9781119561934","title":"Machine Learning with Spark and Python","description":"\u003cp\u003e\u003ci\u003eMachine Learning with Spark and Python Essential Techniques for Predictive Analytics, Second Edition\u003c\/i\u003e simplifies ML for practical uses by focusing on two key algorithms. This new second edition improves with the addition of Spark—a ML framework from the Apache foundation. By implementing Spark, machine learning students can easily process much large data sets and call the spark algorithms using ordinary Python code.\u003cbr\u003e  \u003cbr\u003e \u003ci\u003eMachine Learning with Spark and Python\u003c\/i\u003e focuses on two algorithm families (linear methods and ensemble methods) that effectively predict outcomes. This type of problem covers many use cases such as what ad to place on a web page, predicting prices in securities markets, or detecting credit card fraud. The focus on two families gives enough room for full descriptions of the mechanisms at work in the algorithms. Then the code examples serve to illustrate the workings of the machinery with specific hackable code.\u003c\/p\u003e \u003cp\u003eIntroduction xxi\u003c\/p\u003e \u003cp\u003e\u003cb\u003eChapter 1 The Two Essential Algorithms for Making Predictions 1\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003eWhy are These Two Algorithms So Useful? 2\u003c\/p\u003e \u003cp\u003eWhat are Penalized Regression Methods? 7\u003c\/p\u003e \u003cp\u003eWhat are Ensemble Methods? 9\u003c\/p\u003e \u003cp\u003eHow to Decide Which Algorithm to Use 11\u003c\/p\u003e \u003cp\u003eThe Process Steps for Building a Predictive Model 13\u003c\/p\u003e \u003cp\u003eFraming a Machine Learning Problem 15\u003c\/p\u003e \u003cp\u003eFeature Extraction and Feature Engineering 17\u003c\/p\u003e \u003cp\u003eDetermining Performance of a Trained Model 18\u003c\/p\u003e \u003cp\u003eChapter Contents and Dependencies 18\u003c\/p\u003e \u003cp\u003eSummary 20\u003c\/p\u003e \u003cp\u003e\u003cb\u003eChapter 2 Understand the Problem by Understanding the Data 23\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003eThe Anatomy of a New Problem 24\u003c\/p\u003e \u003cp\u003eDifferent Types of Attributes and Labels Drive Modeling Choices 26\u003c\/p\u003e \u003cp\u003eThings to Notice about Your New Data Set 27\u003c\/p\u003e \u003cp\u003eClassification Problems: Detecting Unexploded Mines Using Sonar 28\u003c\/p\u003e \u003cp\u003ePhysical Characteristics of the Rocks Versus Mines Data Set 29\u003c\/p\u003e \u003cp\u003eStatistical Summaries of the Rocks Versus Mines Data Set 32\u003c\/p\u003e \u003cp\u003eVisualization of Outliers Using a Quantile-Quantile Plot 34\u003c\/p\u003e \u003cp\u003eStatistical Characterization of Categorical Attributes 35\u003c\/p\u003e \u003cp\u003eHow to Use Python Pandas to Summarize the Rocks Versus Mines Data Set 36\u003c\/p\u003e \u003cp\u003eVisualizing Properties of the Rocks Versus Mines Data Set 39\u003c\/p\u003e \u003cp\u003eVisualizing with Parallel Coordinates Plots 39\u003c\/p\u003e \u003cp\u003eVisualizing Interrelationships between Attributes and Labels 41\u003c\/p\u003e \u003cp\u003eVisualizing Attribute and Label Correlations Using a Heat Map 48\u003c\/p\u003e \u003cp\u003eSummarizing the Process for Understanding the Rocks Versus Mines Data Set 50\u003c\/p\u003e \u003cp\u003eReal-Valued Predictions with Factor Variables: How Old is Your Abalone? 50\u003c\/p\u003e \u003cp\u003eParallel Coordinates for Regression Problems—Visualize Variable Relationships for the Abalone Problem 55\u003c\/p\u003e \u003cp\u003eHow to Use a Correlation Heat Map for Regression—Visualize Pair-Wise Correlations for the Abalone Problem 59\u003c\/p\u003e \u003cp\u003eReal-Valued Predictions Using Real-Valued Attributes: Calculate How Your Wine Tastes 61\u003c\/p\u003e \u003cp\u003eMulticlass Classification Problem: What Type of Glass is That? 67\u003c\/p\u003e \u003cp\u003eUsing PySpark to Understand Large Data Sets 72\u003c\/p\u003e \u003cp\u003eSummary 75\u003c\/p\u003e \u003cp\u003e\u003cb\u003eChapter 3 Predictive Model Building: Balancing Performance, Complexity, and Big Data 77\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003eThe Basic Problem: Understanding Function Approximation 78\u003c\/p\u003e \u003cp\u003eWorking with Training Data 79\u003c\/p\u003e \u003cp\u003eAssessing Performance of Predictive Models 81\u003c\/p\u003e \u003cp\u003eFactors Driving Algorithm Choices and Performance—Complexity and Data 82\u003c\/p\u003e \u003cp\u003eContrast between a Simple Problem and a Complex Problem 82\u003c\/p\u003e \u003cp\u003eContrast between a Simple Model and a Complex Model 85\u003c\/p\u003e \u003cp\u003eFactors Driving Predictive Algorithm Performance 89\u003c\/p\u003e \u003cp\u003eChoosing an Algorithm: Linear or Nonlinear? 90\u003c\/p\u003e \u003cp\u003eMeasuring the Performance of Predictive Models 91\u003c\/p\u003e \u003cp\u003ePerformance Measures for Different Types of Problems 91\u003c\/p\u003e \u003cp\u003eSimulating Performance of Deployed Models 105\u003c\/p\u003e \u003cp\u003eAchieving Harmony between Model and Data 107\u003c\/p\u003e \u003cp\u003eChoosing a Model to Balance Problem Complexity, Model Complexity, and Data Set Size 107\u003c\/p\u003e \u003cp\u003eUsing Forward Stepwise Regression to Control Overfitting 109\u003c\/p\u003e \u003cp\u003eEvaluating and Understanding Your Predictive Model 114\u003c\/p\u003e \u003cp\u003eControl Overfitting by Penalizing Regression Coefficients—Ridge Regression 116\u003c\/p\u003e \u003cp\u003eUsing PySpark for Training Penalized Regression Models on Extremely Large Data Sets 124\u003c\/p\u003e \u003cp\u003eSummary 127\u003c\/p\u003e \u003cp\u003e\u003cb\u003eChapter 4 Penalized Linear Regression 129\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003eWhy Penalized Linear Regression Methods are So Useful 130\u003c\/p\u003e \u003cp\u003eExtremely Fast Coefficient Estimation 130\u003c\/p\u003e \u003cp\u003eVariable Importance Information 131\u003c\/p\u003e \u003cp\u003eExtremely Fast Evaluation When Deployed 131\u003c\/p\u003e \u003cp\u003eReliable Performance 131\u003c\/p\u003e \u003cp\u003eSparse Solutions 132\u003c\/p\u003e \u003cp\u003eProblem May Require Linear Model 132\u003c\/p\u003e \u003cp\u003eWhen to Use Ensemble Methods 132\u003c\/p\u003e \u003cp\u003ePenalized Linear Regression: Regulating Linear Regression for Optimum Performance 132\u003c\/p\u003e \u003cp\u003eTraining Linear Models: Minimizing Errors and More 135\u003c\/p\u003e \u003cp\u003eAdding a Coefficient Penalty to the OLS Formulation 136\u003c\/p\u003e \u003cp\u003eOther Useful Coefficient Penalties—Manhattan and ElasticNet 137\u003c\/p\u003e \u003cp\u003eWhy Lasso Penalty Leads to Sparse Coefficient Vectors 138\u003c\/p\u003e \u003cp\u003eElasticNet Penalty Includes Both Lasso and Ridge 140\u003c\/p\u003e \u003cp\u003eSolving the Penalized Linear Regression Problem 141\u003c\/p\u003e \u003cp\u003eUnderstanding Least Angle Regression and Its Relationship to Forward Stepwise Regression 141\u003c\/p\u003e \u003cp\u003eHow LARS Generates Hundreds of Models of Varying Complexity 145\u003c\/p\u003e \u003cp\u003eChoosing the Best Model from the Hundreds LARS Generates 147\u003c\/p\u003e \u003cp\u003eUsing Glmnet: Very Fast and Very General 152\u003c\/p\u003e \u003cp\u003eComparison of the Mechanics of Glmnet and LARS Algorithms 153\u003c\/p\u003e \u003cp\u003eInitializing and Iterating the Glmnet Algorithm 153\u003c\/p\u003e \u003cp\u003eExtension of Linear Regression to Classification Problems 157\u003c\/p\u003e \u003cp\u003eSolving Classification Problems with Penalized Regression 157\u003c\/p\u003e \u003cp\u003eWorking with Classification Problems Having More Than Two Outcomes 161\u003c\/p\u003e \u003cp\u003eUnderstanding Basis Expansion: Using Linear Methods on Nonlinear Problems 161\u003c\/p\u003e \u003cp\u003eIncorporating Non-Numeric Attributes into Linear Methods 163\u003c\/p\u003e \u003cp\u003eSummary 166\u003c\/p\u003e \u003cp\u003e\u003cb\u003eChapter 5 Building Predictive Models Using Penalized Linear Methods 169\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003ePython Packages for Penalized Linear Regression 170\u003c\/p\u003e \u003cp\u003eMultivariable Regression: Predicting Wine Taste 171\u003c\/p\u003e \u003cp\u003eBuilding and Testing a Model to Predict Wine Taste 172\u003c\/p\u003e \u003cp\u003eTraining on the Whole Data Set before Deployment 175\u003c\/p\u003e \u003cp\u003eBasis Expansion: Improving Performance by Creating New Variables from Old Ones 179\u003c\/p\u003e \u003cp\u003eBinary Classification: Using Penalized Linear Regression to Detect Unexploded Mines 182\u003c\/p\u003e \u003cp\u003eBuild a Rocks Versus Mines Classifier for Deployment 191\u003c\/p\u003e \u003cp\u003eMulticlass Classification: Classifying Crime Scene Glass Samples 200\u003c\/p\u003e \u003cp\u003eLinear Regression and Classification Using PySpark 203\u003c\/p\u003e \u003cp\u003eUsing PySpark to Predict Wine Taste 204\u003c\/p\u003e \u003cp\u003eLogistic Regression with PySpark: Rocks Versus Mines 208\u003c\/p\u003e \u003cp\u003eIncorporating Categorical Variables in a PySpark Model: Predicting Abalone Rings 213\u003c\/p\u003e \u003cp\u003eMulticlass Logistic Regression with Meta Parameter Optimization 217\u003c\/p\u003e \u003cp\u003eSummary 219\u003c\/p\u003e \u003cp\u003e\u003cb\u003eChapter 6 Ensemble Methods 221\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003eBinary Decision Trees 222\u003c\/p\u003e \u003cp\u003eHow a Binary Decision Tree Generates Predictions 224\u003c\/p\u003e \u003cp\u003eHow to Train a Binary Decision Tree 225\u003c\/p\u003e \u003cp\u003eTree Training Equals Split Point Selection 227\u003c\/p\u003e \u003cp\u003eHow Split Point Selection Affects Predictions 228\u003c\/p\u003e \u003cp\u003eAlgorithm for Selecting Split Points 229\u003c\/p\u003e \u003cp\u003eMultivariable Tree Training—Which Attribute to Split? 229\u003c\/p\u003e \u003cp\u003eRecursive Splitting for More Tree Depth 230\u003c\/p\u003e \u003cp\u003eOverfitting Binary Trees 231\u003c\/p\u003e \u003cp\u003eMeasuring Overfit with Binary Trees 231\u003c\/p\u003e \u003cp\u003eBalancing Binary Tree Complexity for Best Performance 232\u003c\/p\u003e \u003cp\u003eModifi cations for Classification and Categorical Features 235\u003c\/p\u003e \u003cp\u003eBootstrap Aggregation: “Bagging” 235\u003c\/p\u003e \u003cp\u003eHow Does the Bagging Algorithm Work? 236\u003c\/p\u003e \u003cp\u003eBagging Performance—Bias Versus Variance 239\u003c\/p\u003e \u003cp\u003eHow Bagging Behaves on Multivariable Problem 241\u003c\/p\u003e \u003cp\u003eBagging Needs Tree Depth for Performance 245\u003c\/p\u003e \u003cp\u003eSummary of Bagging 246\u003c\/p\u003e \u003cp\u003eGradient Boosting 246\u003c\/p\u003e \u003cp\u003eBasic Principle of Gradient Boosting Algorithm 246\u003c\/p\u003e \u003cp\u003eParameter Settings for Gradient Boosting 249\u003c\/p\u003e \u003cp\u003eHow Gradient Boosting Iterates toward a Predictive Model 249\u003c\/p\u003e \u003cp\u003eGetting the Best Performance from Gradient Boosting 250\u003c\/p\u003e \u003cp\u003eGradient Boosting on a Multivariable Problem 253\u003c\/p\u003e \u003cp\u003eSummary for Gradient Boosting 256\u003c\/p\u003e \u003cp\u003eRandom Forests 256\u003c\/p\u003e \u003cp\u003eRandom Forests: Bagging Plus Random Attribute Subsets 259\u003c\/p\u003e \u003cp\u003eRandom Forests Performance Drivers 260\u003c\/p\u003e \u003cp\u003eRandom Forests Summary 261\u003c\/p\u003e \u003cp\u003eSummary 262\u003c\/p\u003e \u003cp\u003e\u003cb\u003eChapter 7 Building Ensemble Models with Python 265\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003eSolving Regression Problems with Python Ensemble Packages 265\u003c\/p\u003e \u003cp\u003eUsing Gradient Boosting to Predict Wine Taste 266\u003c\/p\u003e \u003cp\u003eUsing the Class Constructor for GradientBoostingRegressor 266\u003c\/p\u003e \u003cp\u003eUsing GradientBoostingRegressor to Implement a Regression Model 268\u003c\/p\u003e \u003cp\u003eAssessing the Performance of a Gradient Boosting Model 271\u003c\/p\u003e \u003cp\u003eBuilding a Random Forest Model to Predict Wine Taste 272\u003c\/p\u003e \u003cp\u003eConstructing a RandomForestRegressor Object 273\u003c\/p\u003e \u003cp\u003eModeling Wine Taste with RandomForestRegressor 275\u003c\/p\u003e \u003cp\u003eVisualizing the Performance of a Random Forest Regression Model 279\u003c\/p\u003e \u003cp\u003eIncorporating Non-Numeric Attributes in Python Ensemble Models 279\u003c\/p\u003e \u003cp\u003eCoding the Sex of Abalone for Gradient Boosting Regression in Python 280\u003c\/p\u003e \u003cp\u003eAssessing Performance and the Importance of Coded Variables with Gradient Boosting 282\u003c\/p\u003e \u003cp\u003eCoding the Sex of Abalone for Input to Random Forest Regression in Python 284\u003c\/p\u003e \u003cp\u003eAssessing Performance and the Importance of Coded Variables 287\u003c\/p\u003e \u003cp\u003eSolving Binary Classification Problems with Python Ensemble Methods 288\u003c\/p\u003e \u003cp\u003eDetecting Unexploded Mines with Python Gradient Boosting 288\u003c\/p\u003e \u003cp\u003eDetermining the Performance of a Gradient Boosting Classifier 291\u003c\/p\u003e \u003cp\u003eDetecting Unexploded Mines with Python Random Forest 292\u003c\/p\u003e \u003cp\u003eConstructing a Random Forest Model to Detect Unexploded Mines 294\u003c\/p\u003e \u003cp\u003eDetermining the Performance of a Random Forest Classifier 298\u003c\/p\u003e \u003cp\u003eSolving Multiclass Classification Problems with Python Ensemble Methods 300\u003c\/p\u003e \u003cp\u003eDealing with Class Imbalances 301\u003c\/p\u003e \u003cp\u003eClassifying Glass Using Gradient Boosting 301\u003c\/p\u003e \u003cp\u003eDetermining the Performance of the Gradient Boosting Model on Glass Classification 306\u003c\/p\u003e \u003cp\u003eClassifying Glass with Random Forests 307\u003c\/p\u003e \u003cp\u003eDetermining the Performance of the Random Forest Model on Glass Classification 310\u003c\/p\u003e \u003cp\u003eSolving Regression Problems with PySpark Ensemble Packages 311\u003c\/p\u003e \u003cp\u003ePredicting Wine Taste with PySpark Ensemble Methods 312\u003c\/p\u003e \u003cp\u003ePredicting Abalone Age with PySpark Ensemble Methods 317\u003c\/p\u003e \u003cp\u003eDistinguishing Mines from Rocks with PySpark\u003c\/p\u003e \u003cp\u003eEnsemble Methods 321\u003c\/p\u003e \u003cp\u003eIdentifying Glass Types with PySpark Ensemble Methods 325\u003c\/p\u003e \u003cp\u003eSummary 327\u003c\/p\u003e \u003cp\u003eIndex 329\u003c\/p\u003e  \u003cp\u003e\u003cb\u003eMICHAEL BOWLES\u003c\/b\u003e teaches machine learning at UC Berkeley, University of New Haven and Hacker Dojo in Silicon Valley, consults on machine learning projects, and is involved in a number of startups in such areas as semi conductor inspection, drug design and optimization and trading in the financial markets. Following an assistant professorship at MIT, Michael went on to found and run two Silicon Valley startups, both of which went public. His courses are always popular and receive great feedback from participants.   \u003c\/p\u003e\u003cp\u003e\u003cb\u003eSIMPLE, EFFECTIVE WAY TO ANALYZE DATA AND PREDICT OUTCOMES WITH PYTHON\u003c\/b\u003e \u003c\/p\u003e\u003cp\u003eMachine learning focuses on predictionusing what you know to predict what you would like to know based on historical relationships between the two. At its core, it's a mathematical\/algorithm-based technology that, until recently, required a deep understanding of math and statistical concepts, and fluency in R and other specialized languages. \u003ci\u003eMachine Learning with Spark™ and Python\u003csup\u003e®\u003c\/sup\u003e\u003c\/i\u003e simplifies machine learning for a broader audience and wider application by focusing on two algorithm families that effectively predict outcomes, and by showing you how to apply them using the popular and accessible Python programming language. This edition shows how pyspark extends these two algorithms to extremely large data sets requiring multiple distributed processors. The same basic concepts apply. \u003c\/p\u003e\u003cp\u003eAuthor Michael Bowles draws from years of machine learning expertise to walk you through the design, construction, and implementation of your own machine learning solutions. The algorithms are explained in simple terms with no complex math, and sample code is provided to help you get started right away. You'll delve deep into the mechanisms behind the constructs, and learn how to select and apply the algorithm that will best solve the problem at hand, whether simple or complex. Detailed examples illustrate the machinery with specific, hackable code, and descriptive coverage of penalized linear regression and ensemble methods helps you understand the fundamental processes at work in machine learning. The methods are effective and well tested, and the results speak for themselves. \u003c\/p\u003e\u003cp\u003eDesigned specifically for those without a specialized math or statistics background, \u003ci\u003eMachine Learning with Spark and Python\u003c\/i\u003e shows you how to: \u003c\/p\u003e\u003cul\u003e \u003cli\u003eSelect the right algorithm for the job\u003c\/li\u003e \u003cli\u003eLearn the mechanisms and prepare the data\u003c\/li\u003e \u003cli\u003eCode demonstrates pyspark implementations scalable to big-data using hundreds of processors\u003c\/li\u003e \u003cli\u003eMaster core Python machine learning packages\u003c\/li\u003e \u003cli\u003eBuild versatile predictive models that work\u003c\/li\u003e \u003cli\u003eApply trained models in practice for various uses\u003c\/li\u003e \u003cli\u003eMeasure model performance for better QC and application\u003c\/li\u003e \u003cli\u003eUse provided sample code in Jupyter Notebook format to design and build your own model\u003c\/li\u003e \u003c\/ul\u003e","brand":"Wiley","offers":[{"title":"Default Title","offer_id":47989549433061,"sku":"NP9781119561934","price":50.0,"currency_code":"USD","in_stock":false}],"thumbnail_url":"\/\/cdn.shopify.com\/s\/files\/1\/1842\/7735\/files\/9781119561934.jpg?v=1761784554","url":"https:\/\/k12savings.com\/es\/products\/machine-learning-with-spark-and-python-isbn-9781119561934","provider":"K12savings","version":"1.0","type":"link"}