{"product_id":"data-mining-techniques-isbn-9780470650936","title":"Data Mining Techniques","description":"The leading introductory book on data mining, fully updated and revised!  \u003cp\u003eWhen Berry and Linoff wrote the first edition of \u003ci\u003eData Mining Techniques\u003c\/i\u003e in the late 1990s, data mining was just starting to move out of the lab and into the office and has since grown to become an indispensable tool of modern business. This new edition—more than 50% new and revised— is a significant update from the previous one, and shows you how to harness the newest data mining methods and techniques to solve common business problems. The duo of unparalleled authors share invaluable advice for improving response rates to direct marketing campaigns, identifying new customer segments, and estimating credit risk. In addition, they cover more advanced topics such as preparing data for analysis and creating the necessary infrastructure for data mining at your company. \u003c\/p\u003e \u003cul\u003e \u003cli\u003eFeatures significant updates since the previous edition and updates you on best practices for using data mining methods and techniques for solving common business problems\u003c\/li\u003e \u003cli\u003eCovers a new data mining technique in every chapter along with clear, concise explanations on how to apply each technique immediately\u003c\/li\u003e \u003cli\u003eTouches on core data mining techniques, including decision trees, neural networks, collaborative filtering, association rules, link analysis, survival analysis, and more\u003c\/li\u003e \u003cli\u003eProvides best practices for performing data mining using simple tools such as Excel\u003c\/li\u003e \u003c\/ul\u003e \u003cp\u003e\u003ci\u003eData Mining Techniques, Third Edition\u003c\/i\u003e covers a new data mining technique with each successive chapter and then demonstrates how you can apply that technique for improved marketing, sales, and customer support to get immediate results.\u003c\/p\u003e \u003cp\u003eIntroduction xxxvii\u003c\/p\u003e \u003cp\u003e\u003cb\u003eChapter 1 What Is Data Mining and Why Do It? 1\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003eWhat Is Data Mining? 2\u003c\/p\u003e \u003cp\u003eData Mining Is a Business Process 2\u003c\/p\u003e \u003cp\u003eLarge Amounts of Data 3\u003c\/p\u003e \u003cp\u003eMeaningful Patterns and Rules 3\u003c\/p\u003e \u003cp\u003eData Mining and Customer Relationship Management 4\u003c\/p\u003e \u003cp\u003eWhy Now? 6\u003c\/p\u003e \u003cp\u003eData Is Being Produced 6\u003c\/p\u003e \u003cp\u003eData Is Being Warehoused 6\u003c\/p\u003e \u003cp\u003eComputing Power Is Affordable 7\u003c\/p\u003e \u003cp\u003eInterest in Customer Relationship Management Is Strong 7\u003c\/p\u003e \u003cp\u003eCommercial Data Mining Software Products Have Become Available 8\u003c\/p\u003e \u003cp\u003eSkills for the Data Miner 9\u003c\/p\u003e \u003cp\u003eThe Virtuous Cycle of Data Mining 9\u003c\/p\u003e \u003cp\u003eA Case Study in Business Data Mining 11\u003c\/p\u003e \u003cp\u003eIdentifying BofA’s Business Challenge 12\u003c\/p\u003e \u003cp\u003eApplying Data Mining 12\u003c\/p\u003e \u003cp\u003eActing on the Results 13\u003c\/p\u003e \u003cp\u003eMeasuring the Effects of Data Mining 14\u003c\/p\u003e \u003cp\u003eSteps of the Virtuous Cycle 15\u003c\/p\u003e \u003cp\u003eIdentify Business Opportunities 16\u003c\/p\u003e \u003cp\u003eTransform Data into Information 17\u003c\/p\u003e \u003cp\u003eAct on the Information 19\u003c\/p\u003e \u003cp\u003eMeasure the Results 20\u003c\/p\u003e \u003cp\u003eData Mining in the Context of the Virtuous Cycle 23\u003c\/p\u003e \u003cp\u003eLessons Learned 26\u003c\/p\u003e \u003cp\u003e\u003cb\u003eChapter 2 Data Mining Applications in Marketing and Customer Relationship Management 27\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003eTwo Customer Lifecycles 27\u003c\/p\u003e \u003cp\u003eThe Customer’s Lifecycle 28\u003c\/p\u003e \u003cp\u003eThe Customer Lifecycle 28\u003c\/p\u003e \u003cp\u003eSubscription Relationships versus Event-Based Relationships 30\u003c\/p\u003e \u003cp\u003eOrganize Business Processes Around the Customer Lifecycle 32\u003c\/p\u003e \u003cp\u003eCustomer Acquisition 33\u003c\/p\u003e \u003cp\u003eCustomer Activation 36\u003c\/p\u003e \u003cp\u003eCustomer Relationship Management 37\u003c\/p\u003e \u003cp\u003eWinback 38\u003c\/p\u003e \u003cp\u003eData Mining Applications for Customer Acquisition 38\u003c\/p\u003e \u003cp\u003eIdentifying Good Prospects 39\u003c\/p\u003e \u003cp\u003eChoosing a Communication Channel 39\u003c\/p\u003e \u003cp\u003ePicking Appropriate Messages 40\u003c\/p\u003e \u003cp\u003eA Data Mining Example: Choosing the Right Place to Advertise 40\u003c\/p\u003e \u003cp\u003eWho Fits the Profile? 41\u003c\/p\u003e \u003cp\u003eMeasuring Fitness for Groups of Readers 44\u003c\/p\u003e \u003cp\u003eData Mining to Improve Direct Marketing Campaigns 45\u003c\/p\u003e \u003cp\u003eResponse Modeling 46\u003c\/p\u003e \u003cp\u003eOptimizing Response for a Fixed Budget 47\u003c\/p\u003e \u003cp\u003eOptimizing Campaign Profitability 49\u003c\/p\u003e \u003cp\u003eReaching the People Most Influenced by the Message 53\u003c\/p\u003e \u003cp\u003eUsing Current Customers to Learn About Prospects 54\u003c\/p\u003e \u003cp\u003eStart Tracking Customers Before They Become “Customers” 55\u003c\/p\u003e \u003cp\u003eGather Information from New Customers 55\u003c\/p\u003e \u003cp\u003eAcquisition-Time Variables Can Predict Future Outcomes 56\u003c\/p\u003e \u003cp\u003eData Mining Applications for Customer Relationship Management 56\u003c\/p\u003e \u003cp\u003eMatching Campaigns to Customers 56\u003c\/p\u003e \u003cp\u003eReducing Exposure to Credit Risk 58\u003c\/p\u003e \u003cp\u003eDetermining Customer Value 59\u003c\/p\u003e \u003cp\u003eCross-selling, Up-selling, and Making Recommendations 60\u003c\/p\u003e \u003cp\u003eRetention 60\u003c\/p\u003e \u003cp\u003eRecognizing Attrition 60\u003c\/p\u003e \u003cp\u003eWhy Attrition Matters 61\u003c\/p\u003e \u003cp\u003eDifferent Kinds of Attrition 62\u003c\/p\u003e \u003cp\u003eDifferent Kinds of Attrition Model 63\u003c\/p\u003e \u003cp\u003eBeyond the Customer Lifecycle 64\u003c\/p\u003e \u003cp\u003eLessons Learned 65\u003c\/p\u003e \u003cp\u003e\u003cb\u003eChapter 3 The Data Mining Process 67\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003eWhat Can Go Wrong? 68\u003c\/p\u003e \u003cp\u003eLearning Things That Aren’t True 68\u003c\/p\u003e \u003cp\u003eLearning Things That Are True, but Not Useful 73\u003c\/p\u003e \u003cp\u003eData Mining Styles 74\u003c\/p\u003e \u003cp\u003eHypothesis Testing 75\u003c\/p\u003e \u003cp\u003eDirected Data Mining 81\u003c\/p\u003e \u003cp\u003eUndirected Data Mining 81\u003c\/p\u003e \u003cp\u003eGoals, Tasks, and Techniques 82\u003c\/p\u003e \u003cp\u003eData Mining Business Goals 82\u003c\/p\u003e \u003cp\u003eData Mining Tasks 83\u003c\/p\u003e \u003cp\u003eData Mining Techniques 88\u003c\/p\u003e \u003cp\u003eFormulating Data Mining Problems: From Goals to Tasks to Techniques 88\u003c\/p\u003e \u003cp\u003eWhat Techniques for Which Tasks? 95\u003c\/p\u003e \u003cp\u003eIs There a Target or Targets? 96\u003c\/p\u003e \u003cp\u003eWhat Is the Target Data Like? 96\u003c\/p\u003e \u003cp\u003eWhat Is the Input Data Like? 96\u003c\/p\u003e \u003cp\u003eHow Important Is Ease of Use? 97\u003c\/p\u003e \u003cp\u003eHow Important Is Model Explicability? 97\u003c\/p\u003e \u003cp\u003eLessons Learned 98\u003c\/p\u003e \u003cp\u003e\u003cb\u003eChapter 4 Statistics 101: What You Should Know About Data 101\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003eOccam’s Razor 103\u003c\/p\u003e \u003cp\u003eSkepticism and Simpson’s Paradox 103\u003c\/p\u003e \u003cp\u003eThe Null Hypothesis 104\u003c\/p\u003e \u003cp\u003eP-Values 105\u003c\/p\u003e \u003cp\u003eLooking At and Measuring Data 106\u003c\/p\u003e \u003cp\u003eCategorical Values 106\u003c\/p\u003e \u003cp\u003eNumeric Variables 117\u003c\/p\u003e \u003cp\u003eA Couple More Statistical Ideas 120\u003c\/p\u003e \u003cp\u003eMeasuring Response 120\u003c\/p\u003e \u003cp\u003eStandard Error of a Proportion 121\u003c\/p\u003e \u003cp\u003eComparing Results Using Confidence Bounds 123\u003c\/p\u003e \u003cp\u003eComparing Results Using Difference of Proportions 124\u003c\/p\u003e \u003cp\u003eSize of Sample 125\u003c\/p\u003e \u003cp\u003eWhat the Confidence Interval Really Means 126\u003c\/p\u003e \u003cp\u003eSize of Test and Control for an Experiment 127\u003c\/p\u003e \u003cp\u003eMultiple Comparisons 129\u003c\/p\u003e \u003cp\u003eThe Confidence Level with Multiple Comparisons 129\u003c\/p\u003e \u003cp\u003eBonferroni’s Correction 129\u003c\/p\u003e \u003cp\u003eChi-Square Test 130\u003c\/p\u003e \u003cp\u003eExpected Values 130\u003c\/p\u003e \u003cp\u003eChi-Square Value 132\u003c\/p\u003e \u003cp\u003eComparison of Chi-Square to Difference of Proportions 134\u003c\/p\u003e \u003cp\u003eAn Example: Chi-Square for Regions and Starts 134\u003c\/p\u003e \u003cp\u003eCase Study: Comparing Two Recommendation Systems with an A\/B Test 138\u003c\/p\u003e \u003cp\u003eFirst Metric: Participating Sessions 140\u003c\/p\u003e \u003cp\u003eData Mining and Statistics 144\u003c\/p\u003e \u003cp\u003eLessons Learned 148\u003c\/p\u003e \u003cp\u003e\u003cb\u003eChapter 5 Descriptions and Prediction: Profiling and Predictive Modeling 151\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003eDirected Data Mining Models 152\u003c\/p\u003e \u003cp\u003eDefining the Model Structure and Target 152\u003c\/p\u003e \u003cp\u003eIncremental Response Modeling 154\u003c\/p\u003e \u003cp\u003eModel Stability 156\u003c\/p\u003e \u003cp\u003eTime-Frames in the Model Set 157\u003c\/p\u003e \u003cp\u003eDirected Data Mining Methodology 159\u003c\/p\u003e \u003cp\u003eStep 1: Translate the Business Problem into a Data Mining Problem 161\u003c\/p\u003e \u003cp\u003eHow Will Results Be Used? 163\u003c\/p\u003e \u003cp\u003eHow Will Results Be Delivered? 163\u003c\/p\u003e \u003cp\u003eThe Role of Domain Experts and Information Technology 164\u003c\/p\u003e \u003cp\u003eStep 2: Select Appropriate Data 165\u003c\/p\u003e \u003cp\u003eWhat Data Is Available? 166\u003c\/p\u003e \u003cp\u003eHow Much Data Is Enough? 167\u003c\/p\u003e \u003cp\u003eHow Much History Is Required? 167\u003c\/p\u003e \u003cp\u003eHow Many Variables? 168\u003c\/p\u003e \u003cp\u003eWhat Must the Data Contain? 168\u003c\/p\u003e \u003cp\u003eStep 3: Get to Know the Data 169\u003c\/p\u003e \u003cp\u003eExamine Distributions 169\u003c\/p\u003e \u003cp\u003eCompare Values with Descriptions 170\u003c\/p\u003e \u003cp\u003eValidate Assumptions 170\u003c\/p\u003e \u003cp\u003eAsk Lots of Questions 171\u003c\/p\u003e \u003cp\u003eStep 4: Create a Model Set 172\u003c\/p\u003e \u003cp\u003eAssembling Customer Signatures 172\u003c\/p\u003e \u003cp\u003eCreating a Balanced Sample 172\u003c\/p\u003e \u003cp\u003eIncluding Multiple Timeframes 174\u003c\/p\u003e \u003cp\u003eCreating a Model Set for Prediction 174\u003c\/p\u003e \u003cp\u003eCreating a Model Set for Profiling 176\u003c\/p\u003e \u003cp\u003ePartitioning the Model Set 176\u003c\/p\u003e \u003cp\u003eStep 5: Fix Problems with the Data 177\u003c\/p\u003e \u003cp\u003eCategorical Variables with Too Many Values 177\u003c\/p\u003e \u003cp\u003eNumeric Variables with Skewed Distributions and Outliers 178\u003c\/p\u003e \u003cp\u003eMissing Values 178\u003c\/p\u003e \u003cp\u003eValues with Meanings That Change over Time 179\u003c\/p\u003e \u003cp\u003eInconsistent Data Encoding 179\u003c\/p\u003e \u003cp\u003eStep 6: Transform Data to Bring Information to the Surface 180\u003c\/p\u003e \u003cp\u003eStep 7: Build Models 180\u003c\/p\u003e \u003cp\u003eStep 8: Assess Models 180\u003c\/p\u003e \u003cp\u003eAssessing Binary Response Models and Classifiers 181\u003c\/p\u003e \u003cp\u003eAssessing Binary Response Models Using Lift 182\u003c\/p\u003e \u003cp\u003eAssessing Binary Response Model Scores Using Lift Charts 184\u003c\/p\u003e \u003cp\u003eAssessing Binary Response Model Scores Using Profitability Models 185\u003c\/p\u003e \u003cp\u003eAssessing Binary Response Models Using ROC Charts 186\u003c\/p\u003e \u003cp\u003eAssessing Estimators 188\u003c\/p\u003e \u003cp\u003eAssessing Estimators Using Score Rankings 189\u003c\/p\u003e \u003cp\u003eStep 9: Deploy Models 190\u003c\/p\u003e \u003cp\u003ePractical Issues in Deploying Models 190\u003c\/p\u003e \u003cp\u003eOptimizing Models for Deployment 191\u003c\/p\u003e \u003cp\u003eStep 10: Assess Results 191\u003c\/p\u003e \u003cp\u003eStep 11: Begin Again 193\u003c\/p\u003e \u003cp\u003eLessons Learned 193\u003c\/p\u003e \u003cp\u003e\u003cb\u003eChapter 6 Data Mining Using Classic Statistical Techniques 195\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003eSimilarity Models 196\u003c\/p\u003e \u003cp\u003eSimilarity and Distance 196\u003c\/p\u003e \u003cp\u003eExample: A Similarity Model for Product Penetration 197\u003c\/p\u003e \u003cp\u003eTable Lookup Models 203\u003c\/p\u003e \u003cp\u003eChoosing Dimensions 204\u003c\/p\u003e \u003cp\u003ePartitioning the Dimensions 205\u003c\/p\u003e \u003cp\u003eFrom Training Data to Scores 205\u003c\/p\u003e \u003cp\u003eHandling Sparse and Missing Data by Removing Dimensions 205\u003c\/p\u003e \u003cp\u003eRFM: A Widely Used Lookup Model 206\u003c\/p\u003e \u003cp\u003eRFM Cell Migration 207\u003c\/p\u003e \u003cp\u003eRFM and the Test-and-Measure Methodology 208\u003c\/p\u003e \u003cp\u003eRFM and Incremental Response Modeling 209\u003c\/p\u003e \u003cp\u003eNaïve Bayesian Models 210\u003c\/p\u003e \u003cp\u003eSome Ideas from Probability 210\u003c\/p\u003e \u003cp\u003eThe Naïve Bayesian Calculation 212\u003c\/p\u003e \u003cp\u003eComparison with Table Lookup Models 213\u003c\/p\u003e \u003cp\u003eLinear Regression 213\u003c\/p\u003e \u003cp\u003eThe Best-fit Line 215\u003c\/p\u003e \u003cp\u003eGoodness of Fit 217\u003c\/p\u003e \u003cp\u003eMultiple Regression 220\u003c\/p\u003e \u003cp\u003eThe Equation 220\u003c\/p\u003e \u003cp\u003eThe Range of the Target Variable 221\u003c\/p\u003e \u003cp\u003eInterpreting Coefficients of Linear Regression Equations 221\u003c\/p\u003e \u003cp\u003eCapturing Local Effects with Linear Regression 223\u003c\/p\u003e \u003cp\u003eAdditional Considerations with Multiple Regression 224\u003c\/p\u003e \u003cp\u003eVariable Selection for Multiple Regression 225\u003c\/p\u003e \u003cp\u003eLogistic Regression 227\u003c\/p\u003e \u003cp\u003eModeling Binary Outcomes 227\u003c\/p\u003e \u003cp\u003eThe Logistic Function 229\u003c\/p\u003e \u003cp\u003eFixed Effects and Hierarchical Effects 231\u003c\/p\u003e \u003cp\u003eHierarchical Effects 232\u003c\/p\u003e \u003cp\u003eWithin and Between Effects 232\u003c\/p\u003e \u003cp\u003eFixed Effects 233\u003c\/p\u003e \u003cp\u003eLessons Learned 234\u003c\/p\u003e \u003cp\u003e\u003cb\u003eChapter 7 Decision Trees 237\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003eWhat Is a Decision Tree and How Is It Used? 238\u003c\/p\u003e \u003cp\u003eA Typical Decision Tree 238\u003c\/p\u003e \u003cp\u003eUsing the Tree to Learn About Churn 240\u003c\/p\u003e \u003cp\u003eUsing the Tree to Learn About Data and Select Variables 241\u003c\/p\u003e \u003cp\u003eUsing the Tree to Produce Rankings 243\u003c\/p\u003e \u003cp\u003eUsing the Tree to Estimate Class Probabilities 243\u003c\/p\u003e \u003cp\u003eUsing the Tree to Classify Records 244\u003c\/p\u003e \u003cp\u003eUsing the Tree to Estimate Numeric Values 244\u003c\/p\u003e \u003cp\u003eDecision Trees Are Local Models 245\u003c\/p\u003e \u003cp\u003eGrowing Decision Trees 247\u003c\/p\u003e \u003cp\u003eFinding the Initial Split 248\u003c\/p\u003e \u003cp\u003eGrowing the Full Tree 251\u003c\/p\u003e \u003cp\u003eFinding the Best Split 252\u003c\/p\u003e \u003cp\u003eGini (Population Diversity) as a Splitting Criterion 253\u003c\/p\u003e \u003cp\u003eEntropy Reduction or Information Gain as a Splitting Criterion 254\u003c\/p\u003e \u003cp\u003eInformation Gain Ratio 256\u003c\/p\u003e \u003cp\u003eChi-Square Test as a Splitting Criterion 256\u003c\/p\u003e \u003cp\u003eIncremental Response as a Splitting Criterion 258\u003c\/p\u003e \u003cp\u003eReduction in Variance as a Splitting Criterion for Numeric Targets 259\u003c\/p\u003e \u003cp\u003eF Test 262\u003c\/p\u003e \u003cp\u003ePruning 262\u003c\/p\u003e \u003cp\u003eThe CART Pruning Algorithm 263\u003c\/p\u003e \u003cp\u003ePessimistic Pruning: The C5.0 Pruning Algorithm 267\u003c\/p\u003e \u003cp\u003eStability-Based Pruning 268\u003c\/p\u003e \u003cp\u003eExtracting Rules from Trees 269\u003c\/p\u003e \u003cp\u003eDecision Tree Variations 270\u003c\/p\u003e \u003cp\u003eMultiway Splits 270\u003c\/p\u003e \u003cp\u003eSplitting on More Than One Field at a Time 271\u003c\/p\u003e \u003cp\u003eCreating Nonrectangular Boxes 271\u003c\/p\u003e \u003cp\u003eAssessing the Quality of a Decision Tree 275\u003c\/p\u003e \u003cp\u003eWhen Are Decision Trees Appropriate? 276\u003c\/p\u003e \u003cp\u003eCase Study: Process Control in a Coffee Roasting Plant 277\u003c\/p\u003e \u003cp\u003eGoals for the Simulator 277\u003c\/p\u003e \u003cp\u003eBuilding a Roaster Simulation 278\u003c\/p\u003e \u003cp\u003eEvaluation of the Roaster Simulation 278\u003c\/p\u003e \u003cp\u003eLessons Learned 279\u003c\/p\u003e \u003cp\u003e\u003cb\u003eChapter 8 Artificial Neural Networks 281\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003eA Bit of History 282\u003c\/p\u003e \u003cp\u003eThe Biological Model 283\u003c\/p\u003e \u003cp\u003eThe Biological Neuron 285\u003c\/p\u003e \u003cp\u003eThe Biological Input Layer 286\u003c\/p\u003e \u003cp\u003eThe Biological Output Layer 287\u003c\/p\u003e \u003cp\u003eNeural Networks and Artificial Intelligence 287\u003c\/p\u003e \u003cp\u003eArtificial Neural Networks 288\u003c\/p\u003e \u003cp\u003eThe Artificial Neuron 288\u003c\/p\u003e \u003cp\u003eThe Multi-Layer Perceptron 291\u003c\/p\u003e \u003cp\u003eA Network Example 292\u003c\/p\u003e \u003cp\u003eNetwork Topologies 293\u003c\/p\u003e \u003cp\u003eA Sample Application: Real Estate Appraisal 295\u003c\/p\u003e \u003cp\u003eTraining Neural Networks 299\u003c\/p\u003e \u003cp\u003eHow Does a Neural Network Learn Using Back Propagation? 299\u003c\/p\u003e \u003cp\u003ePruning a Neural Network 300\u003c\/p\u003e \u003cp\u003eRadial Basis Function Networks 303\u003c\/p\u003e \u003cp\u003eOverview of RBF Networks 303\u003c\/p\u003e \u003cp\u003eChoosing the Locations of the Radial Basis Functions 305\u003c\/p\u003e \u003cp\u003eUniversal Approximators 305\u003c\/p\u003e \u003cp\u003eNeural Networks in Practice 308\u003c\/p\u003e \u003cp\u003eChoosing the Training Set 309\u003c\/p\u003e \u003cp\u003eCoverage of Values for All Features 309\u003c\/p\u003e \u003cp\u003eNumber of Features 310\u003c\/p\u003e \u003cp\u003eSize of Training Set 310\u003c\/p\u003e \u003cp\u003eNumber and Range of Outputs 310\u003c\/p\u003e \u003cp\u003eRules of Thumb for Using MLPs 310\u003c\/p\u003e \u003cp\u003ePreparing the Data 311\u003c\/p\u003e \u003cp\u003eInterpreting the Output from a Neural Network 313\u003c\/p\u003e \u003cp\u003eNeural Networks for Time Series 315\u003c\/p\u003e \u003cp\u003eTime Series Modeling 315\u003c\/p\u003e \u003cp\u003eA Neural Network Time Series Example 316\u003c\/p\u003e \u003cp\u003eCan Neural Network Models Be Explained? 317\u003c\/p\u003e \u003cp\u003eSensitivity Analysis 318\u003c\/p\u003e \u003cp\u003eUsing Rules to Describe the Scores 318\u003c\/p\u003e \u003cp\u003eLessons Learned 319\u003c\/p\u003e \u003cp\u003e\u003cb\u003eChapter 9 Nearest Neighbor Approaches: Memory-Based Reasoning and Collaborative Filtering 321\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003eMemory-Based Reasoning 322\u003c\/p\u003e \u003cp\u003eLook-Alike Models 323\u003c\/p\u003e \u003cp\u003eExample: Using MBR to Estimate Rents in Tuxedo, New York 324\u003c\/p\u003e \u003cp\u003eChallenges of MBR 327\u003c\/p\u003e \u003cp\u003eChoosing a Balanced Set of Historical Records 328\u003c\/p\u003e \u003cp\u003eRepresenting the Training Data 328\u003c\/p\u003e \u003cp\u003eDetermining the Distance Function, Combination Function, and Number of Neighbors 331\u003c\/p\u003e \u003cp\u003eCase Study: Using MBR for Classifying Anomalies in Mammograms 331\u003c\/p\u003e \u003cp\u003eThe Business Problem: Identifying Abnormal Mammograms 332\u003c\/p\u003e \u003cp\u003eApplying MBR to the Problem 332\u003c\/p\u003e \u003cp\u003eThe Total Solution 334\u003c\/p\u003e \u003cp\u003eMeasuring Distance and Similarity 335\u003c\/p\u003e \u003cp\u003eWhat Is a Distance Function? 335\u003c\/p\u003e \u003cp\u003eBuilding a Distance Function One Field at a Time 337\u003c\/p\u003e \u003cp\u003eDistance Functions for Other Data Types 340\u003c\/p\u003e \u003cp\u003eWhen a Distance Metric Already Exists 341\u003c\/p\u003e \u003cp\u003eThe Combination Function: Asking the Neighbors for Advice 342\u003c\/p\u003e \u003cp\u003eThe Simplest Approach: One Neighbor 342\u003c\/p\u003e \u003cp\u003eThe Basic Approach for Categorical Targets: Democracy 342\u003c\/p\u003e \u003cp\u003eWeighted Voting for Categorical Targets 344\u003c\/p\u003e \u003cp\u003eNumeric Targets 344\u003c\/p\u003e \u003cp\u003eCase Study: Shazam — Finding Nearest Neighbors for Audio Files 345\u003c\/p\u003e \u003cp\u003eWhy This Feat Is Challenging 346\u003c\/p\u003e \u003cp\u003eThe Audio Signature 347\u003c\/p\u003e \u003cp\u003eMeasuring Similarity 348\u003c\/p\u003e \u003cp\u003eCollaborative Filtering: A Nearest-Neighbor Approach to Making Recommendations 351\u003c\/p\u003e \u003cp\u003eBuilding Profiles 352 Comparing Profiles 352\u003c\/p\u003e \u003cp\u003eMaking Predictions 353\u003c\/p\u003e \u003cp\u003eLessons Learned 354\u003c\/p\u003e \u003cp\u003e\u003cb\u003eChapter 10 Knowing When to Worry: Using Survival Analysis to Understand Customers 357\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003eCustomer Survival 360\u003c\/p\u003e \u003cp\u003eWhat Survival Curves Reveal 360\u003c\/p\u003e \u003cp\u003eFinding the Average Tenure from a Survival Curve 362\u003c\/p\u003e \u003cp\u003eCustomer Retention Using Survival 364\u003c\/p\u003e \u003cp\u003eLooking at Survival as Decay 365\u003c\/p\u003e \u003cp\u003eHazard Probabilities 367\u003c\/p\u003e \u003cp\u003eThe Basic Idea 368\u003c\/p\u003e \u003cp\u003eExamples of Hazard Functions 369\u003c\/p\u003e \u003cp\u003eCensoring 371\u003c\/p\u003e \u003cp\u003eThe Hazard Calculation 372\u003c\/p\u003e \u003cp\u003eOther Types of Censoring 375\u003c\/p\u003e \u003cp\u003eFrom Hazards to Survival 376\u003c\/p\u003e \u003cp\u003eRetention 376\u003c\/p\u003e \u003cp\u003eSurvival 378\u003c\/p\u003e \u003cp\u003eComparison of Retention and Survival 378\u003c\/p\u003e \u003cp\u003eProportional Hazards 380\u003c\/p\u003e \u003cp\u003eExamples of Proportional Hazards 381\u003c\/p\u003e \u003cp\u003eStratification: Measuring Initial Effects on Survival 382\u003c\/p\u003e \u003cp\u003eCox Proportional Hazards 382\u003c\/p\u003e \u003cp\u003eSurvival Analysis in Practice 385\u003c\/p\u003e \u003cp\u003eHandling Different Types of Attrition 385\u003c\/p\u003e \u003cp\u003eWhen Will a Customer Come Back? 387\u003c\/p\u003e \u003cp\u003eUnderstanding Customer Value 389\u003c\/p\u003e \u003cp\u003eForecasting 392\u003c\/p\u003e \u003cp\u003eHazards Changing over Time 393\u003c\/p\u003e \u003cp\u003eLessons Learned 394\u003c\/p\u003e \u003cp\u003e\u003cb\u003eChapter 11 Genetic Algorithms and Swarm Intelligence 397\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003eOptimization 398\u003c\/p\u003e \u003cp\u003eWhat Is an Optimization Problem? 398\u003c\/p\u003e \u003cp\u003eAn Optimization Problem in Ant World 399\u003c\/p\u003e \u003cp\u003eE Pluribus Unum 400\u003c\/p\u003e \u003cp\u003eA Smarter Ant 401\u003c\/p\u003e \u003cp\u003eGenetic Algorithms 403\u003c\/p\u003e \u003cp\u003eA Bit of History 404\u003c\/p\u003e \u003cp\u003eGenetics on Computers 404\u003c\/p\u003e \u003cp\u003eRepresenting the Genome 413\u003c\/p\u003e \u003cp\u003eSchemata: The Building Blocks of Genetic Algorithms 414\u003c\/p\u003e \u003cp\u003eBeyond the Simple Algorithm 417\u003c\/p\u003e \u003cp\u003eThe Traveling Salesman Problem 418\u003c\/p\u003e \u003cp\u003eExhaustive Search 419\u003c\/p\u003e \u003cp\u003eA Simple Greedy Algorithm 419\u003c\/p\u003e \u003cp\u003eThe Genetic Algorithms Approach 419\u003c\/p\u003e \u003cp\u003eThe Swarm Intelligence Approach 420\u003c\/p\u003e \u003cp\u003eCase Study: Using Genetic Algorithms for Resource Optimization 421\u003c\/p\u003e \u003cp\u003eCase Study: Evolving a Solution for Classifying Complaints 423\u003c\/p\u003e \u003cp\u003eBusiness Context 424\u003c\/p\u003e \u003cp\u003eData 425\u003c\/p\u003e \u003cp\u003eThe Comment Signature 425\u003c\/p\u003e \u003cp\u003eThe Genomes 426\u003c\/p\u003e \u003cp\u003eThe Fitness Function 427\u003c\/p\u003e \u003cp\u003eThe Results 427\u003c\/p\u003e \u003cp\u003eLessons Learned 427\u003c\/p\u003e \u003cp\u003e\u003cb\u003eChapter 12 Tell Me Something New: Pattern Discovery and Data Mining 429\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003eUndirected Techniques, Undirected Data Mining 431\u003c\/p\u003e \u003cp\u003eUndirected versus Directed Techniques 431\u003c\/p\u003e \u003cp\u003eUndirected versus Directed Data Mining 431\u003c\/p\u003e \u003cp\u003eCase Study: Undirected Data Mining Using Directed Techniques 432\u003c\/p\u003e \u003cp\u003eWhat is Undirected Data Mining? 435\u003c\/p\u003e \u003cp\u003eData Exploration 435\u003c\/p\u003e \u003cp\u003eSegmentation and Clustering 436\u003c\/p\u003e \u003cp\u003eTarget Variable Definition, When the Target Is Not Explicit 438\u003c\/p\u003e \u003cp\u003eSimulation, Forecasting, and Agent-Based Modeling 443\u003c\/p\u003e \u003cp\u003eMethodology for Undirected Data Mining 455\u003c\/p\u003e \u003cp\u003eThere Is No Methodology 456\u003c\/p\u003e \u003cp\u003eThings to Keep in Mind 456\u003c\/p\u003e \u003cp\u003eLessons Learned 457\u003c\/p\u003e \u003cp\u003e\u003cb\u003eChapter 13 Finding Islands of Similarity: Automatic Cluster Detection 459\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003eSearching for Islands of Simplicity 461\u003c\/p\u003e \u003cp\u003eCustomer Segmentation and Clustering 461\u003c\/p\u003e \u003cp\u003eSimilarity Clusters 463\u003c\/p\u003e \u003cp\u003eTracking Campaigns by Cluster-Based Segments 464\u003c\/p\u003e \u003cp\u003eClustering Reveals an Overlooked Market Segment 466\u003c\/p\u003e \u003cp\u003eFitting the Troops 467\u003c\/p\u003e \u003cp\u003eThe K-Means Clustering Algorithm 468\u003c\/p\u003e \u003cp\u003eTwo Steps of the K-Means Algorithm 468\u003c\/p\u003e \u003cp\u003eVoronoi Diagrams and K-Means Clusters 471\u003c\/p\u003e \u003cp\u003eChoosing the Cluster Seeds 473\u003cbr\u003e\u003cbr\u003eChoosing K 473\u003c\/p\u003e \u003cp\u003eUsing K-Means to Detect Outliers 474\u003c\/p\u003e \u003cp\u003eSemi-Directed Clustering 475\u003c\/p\u003e \u003cp\u003eInterpreting Clusters 475\u003c\/p\u003e \u003cp\u003eCharacterizing Clusters by Their Centroids 476\u003c\/p\u003e \u003cp\u003eCharacterizing Clusters by What Differentiates Them 477\u003c\/p\u003e \u003cp\u003eUsing Decision Trees to Describe Clusters 478\u003c\/p\u003e \u003cp\u003eEvaluating Clusters 479\u003c\/p\u003e \u003cp\u003eCluster Measurements and Terminology 480\u003c\/p\u003e \u003cp\u003eCluster Silhouettes 480\u003c\/p\u003e \u003cp\u003eLimiting Cluster Diameter for Scoring 483\u003c\/p\u003e \u003cp\u003eCase Study: Clustering Towns 484\u003c\/p\u003e \u003cp\u003eCreating Town Signatures 484\u003c\/p\u003e \u003cp\u003eCreating Clusters 486\u003c\/p\u003e \u003cp\u003eDetermining the Right Number of Clusters 486\u003c\/p\u003e \u003cp\u003eEvaluating the Clusters 487\u003c\/p\u003e \u003cp\u003eUsing Demographic Clusters to Adjust Zone Boundaries 488\u003c\/p\u003e \u003cp\u003eBusiness Success 490\u003c\/p\u003e \u003cp\u003eVariations on K-Means 490\u003c\/p\u003e \u003cp\u003eK-Medians, K-Medoids, and K-Modes 490\u003c\/p\u003e \u003cp\u003eThe Soft Side of K-Means 494\u003c\/p\u003e \u003cp\u003eData Preparation for Clustering 495\u003c\/p\u003e \u003cp\u003eScaling for Consistency 496\u003c\/p\u003e \u003cp\u003eUse Weights to Encode Outside Information 496\u003c\/p\u003e \u003cp\u003eSelecting Variables for Clustering 497\u003c\/p\u003e \u003cp\u003eLessons Learned 497\u003c\/p\u003e \u003cp\u003e\u003cb\u003eChapter 14 Alternative Approaches to Cluster Detection 499\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003eShortcomings of K-Means 500\u003c\/p\u003e \u003cp\u003eReasonableness 500\u003c\/p\u003e \u003cp\u003eAn Intuitive Example 501\u003c\/p\u003e \u003cp\u003eFixing the Problem by Changing the Scales 503\u003c\/p\u003e \u003cp\u003eWhat This Means in Practice 504\u003c\/p\u003e \u003cp\u003eGaussian Mixture Models 505\u003c\/p\u003e \u003cp\u003eAdding “Gaussians” to K-Means 505\u003c\/p\u003e \u003cp\u003eBack to Gaussian Mixture Models 508\u003c\/p\u003e \u003cp\u003eScoring GMMs 510\u003c\/p\u003e \u003cp\u003eApplying GMMs 511\u003c\/p\u003e \u003cp\u003eDivisive Clustering 513\u003c\/p\u003e \u003cp\u003eA Decision Tree–Like Method for Clustering 513\u003c\/p\u003e \u003cp\u003eScoring Divisive Clusters 515\u003c\/p\u003e \u003cp\u003eClusters and Trees 515\u003c\/p\u003e \u003cp\u003eAgglomerative (Hierarchical) Clustering 516\u003c\/p\u003e \u003cp\u003eOverview of Agglomerative Clustering Methods 516\u003c\/p\u003e \u003cp\u003eClustering People by Age: An Example of An Agglomerative Clustering Algorithm 520\u003c\/p\u003e \u003cp\u003eScoring Agglomerative Clusters 522\u003c\/p\u003e \u003cp\u003eLimitations of Agglomerative Clustering 523\u003c\/p\u003e \u003cp\u003eAgglomerative Clustering in Practice 525\u003c\/p\u003e \u003cp\u003eCombining Agglomerative Clustering and K-Means 526\u003c\/p\u003e \u003cp\u003eSelf-Organizing Maps 527\u003c\/p\u003e \u003cp\u003eWhat Is a Self-Organizing Map? 527\u003c\/p\u003e \u003cp\u003eTraining an SOM 530\u003c\/p\u003e \u003cp\u003eScoring an SOM 531\u003c\/p\u003e \u003cp\u003eThe Search Continues for Islands of Simplicity 532\u003c\/p\u003e \u003cp\u003eLessons Learned 533\u003c\/p\u003e \u003cp\u003e\u003cb\u003eChapter 15 Market Basket Analysis and Association Rules 535\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003eDefining Market Basket Analysis 536\u003c\/p\u003e \u003cp\u003eFour Levels of Market Basket Data 537\u003c\/p\u003e \u003cp\u003eThe Foundation of Market Basket Analysis: Basic Measures 539\u003c\/p\u003e \u003cp\u003eOrder Characteristics 540\u003c\/p\u003e \u003cp\u003eItem (Product) Popularity 541\u003c\/p\u003e \u003cp\u003eTracking Marketing Interventions 542\u003c\/p\u003e \u003cp\u003eCase Study: Spanish or English 543\u003c\/p\u003e \u003cp\u003eThe Business Problem 543\u003c\/p\u003e \u003cp\u003eThe Data 544\u003c\/p\u003e \u003cp\u003eDefining “Hispanicity” Preference 545\u003c\/p\u003e \u003cp\u003eThe Solution 546\u003c\/p\u003e \u003cp\u003eAssociation Analysis 547\u003c\/p\u003e \u003cp\u003eRules Are Not Always Useful 548\u003c\/p\u003e \u003cp\u003eItem Sets to Association Rules 551\u003c\/p\u003e \u003cp\u003eHow Good Is an Association Rule? 553\u003c\/p\u003e \u003cp\u003eBuilding Association Rules 555\u003c\/p\u003e \u003cp\u003eChoosing the Right Set of Items 556\u003c\/p\u003e \u003cp\u003eAnonymous Versus Identified 561\u003c\/p\u003e \u003cp\u003eGenerating Rules from All This Data 561\u003c\/p\u003e \u003cp\u003eOvercoming Practical Limits 565\u003c\/p\u003e \u003cp\u003eThe Problem of Big Data 567\u003c\/p\u003e \u003cp\u003eExtending the Ideas 569\u003c\/p\u003e \u003cp\u003eDifferent Items on the Right- and Left-Hand Sides 569\u003c\/p\u003e \u003cp\u003eUsing Association Rules to Compare Stores 570\u003c\/p\u003e \u003cp\u003eAssociation Rules and Cross-Selling 572\u003c\/p\u003e \u003cp\u003eA Typical Cross-Sell Model 572\u003c\/p\u003e \u003cp\u003eA More Confident Approach to Product Propensities 573\u003c\/p\u003e \u003cp\u003eResults from Using Confidence 574\u003c\/p\u003e \u003cp\u003eSequential Pattern Analysis 574\u003c\/p\u003e \u003cp\u003eFinding the Sequences 575\u003c\/p\u003e \u003cp\u003eSequential Association Rules 578\u003c\/p\u003e \u003cp\u003eSequential Analysis Using Other Data Mining Techniques 579\u003c\/p\u003e \u003cp\u003eLessons Learned 579\u003c\/p\u003e \u003cp\u003e\u003cb\u003eChapter 16 Link Analysis 581\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003eBasic Graph Theory 582\u003c\/p\u003e \u003cp\u003eWhat Is a Graph? 582\u003c\/p\u003e \u003cp\u003eDirected Graphs 584\u003c\/p\u003e \u003cp\u003eWeighted Graphs 585\u003c\/p\u003e \u003cp\u003eSeven Bridges of Königsberg 585\u003c\/p\u003e \u003cp\u003eDetecting Cycles in a Graph 588\u003c\/p\u003e \u003cp\u003eThe Traveling Salesman Problem Revisited 589\u003c\/p\u003e \u003cp\u003eSocial Network Analysis 593\u003c\/p\u003e \u003cp\u003eSix Degrees of Separation 593\u003c\/p\u003e \u003cp\u003eWhat Your Friends Say About You 595\u003c\/p\u003e \u003cp\u003eFinding Childcare Benefits Fraud 596\u003c\/p\u003e \u003cp\u003eWho Responds to Whom on Dating Sites 597\u003c\/p\u003e \u003cp\u003eSocial Marketing 598\u003c\/p\u003e \u003cp\u003eMining Call Graphs 598\u003c\/p\u003e \u003cp\u003eCase Study: Tracking Down the Leader of the Pack 601\u003c\/p\u003e \u003cp\u003eThe Business Goal 601\u003c\/p\u003e \u003cp\u003eThe Data Processing Challenge 601\u003c\/p\u003e \u003cp\u003eFinding Social Networks in Call Data 602\u003c\/p\u003e \u003cp\u003eHow the Results Are Used for Marketing 602\u003c\/p\u003e \u003cp\u003eEstimating Customer Age 603\u003c\/p\u003e \u003cp\u003eCase Study: Who Is Using Fax Machines from Home? 604\u003c\/p\u003e \u003cp\u003eWhy Finding Fax Machines Is Useful 604\u003c\/p\u003e \u003cp\u003eHow Do Fax Machines Behave? 604\u003c\/p\u003e \u003cp\u003eA Graph Coloring Algorithm 605\u003c\/p\u003e \u003cp\u003e“Coloring” the Graph to Identify Fax Machines 606\u003c\/p\u003e \u003cp\u003eHow Google Came to Rule the World 607\u003c\/p\u003e \u003cp\u003eHubs and Authorities 608\u003c\/p\u003e \u003cp\u003eThe Details 609\u003c\/p\u003e \u003cp\u003eHubs and Authorities in Practice 611\u003c\/p\u003e \u003cp\u003eLessons Learned 612\u003c\/p\u003e \u003cp\u003e\u003cb\u003eChapter 17 Data Warehousing, OLAP, Analytic Sandboxes, and Data Mining 613\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003eThe Architecture of Data 615\u003c\/p\u003e \u003cp\u003eTransaction Data, the Base Level 616\u003c\/p\u003e \u003cp\u003eOperational Summary Data 617\u003c\/p\u003e \u003cp\u003eDecision-Support Summary Data 617\u003c\/p\u003e \u003cp\u003eDatabase Schema\/Data Models 618\u003c\/p\u003e \u003cp\u003eMetadata 623\u003c\/p\u003e \u003cp\u003eBusiness Rules 623\u003c\/p\u003e \u003cp\u003eA General Architecture for Data Warehousing 624\u003c\/p\u003e \u003cp\u003eSource Systems 624\u003c\/p\u003e \u003cp\u003eExtraction, Transformation, and Load 626\u003c\/p\u003e \u003cp\u003eCentral Repository 627\u003c\/p\u003e \u003cp\u003eMetadata Repository 630\u003c\/p\u003e \u003cp\u003eData Marts 630\u003c\/p\u003e \u003cp\u003eOperational Feedback 631\u003c\/p\u003e \u003cp\u003eUsers and Desktop Tools 631\u003c\/p\u003e \u003cp\u003eAnalytic Sandboxes 633\u003c\/p\u003e \u003cp\u003eWhy Are Analytic Sandboxes Needed? 634\u003c\/p\u003e \u003cp\u003eTechnology to Support Analytic Sandboxes 636\u003c\/p\u003e \u003cp\u003eWhere Does OLAP Fit In? 639\u003c\/p\u003e \u003cp\u003eWhat’s in a Cube? 641\u003c\/p\u003e \u003cp\u003eStar Schema 646\u003c\/p\u003e \u003cp\u003eOLAP and Data Mining 648\u003c\/p\u003e \u003cp\u003eWhere Data Mining Fits in with Data Warehousing 650\u003c\/p\u003e \u003cp\u003eLots of Data 651\u003c\/p\u003e \u003cp\u003eConsistent, Clean Data 651\u003c\/p\u003e \u003cp\u003eHypothesis Testing and Measurement 652\u003c\/p\u003e \u003cp\u003eScalable Hardware and RDBMS Support 653\u003c\/p\u003e \u003cp\u003eLessons Learned 653\u003c\/p\u003e \u003cp\u003e\u003cb\u003eChapter 18 Building Customer Signatures 655\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003eFinding Customers in Data 656\u003c\/p\u003e \u003cp\u003eWhat Is a Customer? 657\u003c\/p\u003e \u003cp\u003eAccounts? Customers? Households? 658\u003c\/p\u003e \u003cp\u003eAnonymous Transactions 658\u003c\/p\u003e \u003cp\u003eTransactions Linked to a Card 659\u003c\/p\u003e \u003cp\u003eTransactions Linked to a Cookie 659\u003c\/p\u003e \u003cp\u003eTransactions Linked to an Account 660\u003c\/p\u003e \u003cp\u003eTransactions Linked to a Customer 661\u003c\/p\u003e \u003cp\u003eDesigning Signatures 661\u003c\/p\u003e \u003cp\u003eIs a Customer Signature Necessary? 666\u003c\/p\u003e \u003cp\u003eWhat Does a Row Represent? 666\u003c\/p\u003e \u003cp\u003eWill the Signature Be Used for Predictive Modeling? 671\u003c\/p\u003e \u003cp\u003eHas a Target Been Defined? 672\u003c\/p\u003e \u003cp\u003eAre There Constraints Imposed by the Particular Data Mining Techniques to be Employed? 672\u003c\/p\u003e \u003cp\u003eWhich Customers Will Be Included? 673\u003c\/p\u003e \u003cp\u003eWhat Might Be Interesting to Know About Customers? 673\u003c\/p\u003e \u003cp\u003eWhat a Signature Looks Like 674\u003c\/p\u003e \u003cp\u003eProcess for Creating Signatures 677\u003c\/p\u003e \u003cp\u003eSome Data Is Already at the Right Level of Granularity 678\u003c\/p\u003e \u003cp\u003ePivoting a Regular Time Series 679\u003c\/p\u003e \u003cp\u003eAggregating Time-Stamped Transactions 680\u003c\/p\u003e \u003cp\u003eDealing with Missing Values 685\u003c\/p\u003e \u003cp\u003eMissing Values in Source Data 685\u003c\/p\u003e \u003cp\u003eUnknown or Non-Existent? 687\u003c\/p\u003e \u003cp\u003eWhat Not to Do 687\u003c\/p\u003e \u003cp\u003eThings to Consider 689\u003c\/p\u003e \u003cp\u003eLessons Learned 691\u003c\/p\u003e \u003cp\u003e\u003cb\u003eChapter 19 Derived Variables: Making the Data Mean More 693\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003eHandset Churn Rate as a Predictor of Churn 694\u003c\/p\u003e \u003cp\u003eSingle-Variable Transformations 696\u003c\/p\u003e \u003cp\u003eStandardizing Numeric Variables 696\u003c\/p\u003e \u003cp\u003eTurning Numeric Values into Percentiles 697\u003c\/p\u003e \u003cp\u003eTurning Counts into Rates 698\u003c\/p\u003e \u003cp\u003eRelative Measures 699\u003c\/p\u003e \u003cp\u003eReplacing Categorical Variables with Numeric Ones 700\u003c\/p\u003e \u003cp\u003eCombining Variables 707\u003c\/p\u003e \u003cp\u003eClassic Combinations 707\u003c\/p\u003e \u003cp\u003eCombining Highly Correlated Variables 710\u003c\/p\u003e \u003cp\u003eRent to Home Value 712\u003c\/p\u003e \u003cp\u003eExtracting Features from Time Series 718\u003c\/p\u003e \u003cp\u003eTrend 719\u003c\/p\u003e \u003cp\u003eSeasonality 721\u003c\/p\u003e \u003cp\u003eExtracting Features from Geography 722\u003c\/p\u003e \u003cp\u003eGeocoding 722\u003c\/p\u003e \u003cp\u003eMapping 723\u003c\/p\u003e \u003cp\u003eUsing Geography to Create Relative Measures 724\u003c\/p\u003e \u003cp\u003eUsing Past Values of the Target Variable 725\u003c\/p\u003e \u003cp\u003eUsing Model Scores as Inputs 725\u003c\/p\u003e \u003cp\u003eHandling Sparse Data 726\u003c\/p\u003e \u003cp\u003eAccount Set Patterns 726\u003c\/p\u003e \u003cp\u003eBinning Sparse Values 727\u003c\/p\u003e \u003cp\u003eCapturing Customer Behavior from Transactions 727\u003c\/p\u003e \u003cp\u003eWidening Narrow Data 728\u003c\/p\u003e \u003cp\u003eSphere of Influence as a Predictor of Good Customers 728\u003c\/p\u003e \u003cp\u003eAn Example: Ratings to Rater Profile 730\u003c\/p\u003e \u003cp\u003eSample Fields from the Rater Signature 730\u003c\/p\u003e \u003cp\u003eThe Rating Signature and Derived Variables 732\u003c\/p\u003e \u003cp\u003eLessons Learned 733\u003c\/p\u003e \u003cp\u003e\u003cb\u003eChapter 20 Too Much of a Good Thing? Techniques for Reducing the Number of Variables 735\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003eProblems with Too Many Variables 736\u003c\/p\u003e \u003cp\u003eRisk of Correlation Among Input Variables 736\u003c\/p\u003e \u003cp\u003eRisk of Overfitting 738\u003c\/p\u003e \u003cp\u003eThe Sparse Data Problem 738\u003c\/p\u003e \u003cp\u003eVisualizing Sparseness 739\u003c\/p\u003e \u003cp\u003eIndependence 740\u003c\/p\u003e \u003cp\u003eExhaustive Feature Selection 743\u003c\/p\u003e \u003cp\u003eFlavors of Variable Reduction Techniques 744\u003c\/p\u003e \u003cp\u003eUsing the Target 744\u003c\/p\u003e \u003cp\u003eOriginal versus New Variables 744\u003c\/p\u003e \u003cp\u003eSequential Selection of Features 745\u003c\/p\u003e \u003cp\u003eThe Traditional Forward Selection Methodology 745\u003c\/p\u003e \u003cp\u003eForward Selection Using a Validation Set 747\u003c\/p\u003e \u003cp\u003eStepwise Selection 748\u003c\/p\u003e \u003cp\u003eForward Selection Using Non-Regression Techniques 748\u003c\/p\u003e \u003cp\u003eBackward Selection 748\u003c\/p\u003e \u003cp\u003eUndirected Forward Selection 749\u003c\/p\u003e \u003cp\u003eOther Directed Variable Selection Methods 749\u003c\/p\u003e \u003cp\u003eUsing Decision Trees to Select Variables 750\u003c\/p\u003e \u003cp\u003eVariable Reduction Using Neural Networks 752\u003c\/p\u003e \u003cp\u003ePrincipal Components 753\u003c\/p\u003e \u003cp\u003eWhat Are Principal Components? 753\u003c\/p\u003e \u003cp\u003ePrincipal Components Example 758\u003c\/p\u003e \u003cp\u003ePrincipal Component Analysis 763\u003c\/p\u003e \u003cp\u003eFactor Analysis 767\u003c\/p\u003e \u003cp\u003eVariable Clustering 768\u003c\/p\u003e \u003cp\u003eExample of Variable Clusters 768\u003c\/p\u003e \u003cp\u003eUsing Variable Clusters 770\u003c\/p\u003e \u003cp\u003eHierarchical Variable Clustering 770\u003c\/p\u003e \u003cp\u003eDivisive Variable Clustering 773\u003c\/p\u003e \u003cp\u003eLessons Learned 774\u003c\/p\u003e \u003cp\u003e\u003cb\u003eChapter 21 Listen Carefully to What Your Customers Say: Text Mining 775\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003eWhat Is Text Mining? 776\u003c\/p\u003e \u003cp\u003eText Mining for Derived Columns 776\u003c\/p\u003e \u003cp\u003eBeyond Derived Features 777\u003c\/p\u003e \u003cp\u003eText Analysis Applications 778\u003c\/p\u003e \u003cp\u003eWorking with Text Data 781\u003c\/p\u003e \u003cp\u003eSources of Text 781\u003c\/p\u003e \u003cp\u003eLanguage Effects 782\u003c\/p\u003e \u003cp\u003eBasic Approaches to Representing Documents 783\u003c\/p\u003e \u003cp\u003eRepresenting Documents in Practice 784\u003c\/p\u003e \u003cp\u003eDocuments and the Corpus 786\u003c\/p\u003e \u003cp\u003eCase Study: Ad Hoc Text Mining 786\u003c\/p\u003e \u003cp\u003eThe Boycott 787\u003c\/p\u003e \u003cp\u003eBusiness as Usual 787\u003c\/p\u003e \u003cp\u003eCombining Text Mining and Hypothesis Testing 787\u003c\/p\u003e \u003cp\u003eThe Results 788\u003c\/p\u003e \u003cp\u003eClassifying News Stories Using MBR 789\u003c\/p\u003e \u003cp\u003eWhat Are the Codes? 789\u003c\/p\u003e \u003cp\u003eApplying MBR 790\u003c\/p\u003e \u003cp\u003eThe Results 793\u003c\/p\u003e \u003cp\u003eFrom Text to Numbers 794\u003c\/p\u003e \u003cp\u003eStarting with a “Bag of Words” 794\u003c\/p\u003e \u003cp\u003eTerm-Document Matrix 796\u003c\/p\u003e \u003cp\u003eCorpus Effects 797\u003c\/p\u003e \u003cp\u003eSingular Value Decomposition (SVD) 798\u003c\/p\u003e \u003cp\u003eText Mining and Naïve Bayesian Models 800\u003c\/p\u003e \u003cp\u003eNaïve Bayesian in the Text World 801\u003c\/p\u003e \u003cp\u003eIdentifying Spam Using Naïve Bayesian 801\u003c\/p\u003e \u003cp\u003eSentiment Analysis 806\u003c\/p\u003e \u003cp\u003eDIRECTV: A Case Study in Customer Service 809\u003c\/p\u003e \u003cp\u003eBackground 809\u003c\/p\u003e \u003cp\u003eApplying Text Mining 811\u003c\/p\u003e \u003cp\u003eTaking the Technical Approach 814\u003c\/p\u003e \u003cp\u003eNot an Iterative Process 818\u003c\/p\u003e \u003cp\u003eContinuing to Benefit 818\u003c\/p\u003e \u003cp\u003eLessons Learned 819\u003c\/p\u003e \u003cp\u003eIndex 821\u003c\/p\u003e \u003cp\u003eGORDON S. LINOFF and MICHAEL J. A. BERRY are the founders of Data Miners, Inc., a consultancy specializing in data mining. They have jointly authored two of the leading data mining titles in the field, Data Mining Techniques and Mastering Data Mining (both from Wiley). They each have decades of experience applying data mining techniques to business problems in marketing and customer relationship management.\u003c\/p\u003e   \u003cp\u003eThe newest edition of the leading introductory book on data mining, fully updated and revised\u003c\/p\u003e \u003cp\u003eWho will remain a loyal customer and who won't? Which messages are most effective with which segments? How can customer value be maximized? This book supplies powerful tools for extracting the answers to these and other crucial business questions from the corporate databases where they lie buried. In the years since the first edition of this book, data mining has grown to become an indispensable tool of modern business. In this latest edition, Linoff and Berry have made extensive updates and revisions to every chapter and added several new ones. The book retains the focus of earlier editionsshowing marketing analysts, business managers, and data mining specialists how to harness data mining methods and techniques to solve important business problems. While never sacrificing accuracy for the sake of simplicity, Linoff and Berry present even complex topics in clear, concise English with minimal use of technical jargon or mathematical formulas. Technical topics are illustrated with case studies and practical real-world examples drawn from the authors' experiences, and every chapter contains valuable tips for practitioners. Among the techniques newly covered, or covered in greater depth, are linear and logistic regression models, incremental response (uplift) modeling, naïve Bayesian models, table lookup models, similarity models, radial basis function networks, expectation maximization (EM) clustering, and swarm intelligence. New chapters are devoted to data preparation, derived variables, principal components and other variable reduction techniques, and text mining.\u003c\/p\u003e \u003cp\u003eAfter establishing the business context with an overview of data mining applications, and introducing aspects of data mining methodology common to all data mining projects, the book covers each important data mining technique in detail.\u003c\/p\u003e \u003cp\u003eThis third edition of Data Mining Techniques covers such topics as:\u003c\/p\u003e \u003cul\u003e \u003cli\u003e \u003cp\u003eHow to create stable, long-lasting predictive models\u003c\/p\u003e \u003c\/li\u003e \u003cli\u003e \u003cp\u003eData preparation and variable selection\u003c\/p\u003e \u003c\/li\u003e \u003cli\u003e \u003cp\u003eModeling specific targets with directed techniques such as regression, decision trees, neural networks, and memory based reasoning\u003c\/p\u003e \u003c\/li\u003e \u003cli\u003e \u003cp\u003eFinding patterns with undirected techniques such as clustering, association rules, and link analysis\u003c\/p\u003e \u003c\/li\u003e \u003cli\u003e \u003cp\u003eModeling business time-to-event problems such as time to next purchase and expected remaining lifetime\u003c\/p\u003e \u003c\/li\u003e \u003cli\u003e \u003cp\u003eMining unstructured text\u003c\/p\u003e \u003c\/li\u003e \u003c\/ul\u003e \u003cp\u003eThe companion website provides data that can be used to test out the various data mining techniques in the book.\u003c\/p\u003e","brand":"Wiley","offers":[{"title":"Default Title","offer_id":47989025210597,"sku":"NP9780470650936","price":50.0,"currency_code":"USD","in_stock":false}],"thumbnail_url":"\/\/cdn.shopify.com\/s\/files\/1\/1842\/7735\/files\/9780470650936.jpg?v=1761782486","url":"https:\/\/k12savings.com\/products\/data-mining-techniques-isbn-9780470650936","provider":"K12savings","version":"1.0","type":"link"}