{"product_id":"data-science-and-big-data-analytics-isbn-9781118876138","title":"Data Science and Big Data Analytics","description":"\u003ci\u003eData Science and Big Data Analytics\u003c\/i\u003e is about harnessing the power of data for new insights. The book covers the breadth of activities and methods and tools that Data Scientists use. The content focuses on concepts, principles and practical applications that are applicable to any industry and technology environment, and the learning is supported and explained with examples that you can replicate using open-source software. \u003cp\u003eThis book will help you:\u003c\/p\u003e \u003cul\u003e \u003cli\u003eBecome a contributor on a data science team\u003c\/li\u003e \u003cli\u003eDeploy a structured lifecycle approach to data analytics problems\u003c\/li\u003e \u003cli\u003eApply appropriate analytic techniques and tools to analyzing big data\u003c\/li\u003e \u003cli\u003eLearn how to tell a compelling story with data to drive business action\u003c\/li\u003e \u003cli\u003ePrepare for EMC Proven Professional Data Science Certification\u003c\/li\u003e \u003c\/ul\u003e \u003cp\u003eGet started discovering, analyzing, visualizing, and presenting data in a meaningful way today!\u003c\/p\u003e \u003cp\u003eIntroduction xvii\u003c\/p\u003e \u003cp\u003e\u003cb\u003eChapter 1 Introduction to Big Data Analytics 1\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003e1.1 Big Data Overview 2\u003c\/p\u003e \u003cp\u003e1.1.1 Data Structures 5\u003c\/p\u003e \u003cp\u003e1.1.2 Analyst Perspective on Data Repositories 9\u003c\/p\u003e \u003cp\u003e1.2 State of the Practice in Analytics 11\u003c\/p\u003e \u003cp\u003e1.2.1 BI Versus Data Science 12\u003c\/p\u003e \u003cp\u003e1.2.2 Current Analytical Architecture 13\u003c\/p\u003e \u003cp\u003e1.2.3 Drivers of Big Data 15\u003c\/p\u003e \u003cp\u003e1.2.4 Emerging Big Data Ecosystem and a New Approach to Analytics 16\u003c\/p\u003e \u003cp\u003e1.3 Key Roles for the New Big Data Ecosystem 19\u003c\/p\u003e \u003cp\u003e1.4 Examples of Big Data Analytics 22\u003c\/p\u003e \u003cp\u003eSummary 23\u003c\/p\u003e \u003cp\u003eExercises 23\u003c\/p\u003e \u003cp\u003eBibliography 24\u003c\/p\u003e \u003cp\u003e\u003cb\u003eChapter 2 Data Analytics Lifecycle 25\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003e2.1 Data Analytics Lifecycle Overview 26\u003c\/p\u003e \u003cp\u003e2.1.1 Key Roles for a Successful Analytics Project 26\u003c\/p\u003e \u003cp\u003e2.1.2 Background and Overview of Data Analytics Lifecycle 28\u003c\/p\u003e \u003cp\u003e2.2 Phase 1: Discovery 30\u003c\/p\u003e \u003cp\u003e2.2.1 Learning the Business Domain 30\u003c\/p\u003e \u003cp\u003e2.2.2 Resources 31\u003c\/p\u003e \u003cp\u003e2.2.3 Framing the Problem 32\u003c\/p\u003e \u003cp\u003e2.2.4 Identifying Key Stakeholders 33\u003c\/p\u003e \u003cp\u003e2.2.5 Interviewing the Analytics Sponsor 33\u003c\/p\u003e \u003cp\u003e2.2.6 Developing Initial Hypotheses 35\u003c\/p\u003e \u003cp\u003e2.2.7 Identifying Potential Data Sources 35\u003c\/p\u003e \u003cp\u003e2.3 Phase 2: Data Preparation 36\u003c\/p\u003e \u003cp\u003e2.3.1 Preparing the Analytic Sandbox 37\u003c\/p\u003e \u003cp\u003e2.3.2 Performing ETLT 38\u003c\/p\u003e \u003cp\u003e2.3.3 Learning About the Data 39\u003c\/p\u003e \u003cp\u003e2.3.4 Data Conditioning 40\u003c\/p\u003e \u003cp\u003e2.3.5 Survey and Visualize 41\u003c\/p\u003e \u003cp\u003e2.3.6 Common Tools for the Data Preparation Phase 42\u003c\/p\u003e \u003cp\u003e2.4 Phase 3: Model Planning 42\u003c\/p\u003e \u003cp\u003e2.4.1 Data Exploration and Variable Selection 44\u003c\/p\u003e \u003cp\u003e2.4.2 Model Selection 45\u003c\/p\u003e \u003cp\u003e2.4.3 Common Tools for the Model Planning Phase 45\u003c\/p\u003e \u003cp\u003e2.5 Phase 4: Model Building 46\u003c\/p\u003e \u003cp\u003e2.5.1 Common Tools for the Model Building Phase 48\u003c\/p\u003e \u003cp\u003e2.6 Phase 5: Communicate Results 49\u003c\/p\u003e \u003cp\u003e2.7 Phase 6: Operationalize 50\u003c\/p\u003e \u003cp\u003e2.8 Case Study: Global Innovation Network and Analysis (GINA) 53\u003c\/p\u003e \u003cp\u003e2.8.1 Phase 1: Discovery 54\u003c\/p\u003e \u003cp\u003e2.8.2 Phase 2: Data Preparation 55\u003c\/p\u003e \u003cp\u003e2.8.3 Phase 3: Model Planning 56\u003c\/p\u003e \u003cp\u003e2.8.4 Phase 4: Model Building 56\u003c\/p\u003e \u003cp\u003e2.8.5 Phase 5: Communicate Results 58\u003c\/p\u003e \u003cp\u003e2.8.6 Phase 6: Operationalize 59\u003c\/p\u003e \u003cp\u003eSummary 60\u003c\/p\u003e \u003cp\u003eExercises 61\u003c\/p\u003e \u003cp\u003eBibliography 61\u003c\/p\u003e \u003cp\u003e\u003cb\u003eChapter 3 Review of Basic Data Analytic Methods Using R 63\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003e3.1 Introduction to R 64\u003c\/p\u003e \u003cp\u003e3.1.1 R Graphical User Interfaces 67\u003c\/p\u003e \u003cp\u003e3.1.2 Data Import and Export 69\u003c\/p\u003e \u003cp\u003e3.1.3 Attribute and Data Types 71\u003c\/p\u003e \u003cp\u003e3.1.4 Descriptive Statistics 79\u003c\/p\u003e \u003cp\u003e3.2 Exploratory Data Analysis 80\u003c\/p\u003e \u003cp\u003e3.2.1 Visualization Before Analysis 82\u003c\/p\u003e \u003cp\u003e3.2.2 Dirty Data 85\u003c\/p\u003e \u003cp\u003e3.2.3 Visualizing a Single Variable 88\u003c\/p\u003e \u003cp\u003e3.2.4 Examining Multiple Variables 91\u003c\/p\u003e \u003cp\u003e3.2.5 Data Exploration Versus Presentation 99\u003c\/p\u003e \u003cp\u003e3.3 Statistical Methods for Evaluation 101\u003c\/p\u003e \u003cp\u003e3.3.1 Hypothesis Testing 102\u003c\/p\u003e \u003cp\u003e3.3.2 Difference of Means 104\u003c\/p\u003e \u003cp\u003e3.3.3 Wilcoxon Rank-Sum Test 108\u003c\/p\u003e \u003cp\u003e3.3.4 Type I and Type II Errors 109\u003c\/p\u003e \u003cp\u003e3.3.5 Power and Sample Size 110\u003c\/p\u003e \u003cp\u003e3.3.6 ANOVA 110\u003c\/p\u003e \u003cp\u003eSummary 114\u003c\/p\u003e \u003cp\u003eExercises 114\u003c\/p\u003e \u003cp\u003eBibliography 115\u003c\/p\u003e \u003cp\u003e\u003cb\u003eChapter 4 Advanced Analytical Theory and Methods: Clustering 117\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003e4.1 Overview of Clustering 118\u003c\/p\u003e \u003cp\u003e4.2 K-means 118\u003c\/p\u003e \u003cp\u003e4.2.1 Use Cases 119\u003c\/p\u003e \u003cp\u003e4.2.2 Overview of the Method 120\u003c\/p\u003e \u003cp\u003e4.2.3 Determining the Number of Clusters 123\u003c\/p\u003e \u003cp\u003e4.2.4 Diagnostics 128\u003c\/p\u003e \u003cp\u003e4.2.5 Reasons to Choose and Cautions 130\u003c\/p\u003e \u003cp\u003e4.3 Additional Algorithms 134\u003c\/p\u003e \u003cp\u003eSummary 135\u003c\/p\u003e \u003cp\u003eExercises 135\u003c\/p\u003e \u003cp\u003eBibliography 136\u003c\/p\u003e \u003cp\u003e\u003cb\u003eChapter 5 Advanced Analytical Theory and Methods: Association Rules 137\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003e5.1 Overview 138\u003c\/p\u003e \u003cp\u003e5.2 Apriori Algorithm 140\u003c\/p\u003e \u003cp\u003e5.3 Evaluation of Candidate Rules 141\u003c\/p\u003e \u003cp\u003e5.4 Applications of Association Rules 143\u003c\/p\u003e \u003cp\u003e5.5 An Example: Transactions in a Grocery Store 143\u003c\/p\u003e \u003cp\u003e5.5.1 The Groceries Dataset 144\u003c\/p\u003e \u003cp\u003e5.5.2 Frequent Itemset Generation 146\u003c\/p\u003e \u003cp\u003e5.5.3 Rule Generation and Visualization 152\u003c\/p\u003e \u003cp\u003e5.6 Validation and Testing 157\u003c\/p\u003e \u003cp\u003e5.7 Diagnostics 158\u003c\/p\u003e \u003cp\u003eSummary 158\u003c\/p\u003e \u003cp\u003eExercises 159\u003c\/p\u003e \u003cp\u003eBibliography 160\u003c\/p\u003e \u003cp\u003e\u003cb\u003eChapter 6 Advanced Analytical Theory and Methods: Regression 161\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003e6.1 Linear Regression 162\u003c\/p\u003e \u003cp\u003e6.1.1 Use Cases 162\u003c\/p\u003e \u003cp\u003e6.1.2 Model Description 163\u003c\/p\u003e \u003cp\u003e6.1.3 Diagnostics 173\u003c\/p\u003e \u003cp\u003e6.2 Logistic Regression 178\u003c\/p\u003e \u003cp\u003e6.2.1 Use Cases 179\u003c\/p\u003e \u003cp\u003e6.2.2 Model Description 179\u003c\/p\u003e \u003cp\u003e6.2.3 Diagnostics 181\u003c\/p\u003e \u003cp\u003e6.3 Reasons to Choose and Cautions 188\u003c\/p\u003e \u003cp\u003e6.4 Additional Regression Models 189\u003c\/p\u003e \u003cp\u003eSummary 190\u003c\/p\u003e \u003cp\u003eExercises 190\u003c\/p\u003e \u003cp\u003e\u003cb\u003eChapter 7 Advanced Analytical Theory and Methods: Classification 191\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003e7.1 Decision Trees 192\u003c\/p\u003e \u003cp\u003e7.1.1 Overview of a Decision Tree 193\u003c\/p\u003e \u003cp\u003e7.1.2 The General Algorithm 197\u003c\/p\u003e \u003cp\u003e7.1.3 Decision Tree Algorithms 203\u003c\/p\u003e \u003cp\u003e7.1.4 Evaluating a Decision Tree 204\u003c\/p\u003e \u003cp\u003e7.1.5 Decision Trees in R 206\u003c\/p\u003e \u003cp\u003e7.2 Naïve Bayes 211\u003c\/p\u003e \u003cp\u003e7.2.1 Bayes’ Theorem 212\u003c\/p\u003e \u003cp\u003e7.2.2 Naïve Bayes Classifier 214\u003c\/p\u003e \u003cp\u003e7.2.3 Smoothing 217\u003c\/p\u003e \u003cp\u003e7.2.4 Diagnostics 217\u003c\/p\u003e \u003cp\u003e7.2.5 Naïve Bayes in R 218\u003c\/p\u003e \u003cp\u003e7.3 Diagnostics of Classifiers 224\u003c\/p\u003e \u003cp\u003e7.4 Additional Classification Methods 228\u003c\/p\u003e \u003cp\u003eSummary 229\u003c\/p\u003e \u003cp\u003eExercises 230\u003c\/p\u003e \u003cp\u003eBibliography 231\u003c\/p\u003e \u003cp\u003e\u003cb\u003eChapter 8 Advanced Analytical Theory and Methods: Time Series Analysis 233\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003e8.1 Overview of Time Series Analysis 234\u003c\/p\u003e \u003cp\u003e8.1.1 Box-Jenkins Methodology 235\u003c\/p\u003e \u003cp\u003e8.2 ARIMA Model 236\u003c\/p\u003e \u003cp\u003e8.2.1 Autocorrelation Function (ACF) 236\u003c\/p\u003e \u003cp\u003e8.2.2 Autoregressive Models 238\u003c\/p\u003e \u003cp\u003e8.2.3 Moving Average Models 239\u003c\/p\u003e \u003cp\u003e8.2.4 ARMA and ARIMA Models 241\u003c\/p\u003e \u003cp\u003e8.2.5 Building and Evaluating an ARIMA Model 244\u003c\/p\u003e \u003cp\u003e8.2.6 Reasons to Choose and Cautions 252\u003c\/p\u003e \u003cp\u003e8.3 Additional Methods 253\u003c\/p\u003e \u003cp\u003eSummary 254\u003c\/p\u003e \u003cp\u003eExercises 254\u003c\/p\u003e \u003cp\u003e\u003cb\u003eChapter 9 Advanced Analytical Theory and Methods: Text Analysis 255\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003e9.1 Text Analysis Steps 257\u003c\/p\u003e \u003cp\u003e9.2 A Text Analysis Example 259\u003c\/p\u003e \u003cp\u003e9.3 Collecting Raw Text 260\u003c\/p\u003e \u003cp\u003e9.4 Representing Text 264\u003c\/p\u003e \u003cp\u003e9.5 Term Frequency—Inverse Document Frequency (TFIDF) 269\u003c\/p\u003e \u003cp\u003e9.6 Categorizing Documents by Topics 274\u003c\/p\u003e \u003cp\u003e9.7 Determining Sentiments 277\u003c\/p\u003e \u003cp\u003e9.8 Gaining Insights 283\u003c\/p\u003e \u003cp\u003eSummary 290\u003c\/p\u003e \u003cp\u003eExercises 290\u003c\/p\u003e \u003cp\u003eBibliography 291\u003c\/p\u003e \u003cp\u003e\u003cb\u003eChapter 10 Advanced Analytics—Technology and Tools: MapReduce and Hadoop 295\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003e10.1 Analytics for Unstructured Data 296\u003c\/p\u003e \u003cp\u003e10.1.1 Use Cases 296\u003c\/p\u003e \u003cp\u003e10.1.2 MapReduce 298\u003c\/p\u003e \u003cp\u003e10.1.3 Apache Hadoop 300\u003c\/p\u003e \u003cp\u003e10.2 The Hadoop Ecosystem 306\u003c\/p\u003e \u003cp\u003e10.2.1 Pig 306\u003c\/p\u003e \u003cp\u003e10.2.2 Hive 308\u003c\/p\u003e \u003cp\u003e10.2.3 HBase 311\u003c\/p\u003e \u003cp\u003e10.2.4 Mahout 319\u003c\/p\u003e \u003cp\u003e10.3 NoSQL 322\u003c\/p\u003e \u003cp\u003eSummary 323\u003c\/p\u003e \u003cp\u003eExercises 324\u003c\/p\u003e \u003cp\u003eBibliography 324\u003c\/p\u003e \u003cp\u003e\u003cb\u003eChapter 11 Advanced Analytics—Technology and Tools: In-Database Analytics 327\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003e11.1 SQL Essentials 328\u003c\/p\u003e \u003cp\u003e11.1.1 Joins 330\u003c\/p\u003e \u003cp\u003e11.1.2 Set Operations 332\u003c\/p\u003e \u003cp\u003e11.1.3 Grouping Extensions 334\u003c\/p\u003e \u003cp\u003e11.2 In-Database Text Analysis 338\u003c\/p\u003e \u003cp\u003e11.3 Advanced SQL 343\u003c\/p\u003e \u003cp\u003e11.3.1 Window Functions 343\u003c\/p\u003e \u003cp\u003e11.3.2 User-Defined Functions and Aggregates 347\u003c\/p\u003e \u003cp\u003e11.3.3 Ordered Aggregates 351\u003c\/p\u003e \u003cp\u003e11.3.4 MADlib 352\u003c\/p\u003e \u003cp\u003eSummary 356\u003c\/p\u003e \u003cp\u003eExercises 356\u003c\/p\u003e \u003cp\u003eBibliography 357\u003c\/p\u003e \u003cp\u003e\u003cb\u003eChapter 12 The Endgame, or Putting It All Together 359\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003e12.1 Communicating and Operationalizing an Analytics Project 360\u003c\/p\u003e \u003cp\u003e12.2 Creating the Final Deliverables 362\u003c\/p\u003e \u003cp\u003e12.2.1 Developing Core Material for Multiple Audiences 364\u003c\/p\u003e \u003cp\u003e12.2.2 Project Goals 365\u003c\/p\u003e \u003cp\u003e12.2.3 Main Findings 367\u003c\/p\u003e \u003cp\u003e12.2.4 Approach 369\u003c\/p\u003e \u003cp\u003e12.2.5 Model Description 371\u003c\/p\u003e \u003cp\u003e12.2.6 Key Points Supported with Data 372\u003c\/p\u003e \u003cp\u003e12.2.7 Model Details 372\u003c\/p\u003e \u003cp\u003e12.2.8 Recommendations 374\u003c\/p\u003e \u003cp\u003e12.2.9 Additional Tips on Final Presentation 375\u003c\/p\u003e \u003cp\u003e12.2.10 Providing Technical Specifications and Code 376\u003c\/p\u003e \u003cp\u003e12.3 Data Visualization Basics 377\u003c\/p\u003e \u003cp\u003e12.3.1 Key Points Supported with Data 378\u003c\/p\u003e \u003cp\u003e12.3.2 Evolution of a Graph 380\u003c\/p\u003e \u003cp\u003e12.3.3 Common Representation Methods 386\u003c\/p\u003e \u003cp\u003e12.3.4 How to Clean Up a Graphic 387\u003c\/p\u003e \u003cp\u003e12.3.5 Additional Considerations 392\u003c\/p\u003e \u003cp\u003eSummary 393\u003c\/p\u003e \u003cp\u003eExercises 394\u003c\/p\u003e \u003cp\u003eReferences and Further Reading 394\u003c\/p\u003e \u003cp\u003eBibliography 394\u003c\/p\u003e \u003cp\u003eIndex 397\u003c\/p\u003e    \u003cp\u003eEMC is a global leader in enabling businesses and service providers to transform their operations and deliver IT as a service. Fundamental to this transformation is cloud computing. Through innovative products and services, EMC accelerates the journey to cloud computing, helping IT departments to store, manage, protect and analyze their most valuable asset  information  in a more agile, trusted and cost-efficient way. Additional information about EMC can be found at www.EMC.com    \u003c\/p\u003e\u003cp\u003e\u003cb\u003eData Science and Big Data Analytics\u003c\/b\u003e \u003c\/p\u003e\u003cp\u003eDiscovering, Analyzing, Visualizing and Presenting Data \u003c\/p\u003e\u003cp\u003eData Science and Big Data Analytics is about harnessing the power of data for new insights. The book covers the breadth of activities, methods and tools that Data Scientists use. The content focuses on concepts, principles and practical applications that are relevant to any industry and technology environment, and the learning is supported and explained with illustrative examples using open-source software. \u003c\/p\u003e\u003cp\u003eThis book will help you: \u003c\/p\u003e\u003cul\u003e \u003cli\u003eBecome a contributor on a data science team\u003c\/li\u003e \u003cli\u003eDeploy a structured lifecycle approach to data analytics problems\u003c\/li\u003e \u003cli\u003eApply appropriate analytic techniques and tools to analyze big data\u003c\/li\u003e \u003cli\u003eLearn how to tell a compelling story with data to drive business action\u003c\/li\u003e \u003cli\u003ePrepare for EMC Proven\u003csup\u003eTM\u003c\/sup\u003e Professional Data Scientist certification\u003c\/li\u003e \u003c\/ul\u003e \u003cp\u003e\u003cb\u003eEMC Proven™ Professional\u003c\/b\u003e is a leading education and certification program in the IT industry, providing comprehensive coverage of information storage technologies, virtualization, cloud computing, data science\/big data analytics, and more... \u003c\/p\u003e\u003cp\u003eBeing Proven means investing in yourself and formally validating your expertise! \u003c\/p\u003e\u003cp\u003eThis book prepares you for the Data Science Associate exam E20-007 leading to EMC Proven Professional Data Science Associate (EMCDSA) certification. \u003c\/p\u003e\u003cp\u003eVisit http:\/\/education.EMC.com for details.\u003c\/p\u003e","brand":"Wiley","offers":[{"title":"Default Title","offer_id":47989025538277,"sku":"NP9781118876138","price":63.0,"currency_code":"USD","in_stock":false}],"thumbnail_url":"\/\/cdn.shopify.com\/s\/files\/1\/1842\/7735\/files\/9781118876138.jpg?v=1761782488","url":"https:\/\/k12savings.com\/es\/products\/data-science-and-big-data-analytics-isbn-9781118876138","provider":"K12savings","version":"1.0","type":"link"}