{"product_id":"big-data-isbn-9781119701828","title":"Big Data","description":"\u003cp\u003e\u003cb\u003eLearn Big Data from the ground up with this complete and up-to-date resource from leaders in the field \u003c\/b\u003e\u003c\/p\u003e \u003cp\u003e\u003ci\u003eBig Data: Concepts, Technology, and Architecture\u003c\/i\u003e delivers a comprehensive treatment of Big Data tools, terminology, and technology perfectly suited to a wide range of business professionals, academic researchers, and students. Beginning with a fulsome overview of what we mean when we say, “Big Data,” the book moves on to discuss every stage of the lifecycle of Big Data. \u003c\/p\u003e \u003cp\u003eYou’ll learn about the creation of structured, unstructured, and semi-structured data, data storage solutions, traditional database solutions like SQL, data processing, data analytics, machine learning, and data mining. You’ll also discover how specific technologies like Apache Hadoop, SQOOP, and Flume work. \u003c\/p\u003e \u003cp\u003e\u003ci\u003eBig Data\u003c\/i\u003e also covers the central topic of big data visualization with Tableau, and you’ll learn how to create scatter plots, histograms, bar, line, and pie charts with that software. \u003c\/p\u003e \u003cp\u003eAccessibly organized, \u003ci\u003eBig Data\u003c\/i\u003e includes illuminating case studies throughout the material, showing you how the included concepts have been applied in real-world settings. Some of those concepts include: \u003c\/p\u003e \u003cul\u003e \u003cli\u003eThe common challenges facing big data technology and technologists, like data heterogeneity and incompleteness, data volume and velocity, storage limitations, and privacy concerns \u003c\/li\u003e \u003cli\u003eRelational and non-relational databases, like RDBMS, NoSQL, and NewSQL databases \u003c\/li\u003e \u003cli\u003eVirtualizing Big Data through encapsulation, partitioning, and isolating, as well as big data server virtualization \u003c\/li\u003e \u003cli\u003eApache software, including Hadoop, Cassandra, Avro, Pig, Mahout, Oozie, and Hive \u003c\/li\u003e \u003cli\u003eThe Big Data analytics lifecycle, including business case evaluation, data preparation, extraction, transformation, analysis, and visualization \u003c\/li\u003e \u003c\/ul\u003e \u003cp\u003ePerfect for data scientists, data engineers, and database managers, \u003ci\u003eBig Data\u003c\/i\u003e also belongs on the bookshelves of business intelligence analysts who are required to make decisions based on large volumes of information. Executives and managers who lead teams responsible for keeping or understanding large datasets will also benefit from this book. \u003c\/p\u003e \u003cp\u003e \u003c\/p\u003e \u003cp\u003eAcknowledgments xi\u003c\/p\u003e \u003cp\u003eAbout the Author xii\u003c\/p\u003e \u003cp\u003e\u003cb\u003e1 Introduction to the World of Big Data \u003c\/b\u003e\u003cb\u003e1\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003e1.1 Understanding Big Data 1\u003c\/p\u003e \u003cp\u003e1.2 Evolution of Big Data 2\u003c\/p\u003e \u003cp\u003e1.3 Failure of Traditional Database in Handling Big Data 3\u003c\/p\u003e \u003cp\u003e1.4 3 Vs of Big Data 4\u003c\/p\u003e \u003cp\u003e1.5 Sources of Big Data 7\u003c\/p\u003e \u003cp\u003e1.6 Different Types of Data 8\u003c\/p\u003e \u003cp\u003e1.7 Big Data Infrastructure 11\u003c\/p\u003e \u003cp\u003e1.8 Big Data Life Cycle 12\u003c\/p\u003e \u003cp\u003e1.9 Big Data Technology 18\u003c\/p\u003e \u003cp\u003e1.10 Big Data Applications 21\u003c\/p\u003e \u003cp\u003e1.11 Big Data Use Cases 21\u003c\/p\u003e \u003cp\u003eChapter 1 Refresher 24\u003c\/p\u003e \u003cp\u003e\u003cb\u003e2 Big Data Storage Concepts \u003c\/b\u003e\u003cb\u003e31\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003e2.1 Cluster Computing 32\u003c\/p\u003e \u003cp\u003e2.2 Distribution Models 37\u003c\/p\u003e \u003cp\u003e2.3 Distributed File System 43\u003c\/p\u003e \u003cp\u003e2.4 Relational and Non-Relational Databases 43\u003c\/p\u003e \u003cp\u003e2.5 Scaling Up and Scaling Out Storage 47\u003c\/p\u003e \u003cp\u003eChapter 2 Refresher 48\u003c\/p\u003e \u003cp\u003e\u003cb\u003e3 NoSQL Database \u003c\/b\u003e\u003cb\u003e53\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003e3.1 Introduction to NoSQL 53\u003c\/p\u003e \u003cp\u003e3.2 Why NoSQL 54\u003c\/p\u003e \u003cp\u003e3.3 CAP Theorem 54\u003c\/p\u003e \u003cp\u003e3.4 ACID 56\u003c\/p\u003e \u003cp\u003e3.5 BASE 56\u003c\/p\u003e \u003cp\u003e3.6 Schemaless Databases 57\u003c\/p\u003e \u003cp\u003e3.7 NoSQL (Not Only SQL) 57\u003c\/p\u003e \u003cp\u003e3.8 Migrating from RDBMS to NoSQL 76\u003c\/p\u003e \u003cp\u003eChapter 3 Refresher 77\u003c\/p\u003e \u003cp\u003e\u003cb\u003e4 Processing, Management Concepts, and Cloud Computing \u003c\/b\u003e\u003cb\u003e83\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003ePart I: Big Data Processing and Management Concepts 83\u003c\/p\u003e \u003cp\u003e4.1 Data Processing 83\u003c\/p\u003e \u003cp\u003e4.2 Shared Everything Architecture 85\u003c\/p\u003e \u003cp\u003e4.3 Shared-Nothing Architecture 86\u003c\/p\u003e \u003cp\u003e4.4 Batch Processing 88\u003c\/p\u003e \u003cp\u003e4.5 Real-Time Data Processing 88\u003c\/p\u003e \u003cp\u003e4.6 Parallel Computing 89\u003c\/p\u003e \u003cp\u003e4.7 Distributed Computing 90\u003c\/p\u003e \u003cp\u003e4.8 Big Data Virtualization 90\u003c\/p\u003e \u003cp\u003ePart II: Managing and Processing Big Data in Cloud Computing 93\u003c\/p\u003e \u003cp\u003e4.9 Introduction 93\u003c\/p\u003e \u003cp\u003e4.10 Cloud Computing Types 94\u003c\/p\u003e \u003cp\u003e4.11 Cloud Services 95\u003c\/p\u003e \u003cp\u003e4.12 Cloud Storage 96\u003c\/p\u003e \u003cp\u003e4.13 Cloud Architecture 101\u003c\/p\u003e \u003cp\u003eChapter 4 Refresher 103\u003c\/p\u003e \u003cp\u003e\u003cb\u003e5 Driving Big Data with Hadoop Tools and Technologies \u003c\/b\u003e\u003cb\u003e111\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003e5.1 Apache Hadoop 111\u003c\/p\u003e \u003cp\u003e5.2 Hadoop Storage 114\u003c\/p\u003e \u003cp\u003e5.3 Hadoop Computation 119\u003c\/p\u003e \u003cp\u003e5.4 Hadoop 2.0 129\u003c\/p\u003e \u003cp\u003e5.5 HBASE 138\u003c\/p\u003e \u003cp\u003e5.6 Apache Cassandra 141\u003c\/p\u003e \u003cp\u003e5.7 SQOOP 141\u003c\/p\u003e \u003cp\u003e5.8 Flume 143\u003c\/p\u003e \u003cp\u003e5.9 Apache Avro 144\u003c\/p\u003e \u003cp\u003e5.10 Apache Pig 145\u003c\/p\u003e \u003cp\u003e5.11 Apache Mahout 146\u003c\/p\u003e \u003cp\u003e5.12 Apache Oozie 146\u003c\/p\u003e \u003cp\u003e5.13 Apache Hive 149\u003c\/p\u003e \u003cp\u003e5.14 Hive Architecture 151\u003c\/p\u003e \u003cp\u003e5.15 Hadoop Distributions 152\u003c\/p\u003e \u003cp\u003eChapter 5 Refresher 153\u003c\/p\u003e \u003cp\u003e\u003cb\u003e6 Big Data Analytics \u003c\/b\u003e\u003cb\u003e161\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003e6.1 Terminology of Big Data Analytics 161\u003c\/p\u003e \u003cp\u003e6.2 Big Data Analytics 162\u003c\/p\u003e \u003cp\u003e6.3 Data Analytics Life Cycle 166\u003c\/p\u003e \u003cp\u003e6.4 Big Data Analytics Techniques 170\u003c\/p\u003e \u003cp\u003e6.5 Semantic Analysis 175\u003c\/p\u003e \u003cp\u003e6.6 Visual analysis 178\u003c\/p\u003e \u003cp\u003e6.7 Big Data Business Intelligence 178\u003c\/p\u003e \u003cp\u003e6.8 Big Data Real-Time Analytics Processing 180\u003c\/p\u003e \u003cp\u003e6.9 Enterprise Data Warehouse 181\u003c\/p\u003e \u003cp\u003eChapter 6 Refresher 182\u003c\/p\u003e \u003cp\u003e\u003cb\u003e7 Big Data Analytics with Machine Learning \u003c\/b\u003e\u003cb\u003e187\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003e7.1 Introduction to Machine Learning 187\u003c\/p\u003e \u003cp\u003e7.2 Machine Learning Use Cases 188\u003c\/p\u003e \u003cp\u003e7.3 Types of Machine Learning 189\u003c\/p\u003e \u003cp\u003eChapter 7 Refresher 196\u003c\/p\u003e \u003cp\u003e\u003cb\u003e8 Mining Data Streams and Frequent Itemset \u003c\/b\u003e\u003cb\u003e201\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003e8.1 Itemset Mining 201\u003c\/p\u003e \u003cp\u003e8.2 Association Rules 206\u003c\/p\u003e \u003cp\u003e8.3 Frequent Itemset Generation 210\u003c\/p\u003e \u003cp\u003e8.4 Itemset Mining Algorithms 211\u003c\/p\u003e \u003cp\u003e8.5 Maximal and Closed Frequent Itemset 229\u003c\/p\u003e \u003cp\u003e8.6 Mining Maximal Frequent Itemsets: the GenMax Algorithm 233\u003c\/p\u003e \u003cp\u003e8.7 Mining Closed Frequent Itemsets: the Charm Algorithm 236\u003c\/p\u003e \u003cp\u003e8.8 CHARM Algorithm Implementation 236\u003c\/p\u003e \u003cp\u003e8.9 Data Mining Methods 239\u003c\/p\u003e \u003cp\u003e8.10 Prediction 240\u003c\/p\u003e \u003cp\u003e8.11 Important Terms Used in Bayesian Network 241\u003c\/p\u003e \u003cp\u003e8.12 Density Based Clustering Algorithm 249\u003c\/p\u003e \u003cp\u003e8.13 DBSCAN 249\u003c\/p\u003e \u003cp\u003e8.14 Kernel Density Estimation 250\u003c\/p\u003e \u003cp\u003e8.15 Mining Data Streams 254\u003c\/p\u003e \u003cp\u003e8.16 Time Series Forecasting 255\u003c\/p\u003e \u003cp\u003e\u003cb\u003e9 Cluster Analysis \u003c\/b\u003e\u003cb\u003e259\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003e9.1 Clustering 259\u003c\/p\u003e \u003cp\u003e9.2 Distance Measurement Techniques 261\u003c\/p\u003e \u003cp\u003e9.3 Hierarchical Clustering 263\u003c\/p\u003e \u003cp\u003e9.4 Analysis of Protein Patterns in the Human Cancer-Associated Liver 266\u003c\/p\u003e \u003cp\u003e9.5 Recognition Using Biometrics of Hands 267\u003c\/p\u003e \u003cp\u003e9.6 Expectation Maximization Clustering Algorithm 274\u003c\/p\u003e \u003cp\u003e9.7 Representative-Based Clustering 277\u003c\/p\u003e \u003cp\u003e9.8 Methods of Determining the Number of Clusters 277\u003c\/p\u003e \u003cp\u003e9.9 Optimization Algorithm 284\u003c\/p\u003e \u003cp\u003e9.10 Choosing the Number of Clusters 288\u003c\/p\u003e \u003cp\u003e9.11 Bayesian Analysis of Mixtures 290\u003c\/p\u003e \u003cp\u003e9.12 Fuzzy Clustering 290\u003c\/p\u003e \u003cp\u003e9.13 Fuzzy C-Means Clustering 291\u003c\/p\u003e \u003cp\u003e\u003cb\u003e10 Big Data Visualization \u003c\/b\u003e\u003cb\u003e293\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003e10.1 Big Data Visualization 293\u003c\/p\u003e \u003cp\u003e10.2 Conventional Data Visualization Techniques 294\u003c\/p\u003e \u003cp\u003e10.3 Tableau 297\u003c\/p\u003e \u003cp\u003e10.4 Bar Chart in Tableau 309\u003c\/p\u003e \u003cp\u003e10.5 Line Chart 310\u003c\/p\u003e \u003cp\u003e10.6 Pie Chart 311\u003c\/p\u003e \u003cp\u003e10.7 Bubble Chart 312\u003c\/p\u003e \u003cp\u003e10.8 Box Plot 313\u003c\/p\u003e \u003cp\u003e10.9 Tableau Use Cases 313\u003c\/p\u003e \u003cp\u003e10.10 Installing R and Getting Ready 318\u003c\/p\u003e \u003cp\u003e10.11 Data Structures in R 321\u003c\/p\u003e \u003cp\u003e10.12 Importing Data from a File 335\u003c\/p\u003e \u003cp\u003e10.13 Importing Data from a Delimited Text File 336\u003c\/p\u003e \u003cp\u003e10.14 Control Structures in R 337\u003c\/p\u003e \u003cp\u003e10.15 Basic Graphs in R 341\u003c\/p\u003e \u003cp\u003eIndex 347\u003c\/p\u003e \u003cp\u003e\u003cb\u003eBALAMURUGAN BALUSAMY, P\u003csmall\u003eH\u003c\/small\u003eD,\u003c\/b\u003e is a Professor with the School of Computing Science and Engineering at Galgotias University, Greater Noida, India\u003c\/p\u003e \u003cp\u003e\u003cb\u003eNANDHINI ABIRAMI. R\u003c\/b\u003e is an IT Consultant and Research Scholar at VIT University in Vellore. \u003c\/p\u003e\u003cp\u003e\u003cb\u003eSEIFEDINE KADRY, PhD,\u003c\/b\u003e is a Professor of Data Science at the Faculty of Applied Computing and Technology at Noroff University College, Kristiansand, Norway. \u003c\/p\u003e\u003cp\u003e\u003cb\u003eAMIR H. GANDOMI, P\u003csmall\u003eH\u003c\/small\u003eD,\u003c\/b\u003e is a Professor of Data Science at the Faculty of Engineering \u0026amp; Information Technology, University of Technology Sydney, Australia.  \u003c\/p\u003e\u003cp\u003e\u003cb\u003eLearn Big Data from the ground up with this complete and up-to-date resource from leaders in the field\u003c\/b\u003e  \u003c\/p\u003e\u003cp\u003e\u003ci\u003eBig Data: Concepts, Technology, and Architecture\u003c\/i\u003e delivers a comprehensive treatment of Big Data tools, terminology, and technology perfectly suited to a wide range of business professionals, academic researchers, and students. Beginning with a fulsome overview of what we mean when we say, “Big Data,” the book moves on to discuss every stage of the lifecycle of Big Data. \u003c\/p\u003e\u003cp\u003eYou’ll learn about the creation of structured, unstructured, and semi-structured data, data storage solutions, traditional database solutions like SQL, data processing, data analytics, machine learning, and data mining. You’ll also discover how specific technologies like Apache Hadoop, SQOOP, and Flume work. \u003c\/p\u003e\u003cp\u003e\u003ci\u003eBig Data\u003c\/i\u003e also covers the central topic of big data visualization with Tableau, and you’ll learn how to create scatter plots, histograms, bar, line, and pie charts with that software. \u003c\/p\u003e\u003cp\u003eAccessibly organized, \u003ci\u003eBig Data\u003c\/i\u003e includes illuminating case studies throughout the material, showing you how the included concepts have been applied in real-world settings. Some of those concepts include: \u003c\/p\u003e\u003cul\u003e\n\u003cli\u003eThe common challenges facing big data technology and technologists, like data heterogeneity and incompleteness, data volume and velocity, storage limitations, and privacy concerns\u003c\/li\u003e \u003cli\u003eRelational and non-relational databases, like RDBMS, NoSQL, and NewSQL databases\u003c\/li\u003e \u003cli\u003eVirtualizing Big Data through encapsulation, partitioning, and isolating, as well as big data server virtualization\u003c\/li\u003e \u003cli\u003eApache software, including Hadoop, Cassandra, Avro, Pig, Mahout, Oozie, and Hive\u003c\/li\u003e \u003cli\u003eThe Big Data analytics lifecycle, including business case evaluation, data preparation, extraction, transformation, analysis, and visualization\u003c\/li\u003e\n\u003c\/ul\u003e \u003cp\u003ePerfect for data scientists, data engineers, and database managers, \u003ci\u003eBig Data\u003c\/i\u003e also belongs on the bookshelves of business intelligence analysts who are required to make decisions based on large volumes of information. Executives and managers who lead teams responsible for keeping or understanding large datasets will also benefit from this book.\u003c\/p\u003e","brand":"Wiley","offers":[{"title":"Default Title","offer_id":47988810514661,"sku":"NP9781119701828","price":139.95,"currency_code":"USD","in_stock":false}],"thumbnail_url":"\/\/cdn.shopify.com\/s\/files\/1\/1842\/7735\/files\/9781119701828.jpg?v=1761781679","url":"https:\/\/k12savings.com\/products\/big-data-isbn-9781119701828","provider":"K12savings","version":"1.0","type":"link"}