{"product_id":"the-data-science-handbook-isbn-9781394234493","title":"The Data Science Handbook","description":"\u003cp\u003e\u003cb\u003ePractical, accessible guide to becoming a data scientist, updated to include the latest advances in data science and related fields.\u003c\/b\u003e \u003c\/p\u003e\u003cp\u003eBecoming a data scientist is hard. The job focuses on mathematical tools, but also demands fluency with software engineering, understanding of a business situation, and deep understanding of the data itself. This book provides a crash course in data science, combining all the necessary skills into a unified discipline. \u003c\/p\u003e\u003cp\u003eThe focus of \u003ci\u003eThe Data Science Handbook\u003c\/i\u003e is on practical applications and the ability to solve real problems, rather than theoretical formalisms that are rarely needed in practice. Among its key points are: \u003c\/p\u003e\u003cul\u003e\n\u003cli\u003eAn emphasis on software engineering and coding skills, which play a significant role in most real data science problems.\u003c\/li\u003e\n\u003cli\u003eExtensive sample code, detailed discussions of important libraries, and a solid grounding in core concepts from computer science (computer architecture, runtime complexity, and programming paradigms).\u003c\/li\u003e\n\u003cli\u003eA broad overview of important mathematical tools, including classical techniques in statistics, stochastic modeling, regression, numerical optimization, and more.\u003c\/li\u003e\n\u003cli\u003eExtensive tips about the practical realities of working as a data scientist, including understanding related jobs functions, project life cycles, and the varying roles of data science in an organization.\u003c\/li\u003e\n\u003cli\u003eExactly the right amount of theory. A solid conceptual foundation is required for fitting the right model to a business problem, understanding a tool’s limitations, and reasoning about discoveries.\u003c\/li\u003e\n\u003c\/ul\u003e \u003cp\u003eData science is a quickly evolving field, and this 2nd edition has been updated to reflect the latest developments, including the revolution in AI that has come from Large Language Models and the growth of ML Engineering as its own discipline. Much of data science has become a skillset that anybody can have, making this book not only for aspiring data scientists, but also for professionals in other fields who want to use analytics as a force multiplier in their organization. \u003c\/p\u003e\u003cp\u003ePreface to the First Edition xvii\u003c\/p\u003e \u003cp\u003ePreface to the Second Edition xix\u003c\/p\u003e \u003cp\u003e\u003cb\u003e1 Introduction 1\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003e1.1 What Data Science Is and Isn’t 2\u003c\/p\u003e \u003cp\u003e1.2 This Book’s Slogan: Simple Models Are Easier to Work With 3\u003c\/p\u003e \u003cp\u003e1.3 How Is This Book Organized? 4\u003c\/p\u003e \u003cp\u003e1.4 How to Use This Book? 4\u003c\/p\u003e \u003cp\u003e1.5 Why Is It All in Python, Anyway? 4\u003c\/p\u003e \u003cp\u003e1.6 Example Code and Datasets 5\u003c\/p\u003e \u003cp\u003e1.7 Parting Words 5\u003c\/p\u003e \u003cp\u003e\u003cb\u003ePart I The Stuff You’ll Always Use 7\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003e\u003cb\u003e2 The Data Science Road Map 9\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003e2.1 Frame the Problem 10\u003c\/p\u003e \u003cp\u003e2.2 Understand the Data: Basic Questions 11\u003c\/p\u003e \u003cp\u003e2.3 Understand the Data: Data Wrangling 12\u003c\/p\u003e \u003cp\u003e2.4 Understand the Data: Exploratory Analysis 12\u003c\/p\u003e \u003cp\u003e2.5 Extract Features 13\u003c\/p\u003e \u003cp\u003e2.6 Model 14\u003c\/p\u003e \u003cp\u003e2.7 Present Results 14\u003c\/p\u003e \u003cp\u003e2.8 Deploy Code 14\u003c\/p\u003e \u003cp\u003e2.9 Iterating 15\u003c\/p\u003e \u003cp\u003e2.10 Glossary 15\u003c\/p\u003e \u003cp\u003e\u003cb\u003e3 Programming Languages 17\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003e3.1 Why Use a Programming Language? What Are the Other Options? 17\u003c\/p\u003e \u003cp\u003e3.2 A Survey of Programming Languages for Data Science 18\u003c\/p\u003e \u003cp\u003e3.3 Where to Write Code 20\u003c\/p\u003e \u003cp\u003e3.4 Python Overview and Example Scripts 21\u003c\/p\u003e \u003cp\u003e3.5 Python Data Types 25\u003c\/p\u003e \u003cp\u003e3.6 GOTCHA: Hashable and Unhashable Types 30\u003c\/p\u003e \u003cp\u003e3.7 Functions and Control Structures 31\u003c\/p\u003e \u003cp\u003e3.8 Other Parts of Python 33\u003c\/p\u003e \u003cp\u003e3.9 Python’s Technical Libraries 35\u003c\/p\u003e \u003cp\u003e3.10 Other Python Resources 39\u003c\/p\u003e \u003cp\u003e3.11 Further Reading 39\u003c\/p\u003e \u003cp\u003e3.12 Glossary 40\u003c\/p\u003e \u003cp\u003e3a Interlude: My Personal Toolkit 41\u003c\/p\u003e \u003cp\u003e\u003cb\u003e4 Data Munging: String Manipulation, Regular Expressions, and Data Cleaning 43\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003e4.1 The Worst Dataset in the World 43\u003c\/p\u003e \u003cp\u003e4.2 How to Identify Pathologies 44\u003c\/p\u003e \u003cp\u003e4.3 Problems with Data Content 44\u003c\/p\u003e \u003cp\u003e4.4 Formatting Issues 46\u003c\/p\u003e \u003cp\u003e4.5 Example Formatting Script 49\u003c\/p\u003e \u003cp\u003e4.6 Regular Expressions 50\u003c\/p\u003e \u003cp\u003e4.7 Life in the Trenches 53\u003c\/p\u003e \u003cp\u003e4.8 Glossary 54\u003c\/p\u003e \u003cp\u003e\u003cb\u003e5 Visualizations and Simple Metrics 55\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003e5.1 A Note on Python’s Visualization Tools 56\u003c\/p\u003e \u003cp\u003e5.2 Example Code 56\u003c\/p\u003e \u003cp\u003e5.3 Pie Charts 56\u003c\/p\u003e \u003cp\u003e5.4 Bar Charts 58\u003c\/p\u003e \u003cp\u003e5.5 Histograms 59\u003c\/p\u003e \u003cp\u003e5.6 Means, Standard Deviations, Medians, and Quantiles 61\u003c\/p\u003e \u003cp\u003e5.7 Boxplots 62\u003c\/p\u003e \u003cp\u003e5.8 Scatterplots 64\u003c\/p\u003e \u003cp\u003e5.9 Scatterplots with Logarithmic Axes 65\u003c\/p\u003e \u003cp\u003e5.10 Scatter Matrices 67\u003c\/p\u003e \u003cp\u003e5.11 Heatmaps 68\u003c\/p\u003e \u003cp\u003e5.12 Correlations 69\u003c\/p\u003e \u003cp\u003e5.13 Anscombe’s Quartet and the Limits of Numbers 71\u003c\/p\u003e \u003cp\u003e5.14 Time Series 72\u003c\/p\u003e \u003cp\u003e5.15 Further Reading 75\u003c\/p\u003e \u003cp\u003e5.16 Glossary 75\u003c\/p\u003e \u003cp\u003e\u003cb\u003e6 Overview: Machine Learning and Artificial Intelligence 77\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003e6.1 Historical Context 77\u003c\/p\u003e \u003cp\u003e6.2 The Central Paradigm: Learning a Function from Example 78\u003c\/p\u003e \u003cp\u003e6.3 Machine Learning Data: Vectors and Feature Extraction 79\u003c\/p\u003e \u003cp\u003e6.4 Supervised, Unsupervised, and In-Between 79\u003c\/p\u003e \u003cp\u003e6.5 Training Data, Testing Data, and the Great Boogeyman of Overfitting 80\u003c\/p\u003e \u003cp\u003e6.6 Reinforcement Learning 81\u003c\/p\u003e \u003cp\u003e6.7 ML Models as Building Blocks for AI Systems 82\u003c\/p\u003e \u003cp\u003e6.8 ML Engineering as a New Job Role 82\u003c\/p\u003e \u003cp\u003e6.9 Further Reading 83\u003c\/p\u003e \u003cp\u003e6.10 Glossary 83\u003c\/p\u003e \u003cp\u003e\u003cb\u003e7 Interlude: Feature Extraction Ideas 85\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003e7.1 Standard Features 85\u003c\/p\u003e \u003cp\u003e7.2 Features that Involve Grouping 86\u003c\/p\u003e \u003cp\u003e7.3 Preview of More Sophisticated Features 86\u003c\/p\u003e \u003cp\u003e7.4 You Get What You Measure: Defining the Target Variable 87\u003c\/p\u003e \u003cp\u003e\u003cb\u003e8 Machine-Learning Classification 89\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003e8.1 What Is a Classifier, and What Can You Do with It? 89\u003c\/p\u003e \u003cp\u003e8.2 A Few Practical Concerns 90\u003c\/p\u003e \u003cp\u003e8.3 Binary Versus Multiclass 90\u003c\/p\u003e \u003cp\u003e8.4 Example Script 91\u003c\/p\u003e \u003cp\u003e8.5 Specific Classifiers 92\u003c\/p\u003e \u003cp\u003e8.6 Evaluating Classifiers 102\u003c\/p\u003e \u003cp\u003e8.7 Selecting Classification Cutoffs 105\u003c\/p\u003e \u003cp\u003e8.8 Further Reading 106\u003c\/p\u003e \u003cp\u003e8.9 Glossary 106\u003c\/p\u003e \u003cp\u003e\u003cb\u003e9 Technical Communication and Documentation 109\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003e9.1 Several Guiding Principles 109\u003c\/p\u003e \u003cp\u003e9.2 Slide Decks 112\u003c\/p\u003e \u003cp\u003e9.3 Written Reports 114\u003c\/p\u003e \u003cp\u003e9.4 Speaking: What Has Worked for Me 115\u003c\/p\u003e \u003cp\u003e9.5 Code Documentation 117\u003c\/p\u003e \u003cp\u003e9.6 Further Reading 117\u003c\/p\u003e \u003cp\u003e9.7 Glossary 117\u003c\/p\u003e \u003cp\u003e\u003cb\u003ePart II Stuff You Still Need to Know 119\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003e\u003cb\u003e10 Unsupervised Learning: Clustering and Dimensionality Reduction 121\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003e10.1 The Curse of Dimensionality 121\u003c\/p\u003e \u003cp\u003e10.2 Example: Eigenfaces for Dimensionality Reduction 123\u003c\/p\u003e \u003cp\u003e10.3 Principal Component Analysis and Factor Analysis 125\u003c\/p\u003e \u003cp\u003e10.4 Skree Plots and Understanding Dimensionality 127\u003c\/p\u003e \u003cp\u003e10.5 Factor Analysis 127\u003c\/p\u003e \u003cp\u003e10.6 Limitations of PCA 128\u003c\/p\u003e \u003cp\u003e10.7 Clustering 128\u003c\/p\u003e \u003cp\u003e10.8 Further Reading 133\u003c\/p\u003e \u003cp\u003e10.9 Glossary 134\u003c\/p\u003e \u003cp\u003e\u003cb\u003e11 Regression 135\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003e11.1 Example: Predicting Diabetes Progression 136\u003c\/p\u003e \u003cp\u003e11.2 Fitting a Line with Least Squares 137\u003c\/p\u003e \u003cp\u003e11.3 Alternatives to Least Squares 139\u003c\/p\u003e \u003cp\u003e11.4 Fitting Nonlinear Curves 139\u003c\/p\u003e \u003cp\u003e11.5 Goodness of Fit: R 2 and Correlation 141\u003c\/p\u003e \u003cp\u003e11.6 Correlation of Residuals 142\u003c\/p\u003e \u003cp\u003e11.7 Linear Regression 142\u003c\/p\u003e \u003cp\u003e11.8 LASSO Regression and Feature Selection 144\u003c\/p\u003e \u003cp\u003e11.9 Further Reading 145\u003c\/p\u003e \u003cp\u003e11.10 Glossary 145\u003c\/p\u003e \u003cp\u003e\u003cb\u003e12 Data Encodings and File Formats 147\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003e12.1 Typical File Format Categories 147\u003c\/p\u003e \u003cp\u003e12.2 CSV Files 149\u003c\/p\u003e \u003cp\u003e12.3 JSON Files 150\u003c\/p\u003e \u003cp\u003e12.4 XML Files 151\u003c\/p\u003e \u003cp\u003e12.5 HTML Files 153\u003c\/p\u003e \u003cp\u003e12.6 Tar Files 154\u003c\/p\u003e \u003cp\u003e12.7 GZip Files 155\u003c\/p\u003e \u003cp\u003e12.8 Zip Files 155\u003c\/p\u003e \u003cp\u003e12.9 Image Files: Rasterized, Vectorized, and\/or Compressed 156\u003c\/p\u003e \u003cp\u003e12.10 It’s All Bytes at the End of the Day 157\u003c\/p\u003e \u003cp\u003e12.11 Integers 158\u003c\/p\u003e \u003cp\u003e12.12 Floats 158\u003c\/p\u003e \u003cp\u003e12.13 Text Data 159\u003c\/p\u003e \u003cp\u003e12.14 Further Reading 161\u003c\/p\u003e \u003cp\u003e12.15 Glossary 161\u003c\/p\u003e \u003cp\u003e\u003cb\u003e13 Big Data 163\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003e13.1 What Is Big Data? 163\u003c\/p\u003e \u003cp\u003e13.2 When to Use – And not Use – Big Data 164\u003c\/p\u003e \u003cp\u003e13.3 Hadoop: The File System and the Processor 165\u003c\/p\u003e \u003cp\u003e13.4 Example PySpark Script 165\u003c\/p\u003e \u003cp\u003e13.5 Spark Overview 166\u003c\/p\u003e \u003cp\u003e13.6 Spark Operations 168\u003c\/p\u003e \u003cp\u003e13.7 PySpark Data Frames 169\u003c\/p\u003e \u003cp\u003e13.8 Two Ways to Run PySpark 170\u003c\/p\u003e \u003cp\u003e13.9 Configuring Spark 170\u003c\/p\u003e \u003cp\u003e13.10 Under the Hood 172\u003c\/p\u003e \u003cp\u003e13.11 Spark Tips and Gotchas 172\u003c\/p\u003e \u003cp\u003e13.12 The MapReduce Paradigm 173\u003c\/p\u003e \u003cp\u003e13.13 Performance Considerations 174\u003c\/p\u003e \u003cp\u003e13.14 Further Reading 175\u003c\/p\u003e \u003cp\u003e13.15 Glossary 176\u003c\/p\u003e \u003cp\u003e\u003cb\u003e14 Databases 177\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003e14.1 Relational Databases and MySQL® 178\u003c\/p\u003e \u003cp\u003e14.2 Key–Value Stores 183\u003c\/p\u003e \u003cp\u003e14.3 Wide-Column Stores 183\u003c\/p\u003e \u003cp\u003e14.4 Document Stores 184\u003c\/p\u003e \u003cp\u003e14.5 Further Reading 186\u003c\/p\u003e \u003cp\u003e14.6 Glossary 186\u003c\/p\u003e \u003cp\u003e\u003cb\u003e15 Software Engineering Best Practices 187\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003e15.1 Coding Style 187\u003c\/p\u003e \u003cp\u003e15.2 Version Control and Git for Data Scientists 189\u003c\/p\u003e \u003cp\u003e15.3 Testing Code 191\u003c\/p\u003e \u003cp\u003e15.4 Test-Driven Development 193\u003c\/p\u003e \u003cp\u003e15.5 AGILE Methodology 194\u003c\/p\u003e \u003cp\u003e15.6 Further Reading 194\u003c\/p\u003e \u003cp\u003e15.7 Glossary 194\u003c\/p\u003e \u003cp\u003e\u003cb\u003e16 Traditional Natural Language Processing 197\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003e16.1 Do I Even Need NLP? 197\u003c\/p\u003e \u003cp\u003e16.2 The Great Divide: Language Versus Statistics 198\u003c\/p\u003e \u003cp\u003e16.3 Example: Sentiment Analysis on Stock Market Articles 198\u003c\/p\u003e \u003cp\u003e16.4 Software and Datasets 200\u003c\/p\u003e \u003cp\u003e16.5 Tokenization 201\u003c\/p\u003e \u003cp\u003e16.6 Central Concept: Bag-of-Words 201\u003c\/p\u003e \u003cp\u003e16.7 Word Weighting: TF-IDF 202\u003c\/p\u003e \u003cp\u003e16.8 n-Grams 202\u003c\/p\u003e \u003cp\u003e16.9 Stop Words 203\u003c\/p\u003e \u003cp\u003e16.10 Lemmatization and Stemming 203\u003c\/p\u003e \u003cp\u003e16.11 Synonyms 204\u003c\/p\u003e \u003cp\u003e16.12 Part of Speech Tagging 204\u003c\/p\u003e \u003cp\u003e16.13 Common Problems 204\u003c\/p\u003e \u003cp\u003e16.14 Advanced Linguistic NLP: Syntax Trees, Knowledge, and Understanding 206\u003c\/p\u003e \u003cp\u003e16.15 Further Reading 207\u003c\/p\u003e \u003cp\u003e16.16 Glossary 207\u003c\/p\u003e \u003cp\u003e\u003cb\u003e17 Time Series Analysis 209\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003e17.1 Example: Predicting Wikipedia Page Views 210\u003c\/p\u003e \u003cp\u003e17.2 A Typical Workflow 213\u003c\/p\u003e \u003cp\u003e17.3 Time Series Versus Time-Stamped Events 213\u003c\/p\u003e \u003cp\u003e17.4 Resampling and Interpolation 214\u003c\/p\u003e \u003cp\u003e17.5 Smoothing Signals 216\u003c\/p\u003e \u003cp\u003e17.6 Logarithms and Other Transformations 217\u003c\/p\u003e \u003cp\u003e17.7 Trends and Periodicity 217\u003c\/p\u003e \u003cp\u003e17.8 Windowing 217\u003c\/p\u003e \u003cp\u003e17.9 Brainstorming Simple Features 218\u003c\/p\u003e \u003cp\u003e17.10 Better Features: Time Series as Vectors 219\u003c\/p\u003e \u003cp\u003e17.11 Fourier Analysis: Sometimes a Magic Bullet 220\u003c\/p\u003e \u003cp\u003e17.12 Time Series in Context: The Whole Suite of Features 222\u003c\/p\u003e \u003cp\u003e17.13 Further Reading 222\u003c\/p\u003e \u003cp\u003e17.14 Glossary 222\u003c\/p\u003e \u003cp\u003e\u003cb\u003e18 Probability 225\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003e18.1 Flipping Coins: Bernoulli Random Variables 225\u003c\/p\u003e \u003cp\u003e18.2 Throwing Darts: Uniform Random Variables 226\u003c\/p\u003e \u003cp\u003e18.3 The Uniform Distribution and Pseudorandom Numbers 227\u003c\/p\u003e \u003cp\u003e18.4 Nondiscrete, Noncontinuous Random Variables 228\u003c\/p\u003e \u003cp\u003e18.5 Notation, Expectations, and Standard Deviation 230\u003c\/p\u003e \u003cp\u003e18.6 Dependence, Marginal, and Conditional Probability 231\u003c\/p\u003e \u003cp\u003e18.7 Understanding the Tails 232\u003c\/p\u003e \u003cp\u003e18.8 Binomial Distribution 234\u003c\/p\u003e \u003cp\u003e18.9 Poisson Distribution 234\u003c\/p\u003e \u003cp\u003e18.10 Normal Distribution 235\u003c\/p\u003e \u003cp\u003e18.11 Multivariate Gaussian 236\u003c\/p\u003e \u003cp\u003e18.12 Exponential Distribution 237\u003c\/p\u003e \u003cp\u003e18.13 Log-Normal Distribution 238\u003c\/p\u003e \u003cp\u003e18.14 Entropy 238\u003c\/p\u003e \u003cp\u003e18.15 Further Reading 240\u003c\/p\u003e \u003cp\u003e18.16 Glossary 240\u003c\/p\u003e \u003cp\u003e\u003cb\u003e19 Statistics 243\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003e19.1 Statistics in Perspective 243\u003c\/p\u003e \u003cp\u003e19.2 Bayesian Versus Frequentist: Practical Tradeoffs and Differing Philosophies 244\u003c\/p\u003e \u003cp\u003e19.3 Hypothesis Testing: Key Idea and Example 245\u003c\/p\u003e \u003cp\u003e19.4 Multiple Hypothesis Testing 246\u003c\/p\u003e \u003cp\u003e19.5 Parameter Estimation 247\u003c\/p\u003e \u003cp\u003e19.6 Hypothesis Testing: t-Test 248\u003c\/p\u003e \u003cp\u003e19.7 Confidence Intervals 250\u003c\/p\u003e \u003cp\u003e19.8 Bayesian Statistics 252\u003c\/p\u003e \u003cp\u003e19.9 Naive Bayesian Statistics 253\u003c\/p\u003e \u003cp\u003e19.10 Bayesian Networks 253\u003c\/p\u003e \u003cp\u003e19.11 Choosing Priors: Maximum Entropy or Domain Knowledge 254\u003c\/p\u003e \u003cp\u003e19.12 Further Reading 255\u003c\/p\u003e \u003cp\u003e19.13 Glossary 255\u003c\/p\u003e \u003cp\u003e\u003cb\u003e20 Programming Language Concepts 257\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003e20.1 Programming Paradigms 257\u003c\/p\u003e \u003cp\u003e20.2 Compilation and Interpretation 264\u003c\/p\u003e \u003cp\u003e20.3 Type Systems 266\u003c\/p\u003e \u003cp\u003e20.4 Further Reading 267\u003c\/p\u003e \u003cp\u003e20.5 Glossary 267\u003c\/p\u003e \u003cp\u003e\u003cb\u003e21 Performance and Computer Memory 269\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003e21.1 A Word of Caution 269\u003c\/p\u003e \u003cp\u003e21.2 Example Script 270\u003c\/p\u003e \u003cp\u003e21.3 Algorithm Performance and Big-O Notation 272\u003c\/p\u003e \u003cp\u003e21.4 Some Classic Problems: Sorting a List and Binary Search 273\u003c\/p\u003e \u003cp\u003e21.5 Amortized Performance and Average Performance 276\u003c\/p\u003e \u003cp\u003e21.6 Two Principles: Reducing Overhead and Managing Memory 277\u003c\/p\u003e \u003cp\u003e21.7 Performance Tip: Use Numerical Libraries When Applicable 278\u003c\/p\u003e \u003cp\u003e21.8 Performance Tip: Delete Large Structures You Don’t Need 280\u003c\/p\u003e \u003cp\u003e21.9 Performance Tip: Use Built-In Functions When Possible 280\u003c\/p\u003e \u003cp\u003e21.10 Performance Tip: Avoid Superfluous Function Calls 280\u003c\/p\u003e \u003cp\u003e21.11 Performance Tip: Avoid Creating Large New Objects 281\u003c\/p\u003e \u003cp\u003e21.12 Further Reading 281\u003c\/p\u003e \u003cp\u003e21.13 Glossary 281\u003c\/p\u003e \u003cp\u003e\u003cb\u003ePart III Specialized or Advanced Topics 283\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003e\u003cb\u003e22 Computer Memory and Data Structures 285\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003e22.1 Virtual Memory, the Stack, and the Heap 285\u003c\/p\u003e \u003cp\u003e22.2 Example C Program 286\u003c\/p\u003e \u003cp\u003e22.3 Data Types and Arrays in Memory 286\u003c\/p\u003e \u003cp\u003e22.4 Structs 287\u003c\/p\u003e \u003cp\u003e22.5 Pointers, the Stack, and the Heap 288\u003c\/p\u003e \u003cp\u003e22.6 Key Data Structures 292\u003c\/p\u003e \u003cp\u003e22.7 Further Reading 297\u003c\/p\u003e \u003cp\u003e22.8 Glossary 297\u003c\/p\u003e \u003cp\u003e\u003cb\u003e23 Maximum-Likelihood Estimation and Optimization 299\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003e23.1 Maximum-Likelihood Estimation 299\u003c\/p\u003e \u003cp\u003e23.2 A Simple Example: Fitting a Line 300\u003c\/p\u003e \u003cp\u003e23.3 Another Example: Logistic Regression 301\u003c\/p\u003e \u003cp\u003e23.4 Optimization 302\u003c\/p\u003e \u003cp\u003e23.5 Gradient Descent 303\u003c\/p\u003e \u003cp\u003e23.6 Convex Optimization 306\u003c\/p\u003e \u003cp\u003e23.7 Stochastic Gradient Descent 307\u003c\/p\u003e \u003cp\u003e23.8 Further Reading 308\u003c\/p\u003e \u003cp\u003e23.9 Glossary 308\u003c\/p\u003e \u003cp\u003e\u003cb\u003e24 Deep Learning and AI 309\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003e24.1 A Note on Libraries and Hardware 310\u003c\/p\u003e \u003cp\u003e24.2 A Note on Training Data 310\u003c\/p\u003e \u003cp\u003e24.3 Simple Deep Learning: Perceptrons 311\u003c\/p\u003e \u003cp\u003e24.4 What Is a Tensor? 314\u003c\/p\u003e \u003cp\u003e24.5 Convolutional Neural Networks 315\u003c\/p\u003e \u003cp\u003e24.6 Example: The MNIST Handwriting Dataset 317\u003c\/p\u003e \u003cp\u003e24.7 Autoencoders and Latent Vectors 318\u003c\/p\u003e \u003cp\u003e24.8 Generative AI and GANs 321\u003c\/p\u003e \u003cp\u003e24.9 Diffusion Models 323\u003c\/p\u003e \u003cp\u003e24.10 RNNs, Hidden State, and the Encoder–Decoder 324\u003c\/p\u003e \u003cp\u003e24.11 Attention and Transformers 325\u003c\/p\u003e \u003cp\u003e24.12 Stable Diffusion: Bringing the Parts Together 326\u003c\/p\u003e \u003cp\u003e24.13 Large Language Models and Prompt Engineering 327\u003c\/p\u003e \u003cp\u003e24.14 Further Reading 328\u003c\/p\u003e \u003cp\u003e24.15 Glossary 329\u003c\/p\u003e \u003cp\u003e\u003cb\u003e25 Stochastic Modeling 331\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003e25.1 Markov Chains 331\u003c\/p\u003e \u003cp\u003e25.2 Two Kinds of Markov Chain, Two Kinds of Questions 333\u003c\/p\u003e \u003cp\u003e25.3 Hidden Markov Models and the Viterbi Algorithm 334\u003c\/p\u003e \u003cp\u003e25.4 The Viterbi Algorithm 336\u003c\/p\u003e \u003cp\u003e25.5 Random Walks 337\u003c\/p\u003e \u003cp\u003e25.6 Brownian Motion 338\u003c\/p\u003e \u003cp\u003e25.7 ARIMA Models 339\u003c\/p\u003e \u003cp\u003e25.8 Continuous-Time Markov Processes 339\u003c\/p\u003e \u003cp\u003e25.9 Poisson Processes 340\u003c\/p\u003e \u003cp\u003e25.10 Further Reading 341\u003c\/p\u003e \u003cp\u003e25.11 Glossary 341\u003c\/p\u003e \u003cp\u003e\u003cb\u003e26 Parting Words: Your Future as a Data Scientist 343\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003eIndex 345\u003c\/p\u003e  \u003cp\u003e\u003cb\u003eField Cady\u003c\/b\u003e is a data scientist, researcher and author based in Seattle, WA, USA. He has worked for a range of companies including Google, the Allen Institute for Artificial Intelligence, and several startups. He received a BS in physics and math from Stanford and did graduate work computer science at Carnegie Mellon. He is the author of \u003ci\u003eThe Data Science Handbook\u003c\/i\u003e (Wiley 2017).   \u003c\/p\u003e\u003cp\u003e\u003cb\u003ePractical, accessible guide to becoming a data scientist, updated to include the latest advances in data science and related fields.\u003c\/b\u003e \u003c\/p\u003e\u003cp\u003eBecoming a data scientist is hard. The job focuses on mathematical tools, but also demands fluency with software engineering, understanding of a business situation, and deep understanding of the data itself. This book provides a crash course in data science, combining all the necessary skills into a unified discipline. \u003c\/p\u003e\u003cp\u003eThe focus of \u003ci\u003eThe Data Science Handbook\u003c\/i\u003e is on practical applications and the ability to solve real problems, rather than theoretical formalisms that are rarely needed in practice. Among its key points are: \u003c\/p\u003e\u003cul\u003e\n\u003cli\u003eAn emphasis on software engineering and coding skills, which play a significant role in most real data science problems.\u003c\/li\u003e\n\u003cli\u003eExtensive sample code, detailed discussions of important libraries, and a solid grounding in core concepts from computer science (computer architecture, runtime complexity, and programming paradigms).\u003c\/li\u003e\n\u003cli\u003eA broad overview of important mathematical tools, including classical techniques in statistics, stochastic modeling, regression, numerical optimization, and more.\u003c\/li\u003e\n\u003cli\u003eExtensive tips about the practical realities of working as a data scientist, including understanding related jobs functions, project life cycles, and the varying roles of data science in an organization.\u003c\/li\u003e\n\u003cli\u003eExactly the right amount of theory. A solid conceptual foundation is required for fitting the right model to a business problem, understanding a tool’s limitations, and reasoning about discoveries.\u003c\/li\u003e\n\u003c\/ul\u003e \u003cp\u003eData science is a quickly evolving field, and this 2nd edition has been updated to reflect the latest developments, including the revolution in AI that has come from Large Language Models and the growth of ML Engineering as its own discipline. Much of data science has become a skillset that anybody can have, making this book not only for aspiring data scientists, but also for professionals in other fields who want to use analytics as a force multiplier in their organization.\u003c\/p\u003e","brand":"Wiley","offers":[{"title":"Default Title","offer_id":47990203875557,"sku":"NP9781394234493","price":75.0,"currency_code":"USD","in_stock":false}],"thumbnail_url":"\/\/cdn.shopify.com\/s\/files\/1\/1842\/7735\/files\/9781394234493.jpg?v=1761786894","url":"https:\/\/k12savings.com\/es\/products\/the-data-science-handbook-isbn-9781394234493","provider":"K12savings","version":"1.0","type":"link"}