{"product_id":"automated-data-collection-with-r-isbn-9781118834817","title":"Automated Data Collection with R","description":"\u003cp\u003e\u003cb\u003eA hands on guide to web scraping and text mining for both beginners and experienced users of R\u003c\/b\u003e\u003c\/p\u003e \u003cul\u003e \u003cli\u003eIntroduces fundamental concepts of the main architecture of the web and databases and covers HTTP, HTML, XML, JSON, SQL.\u003c\/li\u003e \u003cli\u003eProvides basic techniques to query web documents and data sets (XPath and regular expressions).\u003c\/li\u003e \u003cli\u003eAn extensive set of exercises are presented to guide the reader through each technique.\u003c\/li\u003e \u003cli\u003eExplores both supervised and unsupervised techniques as well as advanced techniques such as data scraping and text management.\u003c\/li\u003e \u003cli\u003eCase studies are featured throughout along with examples for each technique presented.\u003c\/li\u003e \u003cli\u003eR code and solutions to exercises featured in the book are provided on a supporting website.\u003c\/li\u003e \u003c\/ul\u003e \u003cp\u003ePreface xv\u003c\/p\u003e \u003cp\u003e\u003cb\u003e1 Introduction 1\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003e1.1 Case study: World Heritage Sites in Danger 1\u003c\/p\u003e \u003cp\u003e1.2 Some remarks on web data quality 7\u003c\/p\u003e \u003cp\u003e1.3 Technologies for disseminating, extracting, and storing web data 9\u003c\/p\u003e \u003cp\u003e1.4 Structure of the book 13\u003c\/p\u003e \u003cp\u003e\u003cb\u003ePart One A Primer on Web and Data Technologies 15\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003e\u003cb\u003e2 HTML 17\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003e2.1 Browser presentation and source code 18\u003c\/p\u003e \u003cp\u003e2.2 Syntax rules 19\u003c\/p\u003e \u003cp\u003e2.3 Tags and attributes 24\u003c\/p\u003e \u003cp\u003e2.4 Parsing 32\u003c\/p\u003e \u003cp\u003e\u003cb\u003e3 XML and JSON 41\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003e3.1 A short example XML document 42\u003c\/p\u003e \u003cp\u003e3.2 XML syntax rules 43\u003c\/p\u003e \u003cp\u003e3.3 When is an XML document well formed or valid? 51\u003c\/p\u003e \u003cp\u003e3.4 XML extensions and technologies 53\u003c\/p\u003e \u003cp\u003e3.5 XML and R in practice 60\u003c\/p\u003e \u003cp\u003e3.6 A short example JSON document 68\u003c\/p\u003e \u003cp\u003e3.7 JSON syntax rules 69\u003c\/p\u003e \u003cp\u003e3.8 JSON and R in practice 71\u003c\/p\u003e \u003cp\u003e\u003cb\u003e4 XPath 79\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003e4.1 XPath--a query language for web documents 80\u003c\/p\u003e \u003cp\u003e4.2 Identifying node sets with XPath 81\u003c\/p\u003e \u003cp\u003e4.3 Extracting node elements 93\u003c\/p\u003e \u003cp\u003e\u003cb\u003e5 HTTP 101\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003e5.1 HTTP fundamentals 102\u003c\/p\u003e \u003cp\u003e5.2 Advanced features of HTTP 116\u003c\/p\u003e \u003cp\u003e5.3 Protocols beyond HTTP 124\u003c\/p\u003e \u003cp\u003e5.4 HTTP in action 126\u003c\/p\u003e \u003cp\u003e\u003cb\u003e6 AJAX 149\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003e6.1 JavaScript 150\u003c\/p\u003e \u003cp\u003e6.2 XHR 154\u003c\/p\u003e \u003cp\u003e6.3 Exploring AJAX with Web Developer Tools 158\u003c\/p\u003e \u003cp\u003e\u003cb\u003e7 SQL and relational databases 164\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003e7.1 Overview and terminology 165\u003c\/p\u003e \u003cp\u003e7.2 Relational Databases 167\u003c\/p\u003e \u003cp\u003e7.3 SQL: a language to communicate with Databases 175\u003c\/p\u003e \u003cp\u003e7.4 Databases in action 188\u003c\/p\u003e \u003cp\u003e\u003cb\u003e8 Regular expressions and essential string functions 196\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003e8.1 Regular expressions 198\u003c\/p\u003e \u003cp\u003e8.2 String processing 207\u003c\/p\u003e \u003cp\u003e8.3 A word on character encodings 214\u003c\/p\u003e \u003cp\u003e\u003cb\u003ePart Two A Practical Toolbox forWeb Scraping and Text Mining 219\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003e\u003cb\u003e9 Scraping the Web 221\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003e9.1 Retrieval scenarios 222\u003c\/p\u003e \u003cp\u003e9.2 Extraction strategies 270\u003c\/p\u003e \u003cp\u003e9.3 Web scraping: Good practice 278\u003c\/p\u003e \u003cp\u003e9.4 Valuable sources of inspiration 290\u003c\/p\u003e \u003cp\u003e\u003cb\u003e10 Statistical text processing 295\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003e10.1 The running example: Classifying press releases of the British government 296\u003c\/p\u003e \u003cp\u003e10.2 Processing textual data 298\u003c\/p\u003e \u003cp\u003e10.3 Supervised learning techniques 307\u003c\/p\u003e \u003cp\u003e10.4 Unsupervised learning techniques 313\u003c\/p\u003e \u003cp\u003e\u003cb\u003e11 Managing data projects 322\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003e11.1 Interacting with the file system 322\u003c\/p\u003e \u003cp\u003e11.2 Processing multiple documents\/links 323\u003c\/p\u003e \u003cp\u003e11.3 Organizing scraping procedures 328\u003c\/p\u003e \u003cp\u003e11.4 Executing R scripts on a regular basis 334\u003c\/p\u003e \u003cp\u003e\u003cb\u003ePart Three A Bag of Case Studies 341\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003e\u003cb\u003e12 Collaboration networks in the US Senate 343\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003e12.1 Information on the bills 344\u003c\/p\u003e \u003cp\u003e12.2 Information on the senators 350\u003c\/p\u003e \u003cp\u003e12.3 Analyzing the network structure 353\u003c\/p\u003e \u003cp\u003e12.4 Conclusion 358\u003c\/p\u003e \u003cp\u003e\u003cb\u003e13 Parsing information from semistructured documents 359\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003e13.1 Downloading data from the FTP server 360\u003c\/p\u003e \u003cp\u003e13.2 Parsing semistructured text data 361\u003c\/p\u003e \u003cp\u003e13.3 Visualizing station and temperature data 368\u003c\/p\u003e \u003cp\u003e\u003cb\u003e14 Predicting the 2014 Academy Awards using Twitter 371\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003e\u003cb\u003e15 Mapping the geographic distribution of names 380\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003e15.1 Developing a data collection strategy 381\u003c\/p\u003e \u003cp\u003e15.2 Website inspection 382\u003c\/p\u003e \u003cp\u003e15.3 Data retrieval and information extraction 384\u003c\/p\u003e \u003cp\u003e15.4 Mapping names 387\u003c\/p\u003e \u003cp\u003e15.5 Automating the process 389\u003c\/p\u003e \u003cp\u003e\u003cb\u003e16 Gathering data on mobile phones 396\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003e16.1 Page exploration 396\u003c\/p\u003e \u003cp\u003e16.2 Scraping procedure 404\u003c\/p\u003e \u003cp\u003e16.3 Graphical analysis 406\u003c\/p\u003e \u003cp\u003e16.4 Data storage 408\u003c\/p\u003e \u003cp\u003e\u003cb\u003e17 Analyzing sentiments of product reviews 416\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003e17.1 Introduction 416\u003c\/p\u003e \u003cp\u003e17.2 Collecting the data 417\u003c\/p\u003e \u003cp\u003e17.3 Analyzing the data 426\u003c\/p\u003e \u003cp\u003e17.4 Conclusion 434\u003c\/p\u003e \u003cp\u003eReferences 435\u003c\/p\u003e \u003cp\u003eGeneral index 442\u003c\/p\u003e \u003cp\u003ePackage index 448\u003c\/p\u003e \u003cp\u003eFunction index 449\u003c\/p\u003e \u003cp\u003e\u003cb\u003eSimon Munzert\u003c\/b\u003e is the author of \u003ci\u003eAutomated Data Collection with R: A Practical Guide to Web Scraping and Text Mining\u003c\/i\u003e, published by Wiley.\u003c\/p\u003e \u003cp\u003e\u003cb\u003eChristian Rubba\u003c\/b\u003e is the author of \u003ci\u003eAutomated Data Collection with R: A Practical Guide to Web Scraping and Text Mining\u003c\/i\u003e, published by Wiley.\u003c\/p\u003e \u003cp\u003e\u003cb\u003ePeter Meißner\u003c\/b\u003e is the author of \u003ci\u003eAutomated Data Collection with R: A Practical Guide to Web Scraping and Text Mining\u003c\/i\u003e, published by Wiley.\u003c\/p\u003e \u003cp\u003e\u003cb\u003eDominic Nyhuis\u003c\/b\u003e is the author of \u003ci\u003eAutomated Data Collection with R: A Practical Guide to Web Scraping and Text Mining\u003c\/i\u003e, published by Wiley.\u003c\/p\u003e","brand":"Wiley","offers":[{"title":"Default Title","offer_id":47988780368101,"sku":"NP9781118834817","price":87.95,"currency_code":"USD","in_stock":false}],"thumbnail_url":"\/\/cdn.shopify.com\/s\/files\/1\/1842\/7735\/files\/9781118834817.jpg?v=1761781562","url":"https:\/\/k12savings.com\/es\/products\/automated-data-collection-with-r-isbn-9781118834817","provider":"K12savings","version":"1.0","type":"link"}