Web Scraping with Python: Collecting Data from the Modern Web

Front Cover
"O'Reilly Media, Inc.", Jun 15, 2015 - Computers - 256 pages

Learn web scraping and crawling techniques to access unlimited data from any web source in any format. With this practical guide, you’ll learn how to use Python scripts and web APIs to gather and process data from thousands—or even millions—of web pages at once.

Ideal for programmers, security professionals, and web administrators familiar with Python, this book not only teaches basic web scraping mechanics, but also delves into more advanced topics, such as analyzing raw data or using scrapers for frontend website testing. Code samples are available to help you understand the concepts in practice.

  • Learn how to parse complicated HTML pages
  • Traverse multiple pages and sites
  • Get a general overview of APIs and how they work
  • Learn several methods for storing the data you scrape
  • Download, read, and extract data from documents
  • Use tools and techniques to clean badly formatted data
  • Read and write natural languages
  • Crawl through forms and logins
  • Understand how to scrape JavaScript
  • Learn image processing and text recognition
 

What people are saying - Write a review

User Review - Flag as inappropriate

as we move from a graphical user interface to voice command and control, it's important to learn. Ryan Mitchell is a leading teacher in the area of web scraping. I'll continue as I read through it.

User Review - Flag as inappropriate

This is a hot book, by a hot author.

Selected pages

Contents

Section 1
Section 2
Section 3
Section 4
Section 5
Section 6
Section 7
Section 8
Section 9
Section 10
Section 11
Section 12
Section 13
Section 14
Section 15
Section 16

Other editions - View all

Common terms and phrases

About the author (2015)

Ryan Mitchell is a Software Engineer at LinkeDrive in Boston, where she develops their API and data analysis tools. She is a graduate of Olin College of Engineering, and is a Masters degree student at Harvard University School of Extension Studies. Prior to joining LinkeDrive, she was a Software Engineer working on web scraping and data analysis at Abine.

Bibliographic information