Cider - Shop now
Buy used:
$25.04
Get Fast, Free Shipping with Amazon Prime
FREE delivery Thursday, May 8 to Nashville 37217 on orders shipped by Amazon over $35
Or Prime members get FREE delivery Tuesday, May 6. Order within 3 hrs 57 mins.
Used: Good | Details
Sold by ShipPlus
Condition: Used: Good
Comment: Ships directly from Amazon warehouse. Solid copy with sturdy binding. Cover/pages have some shelf wear/use. May have some markings scattered throughout. Satisfaction guaranteed!
Access codes and supplements are not guaranteed with used items.
Kindle app logo image

Download the free Kindle app and start reading Kindle books instantly on your smartphone, tablet, or computer - no Kindle device required.

Read instantly on your browser with Kindle for Web.

Using your mobile phone camera - scan the code below and download the Kindle app.

QR code to download the Kindle App

Follow the author

Something went wrong. Please try your request again later.

Web Scraping with Python: Collecting Data from the Modern Web 1st Edition

4.4 out of 5 stars 127 ratings

Learn web scraping and crawling techniques to access unlimited data from any web source in any format. With this practical guide, you’ll learn how to use Python scripts and web APIs to gather and process data from thousands—or even millions—of web pages at once.

Ideal for programmers, security professionals, and web administrators familiar with Python, this book not only teaches basic web scraping mechanics, but also delves into more advanced topics, such as analyzing raw data or using scrapers for frontend website testing. Code samples are available to help you understand the concepts in practice.

  • Learn how to parse complicated HTML pages
  • Traverse multiple pages and sites
  • Get a general overview of APIs and how they work
  • Learn several methods for storing the data you scrape
  • Download, read, and extract data from documents
  • Use tools and techniques to clean badly formatted data
  • Read and write natural languages
  • Crawl through forms and logins
  • Understand how to scrape JavaScript
  • Learn image processing and text recognition

There is a newer edition of this item:


From the brand


From the Publisher

Q&A with author Ryan Mitchell

What got you interested in web scraping?

In 2011, I started working for a company called Abine, that offered a service to remove customers’ personal information from various sites on the Internet. In the early days of the company, the process of looking for someone’s personal information on all of these sites, filling out all these opt-out forms, faxing emailing, compiling reports to send back to the customers -- it all took a lot of time! I started looking into ways to streamline these processes, and add additional features. I built bots that could search for profiles, store information in our database, fill out web forms, create documents, and send the emails and faxes automatically. Some of these sites were fairly bot-resistant, so I had to learn, and even invent, some interesting techniques to deal with them. I really fell in love with building bots and scraping the web, and continued to do it even after I left the company!

Why is Python such a good fit for web scraping and building web crawlers?

I’ll be honest: As far as high performance programming languages go, Python does not win many speed contests. But with web scraping, you’re not looking for speed -- sending and receiving data across the Internet will be thousands of times slower than any relatively tiny differences in language performance, so you can throw that metric out the window! What you need is something that’s lightweight, easy to deploy to remote machines, that can be installed and run anywhere, that’s easy to write and modify, and, perhaps most importantly: that has a plethora of well-document tools for just about any situation. Python has all of these in spades.

What’s the most interesting way you’ve used web scraping, for professional or side projects?

One of my favorite scraping projects, and something I introduce in Web Scraping with Python, is scraping Wikipedia for historical edits by IP address, time of the edit, and language. You can resolve the IP address to a geographic location, and explore when and where speakers of different languages are making edits. Lots of interesting sociological research potential there!

A recent hobby of mine has also been automated CAPTCHA solving. I really enjoy analyzing new types of CAPTCHAs for vulnerabilities, writing scripts to pre-process the images, creating data sets for machine learning algorithms, and seeing how high I can get the success percentage of my bots! No real practical applications these days, but you never know when it will come in handy.

What information do you hope that readers of your book will walk away with?

I try to stress a couple of things throughout the book:

First, no website is bot-proof. Attempts to make websites more bot-proof generally also result in a loss of usability for human users. That loss of usability may be in the form of slower loading times, poor browser compatibility, lack of accessibility for users with mobility or visual impairments, or users on mobile devices. And many of these measures have no real deterring effect on web scrapers. If you can view the data in a browser, you can capture it with a scraper.

Second, writing web scrapers that capture the data you want often involve combining multiple techniques, some creative thinking, and a dash of laziness. I can’t count the number of times people have asked me to build a bot, or to help them build a bot, to collect data that could be easily obtained through an API! So sometimes your data collection problem can be solved using the information from only a single chapter in the book. On the other hand, I also provide an example of a web scraper that uses JavaScript execution, HTML parsing, DOM interaction, and optical character recognition, all in one piece of code, in order to extract the text from book previews on Amazon! (Sorry, Amazon!) When faced with a web scraping problem you should always 'work the steps' to try formulate a data extraction and processing plan -- it’s not just about learning a single library or command!

What’s the most exciting or important thing happening in your space right now?

Like many fields, especially computer science fields, there’s a lot being done with machine learning and big data. The percentage of page requests performed by humans and bots is about 50/50 right now, and as more humans are getting on the Internet, more bots are too -- and outpacing them! There’s just so much data, and so many machines collecting that data, and so many connections we haven’t been able to make before, waiting to be made. And these aren’t just data scientists and server farm owners making them, either! The kind of research that once might have required months or years of surveys and data collection are now just a Python script, a database, and a weekend of coding away!

Editorial Reviews

About the Author

Ryan Mitchell is a Software Engineer at LinkeDrive in Boston, where she develops their API and data analysis tools. She is a graduate of Olin College of Engineering, and is a Masters degree student at Harvard University School of Extension Studies. Prior to joining LinkeDrive, she was a Software Engineer working on web scraping and data analysis at Abine.

Product details

  • Publisher ‏ : ‎ O'Reilly Media; 1st edition (July 24, 2015)
  • Language ‏ : ‎ English
  • Paperback ‏ : ‎ 256 pages
  • ISBN-10 ‏ : ‎ 1491910291
  • ISBN-13 ‏ : ‎ 978-1491910290
  • Item Weight ‏ : ‎ 14.6 ounces
  • Dimensions ‏ : ‎ 7.01 x 0.54 x 9.17 inches
  • Customer Reviews:
    4.4 out of 5 stars 127 ratings

About the author

Follow authors to get new release updates, plus improved recommendations.
Ryan Mitchell
Brief content visible, double tap to read full content.
Full content visible, double tap to read brief content.

Ryan Mitchell is the author of Unlocking Python (Wiley) and Web Scraping with Python (O’Reilly). She has six LinkedIn Learning courses, including Python Essential Training, the leading Python course on the platform. An expert in web scraping, application security, and data science, Ryan has hosted workshops and spoken at many events, including Data Day Texas and DEF CON.

Ryan holds a master’s degree in software engineering from Harvard University Extension School and a bachelor’s in engineering from Olin College of Engineering. She is currently a principal software engineer at the Gerson Lehrman Group where she does back end development and data science on the search team.

Customer reviews

4.4 out of 5 stars
127 global ratings

Review this product

Share your thoughts with other customers

Customers say

Customers find the book easy to translate lessons into practical projects and appreciate its readability. The writing style is well-structured and clear, with great examples throughout. They find the content interesting, and one customer notes it's particularly useful for those with some knowledge of Python. The information content receives mixed feedback, with one customer mentioning it covers too much basic information.

AI-generated from the text of customer reviews

22 customers mention "Use"20 positive2 negative

Customers find the book provides good basic information and can be easily translated into practical projects, with one customer noting it serves as a great investment in data acquisition skills.

"...I have found it useful in my scraping at work and at home on multiple occasions. Easy read and a joy to have read. Thank you Ryan for this book!" Read more

"...coder (python or otherwise) this book is a great investment in your data acquisition skills...." Read more

"...seen in a book to the experience of sitting with a friendly, approachable expert who is ready to answer your questions intelligently and in a..." Read more

"...I have been able to easily translate the lessons into practical projects...." Read more

17 customers mention "Readability"17 positive0 negative

Customers find the book readable and enjoyable to read, with one customer noting it is to the point.

"...THIS BOOK IS PACKED FULL OF INFORMATION. It is a joy to read and always has answers when I am looking...." Read more

"...However, if you are new to web scraping, this is a great introductory book to the tools available in Python and their uses...." Read more

"This is a truly excellent book...." Read more

"It's a great book. I got follow all the examples shown by the author, although I'm not fluent in Python. The matter isn't so easy as it looks...." Read more

13 customers mention "Writing style"13 positive0 negative

Customers appreciate the writing style of the book, finding it very well written and concise, with one customer noting the clear chapter titles.

"...Easy read and a joy to have read. Thank you Ryan for this book!" Read more

"...the issues related to machine learning, Selenium/webdrivers and text processing. Only one topic is not disclosed - the CAPTCHAs recognition...." Read more

"Clear, concise, and engaging. I have been able to easily translate the lessons into practical projects...." Read more

"Well-written book on a slightly obscure subject that does have some real uses...." Read more

4 customers mention "Encyclopedia content"4 positive0 negative

Customers appreciate the encyclopedia content of the book, with great examples and code samples, and one customer specifically mentions interesting examples with Wikipedia.

"...There were a few interesting examples with Wikipedia and how to crawl it, but there needed to be much more...." Read more

"Great code samples! Easy to read and follow." Read more

"A very well written book with great examples." Read more

"Great examples and exercises to try out." Read more

4 customers mention "Interest"4 positive0 negative

Customers find the book interesting.

"Clear, concise, and engaging. I have been able to easily translate the lessons into practical projects...." Read more

"informative and interesting hot topic recommended." Read more

"Interesting and useful..." Read more

"Clear, intriguing and effective...." Read more

4 customers mention "Information content"2 positive2 negative

Customers have mixed opinions about the information content of the book, with one customer appreciating how it sets up the basics, while another finds it goes over too much basic information.

"...Then...I got Ryan Mitchell's book. This book sets you up with not only the basics, but also more advanced techniques that you'll need to really..." Read more

"I really wanted to like this book but for 200 pages it goes over way too much basic information...." Read more

"that Python can be useful as a simple automation tool..." Read more

"Good at describing how web scraping works but does not go into a lot more details on most areas...." Read more

Top reviews from the United States

  • Reviewed in the United States on August 19, 2015
    This book is excellent. I love the focus on Python 3 and all the techniques presented. I felt like it was Christmas day just reading the Table of Contents. THIS BOOK IS PACKED FULL OF INFORMATION. It is a joy to read and always has answers when I am looking. I have found it useful in my scraping at work and at home on multiple occasions. Easy read and a joy to have read. Thank you Ryan for this book!
    9 people found this helpful
    Report
  • Reviewed in the United States on September 26, 2015
    During last year, my point of view to acceptability of Python for real life projects has changed very seriously. Earlier, I thought, that Python can be useful as a simple automation tool, but the language is powerful, flexible and gave me sense of control under my code. When I finished the book, I found that 40 pages of remarks, link and ideas have been written by me. It is the great result – the book inspired me to dig deeper the issues related to machine learning, Selenium/webdrivers and text processing. Only one topic is not disclosed - the CAPTCHAs recognition. It is still unclear how to do that in the reality.
    Anyway, I like to say THANKS to Ryan Mitchell – your book is awesome!
    6 people found this helpful
    Report
  • Reviewed in the United States on May 4, 2018
    90% of the time this book has exactly what I needed to solve a real world problem. 10% of the time, it went over my head and I spent hours on YouTube shoring up the material I couldn't gain from the book. In example, the section on storing data in MySQL via PyMySQL was a bit too short, unclear for me to get a real handle on how to automate db queries with Python.

    Nonetheless, as an entry level python programmer, I found the book mostly readily accessible. If you're an experienced coder (python or otherwise) this book is a great investment in your data acquisition skills.

    I'll end on a positive note - my boss likes weather updates for our offices in four different cities (we do logistics.) He wants this report at 6:15am daily. I was able to write a .py script that scrapes the webpage, compiles results into a string, logs into my email account and sends the report to him daily, on time. Now I never have to worry about this early morning task again!

    If you need to automate the retrieval, processing and delivery of online information, this book is for you!
    One person found this helpful
    Report
  • Reviewed in the United States on August 21, 2016
    This is mostly a beginners' manual, so don't expect extremely complicated programs or tips. However, if you are new to web scraping, this is a great introductory book to the tools available in Python and their uses. In my case, I had learned most of what was in the book using trial and error (and lots of time going through Stack Exchange questions!). If I had had this book before, I would have saved a lot of time learning the basics.
    8 people found this helpful
    Report
  • Reviewed in the United States on December 27, 2015
    This is a truly excellent book. It is the closest I have seen in a book to the experience of sitting with a friendly, approachable expert who is ready to answer your questions intelligently and in a supportive way. You need the very basics of Python as can be learned from the Pycharm educational version but everything else is provided.
    8 people found this helpful
    Report
  • Reviewed in the United States on February 27, 2016
    I really wanted to like this book but for 200 pages it goes over way too much basic information. For example, the author introduces the Python set data structure, but describes in it like reader is totally unfamiliar with sets. Later GET/POST is discussed also as if the reader has never heard of it. There are tons of topics where the description sounds as if the reader has never programmed at all. At one point file extensions were introduced... it was these elementary descriptions which were incredibly annoying to me.

    Even the appendix was poorly constructed. There was an entire paragraph about how Python does not use semi-colons. Then there were reminders that languages such as Java and C++ need semi-colons, in case you switch back... was this written for a first time programmer? The last appendix was 10 pages about legal ramifications of scraping; a lot of rambling here and wasted space.

    Speaking of wasted space, sometimes the author shows an example which outputs junk data for half a page. There was no need for these parts to be in print.

    On the content and examples themselves you would be better served just by going to the documentation for BeautifulSoup, Selenium, and the other libraries introduced. Another negative was the lack of on how to crawl Javascript; there was mention but just to say your code may break if there is too much Javascript. There were a few interesting examples with Wikipedia and how to crawl it, but there needed to be much more.

    The chapters never seemed to link together for me. A lot of chapters cover something totally random from the last, and at the end I felt like I had a bunch of random techniques from different libraries. I can at least say I got a better idea of how to design a web crawler though.

    This book is incredibly short if you factor in the filler and elementary info. The author should have spent a lot more time giving useful examples rather than describing why Python sets are different from lists.
    13 people found this helpful
    Report
  • Reviewed in the United States on February 18, 2017
    Clear, concise, and engaging. I have been able to easily translate the lessons into practical projects. The author spells out the considerations for using each particular technique and the situations in which some might be more effective than others. The chapters are mutually reinforcing but you may find yourself skipping around if you already know what you want to get out of web scraping. This should not be your first python book.
    One person found this helpful
    Report
  • Reviewed in the United States on July 23, 2015
    I had search through internet about web scraping last year. I surprised that I could not find a book about it. Recently, I needed to do some tasks about web scraping. I searched about it in internet again. I found this book from amazon. Then I bought it.
    I found this book very useful. I copied some examples of this book and then modified it for my works.
    It was really helpful.
    The author explained every details of web scraping domain. I could feel that she wrote this book in heart.
    Throughout this book, it uses python library BeautifulSoup to do web scraping.
    3 people found this helpful
    Report

Top reviews from other countries

Translate all reviews to English
  • Amazon Customer
    5.0 out of 5 stars Must read.
    Reviewed in India on June 4, 2017
    Author has an excellent knowledge on subject. Every chapter is well presented.
  • Yehonatan
    5.0 out of 5 stars Amazing Book. You'll never regret buying it.
    Reviewed in Germany on December 21, 2015
    Really amazing book. this book help you get started with web scraping super fast, it gives you allot of skills for web scraping that lets you the power to explore new ways and methods for scraping. Hence that web scraping is such a big area one book cant cover it all and that makes a book for the subject super hard to write but this book does it. It teaches you the things you need to get started with web scraping and improve far better that you would do on your own.
  • Amazon Customer
    5.0 out of 5 stars A great book for beginners in webscraping
    Reviewed in the United Kingdom on April 25, 2016
    A great book for beginners in webscraping!
    This book is a great first stepping stone into getting started with web scraping.
    The book does require you to know python on a moderate level in order to be more efficient in learning these concepts, I would recommend this book to anyone who wants to learn the core concepts of web scraping.
  • Herve
    5.0 out of 5 stars Web Scraping with Python par Ryan Mitchell
    Reviewed in France on December 1, 2016
    Excellente introduction au web scrapping à l'aide de python.
    Les outils essentiels sont tous explorés en surface (beautifulsoup, mySQL, selenium, pil, ...) mais des liens très utiles sont cités afin de pouvoir aller plus loin.
    Les exemples sont clairs, bien documentés, en python 3.x mais très aisément adaptables en 2.7.
    Donc un ouvrage à conseiller +++ a toutes personne cherchant une introduction solide au sujet.
    Report
  • Amazon Customer
    3.0 out of 5 stars Three Stars
    Reviewed in Canada on November 29, 2016
    was reading through this - real insightful