Required Software
  • Software to web scrape in JavaScript
  • (optional) Note about deprecation of Request/Request-Promise
What you should ALWAYS check before even writing a web scraper!
  • This could save you A LOT of time and effort!
Intro to CSS selectors and tools we use for scraping
  • Intro to section
  • Using Chrome Developer Tools
  • Selecting our element
  • Building our first scraper!
  • Selecting multiple elements
  • Selecting using CSS ID
  • Selecting using CSS classes
  • Selecting using HTML attributes
  • You're on your way to become a scraping ninja!
Scraping HTML tables with Request/Cheerio
  • Intro to section
  • Structure of a HTML table
  • Data Structure in JavaScript
  • Creating selector in Chrome Tools
  • Scraping all table cells in Chrome Tools
  • Scraping data in Nodejs with Cheerio/Request
  • Scraping Company Names in Nodejs
  • Scraping all table columns
  • BONUS - dynamic table headers when scraping tables
Scraping software jobs on Craigslist using Puppeteer
  • Intro to project
  • Why are we using Puppeteer instead of Nodejs Request?
  • Initialising project
  • Opening a URL with Puppeteer
  • What data are we scraping?
  • Data Structure
  • Job Title Css Selector
  • Scraping job title using Cheerio
  • Scraping description url
  • Creating array of scraping objects
  • Scraping job post date
  • Scraping Neighborhood data
  • Scraping List of Pages with Puppeteer
  • Limiting Scraping Requests per Second
  • Scraping job descriptions from different pages
  • Scraping compensation from job listings
  • mLab is now MongoDB Atlas
  • Setting up MongoDB database with MLab
  • Connecting to MongoDB database with Mongoose
  • Creating Listing mongoose schema
  • Saving listing data to MongoDB
Web Scraping Craigslist Jobs using Nodejs Request
  • Introduction
  • Project Setup
  • Getting Html from website
  • Creating sample of data to collect
  • Title/URL From Jobs
  • Scraping Time Job Was Posted
  • Job Neighborhood
  • Scraping Job Descriptions
  • Finish Description and Compensation
  • Outtro
What to do if you're blocked?
  • Help! I'm blocked!
  • What can you do if you're blocked?
  • Scraping API's
  • Using a proxy in Request
Building a web scraper the TDD way
  • Initializing project and adding packages
  • Creating tests folder and setting up test script
  • Writing our first simple test
  • Making our first simple test pass!
  • Getting HTML from the website for our tests
  • Reading HTML file for our tests
  • Writing out our tests
  • Getting title test to pass
  • Making URL test pass!
  • Making hood test pass!
  • Making the final test for datePosted pass!
  • End notes + refactoring
Exporting web scraping results to CSV
  • Exporting web scraping results to CSV
Handling Network Problems
  • Handling Network Problems in our Craigslist scraper
Robots.txt parsing
  • What is robots.txt?
  • Initialising project
  • Example of usage robots-parser
  • Parsing robots.txt from a real site
Scraping Sites with Pagination
  • Simple Pagination Scraper in 10 mins!
Scraping Sites with Authentication
  • Intro to authentication scraping project
  • Looking at Login request
  • Recreating login in Postman
  • Creating our login request in Nodejs
  • Using Puppeteer instead of Request
Scraping a website with Cookie/Session authentication and CSRF tokens
  • Intro to project
  • Replicating login request inside Postman - seeing how cookies are required
  • Building out our request inside Node.js and enabling cookieJar
  • Getting CSRF token from saved cookies and using it in our POST login request
Scraping Nordstrom.com - how to find a secret API and avoid building a scraper!
  • Intro To Nordstrom.com project