How to do web scrapping using python?

How to scrap web pages in python?

Scrapping is a very interesting topic but serious as well. Because it is not a legal work until website owner allows you to scrap it so if you are doing it, will completely be your responsibility. I'm going to show you how you can scrap a website and store it in a csv file. You can save it into database as well its up to your choice. There are various ways to scrap but I have come with BeautifulSoup library.

Please look into below example a complete flow of how to scrap all pages of a website:

 

from bs4 import BeautifulSoup

import requests

from csv import writer

import time

 

with open("blog_data.csv", "w", newline="") as csv_file:

    csv_writer = writer(csv_file)

    csv_writer.writerow(["Title", "URL", "Date"])

    page = 1

    while True:

        ##Don't use this URL its is just for reference.

        website_url = "https://www.rithmschool.com/blog?page={}".format(page)

        response = requests.get(website_url)

        soup = BeautifulSoup(response.text, "html.parser")

        articles = soup.find_all("article")

        articles_found = len(articles)

        print(len(articles))

        if articles_found == 0:

            break

 

        for article in articles:

            a_tag = article.find("a")

            url = a_tag.attrs["href"]

            title = a_tag.get_text()

            time_tag = article.find("time")

            datetime = time_tag["datetime"]

            csv_writer.writerow([title, url, datetime])

        

        page = page+1

        time.sleep(5)

 

 

I hope it is helpful for you to proceed with scrapping.