Table Of Content

Web Scraping Using Python: A Step-By-Step Tutorial Guide (2025)

Publish Date

July 08, 2025

Author

Scraping Intelligence

Web scraping is the process of extracting massive volume of data from websites. Python is one of the simplest programming languages and comes with many powerful web scraping libraries, making it a perfect fit for this task. Python is widely used for web scraping due to its ease of use, readability, and vast ecosystem of tools. With just a few lines of code, you can develop highly efficient web scrapers. If you want to explore data science as a beginner or are a professional seeking to automate research, then Python web scraping is a flexible and scalable solution.

This is a step-by-step tutorial where you'll learn how to build a basic Python scraper. Once developed, the scraper will navigate through each page of a website and extract data into CSV — one of the most common formats for structured data storage and exchange. After reading this tutorial, you'll understand the best Python libraries for web scraping and how to use them effectively. Walk through this guide on Web Scraping Using Python and you will be able to build your own Python-based data extraction script with ease.

Requirements to Build a Python Web Scraping

Python 3.4+: It offers useful features such as improved security and better library support, making it more effective and future-proof than older versions.
pip: If you are using Python version 3.4 or later, then you don't need to install pip manually, as it comes pre-installed.

Windows

On Windows, you need to download and install the latest version of Python. When installing Python, ensure you check the 'Add python.exe to PATH' option. This will allow Windows to locate and execute both python and pip command from the terminal.

Linux

Python is preinstalled on Linux; however, it may not be the newest release. Therefore, you need to run a command to install or update Python on a Linux. On Debian-based open-source OS, you have to run:

sudo apt-get install python3

After this you have to verify that Python is successfully installed or not. To do so use the by following command:

python --your version

If Python is installed, you will see a message like 'Python 3.xx' along with the Python prompt >>>. Here, 3.xx denotes the version of Python installed on your PC.

If the version of Python is 3.11.0 then you should see:

Python 3.11.0

macOS

To install Python on macOS, you have to download it from the official website. Then, open the downloaded .pkg file to launch the Python installer. Follow the installation wizard. Once the installation is complete, Python will be located at /usr/local/bin/python3.

To verify installation, open terminal and run:

python3 –your version

You will be able to see something like Python 3.xx.x

Best Python Libraries for Web Scraping in 2025

A Python web scraping library is a tool to extract data from websites with no hassle. While it's possible to extract data using JavaScript or any language without frameworks or libraries, this is not an ideal solution. In such cases, you have to handle HTTP requests manually, parsing, and other complexities, which makes the code brittle, slow, and difficult to maintain. Therefore, it's better to use third-party web scraping libraries such as Beautiful Soup and Selenium. Let's learn how to build a scraper using them!

Beautiful Soup

The Beautiful Soup Python library can help you to scrape information from web pages effortlessly. It enables you to perform HTML parsing in Python for scraping both HTML and XML pages. You can even search, navigate, and modify web data using parser. This is a versatile solution that can save your time.

To install the web scraping using Python Beautiful Soup library, run the command below:

pip install beautifulsoup4

Selenium

Selenium is a powerful open-source Python library that can extract data from any website. It is well-suited for imitating human browsing behavior, making it ideal for extracting data from websites that rely on AJAX or JavaScript to render content. Selenium is an excellent solution to build web scraper without using any other Python scraping libraries.

Run the following command to install the Selenium library:

pip install selenium

Building Web Scraping with Python

Now, let's see how to build your own web scraper in Python.

We are using a Quotes to Scrap website that contains a paginated list of quotes to extract text, author, and list of tags. After this extracted data will be converted to a CSV file.

Are you ready now? Let's go and learn web scraping in Python step by step:

Step 1: Choose the Appropriate Python Libraries for Extracting Data from Web

Before we proceed with Python web scraping examples for beginners, you need to identify which Python scraping libraries are best suited to help you achieve your goal. Now, firs visit the targeted website in the browser you are using. Right-click anywhere on the website and select 'Inspect' from the menu. Alternatively, you can press F12 to do the same. You will see that DevTools browser is opened. Navigate to Network tab and then reload page. You will notice that our targeted website doesn't make any Fetch or XHR request.

This indicates that the 'Quotes to Scrape' site does not rely on JavaScript to retrieve data dynamically. The pages returned by the server already contain the data of interest. This is because it is a static content website.

It is because our targeted site is not using JavaScript to render the page or retrieve data, in fact you do not require web scraping Python Selenium to extract it. However, you can use it will take time and resources. So it is good practice that you avoid such situation by using Beautiful Soup with Requests.

After understanding the Python libraries needed, we can move on to the next step: building a simple scraper with Beautiful Soup.

Step 2: Initialize a Python Project

First, you need to set up your Python web scraping project. Technically, all you need is a .py file. However, you can also use advanced IDEs such as Google Colab, Jupyter Notebook/JupyterLab, or Replit. These IDEs are ideal for quickly writing and testing Python code without requiring additional tools or manual setup. In this guide, you'll learn how to set up a Python project using the PyCharm IDE.

Now, open PyCharm and select "File > New Project." A new project popup will appear. In this window, choose "Pure Python" and create a new project. Suppose we name the project "python-web-scraper." Now, click "Create," and you will have access to a blank Python project. By default, PyCharm automatically initializes a main.py file. For better clarity, you can rename it to scraper.py.

You'll notice that PyCharm automatically initializes the Python file with a few lines of starter code. You'll need to delete these lines to start coding from scratch.

Next, to install the project dependencies, run the following command in your terminal to install Requests and BeautifulSoup:

pip install requests beautifulsoup4

This command will install both libraries simultaneously. Wait for the installation process to complete. You are now ready to use Requests and Beautiful Soup to create your web crawler and scraper in Python. At this point, make sure to import the two libraries by adding the following lines at the beginning of the scraper.py script:

import requests
from bs4 import BeautifulSoup

PyCharm IDE will display the above two lines in grey because the libraries haven't been used in the code yet. If the code is underlined in red, it indicates that something went wrong with the installation. In this case, try reinstalling the libraries. Now, you are all set to start writing the Python web scraping logic.

Step 3: Connect to the Target URL

The first foremost thing to do in a web scraper is to connect to targeted site. Grab complete URL of target page from your browser. The entire web page of targeted website URL will look like:

https://quotes.toscrape.com
Use the requests library and enter the following code to download the web page: 
page = requests.get('https://quotes.toscrape.com')

The above line assigns the result of the requests.get() method to the variable page. It performs a GET request to the specified URL and returns a Response object containing the server's response to the HTTP request.

If the request is executed successfully, then page.status_code will contain 200. Here the HTTP 200 OK response code states that the HTTP request is executed without any error. If there is an error, the HTTP status code will be a 4xx or 5xx. This happens due to few reasons, but you have to keep in mind that most sites block HTTP requests that do not contain a valid user agent.

Use the following code to set a valid User-Agent header in the request:

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/107.0.0.0 Safari/537.36'
}
# Missing closing parenthesis here:
page = requests.get('https://quotes.toscrape.com', headers=headers)

Requests will now execute the HTTP request with the headers passed as a parameter.

Step 4: Parse HTML Content

To parse the HTML content returned by the server after the GET request, pass page.text to the BeautifulSoup() constructor as shown below:

soup = BeautifulSoup(page.text, 'html.parser')

Here, the second argument defines the parser that Beautiful Soup use.

The soup variable now contains a BeautifulSoup object. It represents a tree structure generated by parsing the HTML content from page.text using built-in html.parser of Python.

You can now use it to get the desired HTML element from the web page. Let's see how it can be done.

Step 5: Select HTML Elements with Beautiful Soup

Beautiful Soup provides some approaches for selecting elements from the Document Object Model(DOM). The starting points are:

find(): This function will return the value of first HTML element that matches the input selector strategy, if one exists.
find_all(): This function will return list of elements that match the selector condition passed a parameter.

According to the parameters passed to the two methods above, they will search for elements on the web page in different ways. Now you can select HTML elements:

By tag:

# get all  elements
# on the page
h1_elements = soup.find_all('h1')

By id:

# get the element with id="main-title"
main_title_element = soup.find(id='main-title')

By text:

# find the footer element
# based on the text it contains
footer_element = soup.find(text={'Powered by WordPress'})

By attribute:

# find the email input element
# through its "name" attribute
email_element = soup.find(attrs={'name': 'email'})

By class:

# find all the centered elements
# on the page
centered_element = soup.find_all(class_='text-center')
Combine the above methods to extract your desired HTML element from the web page.
Let's see the example: 
# get all "li" elements
# in the ".navbar" element
soup.find(class_='navbar').find_all('li')
for the ease of use, beautiful Soup also has select method. This allows you to apply a CSS selector directly: 
# get all "li" elements
# in the ".navbar" element
soup.select('.navbar > li')

You learned that the first step is to identify the HTML element you want. Then, you need to define a selection strategy for that element containing the data you wish to scrape.

This can be achieved using your web browser's developer tools. In the Chrome browser, right-click on the HTML tag you are looking for and then select "Inspect." In this case, do it on a quote HTML tag.

Here you will be able to see that the quote <div> HTML element is identified by quote class. It contains:

The quote text in a <span> HTML tag.
The author of the quote in a <small> HTML tag.
A list of elements in a <div> tag, each contained in <a> HTML tag.

You can now use the following CSS selectors on .quote to extract data:

.text
.author
.tags .tag

Step 6: Extract Data from the Elements

To begin with, you need a data structure to store the scraped data. To achieve this, initialize an array variable.

quotes = []

After this, use soup to extract the quote elements from the DOM using the previously defined .quote CSS selector:

quote_elements = soup.find_all('div', class_='quote')

Here, the find_all() method retrieves all <div> HTML elements with the class quote.

Loop through the quotes list and retrieve the quote data as demonstrated below.

# extract the text of the quote
text = quote_element.find('span', class_='text').text
# extract the author of the quote
author = quote_element.find('small', class_='author').text

# extract the tag a HTML elements related to the quote
tag_elements = quote_element.select('.tags .tag')

# store the list of tag strings in a list
tags = []
for tag_element in tag_elements:
    tags.append(tag_element.text)

In the above code, BeautifulSoup's find() method retrieves a single HTML element of interest. Since there are multiple tag strings associated with each quote, it's best to store them in a list. After this, you can transform the scraped data into a dictionary and add it to the quotes list as follows:

quotes.append(
    {
        'text': text,
        'author': author,
        'tags': ', '.join(tags) # merge the tags into a "A, B, ..., Z" string
    }
)

Your target website consists of several other web pages. This is the right time to learn how to crawl the entire site. Let's move on to Step 7 and see how to do that.

Step 7: Implement the Crawling Logic

Go to the bottom of the homepage, where you'll see a "Next →" <a> tag that redirects to the next page. This HTML tag appears on all pages except the last one. This typical scenario is common on paginated websites.

Now, you can easily navigate through the entire website—just follow the link contained in the HTML tag. Let's start from the homepage and see how to move through each page of the target website. Look for the .next <li> tag and extract the relative link to the next page.

Your crawling logic would be:

# the URL of the home page of the target website
base_url = 'https://quotes.toscrape.com'

# retrieve the page and initializing soup...

# get the "Next →" HTML element
next_li_element = soup.find('li', class_='next')

# if there is a next page to scrape
while next_li_element is not None:
    next_page_relative_url = next_li_element.find('a', href=True)['href']

    # get the new page
    page = requests.get(base_url + next_page_relative_url, headers=headers)

    # parse the new page
    soup = BeautifulSoup(page.text, 'html.parser')

    # scraping logic...

    # look for the "Next →" HTML element in the new page
    next_li_element = soup.find('li', class_='next')

You can see that the while loop iterates through each page until there are no more pages available. Specifically, it extracts the relative URL of the next page and uses it to construct the full URL to scrape. It then downloads the next page, scrapes the content, and repeats the process.

You are all done scraping the entire website. The only part that remains is learning how to convert the extracted data into CSV file format.

Step 8: Extract the Scraped Data to a CSV File

Now, let's go over the process of exporting the scraped quote data, which is stored as a list of dictionaries, into a CSV file.

import csv
# scraping logic...
# reading  the "quotes.csv" file and creating it
# if not present
with open('quotes.csv', 'w', encoding='utf-8', newline='') as csv_file:
    # initializing the writer object to insert data
    # in the CSV file
    writer = csv.writer(csv_file)

    # writing the header of the CSV file
    writer.writerow(['Text', 'Author', 'Tags'])

    # writing each row of the CSV
    for quote in quotes:
        writer.writerow(quote.values())

# terminating the operation and releasing the resources
csv_file.close()

This code snippet write the quote data contained in the list of dictionaries in a quotes.csv file. Here you have to keep in mind that CSV is a part of the Python standard library; therefore, you can easily import it without any dependency.

You simply need to create a CSV file using the open() method. After that, you can populate it using the writerow() method from the CSV library's Writer object. Here, each quote dictionary will be written as a separate row in the CSV file.

Now that the data extraction from the website is complete, we will see the entire Python data scraper in Step 9.

Step 9: Put It All Together

Final data scraping Python script looks like:

import csv
import requests
from bs4 import BeautifulSoup

def scrape_page(soup, quotes):
    # retrieving all the quote HTML element on the page
    quote_elements = soup.find_all('div', class_='quote')

    # iterating over the list of quote elements
    # to extract the data of interest and store it
    # in quotes
    for quote_element in quote_elements:
        # extracting the text of the quote
        text = quote_element.find(
            'span',
            class_='text'
        ).text
        # extracting the author of the quote
        author = quote_element.find(
            'small',
            class_='author'
        ).text

        # extracting the tag HTML elements related to the quote
        tag_elements = quote_element.find(
            'div', class_='tags'
        ).find_all('a', class_='tag')

        # storing the list of tag strings in a list
        tags = []
        for tag_element in tag_elements:
            tags.append(tag_element.text)

        # appending a dictionary containing the quote data
        # in a new format in the quote list
        quotes.append(
            {
                'text': text,
                'author': author,
                # merging the tags into a "A, B, ..., Z" string
                'tags': ', '.join(tags)
            }
        )


# the url of the home page of the target website
base_url = 'https://quotes.toscrape.com'

# defining the User-Agent header to use in the GET request below
headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) '
    'AppleWebKit/537.36 (KHTML, like Gecko) '
    'Chrome/107.0.0.0 Safari/537.36'
}

# retrieving the target web page
page = requests.get(base_url, headers=headers)

# parsing the target web page with Beautiful Soup
soup = BeautifulSoup(page.text, 'html.parser')

# initializing the variable that will contain
# the list of all quote data
quotes = []

# scraping the home page
scrape_page(soup, quotes)

# getting the "Next →" HTML element
next_li_element = soup.find('li', class_='next')

# if there is a next page to scrape
while next_li_element is not None:
    next_page_relative_url = next_li_element.find('a', href=True)['href']

    # getting the new page
    page = requests.get(base_url + next_page_relative_url, headers=headers)

    # parsing the new page
    soup = BeautifulSoup(page.text, 'html.parser')

    # scraping the new page
    scrape_page(soup, quotes)

    # looking for the "Next →" HTML element in the new page
    next_li_element = soup.find('li', class_='next')

# Open the "quotes.csv" file and create it
# if not present
csv_file = open('quotes.csv', 'w', encoding='utf-8', newline='')

# initializing the writer object to insert data
# in the CSV file
writer = csv.writer(csv_file)

# writing the header of the CSV file
writer.writerow(['Text', 'Author', 'Tags'])

# writing each row of the CSV
for quote in quotes:
    writer.writerow(quote.values())

# terminating the operation and releasing the resources
csv_file.close()

As you can see, the above code is less than 100 lines long to build a web data extractor.

This Python script is capable to extract all the data and export it to CSV file.

Step 10: Run the Python Web Scraping Script

Now, in the PyCharm IDE, run the script by clicking the "Run" button. After that, wait for the entire process to complete. Congratulations! You can now access the .csv file and open it to view the data it contains.

Python Scraping Use Cases

The most common use cases of web scraping Python includes:

Market Research
Price Monitoring for Products, Retail & e-commerce
AI Development
Travel
Restaurant
Social Media

How Python Ethical Web Scraping Use Cases Apply to AI

Python web scrapping provides large and varied data sets required for training and enhancing models. The common use cases include:

Chatbot Training: Scraping customer support conversation and FAQs to build smart virtual assistants.
Predictive Analytics: Collecting historical and real-time data to train forecasting models.
Sentiment Analysis: Scraping social media content or reviews to train NLP models.
Computer Vision: Gathering image data for AI image classification.
Recommendation Systems: Gathering product information and user behavior data for AI-based personalization.

How Python Web Scraping Creates Impactful Results

Python web scraping delivers impactful results by automating the extraction of important data from websites. It provides real-time, structured data that can reveal market trends, monitor customer sentiment, and power analytics systems or AI models. By transforming scattered data into actionable insights, it enables businesses to make faster, smarter, and data-driven decisions.

Conclusion

In this step-by-step web scraping tutorial with Python, you learned how to scrape the web using the Python programming language and how to get started with one of the most effective data extraction techniques using libraries like BeautifulSoup and Selenium. You also explored the steps to build your own web scraper in Python with just a few lines of code. Finally, you understood various Python use cases, how Python web scraping applies to AI, and how it creates impactful results.

The question that comes to mind is: Is web scraping legal in 2025? Yes, it is legal when used to collect publicly available and non-sensitive data without violating a site's terms and conditions. However, extracting protected content, copyrighted material, or personal data can raise legal concerns. Always check a website's terms of service before scraping data from it.

Scraping Intelligence provides seamless, efficient Scraping e-commerce websites using Python Service tailored to your business needs. We offer a simple and effective process to help make your business successful. Have questions or need a custom solution? Reach out to our team—we're here to help!

Pick The Right Crawler For You!

Boost Your Business with targeted Data Extraction!