A Comprehensive Guide: How to Scrape Real Estate Data from Zillow Using Python?

October 31, 2023
scraping-real-estate-data-from-zillow

For those immersed in the real estate sector, a profound understanding of property valuations and market trends holds immense importance. Zillow, a highly popular online hub for real estate information, serves as an extensive source of data encompassing properties, neighborhoods, and market conditions.

Nevertheless, if you aim to conduct an in-depth analysis of this data or to keep a vigilant eye on specific property listings, web scraping emerges as an invaluable tool. Within the confines of this blog post, we will embark on a journey to uncover the process of scarping real estate data from Zillow using Python, thereby empowering you to effortlessly access and meticulously scrutinize this information for your real estate pursuits.

What is Zillow Data Scraping?

what-is-zillow-data-scraping

Zillow takes a prominent position in the real estate and rental arena, with a dedicated mission to empower individuals by providing them with essential information, inspiration, and insights into their dwellings. Additionally, Zillow facilitates connections with top local experts who can offer invaluable support.

As the most visited real estate platform in the United States, Zillow and its affiliated entities offer users a convenient, on-demand platform for various real estate activities. It includes selling, buying, renting, and securing financing.

Why Scrape Zillow Data?

why-scrape-zillow-data

Zillow.com boasts an extensive real estate database, encompassing details such as property prices, locations, and contact information. This trove of information proves invaluable for conducting market analysis, researching the housing sector, and gaining insights into competitors.

Hence, the ability to extract Zillow data provides access to the most extensive real estate property dataset in the United States.

Why Scrape Zillow Data Using Python?

why-scrape-zillow-data-using-python

Python presents a rich array of web scraping libraries known for their user-friendliness and comprehensive documentation. It's important to emphasize that this recognition of Python's strengths doesn't diminish the documentation available for other programming languages. However, Python's versatility is what sets it apart.

Whether your interest lies in scraping Google search results data or aggregating pricing data for business purposes, Python provides access to a myriad of possibilities.

Furthermore, the Python community stands out for its exceptional support, offering a plethora of forums that can aid you in surmounting any obstacles encountered on your Python journey.

In the realm of web data extraction, Python stands as an excellent starting point. It enables you to efficiently gather data, fostering confidence, particularly for beginners. It will come handy when scraping Zillow data.

For those seeking valuable Python forums, consider exploring:

  • PythonAnywhere
  • Stack Overflow
  • Python subreddit
  • Site point
  • Python Forum

These resources can prove indispensable as you navigate the world of Python and web scraping.

How To Scrape Real Estate Data from Zillow Using Python?

Now, let us go through the process of creating a Zillow web scraper in Python, step by step.

Library Installation

Let's commence with the library selection process. You have two choices:

  • You can use two kinds of tools: one to ask for information (like Requests or UrlLib), and another to understand that information (like BeautifulSoup or Lxml).
  • Opting for a comprehensive library or web scraping framework (Scrapy, Selenium, Pyppeteer).

For beginners, the first option is more straightforward, while the second provides enhanced security. Therefore, let's begin by creating a basic scraper using the Requests and BeautifulSoup libraries for data retrieval and parsing. Subsequently, we'll illustrate a scraper employing Selenium.

To begin, ensure that Python is available on your computer. You can either check if it is already there. If not, install it from the beginning by using this command in your computer's special text box:

python -V         

If you already have Python, it will display the version you are using. To include the tools we need, enter this command in the same text box:

pip install requests
pip install beautifulsoup4
pip install selenium        

Keep in mind that for Selenium to work properly, you should have the right "tool" (called a webdriver) and a Chrome web browser of the same kind.

Analysis of Zillow Page

Now, it's time to analyze the webpage to identify the tags that house the vital data. Head over to the Zillow website, specifically the 'buy' section. For this tutorial, we'll be gathering data related to real estate in Portland.

Now, it's time to inspect the HTML code of the page. It is to identify the elements we intend to extract.

To access the HTML page code, launch the Developer Tools (you can do this by pressing F12 or right-clicking on an empty area of the page and selecting "Inspect").

Let's specify the elements for extraction:

1. Address. The address data can be found within the

 <address data-test="property-card-addr">... </address>

2. Price. The pricing data is located within the

 <span data-test="property-card-price">... </span> tag.

3. Seller or Realtor. The relevant information can be found in the

 <div class="cWiizR"> <address data-test="property-card-addr">... </address> </div>
tag.

For all other property cards, the tags will follow a similar pattern. Now, equipped with the gathered information, let's proceed to develop a web scraper.

Building a Web Scraper

Begin by creating a Python file with a *.py extension and include the required libraries:

import requests
from bs4 import BeautifulSoup     

Now, initiate a request and store the entire page's HTML code in a variable:

data = requests.get('https://www.zillow.com/portland-or/')    

Next, process the data using the BeautifulSoup (BS4) library:

soup = BeautifulSoup(data.text, "lxml")   

Create variables address, price, and seller, in which we will enter the executed data using the information collected earlier.

address = soup.find_all('address', {'data-test':'property-card-addr'})
price = soup.find_all('span', {'data-test':'property-card-price'})
seller = soup.find_all('div', {'class':'cWiizR'}) 

However, attempting to display the contents of these variables may result in an error because Zillow might return a CAPTCHA page instead of the desired page code. To circumvent this issue, add headers to the request:

header = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36',
          'referer':'https://www.zillow.com/homes/Missoula,-MT_rb/'}

data = requests.get('https://www.zillow.com/portland-or/', headers=header)

Now, attempt to display the result on the screen:

print(address)
print(price)
print(seller)

The output from such a script would look like this:


[<address data-test="property-card-addr">3142 NE Wasco St, Portland, OR 97232</address>,
<address data-test="property-card-addr">4801 SW Caldew St, Portland, OR 97219</address>,
<address data-test="property-card-addr">16553 NE Fargo Cir, Portland, OR 97230</address>,
<address data-test="property-card-addr">3064 NW 132nd Ave, Portland, OR 97229</address>,
<address data-test="property-card-addr">3739 SW Pomona St, Portland, OR 97219</address>,
<address data-test="property-card-addr">1440 NW Jenne Ave, Portland, OR 97229</address>,
<address data-test="property-card-addr">3435 SW 11th Ave, Portland, OR 97239</address>,
<address data-test="property-card-addr">8023 N Princeton St, Portland, OR 97203</address>,
<address data-test="property-card-addr">2456 NW Raleigh St, Portland, OR 97210</address>]
[<span data-test="property-card-price">$595,000</span>,
<span data-test="property-card-price">$395,000</span>,
<span data-test="property-card-price">$485,000</span>,
<span data-test="property-card-price">$1,185,000</span>,
<span data-test="property-card-price">$349,900</span>,
<span data-test="property-card-price">$599,900</span>,
<span data-test="property-card-price">$575,000<span/span>,
<span data-test="property-card-price">$425,000</span>,
<span data-test="property-card-price">$1,195,000</span>]
[<div class="StyledPropertyCardDataArea-c11n-8-85-1__sc-yipmu-0 cWiizR">CASCADE HASSON SOTHEBY'S
INTERNATIONAL REALTY</div>,
<div class="StyledPropertyCardDataArea-c11n-8-85-1__sc-yipmu-0 cWiizR">PORTLAND CREATIVE REALTORS</div>,
<div class="StyledPropertyCardDataArea-c11n-8-85-1__sc-yipmu-0 cWiizR">ORCHARD BROKERAGE, LLC</div>,
<div class="StyledPropertyCardDataArea-c11n-8-85-1__sc-yipmu-0 cWiizR">ELEETE REAL ESTATE</div>,
<div class="StyledPropertyCardDataArea-c11n-8-85-1__sc-yipmu-0 cWiizR">URBAN NEST REALTY</div>,
<div class="StyledPropertyCardDataArea-c11n-8-85-1__sc-yipmu-0 cWiizR">KELLER WILLIAMS REALTY
PROFESSIONALS</div>,
<div class="StyledPropertyCardDataArea-c11n-8-85-1__sc-yipmu-0 cWiizR">KELLER WILLIAMS PDX CENTRAL</div>,
<div class="StyledPropertyCardDataArea-c11n-8-85-1__sc-yipmu-0 cWiizR">EXP REALTY, LLC</div>,
<div class="StyledPropertyCardDataArea-c11n-8-85-1__sc-yipmu-0 cWiizR">CASCADE HASSON SOTHEBY'S
INTERNATIONAL REALTY</div>]

Now, let's create additional variables and extract only the property listings' text from the received data:

adr=[]
pr=[]
sl=[]
for result in address:
    adr.append(result.text)
for result in price:
    pr.append(result.text)    
for results in seller:
    sl.append(result.text)
print(adr)
print(pr)
print(sl)

The result:

['16553 NE Fargo Cir, Portland, OR 97230', '3142 NE Wasco St, Portland, OR 97232', '8023 N Princeton St, Portland, OR 97203', '3064 NW 132nd Ave, Portland, OR 97229', '1440 NW Jenne Ave, Portland, OR 97229', '10223 NW Alder Grove Ln, Portland, OR 97229', '5302 SW 53rd Ct, Portland, OR 97221', '3435 SW 11th Ave, Portland, OR 97239', '3739 SW Pomona St, Portland, OR 97219']
['$485,000', '$595,000', '$425,000', '$1,185,000', '$599,900', '$425,000', '$499,000', '$575,000', '$349,900']
['ORCHARD BROKERAGE, LLC', 'CASCADE HASSON SOTHEBY'S INTERNATIONAL REALTY', 'EXP REALTY, LLC', 'ELEETE REAL ESTATE', 'KELLER WILLIAMS REALTY PROFESSIONALS', 'ELEETE REAL ESTATE', 'REDFIN', 'KELLER WILLIAMS PDX CENTRAL', 'URBAN NEST REALTY']

Here's the full script code:

import requests
from bs4 import BeautifulSoup

header = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36',
          'referer':'https://www.zillow.com/homes/Missoula,-MT_rb/'}

data = requests.get('https://www.zillow.com/portland-or/', headers=header)
soup = BeautifulSoup(data.text, 'lxml')

address = soup.find_all('address', {'data-test':'property-card-addr'})
price = soup.find_all('span', {'data-test':'property-card-price'})
seller =soup.find_all('div', {'class':'cWiizR'})

adr=[]
pr=[]
sl=[]
for result in address:
    adr.append(result.text)
for result in price:
    pr.append(result.text)
for results in seller:
    sl.append(result.text)

print(adr)
print(pr)
print(sl)

With the data now in a user-friendly format, you can proceed to work with it further.

Data Saving

To avoid manual data entry, let's save the information to a CSV file. Begin by creating a file and defining the column names:

with open("zillow.csv", "w") as f:
    f.write("Address; Price; Seller\n")

The presence of the "w" flag signifies that if a file named zillow.csv is absent, it will be generated. If the file already exists, it will be removed and then reestablished. To prevent overwriting the content each time the script is executed, you have the option to employ the "a" attribute.

Please proceed to organize the elements into a table.

for i in range(len(adr)):
    with open("zillow.csv", "a") as f:
        f.write(str(adr[i])+"; "+str(pr[i])+"; "+str(sl[i])+"\n")

As a result, you'll obtain the following table:

import requests
from bs4 import BeautifulSoup

header = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36',
          'referer':'https://www.zillow.com/homes/Missoula,-MT_rb/'}

data = requests.get('https://www.zillow.com/portland-or/', headers=header)
soup = BeautifulSoup(data.text, 'lxml')

address = soup.find_all('address', {'data-test':'property-card-addr'})
price = soup.find_all('span', {'data-test':'property-card-price'})
seller =soup.find_all('div', {'class':'cWiizR'})

adr=[]
pr=[]
sl=[]
for result in address:
    adr.append(result.text)
for result in price:
    pr.append(result.text)
for results in seller:
    sl.append(result.text)

with open("zillow.csv", "w") as f:
    f.write("Address; Price; Seller\n")

for i in range(len(adr)):
    with open("zillow.csv", "a") as f:
        f.write(str(adr[i])+"; "+str(pr[i])+"; "+str(sl[i])+"\n")

In this way, you've successfully created a simple Zillow scraper in Python.

Conclusion

Utilizing Python to scrape Zillow data offers significant potential for acquiring valuable insights into the real estate market. Here, firms like Scraping Intelligence comes handy. This approach proves highly effective for gathering information regarding property listings, pricing, neighborhood details, and more. By constructing well-defined requests and employing Python libraries like Beautiful Soup or Selenium, individuals can harness the Zillow website's resources to access and analyze data pertinent to their local or regional real estate market trends.

Nonetheless, it's essential to bear in mind that the site's structure or class names may change over time. Therefore, before employing our provided examples, it is advisable to verify the current accuracy and relevancy of the data.

If creating a scraper from scratch still presents a significant challenge, consider exploring a no-code scraper. A no-code scraper is a relatively swift and uncomplicated solution that doesn't require any prior coding experience or technical knowledge.

10685-B Hazelhurst Dr.#23604 Houston,TX 77043 USA

Incredible Solutions After Consultation

  •   Industry Specific Expert Opinion
  •   Assistance in Data-Driven Decision Making
  •   Insights Through Data Analysis