Table Of Content
    Back to Blog

    How to Extract Reddit Posts, Subreddits, and Profiles Effectively

    extract-reddit-profiles-posts-subreddits
    Category
    Social media
    Publish Date
    Sep 05, 2025
    Author
    Scraping Intelligence

    Reddit is a forum-style social media platform used for various purposes, including content discovery, making friends, career advice, news aggregation, and more. This platform has a treasure of important data from niche interests to industry-specific insights. It is very beneficial for the researchers, analysts, and marketers. They can extract Reddit data to gain competitive insights for making their business more sustainable. In this comprehensive blog post, you will be able to learn how to extract data from Reddit.

    Why Scrape Reddit Data?

    Reddit contains a diverse number of interest-based communities and subjects. Scraped data from Reddit can be used for:

    Content Creation

    Reddit is packed with ideas and inspiration for creating content. This is a hidden gem that content writers can leverage by scraping data. With this opportunity, they can find trending topics and discussions to create concise and engaging content.

    For Sentiment Analysis

    Reddit is one of the social media platforms where people share their opinions and emotions about numerous topics. You can scrape subreddits to perform sentiment analysis to know positive, negative, or neutral feelings about your services, products, or brand.

    For Market Research

    Scraping Reddit data provides comprehensive insights to identify customer preferences and needs to gain a clear picture of the current market scenario. It can also help in performing competitive analysis by extracting competitors’ data.

    What Data Can You Scrape From Reddit?

    You can extract numerous types of data from Reddit. Some of the few are as follows:

    • Post titles and content
    • Subreddit and topics
    • Number of upvotes and downvotes
    • Usernames, profiles, karma scores, etc.
    • Comments and replies
    • Images, videos, and other media files
    • Creation time of posts and comments

    Various Ways to Extract Reddit Data

    Using Reddit’s API

    Reddit has a robust API for developers. It helps developers pull out the data they want programmatically. Rediit’s API is one of the reliable and safe ways to use its data without OAuth authentication.

    Third-Party Tools

    Python Reddit API Wrapper is a Python library. By accessing this Reddit API, you can easily scrape data from it. This API can be used to handle a large volume of queries.

    Web Scraping

    Web scraping is suitable for those who wish to get customized data collection. By using Web scraping, comments, subreddits, and posts, comment profiles become easy. However, it is good practice to follow Reddit’s terms and conditions.

    Data Providers

    Organizations that don’t wish to create their own infrastructure from scratch can use the services of professional data providers, as they offer ready-to-use Reddit data scraper.

    Steps to Extract Reddit Data

    There are many ways to extract Reddit posts, subreddits, and profiles, but we will do it via Python, as it is a widely used programming language that is primarily used for web data scraping. It has very useful libraries that can be used in many programs. We are using PRAW to scrape Reddit data.

    First of all, you have to install PRAW. This can be done by running the command mentioned below on the command prompt:

    Creating a Reddit App

    • Step 1: Reddit data can be extracted very easily by creating a Reddit app. Type https://www.reddit.com/prefs/apps in the browser and press enter.
    • Step 2: Click on the "are you a developer? create an app..." button.
    • Step 3: One form will appear on your screen. Enter your desired name and description.
    • Step 4: Now, enter http://localhost:8080 in the browser.
    • Step 5: Click on the "create app" button.

    This will create a Reddit app. Now, use the combination of Python and PRAW to generate data from Reddit. You have to remember client_id & user_agent values. These will help us connect to Reddit.

    Creating a PRAW Instance

    Now, it’s time to create a PRAW instance and then connect it to Reddit. Let’s understand PRAW types in detail.

    There are mainly two types of PRAW instances. 1) Read-only Instance and 2) Authorized Instance.

    • Read-only Instance: This is an instance for scraping publicly available Reddit data.
    • Authorized Instance: With this Python instance, you will be able to do anything with your Reddit account.

    Write the following code in a Python editor:

    reddit_read_only = praw.Reddit(client_id="",          
                                   client_secret="",       
                                   user_agent="")         
    
    reddit_authorized = praw.Reddit(client_id="",          
                                    client_secret="",      
                                    user_agent="",        
                                    username="", 
                                    password="")
    

    Great! We have now created a PRAW Instance. The next step is to utilize Reddit’s API to extract data. Here in this blog, we will highlight the use of a read-only instance.

    Scraping Reddit Subreddits

    There are many ways to extract Reddit subreddits; the one that we are using will categorize subreddit posts as new, top, controversial, hot, and more.

    Write the following code:

    import praw
    import pandas as pd
    
    reddit_read_only = praw.Reddit(client_id="",         
                                   client_secret="",      
                                   user_agent="")        
    
    subreddit = reddit_read_only.subreddit("redditdev")
    
    print("Display Name:", subreddit.display_name)
    
    print("Title:", subreddit.title)
    
    print("Description:", subreddit.description)
    
    Now, we will extract 3 posts from the subreddit: 
    
    subreddit = reddit_read_only.subreddit("Python")
    
    for post in subreddit.hot(limit=3):
        print(post.title)
        print()
    
    Once you are done with it, we will save the post in a pandas data frame:
    
    posts = subreddit.top("month")
    
    posts_dict = {"Title": [], "Post Text": [],
                  "ID": [], "Score": [],
                  "Total Comments": [], "Post URL": []
                  }
    
    for post in posts:
        posts_dict["Title"].append(post.title)
        
        posts_dict["Post Text"].append(post.selftext)
        
        posts_dict["ID"].append(post.id)
    
        posts_dict["Score"].append(post.score)
    
        posts_dict["Total Comments"].append(post.num_comments)
        
        posts_dict["Post URL"].append(post.url)
    
    top_posts = pd.DataFrame(posts_dict)
    top_posts
    

    Scraping Reddit Posts

    In order to extract the needed data from Reddit posts, you will need a post URL. After getting the URL, you have to create a submission object as shown in the following code snippet.

    import praw
    import pandas as pd
    
    reddit_read_only = praw.Reddit(client_id="", 
                                   client_secret="",       
                                   user_agent="")         
    
    url = "https://www.reddit.com/r/IAmA/comments/m8n4vt/%5C/
    im_bill_gates_cochair_of_the_bill_and_melinda/"
    
    submission = reddit_read_only.submission(url=url)
    

    Here we will extract the comments from the chosen post. To do so, we have to utilize the MoreComments object available in the PRAW module. We will use a for-loop on the submission object in order to extract the required comments. Now, these comments will be appended to the post_comment list. If you want to see whether any comment has the object type of more comments, then we can also add an if statement in the for loop. In case it performs well, then the Reddit post has more comments available. We will add them to our list. After this, we will convert the extracted list into a pandas data frame.

    from praw.models import MoreComments
    
    post_comments = []
    
    for comment in submission.comments:
        if type(comment) == MoreComments:
            continue
    
        post_comments.append(comment.body)
    
    comments_df = pd.DataFrame(post_comments, columns=['comment'])
    comments_df
    

    Scraping Reddit Profiles Posts

    In this section, we will see how Reddit profile pages can be scraped. So let's start. For extracting Reddit posts on profile pages, we are using old.reddit as it has a simpler structure and fewer ads. Old.reddit will make our profile scraping task easier.

    We will use the following URL as an example:

    https://old.reddit.com/user/scraping_intelligence/submitted?count=25&after=t3_191n6zm

    The count query mentioned in the above URL shows the total number of results to render on the HTML page. Rest parameters control the pagination cursor. It is basically a post ID to start after.

    Let’s integrate our logic into Python code:

    import json
    import asyncio
    from typing import List, Dict, Union
    from datetime import datetime
    from httpx import AsyncClient, Response
    from loguru import logger as log
    from parsel import Selector
    
    client = AsyncClient(
    )
    
    def parse_user_posts(response: Response) -> List[Dict]:
        """parse user posts from user profiles"""
        selector = Selector(response.text)
        data = []
        for box in selector.xpath("//div[@id='siteTable']/div[contains(@class, 'thing')]"):
            author = box.xpath("./@data-author").get()
            link = box.xpath("./@data-permalink").get()
            publishing_date = box.xpath("./@data-timestamp").get()
            publishing_date = datetime.fromtimestamp(int(publishing_date) / 1000.0).strftime('%Y-%m-%dT%H:%M:%S.%f%z') if publishing_date else None
            comment_count = box.xpath("./@data-comments-count").get()
            post_score = box.xpath("./@data-score").get() 
            data.append({
                "authorId": box.xpath("./@data-author-fullname").get(),
                "author": author,
                "authorProfile": "https://www.reddit.com/user/" + author if author else None,
                "postId": box.xpath("./@data-fullname").get(),
                "postLink": "https://www.reddit.com" + link if link else None,
                "postTitle": box.xpath(".//p[@class='title']/a/text()").get(),
                "postSubreddit": box.xpath("./@data-subreddit-prefixed").get(),
                "publishingDate": publishing_date,
                "commentCount": int(comment_count) if comment_count else None,
                "postScore": int(post_score) if post_score else None,
                "attachmentType": box.xpath("./@data-type").get(),
                "attachmentLink": box.xpath("./@data-url").get(),
            })
        next_page_url = selector.xpath("//span[@class='next-button']/a/@href").get()
        return {"data": data, "url": next_page_url}
    
    
    async def scrape_user_posts(username: str, sort: Union["new", "top", "controversial"], max_pages: int = None) -> List[Dict]:
        """scrape user posts"""
        url = f"https://old.reddit.com/user/{username}/submitted/?sort={sort}"
        response = await client.get(url)
        data = parse_user_posts(response)
        post_data, next_page_url = data["data"], data["url"]
    
        while next_page_url and (max_pages is None or max_pages > 0):
            response = await client.get(next_page_url)
            data = parse_user_posts(response)
            next_page_url = data["url"]
            post_data.extend(data["data"])
            if max_pages is not None:
                max_pages -= 1
        log.success(f"scraped {len(post_data)} posts from the {username} reddit profile")
        return post_data
    

    Here, we define two functions for extracting Reddit profile posts: 1) parse_user_posts() and 2) scrape_user_posts().

    • parse_user_posts(): This method will parse the available posts’ data on the HTML with the use of XPath selectors.
    • scrape_user_posts(): This function will request the user's post page by using the parsed link of the next page button available for pagination, as long as it is available.

    Scraping Profile Comments

    We can scrape profile comments similar to Reddit profile posts we extracted previously. We just have to replace the starting URL and the logic for parsing:

    def parse_user_comments(response: Response) -> List[Dict]:
        """parse user posts from user profiles"""
        selector = Selector(response.text)
        data = []
        for box in selector.xpath("//div[@id='siteTable']/div[contains(@class, 'thing')]"):
            author = box.xpath("./@data-author").get()
            link = box.xpath("./@data-permalink").get()
            dislikes = box.xpath(".//span[contains(@class, 'dislikes')]/@title").get()
            upvotes = box.xpath(".//span[contains(@class, 'likes')]/@title").get()
            downvotes = box.xpath(".//span[contains(@class, 'unvoted')]/@title").get()
            data.append({
                "authorId": box.xpath("./@data-author-fullname").get(),
                "author": author,
                "authorProfile": "https://www.reddit.com/user/" + author if author else None,
                "commentId": box.xpath("./@data-fullname").get(),
                "commentLink": "https://www.reddit.com" + link if link else None,
                "commentBody": "".join(box.xpath(".//div[contains(@class, 'usertext-body')]/div/p/text()").getall()).replace("\n", ""),
                "attachedCommentLinks": box.xpath(".//div[contains(@class, 'usertext-body')]/div/p/a/@href").getall(),
                "publishingDate": box.xpath(".//time/@datetime").get(),
                "dislikes": int(dislikes) if dislikes else None,
                "upvotes": int(upvotes) if upvotes else None,
                "downvotes": int(downvotes) if downvotes else None,
                "replyTo": {
                    "postTitle": box.xpath(".//p[@class='parent']/a[@class='title']/text()").get(),
                    "postLink": "https://www.reddit.com" + box.xpath(".//p[@class='parent']/a[@class='title']/@href").get(),
                    "postAuthor": box.xpath(".//p[@class='parent']/a[contains(@class, 'author')]/text()").get(),
                    "postSubreddit": box.xpath("./@data-subreddit-prefixed").get(),    
                }
            })
        next_page_url = selector.xpath("//span[@class='next-button']/a/@href").get()
        return {"data": data, "url": next_page_url}
    
    
    async def scrape_user_comments(username: str, sort: Union["new", "top", "controversial"], max_pages: int = None) -> List[Dict]:
        """scrape user posts"""
        url = f"https://old.reddit.com/user/{username}/comments/?sort={sort}"
        return post_data
    

    Conclusion

    Scraping Reddit posts, subreddits, and Profiles can help businesses in many ways. It can be used by content writers for effective and engaging content, performing sentiment analysis to know feelings about products, services, and brands, or for market research for scraping competitors’ data. In this blog, we understand how data can be scraped from Reddit.

    We highlighted ways to collect Reddit data using web scraping, Python API, third-party tools, and more. We learn how to use the Python Reddit API Wrapper for extracting Reddit Posts, subreddits, and profiles step-by-step. There can be many other ways to extract Reddit data; However, you have to choose the right approach to scrape Reddit data and gain valuable insights.


    Frequently Asked Questions

    What is sentiment analysis? +
    Sentiment analysis is a process that enables you to know what customer feels about your services, products, or brand.
    What are the other alternatives to Reddit? +
    There are many alternatives to Reddit, some of which are Threads, Twitter, Instagram, and more.
    What is PRAW? +
    PRAW is a Python package that can be used by developers to interact with API. By using PRAW, they can extract Reddit, subreddits, posts, and profiles.
    Can you scrape Private Reddit Subreddits? +
    No, you cannot extract private Reddit subreddits, as it is prohibited by Reddit itself. Scraping private or restricted data goes against Reddit's terms and service.
    What is the need for proxies to scrape Reddit? +
    If we consider Reddit Subreddits, then they are public; therefore, they can be accessed without a login. Otherwise, you may have to use a proxy to scrape Reddit data.
    What are the two types of PRAW instances? +
    Two types of PRAW instances are Read-only Instance and Authorized Instance.

    Latest Blog

    Explore our latest content pieces for every industry and audience seeking information about data scraping and advanced tools.

    extract-linkedin-company-data-using-python
    Social Media
    10 Sep 2025
    How To Extract LinkedIn Company Data Using Python?

    Our Python guide makes it easy to extract LinkedIn company data. Scraping Intelligence provides a step-by-step guide for mastering this skill.

    track-flight-price-changes-web-scraping
    Hotel & Travel
    08 Sep 2025
    How to Track Real-Time Flight Price Changes Using Web Scraping?

    Learn how to track real-time flight price changes using Web Scraping. Monitor fares, analyze trends, and find the best deals before booking flights.

    extract-reddit-profiles-posts-subreddits
    Social media
    05 Sep 2025
    How to Extract Reddit Posts, Subreddits, and Profiles Effectively

    Learn how to extract Reddit posts, subreddits, and profiles effectively using APIs, tools, and methods to collect accurate social data with ease.

    data-analysis-fast-food-chains-small-business-opportunities
    Food & Restaurant
    27 Aug 2025
    Analysis of Top 5 U.S. Fast Food Chains: Opportunities for Small Food Businesses

    Gain insights into the Top 5 US Fast Food Chains with data-driven analysis. Learn market trends, strategies & opportunities for small food businesses.