200+ Best Data Science Projects for High School Students (By Domain and Skill Level)

At the high school level, a good data science project teaches you something a classroom lesson never can: how to handle a question where you don't already know the answer. Most students start with the Titanic dataset because it's the obvious first step, and that's fine as a learning exercise, but the projects that actually build skill, and the ones that stand out in a portfolio or college application, ask a real question about real data.

We did some research and compiled this blog, which covers 200+ of the latest data science project ideas for high school students, organized by domain and difficulty, from your first exploratory analysis to more advanced machine learning and NLP work. If you want structured guidance turning any of these into a polished, defensible project rather than a tutorial clone, a mentored program like Veritas AI pairs you directly with a data science or AI researcher to build something original. More on that at the end!

What makes a data science project worth doing at the high school level?

Before the list, a quick framing question that will save you a lot of wasted effort: are you trying to answer something you don't already know, or are you trying to demonstrate a technique? Both are valid reasons to do a project, but they should be approached differently.

If you're learning a technique, pick a clean, well-documented dataset and focus on getting the mechanics right. If you're trying to answer a real question, expect the data to be messier, expect your first approach to need revision, and expect the most interesting part of the project to be explaining what you found and why it matters. The second kind of project is what makes a portfolio, application, or competition submission memorable.

Exploratory Data Analysis (EDA) Projects

EDA is the foundation of every data science project and the most commonly underweighted skill among beginners. These projects focus on finding patterns, generating questions, and visualizing data clearly.

  1. Exploring global life expectancy trends and their correlation with healthcare spending using World Bank data

  2. Analyzing decades of Billboard Hot 100 data to find patterns in genre, tempo, and song length over time

  3. Investigating the relationship between screen time and reported sleep quality using a public survey dataset

  4. Exploring NBA player statistics to identify which metrics best predict All-Star selection

  5. Analyzing global CO2 emissions data to compare trends across developed and developing economies

  6. Investigating crime statistics in a major city to find seasonal or geographic patterns

  7. Exploring Spotify's audio feature dataset to characterize what makes a song "danceable" versus "energetic"

  8. Analyzing global coffee production and consumption data to map supply chains and trends

  9. Investigating Olympic medal counts over time and their correlation with GDP and population

  10. Exploring restaurant inspection data from a major city's open data portal to find patterns in violation types

  11. Analyzing airline on-time performance data to identify which airports or routes are most prone to delay

  12. Investigating global internet adoption rates and their relationship to economic development indicators

  13. Exploring a decade of box office data to understand what genre and budget combinations perform best

  14. Analyzing housing price trends across U.S. metro areas using Zillow's public datasets

  15. Investigating global plastic waste generation and mismanagement using Our World in Data datasets

Data Cleaning Projects

These projects specifically focus on the unglamorous but essential skill of taking messy, real-world data and making it usable. Choose datasets that are intentionally imperfect.

  1. Cleaning and standardizing a scraped dataset of job postings with inconsistent salary and location formatting

  2. Reconciling multiple government datasets on the same topic (e.g., unemployment) that use different geographic boundaries

  3. Cleaning a multi-year dataset of restaurant reviews with inconsistent date formats, duplicate entries, and missing ratings

  4. Standardizing a dataset of global city names and country codes pulled from several different sources

  5. Cleaning a public health dataset with significant missing data and comparing imputation strategies

  6. Reconciling currency and unit inconsistencies across an international economic dataset

  7. Cleaning a social media dataset with duplicate accounts, bot-like activity, and inconsistent timestamp formats

  8. Standardizing inconsistent school district names across multiple years of public education data

  9. Cleaning a dataset of historical weather observations with sensor errors and missing readings

  10. Merging and reconciling product data from multiple e-commerce sources with different naming conventions

Regression and Predictive Modeling Projects

  1. Predicting house prices using property features, with a focus on explaining which features matter most and why

  2. Building a model to predict a city's air quality index from traffic, weather, and industrial activity data

  3. Predicting student performance from attendance, study habits, and demographic survey data

  4. Modeling the relationship between marketing spend and sales for a small business dataset

  5. Predicting NBA player salaries from performance statistics and comparing to actual market value

  6. Building a regression model to predict crop yield from weather and soil data

  7. Predicting restaurant ratings from menu pricing, location, and review sentiment features

  8. Modeling energy consumption in buildings based on size, occupancy, and weather data

  9. Predicting flight delay duration (not just likelihood) using historical airline performance data

  10. Building a model to predict a country's life expectancy from healthcare and economic indicators

  11. Predicting used car prices from mileage, age, brand, and condition features

  12. Modeling the relationship between social media engagement metrics and follower growth rate

Classification Projects

  1. Building a classifier to predict loan default risk from financial and demographic features

  2. Classifying news headlines as clickbait or legitimate using text features

  3. Predicting customer churn for a subscription service using behavioral data

  4. Classifying wine quality (good, average, poor) from chemical composition features

  5. Building a spam email classifier and comparing Naive Bayes, logistic regression, and random forest performance

  6. Classifying handwritten digits using the MNIST dataset and comparing model architectures

  7. Predicting which patients are at risk for diabetes using clinical features from a public health dataset

  8. Classifying mushroom species as edible or poisonous from physical characteristics

  9. Building a model to predict whether a job applicant will be hired based on resume features (and auditing it for bias)

  10. Classifying customer reviews by sentiment and comparing accuracy across different feature representations

  11. Predicting whether a startup will receive follow-on funding using Crunchbase data

  12. Building a classifier to detect fraudulent credit card transactions in an imbalanced dataset

Clustering and Unsupervised Learning Projects

  1. Segmenting customers into behavioral groups using purchase history data and K-Means clustering

  2. Clustering countries by socioeconomic indicators to identify natural groupings beyond standard regional categories

  3. Using clustering to identify distinct genres or styles within a large music dataset based on audio features

  4. Segmenting news articles into topic clusters using unsupervised text clustering

  5. Clustering NBA players by playing style using performance statistics rather than position labels

  6. Using anomaly detection to identify unusual patterns in network traffic or transaction data

  7. Clustering U.S. counties by demographic and economic profile to find unexpected similarities

  8. Applying dimensionality reduction (PCA, t-SNE, UMAP) to visualize high-dimensional genetic or survey data

  9. Clustering customer support tickets to identify common issue categories without predefined labels

  10. Using clustering to group similar recipes based on ingredient profiles

Time Series and Forecasting Projects

  1. Forecasting a city's monthly electricity demand using historical consumption and weather data

  2. Predicting stock price movement direction (not exact price) using technical indicators, with honest discussion of limitations

  3. Forecasting retail sales for a seasonal product category using historical transaction data

  4. Modeling and forecasting COVID-19 case trends using an SIR-style model fit to historical data

  5. Forecasting traffic congestion patterns at a specific intersection using historical sensor data

  6. Predicting a river's water level using historical flow and rainfall data for flood risk assessment

  7. Forecasting airline ticket prices over time to identify the best time to book

  8. Modeling seasonal unemployment trends and comparing forecasts to actual subsequent data

  9. Forecasting social media engagement trends for a brand or public figure using historical post data

  10. Predicting future temperature anomalies using historical climate data and comparing to published climate models

Natural Language Processing (NLP) Projects

  1. Building a sentiment classifier for product reviews and analyzing which words drive the strongest sentiment signal

  2. Creating a text summarizer for news articles using extractive summarization techniques

  3. Building a fake news classifier and analyzing which linguistic features are most predictive

  4. Analyzing the sentiment of presidential debate transcripts over time and across candidates

  5. Building a chatbot that answers questions from a specific knowledge base using simple retrieval methods

  6. Classifying movie reviews by genre based on plot summary text alone

  7. Building a resume keyword matcher that scores resumes against job description requirements

  8. Analyzing the readability level of news articles across different publications using standard readability metrics

  9. Building a topic model (LDA) to identify themes in a large collection of customer feedback

  10. Detecting toxic or harassing language in online comments and evaluating model fairness across demographic proxies

  11. Building a simple language translator using sequence-to-sequence modeling on a small parallel corpus

  12. Analyzing how language use in song lyrics has changed across decades for a specific genre

Computer Vision Projects

  1. Building an image classifier to distinguish between similar-looking species (birds, dog breeds, plant types)

  2. Building a model to detect whether a chest X-ray shows signs of pneumonia, using a public medical imaging dataset

  3. Creating a handwritten digit recognizer and testing its robustness to different handwriting styles

  4. Building a model to classify food images by cuisine type

  5. Detecting and counting objects in images (cars in a parking lot, people in a crowd) using object detection

  6. Building a facial emotion recognition model and evaluating its accuracy across different demographic groups

  7. Creating a model that classifies satellite images by land use type (urban, agricultural, forest)

  8. Building a model to detect plant disease from leaf images

  9. Creating an image-based style classifier for art (impressionist, cubist, realist) using a museum's open dataset

  10. Building a model to detect deepfake images and analyzing what visual artifacts it relies on

Sports Analytics Projects

  1. Building a model to predict NBA game outcomes using team and player statistics

  2. Analyzing whether "clutch" performance in basketball is statistically distinguishable from random variation

  3. Predicting NFL draft success from college performance statistics

  4. Building an expected goals (xG) model for soccer using shot location and situation data

  5. Analyzing whether home field advantage has changed over time across different sports

  6. Predicting tennis match outcomes using player ranking, surface type, and historical head-to-head data

  7. Building a fantasy football points predictor using weekly player performance data

  8. Analyzing pitch sequencing patterns in baseball to identify pitcher tendencies

  9. Building a model to evaluate which basketball shot locations produce the best expected value

Healthcare and Public Health Projects

  1. Analyzing the relationship between food access (food deserts) and obesity rates by county

  2. Building a model to predict hospital readmission risk from patient intake data

  3. Analyzing vaccination rate trends and their correlation with disease incidence at the county level

  4. Investigating disparities in healthcare access using publicly available insurance coverage data

  5. Building a model to predict diabetes risk from lifestyle and demographic survey data

  6. Analyzing mental health crisis trends among adolescents using CDC survey data

  7. Investigating the relationship between air pollution exposure and asthma hospitalization rates

  8. Building a model to estimate the spread of an infectious disease through a simulated contact network

  9. Analyzing maternal health outcome disparities across different U.S. regions

  10. Investigating whether telehealth adoption correlates with improved patient outcomes using available data

Finance and Economics Projects

  1. Building a portfolio optimization model using historical stock return and volatility data

  2. Analyzing the relationship between interest rate changes and stock market sector performance

  3. Building a credit risk model and comparing its decisions to actual loan outcomes

  4. Analyzing cryptocurrency price volatility and its correlation with trading volume and news sentiment

  5. Investigating whether ESG (environmental, social, governance) scores correlate with financial performance

  6. Building a simple algorithmic trading strategy backtester and evaluating its risk-adjusted returns honestly

  7. Analyzing the relationship between minimum wage changes and small business employment at the state level

  8. Investigating wealth inequality trends using historical income distribution data

  9. Building a model to predict small business loan default risk

  10. Analyzing how inflation has affected purchasing power across different income brackets over time

Environmental and Climate Data Science Projects

  1. Analyzing global temperature anomaly data and comparing trends to climate model predictions

  2. Building a model to predict wildfire risk from weather, vegetation, and historical fire data

  3. Investigating deforestation trends using satellite imagery and land use change data

  4. Analyzing the relationship between urban tree coverage and local temperature using city-level data

  5. Building a model to forecast renewable energy output (solar, wind) from weather data

  6. Investigating ocean temperature trends and their correlation with coral bleaching events

  7. Analyzing air quality trends in major cities before and after specific policy changes

  8. Building a model to predict water scarcity risk by region using rainfall and population data

  9. Investigating the relationship between EV adoption rates and charging infrastructure availability by state

  10. Analyzing global biodiversity loss trends using the Living Planet Index and related datasets

Social Science and Survey Data Projects

  1. Analyzing General Social Survey data to track changes in public opinion on a specific issue over decades

  2. Investigating the relationship between social media use and self-reported life satisfaction using survey data

  3. Analyzing voter turnout patterns across demographic groups using public election data

  4. Building a model to predict political affiliation from publicly available, non-invasive survey responses

  5. Investigating gender pay gap persistence after controlling for industry, role, and experience

  6. Analyzing changes in family structure and household composition using Census data over time

  7. Investigating the relationship between education spending and standardized test outcomes by state

  8. Analyzing public trust in institutions over time using Pew Research survey data

  9. Investigating residential segregation patterns using Census tract-level demographic data

  10. Analyzing the relationship between commute time and self-reported wellbeing using survey data

Recommendation Systems Projects

  1. Building a movie recommendation system using collaborative filtering on the MovieLens dataset

  2. Creating a book recommendation engine based on plot similarity and reader rating patterns

  3. Building a music recommendation system based on audio feature similarity rather than collaborative filtering

  4. Creating a recipe recommender that suggests meals based on available ingredients and dietary restrictions

  5. Building a course recommendation system for students based on past enrollment and performance patterns

  6. Creating a news article recommender and evaluating it for filter bubble effects

  7. Building a product recommendation system using purchase history and item similarity

A/B Testing and Experimental Design Projects

  1. Designing and running an A/B test on a simple website change and analyzing statistical significance correctly

  2. Analyzing a public A/B test dataset (many companies publish case studies) and critiquing the experimental design

  3. Designing a survey experiment to test whether question wording affects response patterns

  4. Running a controlled study on whether background music affects task performance, with proper statistical analysis

  5. Analyzing the statistical power of a hypothetical experiment and determining the sample size needed for a reliable result

Network Analysis Projects

  1. Analyzing a social network dataset to identify the most influential nodes using centrality measures

  2. Building a co-authorship network from academic papers in a specific field and identifying key collaboration hubs

  3. Analyzing airline route networks to identify hub airports and network vulnerability to disruption

  4. Building a network model of disease spread through a simulated contact network and testing intervention strategies

  5. Analyzing a citation network to trace the influence of a foundational academic paper over time

Audio and Signal Processing Projects

  1. Building a music genre classifier using audio features extracted with a library like Librosa

  2. Creating a speech emotion recognition model using vocal feature extraction

  3. Building a simple speaker identification system that distinguishes between a small set of known voices

  4. Analyzing heart rate variability data from wearable devices to detect patterns related to stress or activity

  5. Building a model to classify environmental sounds (sirens, alarms, traffic) for accessibility applications

Data Science x Specific Industries

  1. Analyzing customer support ticket data to identify the most common and most costly issue categories for a business

  2. Building a churn prediction model for a subscription business and estimating the revenue impact of reducing churn by a target percentage

  3. Analyzing supply chain data to identify bottlenecks in a simulated logistics network

  4. Building a demand forecasting model for a retail business to optimize inventory decisions

  5. Analyzing employee survey data to identify the strongest predictors of job satisfaction and retention

  6. Building a pricing elasticity model for a consumer product using historical sales and price data

  7. Analyzing real estate listing data to identify which features most influence time-on-market

  8. Building a model to predict which marketing channel drives the highest-value customers

Open-Ended and Original Research Projects

These are the most ambitious project types and the ones most likely to result in a genuinely novel contribution, suitable for science fairs, research journals, or a flagship portfolio piece.

  1. Using public datasets to test whether a claim made in a news article or popular book actually holds up statistically

  2. Replicating a well-known published study's analysis using publicly available data and checking whether the conclusions still hold

  3. Building a novel composite index (similar to the Human Development Index) for a specific question you care about, and validating it against known outcomes

  4. Investigating a question specific to your own school or community using data you collect yourself, with proper survey design

  5. Combining two unrelated public datasets to investigate a question neither dataset could answer alone (e.g., weather data and crime data, social media data and stock prices)

  6. Building and validating a predictive model for an outcome that genuinely matters to a community you're part of (school, sports team, local nonprofit)

  7. Conducting a meta-analysis of multiple published studies on a topic and synthesizing their findings statistically

How should you choose the right data science project in high school?

With this many options, the hardest part is often deciding where to start. A few questions worth asking yourself: 

  1. Does this project use a skill I haven't tried yet, or does it deepen a skill I'm still shaky on?

  2. Is there a real dataset available, and is it large and clean enough (or appropriately messy) for what I want to do? 

  3. Can I clearly state, in one sentence, the question I'm trying to answer?

If you can't answer that last question clearly, the project needs more thinking before you open a notebook. The strongest data science work, at any level, starts with a super sharp question.

How can you get feedback on your work?

The single biggest difference between a project that looks like a tutorial exercise and one that looks like real research is feedback from someone who knows what they're looking at. A mentor can tell you when your model's high accuracy is actually a sign of data leakage, when your visualization is technically correct but misleading, or when your interesting result needs one more robustness check before you can trust it.

While a mentor can be a teacher at school, a community on Reddit, or even a senior classmate, mentorship + a structured program to help you work on your project is the best way to succeed.

Veritas AI offers two structured paths depending on where you are. The AI Scholars program is a 10-week, mentor-led bootcamp where you build a real project in a small group, covering Python, machine learning fundamentals, and data analysis from the ground up. If you’re ready to go deeper, the AI Fellowship pairs you one-on-one with a mentor from a top university for 12 to 15 weeks of original research, with past projects spanning healthcare, finance, and computer vision, and direct support from Veritas AI's publication team to help you submit your work to a high school research journal.

If any of the ideas on this list are the ones you keep coming back to, that's usually a sign it's worth doing properly, with someone who can help you do it right.

Explore Veritas AI's programs here!

Frequently Asked Questions

What are good data science project ideas for high school students?

The strongest projects combine an accessible dataset with a genuinely open question, not just a demonstration of a technique. EDA and regression projects are the most accessible starting point. NLP, computer vision, and time series projects offer more advanced challenges once you have the fundamentals down. The best topic is usually one connected to something you're already interested in, whether that's sports, healthcare, music, or your own community.

Where can I find datasets for data science projects?

Kaggle, Google's Dataset Search, Data.gov, the UCI Machine Learning Repository, FiveThirtyEight's data portal, and World Bank Open Data are all strong starting points. For more original projects, government open data portals (city and state level) and public APIs from organizations like NOAA, CDC, and the Bureau of Labor Statistics offer less commonly used data that can lead to more original findings.

How long should a data science project take?

A solid EDA or regression project can be completed in one to two weeks with focused effort. Machine learning projects with proper evaluation and tuning typically take two to four weeks. The most ambitious original research projects, the kind suitable for a science fair or journal submission, often take two to three months including iteration and write-up.

Do I need to know advanced math to do data science projects?

A working understanding of statistics (distributions, hypothesis testing, correlation versus causation) is essential for almost every project on this list. Linear algebra and calculus become more important as you move into deep learning and more advanced model architectures, but plenty of genuinely strong projects, especially in EDA, regression, and classification, are accessible with a solid statistics foundation and working Python skills.

Can a high school data science project be published?

Yes. Journals specifically for high school research, such as the Journal of Emerging Investigators, accept rigorous data science research. Original projects with a clear research question, honest methodology, and a defensible conclusion have a real shot, particularly when developed with mentorship from someone who can help you meet the bar for publication.

P.S. Once you've picked a data science project, our guide to building a data science portfolio covers how to structure, host, and present your work. We've also put together a list of machine learning projects for high school students if you want to go deeper on the ML side specifically, and a roundup of data science programs for high school students in California if you're looking for structured, in-person options at the biggest tech hub in the world!

Tyler Moulton

Tyler Moulton is Head of Academics and Veritas AI Partnerships with 6 years of experience in education consulting, teaching, and astronomy research at Harvard and the University of Cambridge, where they developed a passion for machine learning and artificial intelligence. Tyler is passionate about connecting high-achieving students to advanced AI techniques and helping them build independent, real-world projects in the field of AI!

Next
Next

200+ Best Chemistry Project Ideas for High School Students (All Skill Levels)