15 Data Science Research Ideas for High School Students

If you’re a high school student interested in technology, statistics, or problem-solving, data science offers a powerful way to turn curiosity into a meaningful independent project. Data science can be applied to math, computer science, and decision-making, making it especially well-suited for research projects that can start small and grow in complexity over time. Rather than locking yourself into a single idea right away, exploring a range of research directions allows you to better understand what excites you. Surveying multiple ideas also helps you assess feasibility, available datasets, and skill requirements before committing to a topic that you’ll spend weeks or months developing.

Why should I do data science research in high school?

Conducting data science research in high school helps you build both academic and personal skills that extend far beyond a single project. Designing an independent research study requires time management, organization, and persistence, while analyzing datasets strengthens critical thinking, statistical reasoning, and problem-solving abilities. You also gain experience asking meaningful questions, evaluating evidence, and communicating results clearly. 

Below are 15 data science research ideas that high school students can explore, adapt, or use as inspiration for their own independent projects.

If you’re looking for online STEM programs, check out our blog here.

1. Social Media & Natural Language Processing (NLP)

Social media platforms generate an enormous volume of text every day, shaping public opinion, influencing behavior, and reflecting how people think, feel, and communicate in real time. Natural Language Processing enables studying this data at scale, moving beyond surface-level trends to uncover sentiment shifts, misinformation patterns, online bias, and digital well-being concerns. For student researchers, this space is especially compelling because it encompasses technology, psychology, media, and ethics, and provides access to rich, publicly available datasets. Research in this area can involve analyzing language patterns, building models to classify or predict behavior, or examining how online discourse changes across events, communities, or platforms.

  1. Examine how social media sentiment shifts before, during, and after significant public events: Focus on patterns around events such as elections, product launches, or global crises by applying NLP sentiment analysis techniques.

  2. Build a text classification model to detect toxic or harmful language: Study its prevalence and how it varies across platforms, topics, or time periods.

  3. Examine linguistic differences between viral and non-viral posts: Identify language features such as emotion, readability, or framing that contribute to higher engagement.

2. Analyzing Weather Patterns and Climate Trends

Understand how weather and climate behave over time using data from global sources like NOAA. This information is crucial for forecasting storms and assisting farmers with crop planning, as well as for understanding long-term societal impacts like global warming and extreme weather events. Recent scientific studies indicate that global average temperatures have increased substantially in recent times — approximately 1.5 °C above pre-industrial levels — signaling a persistent warming trend caused by greenhouse gas emissions. By analyzing weather and climate data using time-series analysis and other statistical methods, you can track patterns, identify anomalies, and even make projections of future conditions. 

  1. Examine trends in global temperature change using time-series data: Collect historical temperature records (e.g., from NOAA or NASA datasets) and apply methods such as moving averages or ARIMA models to detect long-term warming trends and seasonal patterns.

  2. Identify recurring anomaly patterns in temperature and precipitation data: subtle changes such as warmer nighttime temperatures, irregular rainfall gaps, or rising humidity that tend to occur before major events like heatwaves or floods, to examine whether these shifts are indicators of large-scale disruptions.

  3. Compare temperature and precipitation data from major cities and nearby rural areas: Analyze how urban expansion influences local climate behavior over time. This research could explore whether certain urban characteristics, such as green cover, population density, or land use, intensify or reduce long-term warming trends.

3. Healthcare Data and Disease Trends

Healthcare systems generate massive amounts of data every day. From hospital admissions to disease surveillance reports, analyzing this data helps identify patterns that can improve early detection, resource planning, and public health responses. Studying disease trends using data science enables students to understand how illnesses spread and how timely interventions can reduce their impact. As global health threats become more frequent, data-driven insights are central to modern medicine and public policy. 

  1. Tracking Early Warning Signals of Disease Outbreaks: Analyze historical public health datasets to identify subtle changes (e.g., rising symptom-related searches, clinic visits, or regional case clusters) that precede officially declared outbreaks.

  2. Studying Seasonal and Regional Disease Patterns: Examine how diseases such as influenza, dengue, and asthma-related hospitalizations vary across seasons and locations, and assess which environmental or demographic factors most strongly influence these trends.

  3. Predicting Healthcare System Strain Using Admission Data: Use hospital or emergency room admission data to study patterns that precede overcrowding or shortages, to identify indicators that could help hospitals prepare in advance.

4. Predicting Housing Prices Using Data Models

Housing prices are influenced by a complex mix of economic, social, and environmental factors, making them a strong real-world problem for data-driven analysis. You can study how models translate messy information into predictions that affect policy, affordability, and everyday decisions. This topic also introduces core ideas in regression, feature selection, and bias, skills central to data science across industries. 

  1. Identifying Hidden Drivers of Price Spikes: Analyze housing datasets and examine less obvious factors, such as proximity to new transit projects, zoning changes, or school district reassignments, that can drive sudden increases in home prices.

  2. Comparing Traditional Regression vs. Machine Learning Models: Build and compare models such as linear regression, decision trees, and random forests to evaluate which approaches best capture non-linear relationships in housing price data across different neighborhoods.

  3. Detecting Early Signs of Housing Market Cooling or Overheating: Examine historical price, inventory, and time-on-market data to identify recurring patterns that precede market slowdowns or overheating, to better understand market stability and risk.

5. Sports Performance Analytics

Sports generate massive amounts of data, like player statistics and biometric metrics. Performance analytics helps teams, analysts, and even bettors make evidence-based decisions rather than relying solely on intuition. For high school researchers, this topic is appealing because the data is often publicly available and directly tied to outcomes people care about: wins, injuries, and predictions. It also introduces students to modeling uncertainty, probability, and real-world decision-making under pressure.

  1. Predicting Player Performance Under Different Conditions: Analyze how factors such as rest days, travel schedules, weather, or home vs. away games influence player performance metrics, and model which conditions consistently lead to performance drops or spikes.

  2. Evaluating the Accuracy of Betting Odds: Compare pre-game betting odds with actual game outcomes to identify systematic biases or patterns, such as teams that are consistently undervalued or overvalued, and test whether simple data models can outperform baseline odds.

  3. Injury Risk and Load Management Analysis: Use historical playing time, performance, and injury data to identify workload patterns that often precede injuries, to understand how data-driven load management could extend athletes' longevity.

6. Urban Data & Smart Cities

Cities produce vast amounts of data daily, ranging from traffic signals and air-quality sensors to public transport logs and energy consumption. Studying urban data helps one understand how infrastructure, policy, and human behavior intersect in real life. As cities grow denser and more climate-stressed, data-driven planning is becoming essential to improving mobility, sustainability, and quality of life.

  1. Detecting “Invisible Congestion” Zones in Cities: Examine traffic speed, stop frequency, and time-of-day data to find areas where congestion isn't officially labeled as traffic jams but still causes persistent slowdowns. For example, congestion near schools, hospitals, or informal crossings.

  2. Mapping Urban Inequality Using Service Access Data: Combine datasets on public transport coverage, healthcare facilities, green spaces, and emergency response times to examine whether certain neighborhoods are systematically underserved compared to others.

  3. Studying Urban Heat Islands at a Micro Level: Use temperature, building density, tree cover, and surface material data to identify street-level heat traps and examine how urban design choices contribute to localized temperature extremes.

7. Financial Markets & Behavioral Economics

Traditional finance assumes that people act rationally, but real markets are driven just as much by emotion, bias, and social influence as by numbers. Behavioral economics uses data to study how fear, overconfidence, herd behavior, and media narratives shape financial decisions, often leading to bubbles, crashes, or irrational trading patterns. With the rise of retail investing apps and online finance communities, this field is more relevant than ever. Research here allows students to combine data analysis with psychology and real-world economic behavior.

  1. Measuring Herd Behavior During Market Volatility: Analyze trading volume, price movements, and timing data to determine whether retail investors tend to buy or sell simultaneously during sharp market swings, suggesting crowd-driven decision-making rather than independent analysis.

  2. Studying the Impact of News Sentiment on Stock Price Movement: Use sentiment analysis on financial news headlines or earnings reports and examine whether shifts in tone precede abnormal returns or increased volatility in specific stocks or sectors.

  3. Analyzing Overconfidence in Retail Trading Patterns: Examine patterns such as frequent trading, short holding periods, or repeated re-entry into losing positions to study whether higher trading activity correlates with lower long-term returns.

8. Education & Learning Analytics

Education systems generate massive amounts of data, but most of it is still used only for grading. Learning analytics uses data to uncover invisible patterns in attention, fatigue, motivation, and conceptual breakdowns that traditional exams miss. With online platforms, recorded lectures, and digital assignments becoming common, students can now study learning itself as a measurable system. This kind of research sits at the intersection of data science, psychology, and human behavior.

  1. Detecting Early Cognitive Fatigue in Online Learning Sessions: Analyze timestamps, pause frequency, replay behavior, and quiz accuracy during recorded lessons to identify recurring patterns that signal when students' learning efficiency drops.

  2. Mapping Conceptual Bottlenecks Across a Semester: Track which specific concepts consistently trigger repeated errors, rewatching behavior, or delayed submissions across multiple students to identify “hidden choke points” in a curriculum that aren’t obvious from final grades alone.

  3. Peer Influence Networks in Collaborative Classrooms: Use discussion forum replies, group project interactions, or peer feedback data to map influence networks and study whether learning outcomes correlate more strongly with peer exposure than instructor input.

9. Climate Risk & Environmental Impact Modeling

Today, climate change is also a risk distribution problem that affects communities unevenly based on geography, income, and infrastructure. With the rise of open climate, satellite, and public health datasets, students can study how environmental exposure translates into real-world consequences, such as displacement, illness, or economic loss. Climate risk modeling allows researchers to move from “what is happening” to “who is impacted, when, and why.”

  1. Mapping Micro-Climate Inequality Within a Single City: Combine satellite heat data, tree cover, and income or housing density data to identify neighborhoods that consistently experience higher temperatures, and analyze how these patterns align with historical zoning or infrastructure decisions.

  2. Predicting Flood Risk Using Everyday Urban Signals: Use rainfall data alongside drainage maps, road elevations, and complaint reports (e.g., waterlogging or sewage overflows) to model which streets or blocks flood repeatedly and whether these risks are underreported in official flood maps.

  3. Estimating Health Exposure From Air Quality Fluctuations: Analyze short-term spikes in air pollution (PM2.5 or ozone) and model how frequently schools, hospitals, or residential areas are exposed to unsafe levels, focusing on cumulative exposure rather than yearly averages.

10. Consumer Behaviour and Recommendation Systems

Recommendation systems influence daily choices, such as what we watch before bed or what we purchase when we’re tired, bored, or in a rush. Although intended to assist, they frequently focus more on boosting engagement than on providing genuinely helpful suggestions, leading to repetitive recommendations, impulsive buying, and fatigue from overconsumption. Studying consumer behavior through data allows students to examine where recommendations genuinely assist users, and where they nudge behavior in subtle, sometimes unintended ways.

  1. Detecting “Recommendation Fatigue” in Streaming Platforms: Analyze viewing history data across entertainment apps such as Netflix, Prime, Spotify, and Apple Music to detect patterns where platforms frequently recommend the same genres or titles. Study how excessive personalization might limit user exploration and impact long-term satisfaction.

  2. Impulse Buying Triggers in E-Commerce Browsing Sessions: Examine how time of day, discounts, or “limited stock” labels correlate with unplanned purchases, focusing on moments when users are most likely to buy things they didn’t initially search for.

  3. Bias Toward Popular Products in Recommendation Feeds: Study whether recommendation systems disproportionately push already popular items, making it harder for new or niche products to surface, mirroring how small brands struggle to compete with big names online.

11. Cybersecurity & Fraud Detection

As more of everyday life moves online (payments, messaging, logins, and identity), cyber fraud has become about repeated patterns that often go unnoticed. Data science plays a critical role in detecting these small irregularities at scale, where human monitoring fails. For high school researchers, this area is especially compelling because it combines real-world relevance with practical techniques like anomaly detection, pattern analysis, and behavioral modeling.

  1. Detecting Unusual Spending Sequences in Credit Card Transactions: Analyze transaction data to identify patterns such as rapid micro-transactions, sudden category shifts, or geographically inconsistent purchases that often precede confirmed fraud cases.

  2. Identifying Phishing Emails Through Language and Timing Patterns: Examine email metadata and text features (subject lines, urgency cues, sender domains, send times) to distinguish phishing attempts from legitimate communication using classification models.

  3. Spotting Fake or Automated Accounts on Social Platforms: Study account behavior such as posting frequency, content repetition, follower-following ratios, or login timing to identify bot-like or coordinated inauthentic activity.

12. Transportation and Mobility Analytics

How people move through cities reveals far more than just traffic patterns. It reflects safety, access, inequality, and daily decision-making. With ride-sharing apps, navigation tools, and public transit systems generating massive datasets, transportation has become one of the most tangible ways data science shapes everyday life. For a student researcher, this space is powerful because it connects numbers directly to lived experiences: missed buses, unsafe roads, long commutes, and rising fuel costs.

  1. Mapping “Invisible” Accident Risk Zones Beyond Official Hotspots: Go beyond recorded crash locations by analyzing near-miss indicators such as sudden braking, sharp decelerations, or repeated congestion spikes from traffic sensor or GPS data to identify roads that are risky but underreported.

  2. Studying Ride-Sharing Price Surges as a Proxy for Urban Stress: Analyze surge pricing patterns during rain, festivals, exams, or late-night hours to understand how demand, weather, and social events strain a city’s transport infrastructure.

  3. Evaluating Public Transit Reliability from a Student’s Daily Commute: Use bus or train arrival data to measure variability, delays, and missed connections during peak school or office hours, and assess how unpredictability affects punctuality and route choices.

13. Bias, Fairness, and Ethics in Data Models

As algorithms increasingly influence decisions around college admissions, hiring, loans, content visibility, and even policing, questions of fairness and bias are no longer abstract. Data models often inherit hidden biases from historical data, design choices, or proxy variables, unintentionally disadvantaging certain groups. This makes bias and ethics a particularly meaningful research area for high school students, because it combines technical analysis with social awareness and critical thinking. Research here isn’t about building “better” models alone, but about questioning what those models assume and who they serve.

  1. Detecting Socioeconomic Bias in College Admissions Prediction Tools: Analyze publicly available admissions datasets or simulated applicant profiles to see whether variables like zip code, school type, or extracurricular access disproportionately affect acceptance predictions, even when academic performance is similar.

  2. Examining Gender Bias in Resume-Screening or Skill-Matching Algorithms: Use synthetic resumes with identical qualifications but different names or pronouns to test whether automated screening models rank candidates differently based on perceived gender.

  3. Auditing Recommendation Algorithms for Content Polarization: Study how recommendation systems on platforms like video or news apps amplify extreme or repetitive viewpoints over time by tracking how suggested content shifts after a user interacts with neutral versus opinionated material.

14. Human Behavior & Social Patterns

Every day, digital traces such as sleep logs, screen time, location data, and app activity record how people behave, cope, and adapt. Analyzing these patterns allows students to move beyond abstract theories and examine how habits shift under stress, in routines, or during shared excitement. This field of data science is especially valuable because it connects quantitative data to lived experiences like burnout, focus, anxiety, motivation, and social influence. Research can uncover subtle behavioral rhythms that are often unnoticed on an individual level but become clear when viewed at a larger scale.

  1. Sleep Disruption Patterns Around High-Stress Academic Periods: Analyze wearable or self-reported sleep data from students across regular weeks versus exam weeks to identify recurring changes in sleep duration, consistency, and recovery, and examine whether certain patterns predict burnout or declining performance.

  2. Screen Time Fragmentation and Attention Decay: Study not just total screen time, but how frequently users switch between apps within short intervals, to explore whether fragmented usage correlates more strongly with perceived mental fatigue than overall hours spent online.

  3. Mobility and Routine Shifts During Major Collective Events: Use anonymized mobility or location data to examine how daily movement patterns change during events like elections, lockdown announcements, or major sports finals, focusing on whether routines return to baseline or permanently shift afterward.

15. Decision-Making Under Uncertainty

Daily life involves dealing with imperfect information, such as deciding when to buy, wait, quit, or act. Data science helps students analyze how people actually make decisions under uncertainty, rather than how they believe they do. This subject is particularly engaging because it combines behavioral science with real-world data, uncovering patterns of risk, delay, regret, and overconfidence that appear across various ages and situations.

  1. Waiting vs Acting Behavior in Price Drops: Analyze historical price data for products like smartphones, flight tickets, or event passes alongside purchase timing to study when people choose to buy, and whether waiting longer actually leads to better outcomes or missed opportunities.

  2. Overconfidence in Prediction-Based Choices: Examine datasets where users make predictions (sports brackets, stock simulators, fantasy leagues) to identify how confidence levels shift after wins or losses, and whether people systematically overestimate future performance after short-term success.

  3. Abandonment Points in Long-Term Goals: Use progress-tracking data from fitness apps, language-learning platforms, or savings challenges to identify common drop-off points, and analyze whether early setbacks or plateaus are stronger predictors of quitting than overall difficulty.

If you’re looking to build a project/research paper in the field of AI & ML, consider applying to Veritas AI! 

With Veritas AI, which was founded by Harvard graduate students, you can work 1-on-1 with mentors from universities like Harvard, Stanford, MIT, and more to create unique, personalized projects. In the past year, we had over 1000 students learn AI & ML with us. Check out a past student’s experience in the program here. You can apply here!

Image source - Veritas AI

Tyler Moulton

Tyler Moulton is Head of Academics and Veritas AI Partnerships with 6 years of experience in education consulting, teaching, and astronomy research at Harvard and the University of Cambridge, where they developed a passion for machine learning and artificial intelligence. Tyler is passionate about connecting high-achieving students to advanced AI techniques and helping them build independent, real-world projects in the field of AI!

Previous
Previous

15 STEM Programs for Middle School Students in Texas

Next
Next

11 Business Programs for High School Students in Washington State