Trust in AI Math Help with Confidence Scores | Project

Trust in AI Math Help with Confidence Scores | Project

ISEF Category: Behavioral and Social Sciences

Ready to Turn This Idea Into a Real Project?

This guide was put together with the help of AI research tools to give you a solid starting point. But a competitive science fair project lives in the details: refining your research question, fine-tuning your variables, analyzing your data, and presenting your findings like a seasoned scientist.

For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →

Subcategory: Other  ·  Difficulty: Intermediate  ·  Setup: Home Setup  ·  Time: 1 to 2 Months

The Hook

A small number can change what people believe. If an AI says it is 92% confident, you may trust it more, even when the number has no real meaning. That makes this topic a strong science fair project because you can test how a simple label changes judgment. You will study trust, bias, and how people react to AI math help.

What Is It?

A large language model, or LLM, is a chat system that predicts the next word in a response. When it explains a math problem, it can sound sure even when it is wrong. A confidence score is a number or label that claims how sure the system is. In your project, that score can be random, which lets you test whether people trust the badge more than the explanation itself.

Think of it like a movie trailer with a star rating slapped on the front. The trailer does not change, but the rating can change how you feel before you watch it. Your study asks whether students give the same math explanation more credit when the score looks high, low, or neutral. That gets at calibration, which means matching confidence with real accuracy.

Why This Is a Good Topic

This is a good science fair topic because you can change one clear thing, the confidence score, and measure one clear outcome, trust. You can run it with surveys and common devices, so you do not need a wet lab. The topic connects to AI in school, online tutoring, and everyday decision-making. You can learn how to design controls, handle human-subject data, and test whether a cue changes judgment even when it should not.

Research Questions

  • How does a random high confidence score change trust ratings for the same AI math explanation?
  • What is the effect of a random low confidence score on willingness to follow the same AI math explanation?
  • Does a confidence score change how accurately students judge whether the explanation is correct?
  • To what extent do prior AI experience and math confidence change the effect of random scores on trust?
  • Which confidence display, percent, badge, or verbal label, leads to the highest trust in the same explanation?
  • How does explanation length interact with confidence scores when students rate the same answer?

Basic Materials

  • Laptop or Chromebook with internet access.
  • Google Forms or Microsoft Forms for the survey.
  • Spreadsheet software such as Google Sheets or Excel.
  • A set of math explanation prompts you create or adapt.
  • Random assignment tool, such as a random number generator.
  • Consent and assent form templates, if your school requires them.

Advanced Materials

  • Qualtrics or REDCap with embedded random assignment.
  • R or Python for mixed-effects models and visualization.
  • PsychoPy for controlled presentation of prompts and response timing.
  • Eye-tracking software or device, if you want attention data.
  • Secure file storage approved for human-subject data.

Software & Tools

  • Google Forms: Collects trust ratings and randomly assigns students to conditions.
  • Google Sheets: Cleans responses and makes quick charts.
  • R: Runs t-tests, regression models, and effect-size estimates.
  • jamovi: Gives a point-and-click way to run basic statistics.

Experiment Steps

  1. Define the exact math explanation you will keep constant across all conditions.
  2. Choose the confidence cues you will compare, such as high, low, and neutral labels.
  3. Build a rating form that separates trust, perceived accuracy, and willingness to use the answer.
  4. Plan your random assignment and control variables, such as math confidence, AI experience, and question difficulty.
  5. Prewrite your analysis plan so you know which score comparisons and subgroup tests matter most.

Common Pitfalls

  • Changing the explanation along with the confidence score, which makes you unable to tell which part caused the trust shift.
  • Matching high confidence with easier questions and low confidence with harder ones, which creates a hidden difficulty effect.
  • Asking one mixed-up question about trust and correctness, which blurs the difference between belief and judgment.
  • Letting students see the answer label after they have already formed an opinion, which weakens the cue you wanted to test.
  • Skipping checks for AI familiarity or math confidence, which can hide the real pattern or make a fake one look real.

What Makes This Competitive

A stronger version of this project does more than compare average trust scores. It checks whether random confidence cues still matter after you control for math ability, AI familiarity, and explanation quality. A factorial design, careful randomization, and a clear analysis plan make the study much stronger. If you also test whether the scores help or hurt confidence calibration, you move closer to real human-subjects research.

Project Variations

  • Compare trust in percent scores, star ratings, and verbal labels such as high confidence to see which cue students believe most.
  • Test the same idea with algebra, geometry, or word-problem explanations to see whether math topic changes the effect.
  • Measure whether a confidence score changes trust, answer copying, or willingness to ask for a second explanation.

Learn More

  • PubMed: Search for review articles on algorithm aversion, trust calibration, and advice taking.
  • PubMed Central: Read full-text open-access papers on human trust in automation and AI explanations.
  • NIH Human Subjects Research Protections: Find consent, privacy, and survey guidance on the NIH website.
  • OpenStax Psychology 2e: Review chapters on judgment, decision-making, and social influence on the OpenStax site.
  • Frontiers in Psychology: Search open-access articles on trust, explanations, and human-AI judgment.

For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →

To discover more projects, visit the MehtA+ Science Fair Hub →

Shopping Cart