Flagging Polypharmacy Risks in Adverse Event Reports
ISEF Category: Computational Biology and Bioinformatics
Ready to Turn This Idea Into a Real Project?
This guide was put together with the help of AI research tools to give you a solid starting point. But a competitive science fair project lives in the details: refining your research question, fine-tuning your variables, analyzing your data, and presenting your findings like a seasoned scientist.
For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →
Subcategory: Computational Pharmacology · Difficulty: Advanced · Setup: University Lab · Time: Full Year
The Hook
A single side effect report can hide a whole medication puzzle. An older adult might take a prescription drug, two over-the-counter medicines, and a supplement, then report only the final symptom. Your model can help spot the pattern before a human reader misses it. That matters because small drug mixes can add up fast.
What Is It?
This project uses text data from FAERS, the FDA Adverse Event Reporting System. FAERS contains reports about bad reactions, and many of them are written as short narratives. You can train a language model to scan those narratives and flag clusters of medicines that often appear together in older adults.
Think of it like sorting a messy drawer of receipts. A person can read one report and miss the repeated pattern, but a model can compare many reports at once. Your goal is not to diagnose patients. Your goal is to find combinations of prescription drugs, over-the-counter products, and supplements that show up with certain adverse events more often than expected.
Why This Is a Good Topic
This is a strong science fair topic because you can test a clear question with public data and measurable outputs. You can compare model performance, compare different text features, and study whether your system finds known risky combinations before it flags new ones. It connects to a real problem, because older adults often use several medicines at once and report lists can be messy or incomplete. You can learn data cleaning, NLP, validation, and statistics in one project.
Research Questions
- How does fine-tuning a language model on FAERS narratives change its ability to flag polypharmacy clusters compared with a baseline text model?
- What is the effect of including over-the-counter drugs and supplements on the number of flagged adverse-event clusters in older adults?
- Does separating medication names from symptom descriptions improve cluster detection accuracy in FAERS narratives?
- To what extent do model flags match known high-risk medication combinations listed in public drug interaction references?
- Which narrative features, such as medication count, supplement mentions, or symptom severity words, most strongly predict a flagged cluster?
- What is the effect of age filtering on the precision of cluster detection in FAERS reports?
Basic Materials
- Public FAERS data files from the FDA website.
- A laptop or desktop computer with enough storage for text data.
- Python installed with pandas, scikit-learn, NumPy, and spaCy.
- A spreadsheet program such as LibreOffice Calc or Google Sheets for quick review.
- A note-taking system for manual annotation of sample reports.
- A small set of public drug reference tables from NIH or FDA sources for validation.
Advanced Materials
- Access to cleaned FAERS extracts and code for large-scale text processing.
- A GPU-enabled workstation or cloud notebook for model fine-tuning.
- Python with Hugging Face Transformers, PyTorch, scikit-learn, pandas, and spaCy.
- Annotation software such as doccano for labeling narratives.
- A database or parquet workflow for joining reports, drug names, and demographics.
- A reference set from DrugBank, NIH DailyMed, or FDA labeling data for comparison.
Software & Tools
- Python: Processes FAERS narratives, builds features, and runs model training and evaluation.
- Jupyter Notebook: Lets you document cleaning steps, plots, and validation checks in one place.
- Hugging Face Transformers: Supports fine-tuning and testing language models on report text.
- spaCy: Helps extract medication names, symptom phrases, and simple text features.
- ImageJ: Not used for this topic, so leave it out of the workflow.
Experiment Steps
- Define the exact prediction task, such as flagging reports that mention likely medication clusters in older adults.
- Build a clean text pipeline that separates drug names, supplement names, and symptom descriptions.
- Create a labeled test set with a clear rule for what counts as a known, suspected, or novel cluster.
- Compare a simple baseline model with a fine-tuned language model so you can measure the added value of NLP.
- Plan validation against public drug interaction references and manual spot checks to catch false alarms.
- Design your analysis so you can report precision, recall, and error patterns for OTC and supplement combinations separately.
Common Pitfalls
- Using raw FAERS text without cleaning drug names, which causes the same medicine to appear under several spellings.
- Treating every co-mentioned drug as a true interaction, which inflates false positives.
- Mixing symptom words with medication words, which makes the model learn the wrong signal.
- Ignoring age filters and then claiming the results apply to elderly patients.
- Skipping manual review of flagged clusters, which hides obvious labeling and extraction errors.
What Makes This Competitive
A stronger project goes beyond simple pattern matching. You can compare multiple model types, test a real baseline, and show where the system fails on OTC and supplement combinations. You can also separate known interactions from new clusters and measure whether the model finds signals that a rules-only method misses. Careful validation and clean error analysis matter more than a flashy model name.
Project Variations
- Focus only on elderly reports that mention herbal supplements, then test whether supplement-heavy narratives are easier or harder to flag.
- Compare prescription-only clusters with prescription plus OTC clusters to see which group the model detects better.
- Swap the language model for a rules-based baseline and test whether symptom context improves false-positive control.
Learn More
- FDA FAERS Public Dashboard: Search adverse-event reports and learn how reports are structured on the FDA site.
- NIH DailyMed: Look up official drug labels and adverse effect lists for validation on the NIH database.
- PubMed: Search for review articles on pharmacovigilance, polypharmacy, and adverse event text mining.
- Hugging Face Course: Read the free documentation on fine-tuning transformer models and text classification.
- MIT OpenCourseWare, Introduction to Machine Learning: Find free lecture materials on model evaluation, overfitting, and classification metrics.
- FDA MedWatch: Review reporting guidance and adverse event terminology on the FDA website.
Computational Biology and Bioinformatics Category Guide
How to Do Real Computational Biology Research at Home: A High School Student’s Guide to Free Tools, Affordable Kits, and Public Databases →For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →
To discover more projects, visit the MehtA+ Science Fair Project Discovery Hub →
