Network Pharmacology for Diabetes Herb Formulas

ISEF Category: Computational Biology and Bioinformatics

Ready to Turn This Idea Into a Real Project?

This guide was put together with the help of AI research tools to give you a solid starting point. But a competitive science fair project lives in the details: refining your research question, fine-tuning your variables, analyzing your data, and presenting your findings like a seasoned scientist.

For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →

Subcategory: Computational Pharmacology · Difficulty: Advanced · Setup: Home Setup · Time: 1 to 2 Months

The Hook

A single herbal formula can contain dozens of compounds. That sounds messy, but it can also be a map. With public databases, you can trace which molecules may hit which proteins, then ask whether the whole formula may act on type-2 diabetes through several pathways at once.

What Is It?

Network pharmacology is a way to study how many compounds may affect many targets at the same time. Instead of asking, “Does this herb work?” you ask, “Which molecules in this formula might bind which proteins, and which disease pathways do those proteins belong to?” Think of it like a subway map. Each compound is a rider, each protein is a stop, and the pathways are the transfer lines that connect them.

For type-2 diabetes, that matters because the disease does not run on one broken switch. Blood sugar control involves insulin signaling, inflammation, glucose transport, lipid metabolism, and more. A polyherbal formula may contain compounds that touch several of those systems. Your project tests that idea in silico, which means you use databases and analysis tools instead of a wet lab. You can compare the predicted targets from KEGG and STRING with known bioactivity records from ChEMBL, then ask whether the pattern looks stronger than random.

Why This Is a Good Topic

This is a strong science fair topic because you can ask clear, testable questions with public data only. You can measure overlap, enrichment, network degree, and pathway coverage, so your results are not just opinions about traditional medicine. The topic connects to a real problem, type-2 diabetes, and it teaches skills that matter in modern bioinformatics, like target prioritization, pathway analysis, and database validation. You can also narrow the project to one formula, one disease, and one comparison group, which keeps the scope realistic.

Research Questions

How does the predicted target overlap differ between one Ayurvedic formula and one TCM formula for type-2 diabetes?
What is the effect of changing the compound-filtering rules on the number of high-confidence targets?
Does the formula’s target set show stronger KEGG pathway enrichment for insulin signaling than matched random compound sets?
To what extent do STRING protein-protein interaction clusters concentrate around diabetes-related proteins?
Which compounds contribute the most ChEMBL-supported bioactivity matches to the predicted target network?
How does the network centrality of shared targets compare with formula-specific targets?

Basic Materials

Laptop or desktop computer with internet access.
Spreadsheet software such as Google Sheets or Excel.
PubChem search access for compound IDs and structures.
KEGG pathway pages for target and pathway lookup.
STRING web tool for protein interaction networks.
ChEMBL database access for bioactivity records.
A note-taking app or document editor for tracking search terms and filters.
A reference manager such as Zotero for saving papers and database citations.

Advanced Materials

Laptop or desktop computer with internet access.
Python installed with pandas, networkx, scipy, matplotlib, and seaborn.
Jupyter Notebook for reproducible analysis.
Cytoscape for visualizing compound-target and protein networks.
R with clusterProfiler or similar enrichment packages.
Access to bulk downloads from ChEMBL, STRING, and KEGG pathway files where available.
PubChem or ChEMBL API access for automated compound and bioactivity retrieval.
Optional access to data from GEO or DisGeNET for disease comparison sets.

Software & Tools

Cytoscape: Visualizes compound-target and protein interaction networks so you can spot hubs and clusters.
Python: Organizes database hits, filters compound lists, and runs network statistics.
Jupyter Notebook: Keeps code, notes, and figures in one reproducible analysis file.
STRING: Finds known and predicted protein-protein interactions for your target set.
PubChem: Helps you standardize compound names, structures, and identifiers before analysis.

Experiment Steps

Define one formula, one diabetes outcome, and one comparison group so your project has a tight scope.
Build a clean compound list from public databases and decide which identity fields count as a match.
Map each compound to predicted or known protein targets, then set rules for keeping only high-confidence links.
Connect the target list to KEGG pathways and STRING interactions so you can test whether the network clusters around diabetes biology.
Plan a validation step using ChEMBL bioactivity records to see whether your predicted targets already have support in public assay data.
Choose summary metrics before you start, such as pathway count, network degree, enrichment score, and overlap with known diabetes targets.

Common Pitfalls

Mixing compound names from different databases, which creates duplicate records and false target counts.
Treating every predicted target as equal evidence, which makes weak hits look as strong as validated ones.
Skipping identifier cleanup for genes and proteins, which breaks KEGG and STRING matching.
Comparing two formulas with different list sizes without normalization, which can make one network look better only because it is larger.
Reading network hubs as proof of real clinical synergy, which overstates what in silico results can support.

What Makes This Competitive

A stronger project goes beyond making a pretty network map. You can compare several formulas, use a matched random-control set, and test whether the observed pathway overlap beats chance. You can also separate prediction from validation by checking ChEMBL support for the most central targets. Clear logic, careful normalization, and a strong statistical comparison will make the work look much more serious.

Project Variations

Compare one Ayurvedic formula with one TCM formula and test which has broader diabetes pathway coverage.
Swap type-2 diabetes for prediabetes or obesity-related insulin resistance and see whether the target network changes.
Focus on one herb pair inside a formula and test whether the pair explains more of the network than either herb alone.

Learn More

NCBI PubMed: Search review articles on network pharmacology, polyherbal formulas, and type-2 diabetes to find background and methods.
ChEMBL: Use the database to look up bioactivity records for predicted targets and compounds.
STRING: Explore protein-protein interaction networks and read the help pages for scoring and confidence settings.
KEGG: Search pathway maps for insulin signaling, glucose metabolism, and related disease pathways.
PubChem: Find compound identifiers, structures, and standardized names for herbal ingredients.
MIT OpenCourseWare: Search biology, bioinformatics, or data analysis course materials for free background on network analysis and statistics.

Computational Biology and Bioinformatics Category Guide

How to Do Real Computational Biology Research at Home: A High School Student’s Guide to Free Tools, Affordable Kits, and Public Databases →

For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →

To discover more projects, visit the MehtA+ Science Fair Project Discovery Hub →