Circular RNA Patterns in Colorectal Cancer Stages

ISEF Category: Cellular and Molecular Biology

Ready to Turn This Idea Into a Real Project?

This guide was put together with the help of AI research tools to give you a solid starting point. But a competitive science fair project lives in the details: refining your research question, fine-tuning your variables, analyzing your data, and presenting your findings like a seasoned scientist.

For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →

Subcategory: Molecular Biology · Difficulty: Advanced · Setup: University Lab · Time: Full Year

The Hook

Some molecules in cancer act like tiny rings, not straight strings. Those rings can stay more stable than normal RNA, which makes them interesting markers for disease stage. If you can find which ones change early, you may help spot cancer sooner. Public data lets you test that idea without a hospital lab.

What Is It?

Circular RNAs, or circRNAs, are RNA molecules that form a loop. Most RNA looks like a loose strand, but circRNAs close back on themselves like a bracelet. That shape can make them harder to break down, so they may last longer in cells and body fluids.

Your project asks whether certain circRNAs appear differently in early-stage versus late-stage colorectal cancer. You use public cancer data from TCGA and known circRNA records from circBase, then compare patterns across stages. Think of it like sorting songs by tempo and mood, then asking which tracks show up more often in one playlist than another.

A graph-attention classifier adds a machine-learning layer. It looks for relationships among genes or RNA features, then gives more weight to the most useful links. The structural-stability part asks which circRNAs may resist degradation better, which matters if you want a biomarker that could survive long enough to detect disease.

Why This Is a Good Topic

This is a strong science fair topic because you can ask a clear, testable question with public data. You do not need to grow cells or run expensive wet-lab assays. You can still do real research by cleaning data, comparing groups, and testing whether a machine-learning model can separate early-stage from late-stage cases. The topic also connects to cancer detection, biomarker discovery, and RNA biology, so your work has a real medical angle.

Research Questions

How does circRNA expression differ between early-stage and late-stage colorectal cancer in TCGA data?
What is the effect of adding circBase annotation on the accuracy of a stage-classification model?
Does a graph-attention classifier outperform a standard random forest or logistic regression model on the same circRNA features?
To what extent do structurally stable circRNAs show stronger stage-specific expression patterns than less stable circRNAs?
Which circRNAs remain most informative after controlling for batch effects and sample imbalance?
How does the choice of feature selection threshold change classifier performance and biomarker ranking?

Basic Materials

Computer with at least 16 GB RAM.
Internet access for downloading public cancer datasets.
Spreadsheet software for organizing sample metadata.
Python installed with Jupyter Notebook.
Conda or another package manager for setting up analysis environments.
Access to TCGA data through the GDC Data Portal.
circBase database records for circRNA annotation.
Basic statistics reference guide or textbook.
External storage for large data files.

Advanced Materials

Workstation or cloud server with a GPU for model training.
Python environment with PyTorch or TensorFlow.
Network analysis library for graph-based feature modeling.
RNA structure prediction tool for stability comparison.
Data visualization package for stage and feature plots.
Version control system for tracking code changes.
Command-line tools for large-scale data cleaning.
Access to additional public cancer cohorts for validation.

Software & Tools

Python: Handles data cleaning, statistics, plotting, and model building for the full project.
Jupyter Notebook: Keeps code, notes, and figures in one place while you analyze the data.
Pandas: Organizes sample tables and expression matrices for filtering and merging.
Scikit-learn: Tests baseline classifiers and compares them with your graph model.
PyTorch Geometric: Builds graph-attention models if you want a more advanced classifier.
ImageJ: Measures and compares visual outputs if you export plots or structure images for presentation.

Experiment Steps

Define the exact stage comparison you will test, then decide which samples count as early-stage and late-stage cases.
Build a clean data table that links circRNA annotations, expression values, and sample metadata.
Choose a baseline model first, then decide what the graph-attention model adds beyond that baseline.
Plan your feature selection rule, so you can compare the same candidate circRNAs across models.
Set up a validation strategy that protects you from overfitting, especially if your sample count is uneven.
Design a ranking method for structural stability, then decide how you will compare that ranking with expression changes.

Common Pitfalls

Mixing stage labels from different sources, which can put the wrong samples into the early or late group.
Treating circBase entries as if they are all directly measured in TCGA, which can create false matches.
Building a classifier on too many features for too few samples, which makes the model memorize noise.
Ignoring batch effects or platform differences, which can make stage signals look stronger than they really are.
Ranking circRNAs by stability without explaining how the stability score was generated, which weakens the biology behind your final claim.

What Makes This Competitive

A stronger project goes beyond a simple differential-expression list. You would compare at least one baseline model with the graph-attention model, then show why the newer method helps. You would also test your top circRNA candidates on an independent public cohort or a strict cross-validation setup. Clear biological interpretation, careful validation, and a smart stability analysis can lift the project from data summary to real research.

Project Variations

Compare circRNA patterns across colon cancer and rectal cancer instead of grouping all colorectal cases together.
Test whether circRNA signatures separate tumor tissue from normal tissue before you compare cancer stages.
Add an external public dataset and see whether the same circRNAs still rank high in a second cohort.

Learn More

PubMed: Search review articles on circular RNA biomarkers, colorectal cancer, and RNA stability.
NIH National Cancer Institute Genomic Data Commons: Find TCGA clinical and expression data through the GDC Data Portal.
circBase: Look up known circular RNA records and annotation details for candidate biomarkers.
NCBI Gene Expression Omnibus: Search for external colorectal cancer datasets that can support validation.
MIT OpenCourseWare: Use free lectures on machine learning, graph models, and biological data analysis.
Nature Reviews Cancer: Search review articles on RNA biomarkers and colorectal cancer biology through journal databases or your school library.

Cellular and Molecular Biology Category Guide

How to Do Real Cellular and Molecular Biology Research at Home: A High School Student’s Guide to Free Tools, Affordable Kits, and Public Databases →

For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →

To discover more projects, visit the MehtA+ Science Fair Project Discovery Hub →