RISC-V FPGA Keyword Spotting Speedup

Ready to Turn This Idea Into a Real Project?

This guide was put together with the help of AI research tools to give you a solid starting point. But a competitive science fair project lives in the details: refining your research question, fine-tuning your variables, analyzing your data, and presenting your findings like a seasoned scientist.

For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →

Subcategory: Other · Difficulty: Advanced · Setup: University Lab · Time: Full Year

The Hook

A tiny chip can listen for a wake word and make a decision before you finish saying the next syllable. That speed matters when power is low and every cycle counts. Your project asks a sharp question, can one custom instruction make a small processor much better at keyword spotting?

What Is It?

This project studies a simple idea with a big payoff. You start with a RISC-V soft-core, which means a processor design built in an FPGA instead of a fixed chip. Then you add one custom instruction for int4 matrix multiply, which is a fast way to do the math behind many machine learning models.

Think of it like adding a specialty tool to a toolbox. A normal processor can do the job with many small steps. A custom instruction bundles that work into one step, so the chip may finish faster, use less energy, or both. In this project, you check whether that extra hardware helps on a speech keyword spotting task, which asks a device to recognize short spoken words.

Why This Is a Good Topic

This is a strong science fair topic because you can measure clear outputs, like speed, resource use, and accuracy. You also get a real-world link to edge AI, where tiny devices need to run models without draining a battery. A student can learn how hardware and software trade off, how benchmarks work, and how to compare one design against a baseline with real data.

Research Questions

How does adding a custom int4 matrix-multiply instruction change inference speed for keyword spotting?
What is the effect of the custom instruction on FPGA resource use compared with a pure software implementation?
Does the custom instruction change keyword spotting accuracy when the same model weights are used?
To what extent does int4 quantization affect speed and accuracy across different keyword sets?
Which benchmark input sizes show the largest speedup from the custom instruction?
How does the custom instruction affect energy use per inference compared with the baseline?

Basic Materials

Tang Nano FPGA board with USB cable.
Computer with Linux or Windows for HDL synthesis and flashing.
Open-source RISC-V soft-core source code.
FPGA design tools that support the Tang Nano board.
Keyword spotting dataset or recorded voice clips.
Python installed with data analysis libraries.
Logic analyzer or serial monitor for timing checks.
Spreadsheet software for recording benchmark results.

Advanced Materials

FPGA board with power measurement access or external power meter.
Oscilloscope or power analyzer for current profiling.
Open-source RISC-V core and custom instruction toolchain.
Verilator or another simulator for cycle-level testing.
Speech keyword spotting model converted to int4 weights.
Test harness for automated benchmark runs.
ImageJ or similar tool for waveform inspection if needed.
Git for version control and experiment tracking.

Software & Tools

Python: Runs benchmark scripts, organizes timing data, and plots speed and accuracy comparisons.
NumPy: Handles arrays for performance calculations and summary statistics.
Pandas: Stores results from many test runs in a clean table.
Matplotlib: Makes charts that compare the baseline and the custom instruction.
Verilator: Simulates the processor design before you flash hardware.

Experiment Steps

Define the exact performance claim you want to test, such as speed, energy, or resource use.
Choose one baseline processor build and one modified build so the comparison stays fair.
Plan a keyword spotting benchmark that uses the same model, the same inputs, and the same output metric for both builds.
Decide how you will measure runtime, accuracy, and hardware cost, then write down the measurement method before testing.
Build a control set that checks whether any gain comes from the new instruction, not from unrelated compiler or code changes.
Map out how you will repeat runs, summarize variation, and compare results with simple statistics.

Common Pitfalls

Changing the benchmark code between the baseline and the modified core, which makes the speed comparison unfair.
Using a model that is not truly int4 quantized, which hides whether the custom instruction helps the intended workload.
Measuring only wall-clock time from the host computer instead of cycle counts from the FPGA, which blurs the processor-level result.
Ignoring synthesis reports, which leaves you blind to whether the custom instruction costs too many logic resources.
Testing on one tiny sample set, which makes keyword spotting accuracy look better or worse than it really is.

What Makes This Competitive

A stronger project shows more than a speedup chart. You would compare several model sizes, report hardware cost, and separate compute gains from memory or bus effects. You would also use repeated trials and a fair baseline so your result stands up to scrutiny. A project gets even stronger if you explain why the custom instruction helps and where it stops helping.

Project Variations

Test the same custom instruction on a different edge AI task, such as gesture or sensor classification.
Compare int4, int8, and float implementations to see how quantization changes speed and accuracy.
Measure energy per inference on the FPGA instead of, or in addition to, raw runtime.

Learn More

The RISC-V Instruction Set Manual: Find the official base and custom instruction rules on the RISC-V International site.
MIT OpenCourseWare Digital Systems Laboratory: Search MIT OpenCourseWare for FPGA and processor design labs that explain hardware testing.
NVIDIA Deep Learning Institute papers and open talks are not needed here, so look instead at review articles in IEEE Xplore through your school library for quantized neural network hardware.
NIH PubMed: Search PubMed for review articles on keyword spotting, edge AI, and low-power inference.
NASA Technology Reports Server: Search NTRS for embedded machine learning hardware reports and benchmarking methods.
Open-source FPGA documentation for Tang Nano: Find board guides, pinouts, and tool setup notes on the vendor and community documentation pages.

Embedded Systems Category Guide

How to Do Real Embedded Systems Research at Home: A High School Student’s Guide to Free Tools, Affordable Kits, and Public Datasets →

For next steps tailored to your interests, skill level, and timeline, work one-on-one with a MehtA+ mentor. Learn more about MehtA+ Science & Engineering Research Mentorship →

To discover more projects, visit the MehtA+ Science Fair Project Discovery Hub →