Speech Synthesis and Perception with Envelope Cue

Overview

This project was completed for the Signals and Systems lab course. It implemented a Tone Vocoder — a system that decomposes speech into frequency sub-bands, extracts the amplitude envelope of each band, re-modulates the envelopes onto sinusoidal carriers, and resynthesizes the signal. This mimics the processing strategy used in cochlear implants, which must transmit speech with a very limited number of independent channels.

ToneVocoder console and spectrum output

Results

  • Increasing the number of frequency bands N consistently improved perceptual quality of the resynthesized speech.
  • Increasing the low-pass filter cutoff frequency improved envelope fidelity and naturalness.
  • Bionic cochlear segmentation (logarithmically spaced bands) outperformed equal-interval segmentation at low N (e.g., N=4), because the low-frequency range carries disproportionately more speech energy.
  • At large N, equal-interval segmentation achieved higher upper-bound quality, but cochlear segmentation became unstable at N≈20 (narrow passbands caused filter instability).
  • Added Speech-Shaped Noise (SSN) at varying SNRs and confirmed that envelope-based synthesis degrades gracefully but becomes unintelligible at low SNR.
  • Developed a full MATLAB App Designer GUI for real-time parameter exploration.

Technical Details

Tone Vocoder Pipeline:

  1. Band-pass filtering: Split the 200–7000 Hz speech spectrum into N sub-bands using Butterworth BPFs.
    • Mode 0: Equal-frequency spacing.
    • Mode 1: Cochlear-length mapping (f = 165.4 × (10^(0.06d) − 1)), producing logarithmically spaced bands that match basilar membrane resonance distribution.
  2. Envelope extraction: Full-wave rectification (abs) followed by a low-pass Butterworth filter (cutoff Cf Hz) to extract the amplitude envelope of each sub-band.
  3. Carrier modulation: Each envelope multiplied by a sinusoidal carrier at the sub-band midpoint frequency.
  4. Synthesis & normalization: Sum all modulated sub-bands; normalize energy to match the input signal level.

Advanced Extensions:

  • Carrier frequency variants: Tested geometric mean, harmonic mean, arithmetic mean, and square mean as alternatives to the midpoint frequency, examining effects on reconstruction fidelity.
  • SSN generation: Synthesized speech-shaped noise matching the input’s power spectral density using pwelch + fir2, added at a controlled SNR.
  • MATLAB App Designer console: Interactive GUI with sliders for band count (0–150) and LPF cutoff (0–200 Hz), BPF mode toggle, SSN on/off switch, and real-time waveform + spectrum display.

Challenges

  • Filter instability at high N: Narrow passbands caused Butterworth BPF coefficients to become numerically unstable; identified N≈20 as the practical upper bound for cochlear-mode segmentation.
  • Energy normalization: Without explicit normalization, synthesized speech energy varied significantly with N and Cf, making perceptual comparisons across conditions unreliable.
  • Code modularity: Refactored the pipeline into reusable functions (Envelope, getSSN, alter) shared across standalone scripts and the App Designer class, which required careful handling of MATLAB’s function scoping rules.

Reflection and Insights

This project made abstract signal-processing concepts tangible: the effect of filter bank design on speech quality can be heard directly, not just measured. The cochlear-inspired logarithmic spacing illustrates a broader principle — domain-specific knowledge (here, auditory neuroscience) often provides better engineering priors than uniform mathematical choices. The project also demonstrated that building an interactive parameter-exploration tool, even a simple slider-based GUI, dramatically accelerates the insight cycle compared to running scripts with hardcoded values.

Team and Role

  • Team: Two-person team.
  • My Role: Implemented the core Tone Vocoder pipeline; designed and built the MATLAB App Designer console; led the cochlear segmentation analysis and carrier frequency experiments.