Assistant Professor

Speech and Hearing Science
Electrical, Computer, and Energy Engineering
Arizona State University
COOR 3472
(480) 727 - 6455
visar ((at)) asu ((dot)) edu

Non-parametric estimates of fundamental information-theoretic quantities

Information-theory (IT) is widely useful in a number of statistical signal processing applications; however estimating fundamental quantities in IT, such as entropy, divergence, or the Fisher information, requires complete knowledge of the underlying data distributions. We have been developing non-parametric data-driven methods that aim to directly estimate these quantities without requiring distribution estimation or integration.

Relevant Publications:
  • Wisler, A., Berisha, V., Spanias, A., & Hero, A. O. (2018). Direct estimation of density functionals using a polynomial basis. IEEE Transactions on Signal Processing, 66(3), 558-572.
  • Kadambi, P., Wisler, A., Berisha, V. (2017) Improved Finite-Sample Estimate of a Nonparametric f-Divergence.In Proc of Asilomar Conference on Signals, Systems, and Computers. IEEE.
  • Berisha, V., Wisler, A., Hero, A. O., & Spanias, A. (2016). Empirically estimable classification bounds based on a nonparametric divergence measure. IEEE Transactions on Signal Processing, 64(3), 580-591.
  • Jiao, Y., Berisha, V., Liss, J., Hsu, S. C., Levy, E., & McAuliffe, M. (2016). Articulation entropy: An unsupervised measure of articulatory precision. IEEE Signal Processing Letters.
  • Berisha, V., & Hero, A. O. (2015). Empirical non-parametric estimation of the Fisher Information. IEEE Signal Processing Letters, 22(7), 988-992.

Tracking neurological health through speech and language

Neurological disorders or traumatic brain injury may disturb an individual’s speech and language abilities well before such changes are perceptually detectable. For example, Parkinson’s Disease can result in speaking rate changes, reduced intonation, imprecise articulation, etc.; Alzheimer’s Disease can result in longer pauses during speech, reduced vocabulary, reduced language complexity, etc. The goal of this project is to develop signal processing and machine learning technology to detect subtle speech and language changes and to integrate these algorithms in devices for early detection, real-time symptom tracking, and intervention monitoring.

Relevant Publications:
  • Schwedt, T., Peplinski, J., Berisha, V. (in press). Altered speech during migraine attacks: A prospective, longitudinal study of episodic migraine without aura. Cephalalgia.
  • Rutkove, S., Qi, K., Shelton, K., Liss, J., Berisha, V., Shefner, J. (in press) ALS longitudinal studies with frequent data collection at home: study design and baseline data. Amyotrophic Lateral Sclerosis and Frontotemporal Degeneration.
  • Song, H., Willi, M., Thuagarajan, J., Berisha, V., and Spanias, A. (2018) Triplet network with attention for speaker diarization. In Proceedings of 2018 Interspeech Conference.
  • Jiao, Y., Berisha, V., Liss, J., Hsu, S. C., Levy, E., & McAuliffe, M. (2016). Articulation entropy: An unsupervised measure of articulatory precision. IEEE Signal Processing Letters.
  • Berisha, V., Wang, S., LaCross, A., Liss, J., Garcia-Filion, P. (2017). Longitudinal changes in linguistic complexity among professional football players. Brain and language, 169, 57-63.
  • *Jiao, Y., Berisha, V., *Tu, M., & Liss, J. (2015). Convex weighting criteria for speaking rate estimation. IEEE/ACM transactions on audio, speech, and language processing, 23(9), 1421-1430.

Efficient Models for Computing Loudness

Reliably estimating loudness requires employing elaborate models associated with a high computational complexity, often not suitable for real-time applications. In this project, we developed, and implemented on mobile devices, efficient algorithms for estimating loudness. In particular, we propose a number of fast algorithms for estimating excitation patterns, specific loudness patterns, and total loudness. The computational efficiency of the existing standard (ANSI, S3.4-2005) for estimating loudness is greatly improved while the fidelity of the estimates is largely unaffected.

Relevant Publication:
  • H. Krishnamoorthi, V. Berisha and A. Spanias, ``A Frequency/Detector Pruning Approach for Loudness," IEEE Signal Processing Letters. June 2009.

Speech/Audio Compression Based on Loudness Criteria

Based on several psychoacoustic principles, a number of different computational auditory models have been developed over the years to mimic aspects of the human auditory system. Embedding these models within existing audio compression algorithms (e.g. MP-3) has led to significant increases in coding efficiency. In this project, we consider an alternative psychoacoustic model based on loudness for inclusion in speech/audio codecs. Loudness is a subjective phenomenon which represents the magnitude of perceived intensity, i.e., it is a measure of the magnitude of neural activity that corresponds to the hearing sensations. When embedded in an existing compression algorithm, results reveal that the proposed system improves the quality of narrowband speech while performing at a lower bitrate. When compared to other wideband speech coding schemes, the proposed algorithms provide comparable speech quality at a lower bitrate.

Relevant Publication:
  • V. Berisha and A. Spanias, ``Wideband Speech Recovery Using Psychoacoustic Criteria," EURASIP Journal on Audio, Speech, and Music Processing, 2007.