Alexander Lerch at desk with computer monitors

Audio Content Analysis Teaches Computers to Understand Music

Audio Content Analysis Teaches Computers to Understand Music

Wes McRae | Feb 28, 2023 — Atlanta, GA

School of Music Associate Professor Alexander Lerch has expanded his book, An Introduction to Audio Content Analysis: Music Information Retrieval Tasks and Applications, with code examples and visualizations of contemporary problems in the field.

Teaching computers to understand audio requires audio content analysis, Lerch said. His book focuses on music, rather than other types of audio.

“When we listen to music, we extract a lot of information. We can hear what instruments are playing, what the tempo is, and what the musical key is. We can focus on specific instruments we can listen to the melody line, the vocals, the drums,” he said. “All these aspects are the content in an audio signal. And this book is about how we automatically extract this kind of content from this raw audio signal.”

For this second edition, Lerch created a learning experience beyond the textbook. “I added code examples to the book, and all the visualizations are online as well. The accompanying materials include slides, videos, and code repositories in MATLAB, Python, and C++.”

“For every illustration in the book, you can see the link to the code generating exactly that illustration,” he said.  “You can see how they are generated, so you not only have the visualization itself and the description in the book text, but also the generating code, complemented by baseline implementations in three languages."

"By offering and integrating these multiple modalities, I hope to support different learning strategies and facilitate a customizable learning environment.”

Audio Analysis Changes Everyone's Listening Experience

Audio content analysis affects people who listen to music every day, Lerch said.

“One of the most obvious use cases is music recommendation. A good content recommendation system needs to understand how preference maps to musical parameters. What is the tempo, what is the key, is it a mixture of styles or a straightforward rock style?”

More intelligent, interactive listening applications are also possible. “I’m not saying this will happen, but I can imagine an interactive listening environment in which you can just say, ‘Hey, make the vocals louder’ or maybe 'I’d like the hear the same song but replace the singer with Tom Waits'.”

All music production will be impacted by music analysis systems, Lerch said. “For example, instead of a sound engineer setting everything up from scratch, they could have an analysis system that produces a default mix based on what the audio tracks contain. The sound engineer could save a lot of time by only having to fine-tune this automatically suggested mix.”

Machine Learning for Music Is a Tough Problem

Algorithmic analysis of musical audio signals uses methods driven by artificial intelligence (AI).

Definitions of AI change over time, Lerch said. “The approaches in this book can be referred to as AI because because a system understanding content is perceived as being intelligent.”

Analysis systems classify and categorize characteristics of music. This is an important distinction from generative systems like ChatGPT, Lerch said.

"ChatGPT is a specific flavor of machine learning, a generative system. A generative system is trained on a lot of data, then you trigger this generator with specific variables, then you get something that is synthesized from the training data but not a copy.”

Lerch's book focuses on signal analysis and understanding. “You could argue that in order to get a generative system, you need to have this analysis functionality available."

“Just throwing data at something doesn’t automatically make it learn. You have to make it learn the right things. For music, there’s often not enough training data.”

“We have a lot of music, but let’s take the example of Vedant Kalbag’s project, scream detection in heavy metal music. We cannot tell a classifier,  ‘Here’s some heavy metal, figure it out.’” Instead, somebody (like Master of Science in Music Technology student Kalbag) has to sit down with a piece of music and mark from beginning to end when exactly what type of scream occurs, he said.

Creating analysis systems for music poses unique challenges especially with respect to data availability, Lerch said, and part of his research tackles this topic. His goal is to “build systems that are still learning what’s relevant, but with a much, much smaller subset of data.”

Questions?

 
If you can't find the information you were looking for, we'll get you to the right place.
Contact Us