Student researcher, Vedant Kalbag, in front of display with digital sound processing.

Algorithms Learn Heavy Metal
Thanks to Student's Innovative Research

Algorithms Learn Heavy Metal
Thanks to Student's Innovative Research

Despite being a well-established subgenre of rock music, heavy metal’s structure and rich vigorous vocals are still misunderstood — both by listeners and by algorithms that classify and optimize our music taste. However, those days are soon to be over thanks to research by music technology student, Vedant Kalbag.

Kalbag grew up in India where music culture is rich, but his ears were most drawn to global heavy metal sounds that have always fascinated him. “Heavy metal is actually what got me really interested in music," he said. "I play the drums and I was always looking for something more challenging when I was discovering music."

 

Kalbag saw an opportunity to contribute to music research and his interests in heavy metal by creating a heavy metal screams and vocals dataset. “The reason this project began is that I realized there wasn't much research done with vocals in general for heavy metal," he said. "I wanted to do an analysis of how vocal techniques being used have changed over the years, which would require an annotated data set. Since I couldn't find anything that existed, I decided to create it."
 

He also wanted to increase the representation that the heavy metal community has in music information retrieval (MIR). This is an important and difficult task, as there are many different types of scream and growl vocals that are often buried in the mix underneath the distorted guitars.

Kalbag’s innovative dataset and research train a machine learning algorithm to classify vocal techniques used in heavy metal music. The algorithm looks for patterns in the features that represent each piece of audio. "So when I feed a new song to this algorithm, it spits out a prediction of what type of scream it thinks is,” he said.

"We tried different machine learning models and in the end, we ended up with a result of about 87% accuracy.”

Kalbag’s research has received support through his studies as a Master of Science in Music Technology student, and as a member of Music Informatics Group, a Center for Music Technology research lab led by music technology professor, Alexander Lerch.

In the summer of 2022, Kalbag and Lerch presented their paper Scream Detection in Heavy Metal Music at the Sound and Music Computing Conference in Saint-Étienne, France. Kalbag was happy that his hard work and contribution to the field were positively received.

”It's not every day that I get to play examples of heavy metal screams to a crowd of 200 academics and have everybody cheer it on. That was a really, really nice moment, and it was really well received because it is novel in its own right,” he said.

Screams, Growls, and Algorithms

Student researcher, Vedant Kalbag, using sound visualizing software.

In order to classify music using machine learning, the algorithm must train on a large enough dataset made of annotated audio.

The annotations contain the start and end type of every vocal event, along with the type of scream present, that can then be used to train the machine learning models. Training the models to see patterns in the data lets them predict and accurately classify music for listeners.

“To be able to identify the differences you need annotations,” Kalbag said. “That's the only way you'll be able to make a prediction on unseen data.”

When no dataset exists, one must be made. Kalbag is among the first to annotate a heavy metal music dataset manually, and being among the first came with its challenges. “It took me about three weeks of nonstop heavy metal music listening to create the annotations that make up the dataset,” Kalbag said. “So the overall dataset has about 280 minutes of annotated audio.”

“As much as I love listening to this stuff, towards the end of the three weeks, even I was getting a bit tired of it.”

Kalbag’s work not only builds on past terminology that defines three specific types of vocals in heavy metal — fry screams, growls, and rough vocals — but also supports future scholars.  “I did also create a series of benchmark systems for anybody who uses this dataset in the future to use as a reference,” he said.

The research ultimately focuses on fry screams, since they are the most present kind in modern music, and annotates through differentiation in low, mid, and high fry screams, as well as clean and layered vocals.

Below are a few of the audio samples used in Kalbag's research. The audio includes clean vocals (no screaming or distortion) and multiple pitches of screaming. Kalbag was able to train his algorithm to detect these different types of metal screams in the music.

 

Clean Vocals

Low Fry Scream

Mid Fry Scream

High Fry Scream


The identification of the different types of vocal techniques in heavy metal music can inform genre classification systems and aid music recommendation systems based on preference for a specific vocal type by music listeners. This is exciting news to rock and heavy metal music fans, who may, sooner than later, have a better music user experience with more accurate music recommendations and new music discoveries.

Learn More & Connect: