top of page

The Thrilling Tension Between AI and Musical Creativity

  • Writer: Eric Doades
    Eric Doades
  • 17m
  • 22 min read

AI models can truly capture the essence of musical creativity. (Wait. Really?). Join us as Tristra interviews Dr. Christopher White from UMass Amherst about his new book The AI Music Problem: Why Machine Learning Conflicts with Musical Creativity. The two of them get into a really interesting question – Is AI redefining music, or is music reshaping AI? It’s a great conversion about the future of music creation.




The News:

 

 

 

 


Listen wherever you pod your casts:


ree

Looking for Rock Paper Scanner, the newsletter of music tech news curated by the Rock Paper Scissors PR team? Subscribe here to get it in your inbox every Friday!


Join the Music Tectonics team and top music innovators by the beach for the best music tech event of the year:

6th Annual Music Tectonics Conference October 22-24, 2024 Santa Monica, California


Episode Transcript

Machine transcribed


Tristra: [00:00:00] Hey everybody and welcome to Music Tectonics. This is Tri New Year, Jager, chief Strategy, scissors, the PR agency focused on music innovation. Today, I get to have a lot of fun. I don't usually get to do these interviews, but I have a really wonderful scholar who's joining us today, and I think you're gonna find this conversation exceedingly thought provoking.


First, let me introduce him. I'm talking today with Dr. Christopher White, who is Associate Professor of Music Theory at UMass Amherst. Chris's research uses big data techniques to study how we hear and write music, which is the subject of his. Fairly recent book, the Music in the Data, that's from 2022.


His articles have appeared in a multitude of academic journals as well as in Slate, the Daily Beast and the Chicago Tribune among other places. Chris is also an avid organist. Like many people who listen to this podcast, he has, he plays two roles as a musical [00:01:00] creative person and. Deep Thinker. Another fun fact as a member of the three Penny Chorus and Orchestra, he appeared on the Today Show on NBC and was a quarterfinalist on America's Got Talent.


However, I brought him here today not to talk about those experiences as exciting as they may be, but to discuss his most recent book. Which is what really intrigued me and made me wanna invite him on the podcast, the AI music Problem. Why Machine Learning Conflicts with Musical Creativity was published very recently by Rutledge, just actually this June.


The book probes, and this is a quote from the description probes, the challenges behind AI generated music enabling readers of. All backgrounds to understand how contemporary AI models work and why music is often a mismatch for those processes. And we'll get into what we mean by a problem and where that mismatch lies, which actually might be a fun challenge for some of our music tech friends out there.


So, it's that purported gap that feels extremely [00:02:00] important to explore right now, the gap between where music, how music works, and how AI works. So to explore this, we're zooming out a bit today and diving into how music and AI intertwine and interact, as well as into AI's potential impacts, positive, negative, and indifferent on music creation and listening, the beating heart of all music tech.


Chris, thank you so much for making time to join us today.


Chris: No, you're very welcome. It's a pleasure.


Tristra: Awesome. And, just so you all know, we're in the middle of a heat wave, everybody here. So, if we sound a little bit langu, it could be because of that. But before we dive to ai, I would love Chris, and this is like really mean, this is like the meanest question to ask any academic.


Can you give us a quick and dirty understanding of what exactly encompasses music theory? So what are we talking about when we talk about music theory?


Chris: No, I love that you said it's a beam question because you know, who's the best person at answering this is the spouse of a music theorist that, my husband gives such a good rundown of what music [00:03:00] theory is way better than me. All right. So music theory is to musicians, like English class is to a high school student where, you're learning like the inner workings and the stuff that's going on inside of music.


So like in English class, you learn what makes up a sonnet and what the difference is between like an Italian sonnet versus a English sonnet. And what makes a Shakespearean sonnet. It like particularly Shakespearean and what's surprising and what is expected, um, what makes something funny.


And it's the same thing in music theory where you might take a, a Wolfgang Moza Sonata. And try to figure out what would've been expected in the 1790s versus what is innovative in the 1790s. When Mozart writes a phrase, what kinds of structures is he using? What kinds of chords is he using?


Is this chord expected or is this chord not expected? And if you wanted to write that kind of chord, what notes would you put in that chord? And then the same thing in various different styles. Like [00:04:00] a theorist of popular music would be talking about the same kinds of things in the Beatles.


A theorist of Indian raagas would be telling you about what's going on in that style. So music theories, it's like getting into the building blocks and expectations and norms of each musical style.


Tristra: And some of these things are explicit knowledge, right? Like musicians and composers or improvisers use them explicitly and are aware of them, and then some, right? Are implicit like we can all feel when something modulates in a pop song, from an English speaking country, often there is an emotional association that most people have, but they may not be aware of how that mechanism triggers their emotions.


So do you explore both sides of that, or is it mostly the explicit side?


Chris: No, that's right. So they're sort of happening in parallel. 'Cause after you've listened to a bunch of music , you've internalized implicitly, as you say, a lot of these expectations. We know when you get that modulation at the end of.


A two thousands era pop tune, that the energy goes up. You know that feeling and it's because you've listened to a lot of [00:05:00] that. You get that because you've listened to 11, right? Yeah.


Tristra: looking at the man in the mirror, like you're


You're doing it again. Yeah.


Chris: yeah. And then when Beyonce in Love on Top does it like four times, you're like, oh my gosh, this is taking this norm and stacking it on top of each other.


So a lot of it is making the implicit explicit so that you can be thoughtful about it so that you can, name something, which as a performer, that's really important, that when you're playing a piece of music, you can name what's going on, you know exactly what's going on.


And then, there's also this. Explicit side where a lot of us didn't grow up listening to Mozart. And so if you want to be a thoughtful listener to somebody like Mozart, you have to study that from the outside in. And so it's so you don't have, I didn't go into music school having a lot of implicit.


Knowledge about Mozart phrases that had to be taught to me. . So we're doing both of these at the same time. Giving outside knowledge in and then trying to bring [00:06:00] internal knowledge out.


Tristra: So how does music theory relate to ai? From what you were just saying, Chris, it feels like there's a lot of potential, conversation between how we humans theorize music, and how we might, show a system how to, parse musical expressions or musical structures.


Chris: Yeah, anytime that somebody is taking a piece of music and slicing it up for a computer to learn components of it, to me that's music theory. Because you're saying what is, what makes, what are the building blocks of this song that I can then give to a computer? Then conversely, anytime that a music theorist goes into a song and says, Hey, what are the component parts of this?


It feels to me like they're basically doing computer science, but without a computer. Because you are, you're taking each little chunk of it and you're basically like treating it like you're in a DAW basic, right? Because you've got these little chunks that you're being like. That is located right there and it's called this thing.


And so there's a very computational [00:07:00] thinking, underlying music theory, and I think there's a very music theory knowledge that underlies music computation.


Tristra: That is so cool. So since these things are kind of harmonious, but you propose a really interesting question, somewhat provocative in the very title of your latest book, which is the AI music problem. So just for the listeners out there who don't hang around with scholars very much a problem is not necessarily the same thing I would say, as in everyday parlance, but I'll let you answer that.


So when you say there's an AI music problem, what do you mean?


Chris: Okay. What I mean is that, music is this, big cloud of a bunch of different kinds of things and social practices and styles and ways that we use the same object very differently. And sometimes that works with AI and sometimes it doesn't it doesn't map one-to-one.


When AI tries to wrap its little mechanical hands around music, it only gets a little bit of it, and a lot of it falls through its [00:08:00] fingers. And so we're talking about, what I'm trying to talk about in the book is parts of it that it can get and parts of it that are falling through its fingers.


Tristra: And we'll talk about both of those parts. But I, so in some ways when we say a problem, it's not that AI music is a problem or that AI is a problem, or that music and AI are incompatible, it's just like there is a challenge, a question, a moment of tension. And I just wanna explain that to. Our listeners because when we in academia, when we talk about a problem, it's not necessarily like you and I have a problem.


Right. It's not like the sort of, difficult impasse. It's more of a moment of tension, complexity, and need for more understanding.


Chris: Yeah. No, that's a really good way of putting it. Yeah, it's basically saying that it's not easy and what about it is not easy?


Tristra: You mentioned that there are some ways that AI and music play really well together, and so I was wondering if we could explore some of those.


What areas of music do you feel lend themselves. Best to AI powered analysis or emulation and where might we see some real innovation? [00:09:00] What's aspects of the harmony between AI and music do you find most potentially fertile or exciting?


Chris: So you just said talk, talked about getting music into a probabilistic system, right? It could be and be because what's underlying most contemporary AI is. As I'm sure most of your listeners know, like this, a probabilistic model that when you're at this, point A, are you gonna go to point B, C, D?


Or e And now that you have like point A and C chunk together, what's after that? And you're doing this from a probabilistic, using probabilistic methods, not deterministic methods. In a way, music is exactly that. Music composition is a probability game. If you're trying to write a melody like Amazing Grace, so let's say you're at ba, ba, ba, ba, ba, ba.


Knowing what happens next is a question of probabilities. Da.


Those are all possibilities [00:10:00] and they're all similarly probable. And the reason that they work is because they're probable. Now there's other things which are less probable. Ba, ba it also works right? But it's less probable. And so like this idea of music being, these strings of probabilities makes it really align very well with, with probabilistic systems.


In fact, music in a lot of ways works better than text because text can be wrong in a lot of ways. Text is determined, can be deterministic. If I say the cat is on the table, I didn't use the word table just because it often follows the cat is on the. It's probably because there's a cat on the table.


Right? And if there's not a cat on the table, that's wrong. So my sentence was determined by the cat being there, right by the meaning that I'm trying to convey. There's no wrong answer in music. If I go ba, ba, ba, ba, ba, ba, ba, that's not wrong because it's less probable, right? And so there's no way.


If you just use [00:11:00] probabilities, you're always going to get usable music in a way that if you use just probabilities in text, you might be like those famous stories of lawyers using chat GBT to write their law briefs. And then they get a, they cite a case that didn't exist, that chat does it because


it's writing a probabilistic document. It's like, well, if I wrote a document like this, it's probably gonna have a case like this in it. And then it's wrong because that case didn't exist. But if you write music like that, it's fine. It's still music. It still sounds good.


Tristra: No judge is gonna, discipline you for a really quirky, flat note after that. Lovely, arpeggio, that's all major. Yeah.


Chris: Yeah, exactly right. And so I think what technically works well is that it's probabilistic. The other thing that I was thinking about is that music is always been, media is always mediated by some sort of technology. And so the idea of adding more technology onto music makes a lot of intuitive sense , to musicians,


of course, computers and digital music, that's one thing. But even the [00:12:00] clarinet is a piece of technology, right? You're mediating your voice, your breath, using a piece of technology. And so the idea of adding more technology onto music, it's a very intuitive thing for a musician to just adopt more and more technology, which it might not be the case for other,


Tristra: Other art forms.


Chris: yeah.


Right.


Tristra: Yeah. So if I'm an, if I am a painter right? And I start putting other layers of technology on my, oil canvas, I may not be a painter anymore. I may be some other kind of visual artist, right? It, can change the very definition of what you're doing based on the medium.


So that's really interesting.


Chris: And like poetry, how would you layer technology onto po? I mean, it would become something different, whereas layering technology onto music. You still have music?


Tristra: Where do music and AI diverge? How can we talk in concrete terms about these areas where music and AI are not in sync?


Chris: Yeah, and this is what my book focuses on. And there's five areas that I focus on. I focus on, the [00:13:00] motivations behind making, musical ais, the data sets that we have to get, how we represent it to a computer, the structure of music that we are trying to find, and then how we interpret it.


So dear listeners, RA and I were at the same conference over the summer, and I heard her talk and I wanted to introduce myself to her because you said something that like really, resonated with me and I don't remember what it exactly was responding to, but somebody was like, how do I use AI technology to


make the next big thing, reach a wide audience or something like that. And your response was, well, don't make it about music or something like that because music is such a small fraction of the AI conversation and it's because it's a small fraction of the commercial, conversation. Music punches above its weight in terms of our cultural, discussion, but I spend more on coffee than I do on music, and I suspect that's true of most people in


Tristra: Yeah. That's why, [00:14:00] people are talking about super fans and things like that, but that's way off topic. Yeah. But I, you definitely have a point that


Chris: Right. It's a smaller, the amount of money that goes into music AI research, it's dwarfed by almost every other, and it's because music AI doesn't have a lot of, revenue streams.


Tristra: Yeah. Yeah. There's very few businesses that are like, for our hospital to function, we must have, generative AI music.


Chris: That's right. Yeah. And then the next problem is that, so when you talk about, when Sam Altman comes out and talks about, how he's gonna make chat, GPT more powerful, it's often about, increasing computing, power making, more complex models or more responsive models. And bigger data sets.


We need bigger data sets. We need like all language that everybody has ever created to make chat. GBT as good as it can be. And let's think about the number of people that create music compared to the number of people that create text and language. It's already a fraction. You and I create music, but.


[00:15:00] How much music do we actually create in a day? It's also a fraction of the number, of the amount of text that we create in a day. And so, how much musical data out there is a fraction of a fraction of what visual and text data can rely on. Visual media have way more, to rely on than musical data sets.


And so we just have less data to work with. It's because fewer people in the world make music than make text and other things.


Tristra: And you can imagine if we could have, say, this is in some weird, crazy alternate universe, if we could have recorded all of the folk performances around the world starting at like the 18th century, right? Or before, you know, since it's completely made up, then might, we might actually have a body of data that rivals what we can, what you can be scraped in terms of text from the internet or photos from the internet.


Chris: Yeah. 100% and then we'd still have the problem of actually getting the usable data out of those recordings. Right. And that is [00:16:00] no small feat either, because if you think about text, written text, it is, like a black letter on a white page and you could just pull that piece of information out and give it to a computer


audio signals are so messy, it's really, really hard to extract reliable, information from them. Even one, rip of a bow string is going to sound a little bit different than the next rip of a bow string in the way that two ts on a page are gonna look exactly the same, right?


Musical data is so variable and so messy compared to other. Other data too, that it's really hard to make a good computational data set from it.


Tristra: That's really interesting. And if you think too about everything from like room noise, you can have of course low contrast, for instance, when. Google scanned , all these books from libraries that were in the public domain. You sometimes see crazy artifacts and things like that, from, you know, just a yellowing paper and the text on there.


But music, it's like infinitely more complex, right? Because you could have that both, you could have the performance on a violin in a tiny [00:17:00] closet, in a giant hall, on a street corner. And all of that would to a human ear, we'd all detect, oh, that's the same sounding thing. But a computer is gonna be like, what guy?


What? You know, it's gonna have so much, noise to clean up to get the signal. This is an interesting problem. So what. If we have these problems, what are some potential ways forward?


How could we embrace more of music's aspects and still have it, function within an AI system? Or are there some things we should just not attempt in your mind? What do you think?


Chris: One of the big mismatches, I think is that music is a primarily a social art


Tristra: Mm-hmm.


Chris: And generative AI is going to be a non-social commodity. Right. But music and, and AI generated content are not totally un overlapping in their [00:18:00] uses.


Because there are certain, like musical. Situations that tend to function like, a commodity. What do we mean by commodity? We mean something that's bought and shared on the market that we don't care who made it. When you buy a trash can at Target, you don't care who made that.


But when you go onto Spotify, you're pro you might be caring whether Taylor Swift's actually saying that song or whether it's a cover by somebody else. And so there's something un commodity, like if you care about who made it and if you care about who made something, it's really hard to swap out generative AI for that, right?


And so that's an instance where they're not overlapping. However, a lot of people just turn on their Spotify to fall asleep at night. Right. Generative AI seems fine with that, right? It seems like a great use for that. You can make infinite amount of music, and when I'm asleep, I don't care who's playing music to me.


Right? That's, that seems like a good instance. [00:19:00] Background of a podcast introduction, music. It's not like, right? Like I, that's the kind of functional music where we're not caring who made it and we're not really paying too much attention to it.


That feels like that would be a good use for generative ai. It's when we don't care about who is making it. We're not using music as a social object to connect two people.


Tristra: I'm also curious to hear, as an educator and as someone who, plays music, regularly, how does AI strike you? And not just generative ai, 'cause there's a lot of other types of AI that can be employed in musical creation or even in music listening coming up now since, stem separation, you can have people remixing their favorite tracks, that kind of thing.


So I was curious if you had any thoughts about how, untrained people, so folks who have never had a lesson or don't consider themselves musicians, how are you imagining aI and music with these interesting mismatches and, harmonious moments, will this [00:20:00] change the game for people who had previously maybe not had access to, a one-on-one musical education or sort of the traditional means of learning music?


Chris: That's a really good question. I bet a bunch of things will happen that nobody can predict, but.


Tristra: Yeah.


Chris: When I ask my music education grad students about this, they are extremely skeptical that AI will have much of a role in explicit music education because when a middle schooler takes clarinet, they're not taking clarinet to sound good.


They're taking clarinet to like, right.


Tristra: Sure words were never


Chris: Yeah. They're taking it because it's fun. It's fun to play, it's fun to do it with your friends. Maybe your mom is making you do it or something like that, but there's not a on-ramp that makes it easier to play the clarinet or quicker to play the clarinet with generative AI because [00:21:00] it's such a physical, social thing.


But I suspect that there can be uses in non-music education context. I think it would be. Cool, for instance, to, you're teaching Shakespearean sonnets in English class and you have somebody, write their own sonnet and then you can use generative AI to set that sonnet to music.


That makes that assignment cooler and more interesting. And then you can talk about whether the generative ai, set it. Properly to your ex, your expressive expectations or whether it seems like a Shakespearean, like it's bringing out the same things that you would as analyzing your poem or something like that.


There's ways I think that generative AI can be used, but it's probably not gonna be used by amateurs to get into music. I mean, watch me be wrong about this, but


Tristra: We'll have you back on in a couple years and be like, see Chris.


Chris: [00:22:00] Right.


Tristra: But I think this also, is. A wonderful challenge that, someone who is interested in working with AI to create music tech could think about, well, how could that happen? Could there be an on-ramp? Maybe not for every instrument or every type of learning, but how, I think you're raising a great point, which is fun.


And social interaction and also the embodied nature of music. That's why music and dance go hand in hand so often. And AI could have interesting roles in all of those, but we're gonna have to figure those out and think about them pretty. Explicitly, right? So, so far it feels like we're only thinking about things in terms of the recorded music that's been so, dominant maybe in most people's lives when it comes to their encounters with music in the last what, 50 years,


but the problem that you're bringing up is it's like a math problem, and some of the solutions could be really interesting and could push people closer to music who [00:23:00] may not have had an entry point before.


Chris: Yeah, and I think music because it poses such a difficult problem for computational analysis. We might also be seeing folks from the computer side be more interested in this going forward. And. Concepts going forward, my impression is , on the AI side, there's a lot of focus on the audio signal.


Basically trying to treat chunks of audio as the thing that the computer's learning, sequences of.


Tristra: Yeah.


Chris: The problem with that is that then you get songs that sound very similar to other songs because you're just treating little bands of sound, as the things that you're reordering.


And so if you are training on the doors, then something's gonna basically sound like the doors 'cause it's using literal sound bands from the doors. But if. We're gonna try to make that better. We're probably going to be like what notes and what chords are inside of that.


And to me that's where really interesting conversations with music theory and with working [00:24:00] musicians can come into play. And so musical knowledge can potentially have a active role in, in making these AI systems actually work better by being like, okay, what chord are we trying to have, simulate right now?


What notes are in this chord? And that sort of pushes the ball back into, away from literal audio analysis into note analysis, which is more the domain of music theory.


Tristra: Oh, that's interesting. So in some ways we're talking about advancing some of these models into what is often called. And I have, I say this with scare quotes, but the more reasoning models, right? Where it's not just like there's a, you know, boom, boom, boom, boom, boom. I'm just lining up these probabilistic, options and more of like, there is an overlying meta understanding of, not just what is most likely to come next, but this is what a chord is and this is how, chords often come together. That is probably a pretty highfalutin concept for AI in its current state. But I think there's many folks [00:25:00] who would love to see, I don't know.


And maybe it's even multiple models working together. I know with a lot of the talk around agentic ai, there's this idea that you can, create. Guardrails for your system so that it's not just throwing out. Totally wild hallucinatory stuff though in music, like you said, that would be awesome.


Chris: It would still be fine, right? Yeah. Yeah. And I don't think it's impossible for, AI systems to learn some of those larger structures, and I think some of the, some of what's going on in like a Suno or a, or a UDO are learning some, there are aspects of those chords there.


Those harmonic, principles there, I think it would be easier to learn and you could learn more sophisticated things if instead of paying just attention to the audio signal, you also maybe tagged it or something like that with that this is what kind of cord it is or, but anyway, this is, sort of outside my area.


But, it seems like when we as humans talk about larger musical [00:26:00] structures, we end up relying on these music theory concepts. So it seems like it could be a way for these models to learn larger musical structures if within those models there were these music theory concepts.


Tristra: So it sounds like if you wanna build a really, amazingly nuanced, model, you should have a chief music officer,


Chris: I think, yeah, us.


Tristra: Amazing. So one more question before we wrap this up, and this has been so much fun. Thank you. I'm curious how this might change us as listeners, understanding more about how AI works, how music works.


Do you feel like there's a potential for, a change in the way we hear music or listen to music?


Chris: I think we're gonna start worrying about whether something was created by AI or not. And I think that that's gonna change. I think that's gonna pop up in the back of our minds. Often, I think there, there were seeds of this when you would hear like a trombone sample and be [00:27:00] like, I wonder whether that trombone sample was digital or,


or acoustic. Right? But it's a deeper question where you're like, am I listening to a flesh and blood human, or am I listening to an ai? And the,


Tristra: because a bot wrote a song,


Chris: and that's the thing that I think the first time that you feel an emotional connection to a bot, and then you realize that it was a bot.


I think you're gonna feel cheated. And I think because you want to feel a connection to, to other human beings with, when you're opting into a musical, experience. And I think that if you feel that you're being fooled.


Tristra: Mm-hmm.


Chris: Then I think after that, you're going to really want to make sure that there's a human on the other side of that music production.


And so I think one way that some listeners are really going to change in the next decade is that we're going to want an assurance that there was a human making this in order for us to bring [00:28:00] our emotional vulnerabilities to a song.


Tristra: That's a really interesting point. Or will we split into two groups. There's gonna be folks who've fallen in love with their AI boyfriend, girlfriend, other people who are like, I can create a pair of social relationship without knowing there is a human on the other end.


'cause I'm making it all up myself. It doesn't matter. And I think there are gonna be people for instance, if you're a fan of jazz, right? You're gonna wanna know that that solo kind of meant something, right? That the person, blowing really meant, they were in that moment, exploring the music and you're getting to witness that.


Like, that's kind of the whole point. So I'm sure it'll vary wildly, just like you mentioned before, genre wise. But, yeah. How are people gonna solve that equation for themselves, right?


Chris: The last time I said this in front of a group, somebody came up to me and was like, you know. I think that the younger generation already doesn't care about this and


Tristra: Yes and no. Yeah.


Chris: That's the thing. Yes and no. I, it's hard for me to imagine that if you are in an emotional state, you throw your headphones on, you [00:29:00] wanna listen to a sad song, you opt into listening to ai that feels like that's not what that musical situation is for. You want to feel like your emotions are part of a larger social community, it seems to me. But I think you're absolutely right that some people will like just opt into that parasocial ai boyfriend thing and not care.


Tristra: and if it works for them, it's good,


Chris: The other side of this is that, like in the 1880s,


edison comes out with his phonograph. Some of his funders are like, I just don't see this going anywhere because nobody's going to sit down and listen to music that they don't see being performed by another human right. Like the phonograph is never going to not be a, just a simple novelty


Tristra: Mm-hmm.


Chris: because nobody's going to


emotionally connect with a machine. But this is the way that most people consume their music. And so there's probably gonna be something like that [00:30:00] too, where the things that you and I can't imagine right now, a hundred years from now, 10 years from


Tristra: Mm-hmm.


Chris: It's gonna be more than norm than the out.


Tristra: Yeah. It's pretty, it's, that's amazing. That's a great example. Well, thank you so much Chris, for your time and for your thoughts. And, we'll make sure to put in the show notes how people can find out more about you and your work. I really appreciate the time though.


Chris: Thank you. This was a real pleasure.




Music Tectonics at NAMM 2024

Let us know what you think! Tweet @MusicTectonics, find us on LinkedIn, Facebook and Instagram, or connect with podcast host Dmitri Vietze on LinkedIn, Twitter, and Facebook.

The Music Tectonics podcast goes beneath the surface of the music industry to explore how technology is changing the way business gets done. Weekly episodes include interviews with music tech movers & shakers, deep dives into seismic shifts, and more.

bottom of page