Sup and welcome to the site. I don't know what you mean by "converts easily to music" — there are examples in real life of spoken languages that are encoded as purely "musical" information in the form of
whistled languages. This is basically just a lossy transmission mode for the underlying, normal spoken language though. Conlangs based in some way around purely melodic information are also a thing, the most well-known one being
Solresol (which is however not a great example to emulate).
I wouldn't recommend the latter option, though, unless the aesthetics of your setting absolutely require it. Common consensus here is that even "ordinary" spoken languages are extremely hard to create, and if you use a medium with which you are unfamiliar, it becomes vastly more difficult to make something sensible (although I am assuming your players would also be fine with something not sensible, so that may not be a problem for you).