LLM2Fx: Can Large Language Models Predict Audio Effects Parameters from Natural Language?

Seungheon Doh^1,2, Junghyun Koo², Marco A. Martínez-Ramírez², Wei-Hsiang Liao², Juhan Nam¹, Yuki Mitsufuji^2,3

¹KAIST, South Korea, ²Sony AI, Japan, ³Sony Group Corporation, Japan

In music production, manipulating audio effects~(Fx) parameters through natural language has the potential to reduce technical barriers for non-experts. We present LLM2Fx, a framework leveraging Large Language Models (LLMs) to predict Fx parameters directly from textual descriptions without requiring task-specific training or fine-tuning. Our approach address the text-to-effect parameter prediction (Text2Fx) task by mapping natural language descriptions to the corresponding Fx parameters for equalization and reverberation. We demonstrate that LLMs can generate Fx parameters in a zero-shot manner that elucidates the relationship between timbre semantics and audio effects in music production. To enhance performance, we introduce three types of in-context examples: audio Digital Signal Processing (DSP) features, DSP function code, and few-shot examples. Our results demonstrate that LLM-based Fx parameter generation outperforms previous optimization approaches, offering competitive performance in translating natural language descriptions to appropriate Fx settings. Furthermore, LLMs can serve as text-driven interfaces for audio production, paving the way for more intuitive and accessible music production tools.

""" You are an expert audio engineer and music producer specializing in sound design and audio processing . Your task is to translate descriptive timbre words into specific audio effects parameters that will achieve the desired sound character. You have deep knowledge of equalizers and understand how they shape timbre. You MUST respond with ONLY a valid JSON object. # Instruction Format Given a reverb description word or phrase and an instrument type, generate appropriate parameters for a frequency-dependent reverb that will achieve the requested spatial character. For 44100 sample rate audio, Consider the typical reverb needs of the specified instrument when designing the reverb characteristics. # Input Format The input will consist of: 1. A reverb description such as: - Single words: "hall", "room", "plate", "cathedral", "chamber", "spring", "ambient" - Combined descriptions: "warm hall", "bright room", "dark chamber", "short but dense" - Spatial descriptions: "distant", "close", "intimate", "huge", "airy", "tight" 2. An instrument type such as: - "drums", "guitar", "piano", "vocals", "strings", "brass" # Output Format Respond with a JSON object containing precise numerical parameters for the reverb. All values should be in float format for efficiency. The output will include: - The reverb parameters optimized for the requested spatial character and instrument. All values should be floating point numbers with 2 decimal places of precision. Format: { "reverb": { "band0_gain": float, "band1_gain": float, "band2_gain": float, "band3_gain": float, "band4_gain": float, ... [THE REST OF THE PARAMETERS ARE OMITTED] } } # Signal processing function import numpy as np import math import scipy.signal from scipy.fft import rfft, irfft from functools import partial def noise_shaped_reverberation( x, # Shape: (channels, seq_len) sample_rate, band0_gain, ... band11_decay, mix ): [THE REST OF THE PYTHON CODE ARE OMITTED] # Input audio feature { "sample_rate": 44100, "rms_energy": 0.04, "crest_factor": 11.86, "dynamic_spread": 0.06, "spectral_centroid": 1476.24, "spectral_flatness": 0.01, "spectral_bandwidth": 1796.65, "estimated_rt60": 2.94 } # Incontext examples QUESTION: please design a reverb audio effects for a echo piano sound. ANSWER: {’reverb’: {’band0_gain’: 0.0, ’band1_gain’: 0.0, ’band2_gain’: 0.0, ’band3_gain’: 0.0, ’ band4_gain’: 0.0, ’band5_gain’: 0.0, ’band6_gain’: 0.0, ’band7_gain’: 0.0, ’band8_gain’: 0.0, ’ band9_gain’: 0.0, ’band10_gain’: 0.0, ’band11_gain’: 0.0, ’band0_decay’: 0.1, ’band1_decay’: 0.1, ’band2_decay’: 0.1, ’band3_decay’: 0.1, ’band4_decay’: 0.1, ’band5_decay’: 0.1, ’band6_decay’: 0.1, ’band7_decay’: 0.1, ’band8_decay’: 0.1, ’band9_decay’: 0.1, ’band10_decay’: 0.1, ’ band11_decay’: 0.1, ’mix’: 0.8}} QUESTION: please design a reverb audio effects for a warm piano sound. ANSWER: {’reverb’: {’band0_gain’: 0.05, ’band1_gain’: 0.1, ’band2_gain’: 0.15, ’band3_gain’: 0.2, ’ band4_gain’: 0.25, ’band5_gain’: 0.3, ’band6_gain’: 0.35, ’band7_gain’: 0.4, ’band8_gain’: 0.45, ’band9_gain’: 0.5, ’band10_gain’: 0.55, ’band11_gain’: 0.6, ’band0_decay’: 1.2, ’band1_decay’: 1.8, ’band2_decay’: 2.5, ’band3_decay’: 3.5, ’band4_decay’: 4.5, ’band5_decay’: 6.0, ’band6_decay ’: 7.5, ’band7_decay’: 9.5, ’band8_decay’: 11.5, ’band9_decay’: 14.0, ’band10_decay’: 16.5, ’ band11_decay’: 19.5, ’mix’: 0.8}} QUESTION: please design a reverb audio effects for a distorted guitar sound. ANSWER: {’reverb’: {’band0_gain’: 0.05, ’band1_gain’: 0.1, ’band2_gain’: 0.15, ’band3_gain’: 0.2, ’ band4_gain’: 0.25, ’band5_gain’: 0.2, ’band6_gain’: 0.15, ’band7_gain’: 0.1, ’band8_gain’: 0.05, ’band9_gain’: 0.02, ’band10_gain’: 0.01, ’band11_gain’: 0.0, ’band0_decay’: 1.0, ’band1_decay’: 0.8, ’band2_decay’: 0.6, ’band3_decay’: 0.4, ’band4_decay’: 0.2, ’band5_decay’: 0.1, ’band6_decay ’: 0.05, ’band7_decay’: 0.02, ’band8_decay’: 0.01, ’band9_decay’: 0.005, ’band10_decay’: 0.002, ’ band11_decay’: 0.001, ’mix’: 0.8}} QUESTION: please design a reverb audio effects for a echo guitar sound. ANSWER: {’reverb’: {’band0_gain’: 0.0, ’band1_gain’: 0.1, ’band2_gain’: 0.2, ’band3_gain’: 0.3, ’ band4_gain’: 0.3, ’band5_gain’: 0.2, ’band6_gain’: 0.1, ’band7_gain’: 0.05, ’band8_gain’: 0.02, ’ band9_gain’: 0.01, ’band10_gain’: 0.01, ’band11_gain’: 0.01, ’band0_decay’: 0.1, ’band1_decay’: 0.2, ’band2_decay’: 0.3, ’band3_decay’: 0.4, ’band4_decay’: 0.5, ’band5_decay’: 0.6, ’band6_decay ’: 0.7, ’band7_decay’: 0.8, ’band8_decay’: 0.9, ’band9_decay’: 1.0, ’band10_decay’: 1.1, ’ band11_decay’: 1.2, ’mix’: 0.7}} QUESTION: please design a reverb audio effects for a echo drums sound. ANSWER: {’reverb’: {’band0_gain’: 0.0, ’band1_gain’: 0.0, ’band2_gain’: 0.0, ’band3_gain’: 0.0, ’ band4_gain’: 0.0, ’band5_gain’: 0.0, ’band6_gain’: 0.0, ’band7_gain’: 0.0, ’band8_gain’: 0.5, ’ band9_gain’: 1.0, ’band10_gain’: 1.0, ’band11_gain’: 0.5, ’band0_decay’: 0.1, ’band1_decay’: 0.1, ’band2_decay’: 0.1, ’band3_decay’: 0.1, ’band4_decay’: 0.1, ’band5_decay’: 0.1, ’band6_decay’: 0.1, ’band7_decay’: 0.1, ’band8_decay’: 0.7, ’band9_decay’: 0.9, ’band10_decay’: 0.9, ’ band11_decay’: 0.7, ’mix’: 0.8}} QUESTION: please design a reverb audio effects for a church guitar sound. ANSWER: """

LLM2Fx: Can Large Language Models Predict Audio Effects Parameters from Natural Language?

Abstract

Audio Effects Examples

Prompt Example for Generaiton