ASMR and Digital Media:Case Study – DH187 Capstone: ASMR

We chose to analyze one of the most popular videos from ASMR creator Goodnight Moon, who has over 1 million subscribers on YouTube. The video features a classic roleplay ASMR experience with a unique fantasy twist, titled “Afternoon at the Herbologist’s Greenhouse (ASMR)“.

Digital Media’s Abilities for ASMR

Digital media became an accessible medium for ASMRtists to continuously experiment and refine their craft, such that they have developed techniques for finely manipulating pauses and volume. ASMR remains a relatively accessible art form, as ASMRtists can easily produce content with minimal equipment. This low barrier to entry has contributed to ASMR’s rapid growth and diversification in the digital space. However, as ASMR became more popular, many ASMRtists have went from using basic built-in laptop microphones to investing in high-sensitivity, low self-noise microphones with stereophonic capabilities that could catch more subtler sound events (e.g. fabric ruffling, breathing, crinkling), treating sound capture with the same level of sophistication as professional sound designers.

Intentional Silences in ASMR

One way digital media has shaped ASMR is allowing the ability to manipulate silences to engage in a one-way conversation with the listener to create a sense of intimacy and immersion. The absence of physical co-presence in ASMR enables listeners to focus and immerse themselves in the auditory experience without the need to navigate social interactions.

Figure 1. Full waveform of audio clip for silence analysis

We examined a 15-second clip for silences (see Fig.1), and identified a notable silence that starts at 6 seconds, where there is an intentional pause after the ASMRtist asks the listener a question. These intentional periods of silence between utterances is a common strategy ASMRtists employ, particularly in role-play videos, where the other half of the unspoken conversation from the viewer is simply left implied. This silence lasts for 11.15 seconds, during which small background noises are subtly present–such as the artist lightly smacking their lips and clicking a pen to simulate the act of “recording” the listener’s answer.

Waveform Analysis of Whispered Speak

Whispering, one of the most commonly reported audio triggers for ASMR, allows ASMRtists to intentionally control the pacing and volume of their vocal delivery to create a more intimate experience for the listener. We selected a 10-second clip (see Fig.2) from the most viewed part of the video, and observed some distinct ADSR (attack, decay, sustain, release) characteristics in their enunciation.

Figure 2. Full waveform of audio clip for ADSR analysis

Figure 3. Closer look at waveform for one word

Taking a look at the enunciation of a single word (see Fig.3), the attack phase is softer and more gradual than the typical harsher onsets of normal speech. The sustain phase exhibited subtle, rhythmic peaks with a soft incline, where the ASMRtist maintains a steady flow between words. The decay is smooth as sounds taper off naturally rather than cutting off abruptly, making the spoken word feel more organic. The release phase after each word appears to be more carefully controlled with each sound fading into silence–except for fricative sounds, such as “s” and “f” phonemes, which have sharper attacks and quicker releases.

Figure 4. Frequency analysis of audio clip

Audiology research on ASMR has shown that human-generated sounds that are less spatially diffuse and perceived as closer to the listener, indicated by a lower inter-aural cross-correlation coefficient (IACC), are more likely to evoke ASMR sensations. The frequency analysis of the recorded ASMR audio reveals a notable presence of higher frequencies (see Fig.4), suggesting that the microphone is capturing more subtle noises, such as breathy whispers and lip smacks. This indicates that the ASMRtist is positioned close to the mic, allowing for a more intimate auditory experience. Through this vocal delivery, the combination of whisper tones with deliberate pacing between words, the listener’s feels closer to the speaker and this increases the likelihood of experiencing ASMR.

Digital Media’s Constraints on ASMR

Although digital media has allowed for ASMR to evolve by leaps and bounds, no technology is without its constraints.

One digital constraint ASMR faces is its lack of accessibility for those with disabilities (i.e. those who are hard of hearing who may still want to watch ASMR videos for visual triggers accompanied by subtitles)—many ASMR videos either lack subtitles, or have automatically generated subtitles. We calculated the word error rate (WER) of a five-minute clip of the video (10:19-15:08), based on an AI-generated transcript and an authoritative, accurate transcript; the WER came out to be 34.79%. This high error rate in transcription could be attributed to the fact that YouTube compresses audio when a video is uploaded, which could result in degradation of audio quality and/or distortion of audio. This distortion not only may result in inaccurate transcriptions, but it may also result in an inadequate ASMR experience, since every little sound and silence matters.

Another potential reason for inaccurate transcriptions could be ASMR’s soft-spoken nature; ASMR is often whispered and potentially not enunciated well as it’s often accompanied by mouth sounds, tapping, and other sounds that may be difficult to transcribe. Furthermore, Automatic Speech Recognition (ASR) is trained with large datasets of audio recordings, but many of them are likely not whispered recordings, which means that ASR may not be well-equipped to transcribe ASMR videos. Ultimately, this high WER excludes a population of viewers that rely on subtitles to enjoy these videos.

Conclusion

ASMR’s emergence as an internet phenomenon can be attributed to the accessibility of digital media, which has allowed ASMRtists to experiment with sound design, refine techniques, and build global audiences with minimal production barriers. Through waveform and frequency analyses, we observed how ASMRtists manipulate silence, whispering, and sound design to create intimate and engaging content. However, despite its accessibility in terms of production, ASMR’s reliance on audio poses challenges for viewers with hearing impairments due to inaccurate transcriptions and YouTube’s audio compression. If we had more time, we would explore how ASMR interacts with accessibility. Further research could focus on developing more accurate transcription models tailored to whispered audio.

Our takeaway is that ASMR isn’t just an Internet trend — it’s a modern exemplification of old and new auditory traditions and is shaped by our current digital environment. ASMR reflects historical auditory traditions, such as whispered storytelling, vocal pacing, and the use of sound to create emotional or physiological responses—practices seen in the early years of radio dramas and television. These sounds, once used for storytelling and atmosphere, eventually became recognized for their relaxing and tingly effects, leading to the formation of an online ASMR community. As ASMR continues to become more popular, understanding its origins and constraints can help us appreciate how sound interacts with technology to shape our sensory experiences.