Learning by listening
Why you should learn a language through listening rather than by reading a textbook.
Here, I am going to explain why you should learn a language mostly through listening rather than through written text. I will also dive deeper and demonstrate how too much emphasis on written text can directly prevent you from achieving true mastery of the language. In other words:
Attempting to learn a language exclusively from books can interfere with natural acquisition.
Key points:
If we begin our “language learning journey” with written language, we are forcing the brain to store the language in a part of the brain called “the visual cortex.” We are essentially learning to recognize “drawings” of words.
Language, however, belongs in the auditory centers of the brain.
When you learn through listening, you activate deeper and older systems in the brain that are specialized in detecting nuances, pitch, and rhythm.
Visual Memory vs. Auditory Memory
Speech and sound are the primary forms of human communication. They have been with us for hundreds of thousands of years, while the world’s oldest written language is just over 5,000 years old.
Although 5,000 years sounds like a very long time, our species, Homo sapiens, is about 100,000–300,000 years old. The genus Homo that includes Neandertals and and our earlier hominin ancestors traces its roots back 1–2 million years.
This means that for the vast majority of human history we have had no written language. We still engaged with symbolism and decoded weather patterns but this processing in the visual cortex was miniscule compared to the later invention of written language. We have listened and spoken for far longer than we have used written language.
I am not speaking against the invention of written language! I am just trying to convey the idea that the brain did not develop to acquire language through the written word. It is meant to acquire language primarily through sound and here I will try to argue why.
The History of Written Language
Literacy only became widespread relatively recently in human history. Just over a hundred years ago the majority of the world's population was still illiterate. Remarkable traditions, such as Buddhism and the advanced mathematical knowledge of the Indians (called “Kanita“), were preserved through oral tradition for centuries.
Indians discovered many of the formulas that we later attributed to Western mathematicians, but they stored them in the form of verse (so-called sutras) rather than in writing. They considered knowledge to be more secure in a living memory than on a static piece of paper and saw no reason to write it down.
Of course, this meant that somewhere along the way, the identity of the original discoverer was forgotten but perhaps the precise person who “flicked the switch” never really mattered.
Socrates
If we look closer to Europe and North America and trace our intellectual history to Socrates, we find an interesting view on writing. This key thinker famously refused to write down his teachings preferring instead the living dialogue of the Hellenic philosophical tradition.
He warned that written language would weaken the memory, as people would begin to rely on external symbols rather than their own mental capacity. He believed that reading provided only a superficial knowledge; one could read a great deal but understand very little. He pointed out that text can neither answer questions nor engage in a living dialogue. Although Large Language Models (AI) are changing this—since text has finally become interactive—the main point remains:
A “static” text in a textbook can never teach you a living language.
Here, I will argue why really learning a language must involve training yourself to have a dialogue with a native speaker using different concepts than you are used to and different movements of the tongue. In other words, you need to create a feedback loop between you and native speakers of that language instead of trying to memorize visual representations of their words.
Written Language is Only a “Visual Representation”
It is important to realize that written language is only a tool for putting ideas into a more permanent form. In reality, written language is merely a representation of the language. It is not the language itself, much like a map is a representation of a landscape but not the landscape itself.
The terms we use to refer to language tell us this very clearly: the Icelandic word tungumál and the Latin word lingua (which is the root of the English word language and the French word langue) both refer directly to the tongue as an organ. A language is something that requires the movement of the tongue; it is living, it is physical, and it is acoustic (it is meant to be spoken not merely looked at).
Therefore, I argue that it is more natural and effective to learn a language through listening—and subsequently speaking—rather than placing too much emphasis on learning the written representation of the language.
Now, here is the main point I want you to remember:
If we start with written language, we are forcing the brain to store the language in the visual cortex. We are learning to recognize “drawings” of words. But language belongs in the auditory centers of the brain.
When you learn through listening:
You activate deeper and older systems in the brain that are specialized in detecting nuances, pitch, and rhythm.
You form a direct connection between meaning and sound without taking a detour through the visual interpretation of letters.
You “save” the language in the correct format so that it is functional for real-time conversation with people.
My Experience with Chinese
I personally know the cognitive load of learning a language through the written form. I experienced this during my initial studies of Mandarin Chinese at a language school in Taiwan. We were expected to take this massive detour through our visual memory:
See a visual symbol (汉字).
Learn the romanized pronunciation (Pinyin).
Connect it to the English translation.
Finally, translate it into your mother tongue to grasp the meaning.
Here the student has to start conjuring up an abstract symbol of a small unit of the language which is often meaningless. Contrary to what many think many chinese characters do not have a well defined meaning. Then they have to connect that abstract symbol to a phonetic representation of it’s sound. Then the have to connect the character and its phonetic representation to an english translation since most textbooks are made in english. Then finally they have to connect it to a word in their own language, sometimes via its written form too if it is and abstract term or technical jargon.
By the time the student has moved from the visual character to the Pinyin to the English and finally to the meaning, the dynamic moment of the conversation with the native speaker has passed and they have shifted the conversation or maybe even started scrolling tik-tok videos.
Compare this to learning a word in a real-world context in conversation or an interaction with a local: Imagine someone pointing at an overweight friend and shouting: “Pàngzi!” (胖子).
In that moment, two things happen that no book can teach you:
Sound and meaning merge: You connect the sound directly to the person and realize it means “fatty” (or “butterball”) without having to “look it up” in your mind.
Cultural literacy: You immediately sense that in this culture, it is not considered unusual or even rude to comment on someone being overweight.
You will never forget this word because you learned it through experience and context rather than as a static symbol on a page. It is stored in the brain’s auditory centers as a part of a living reality but not as an image in your visual memory that you have to recall.
Learning a language from a book or from writing in general is like trying to learn to play the piano by only looking at notes and never playing or even listening to someone playing the instrument. You might understand the theory, but you will never achieve the flow.
But I understand more than I can speak!
I often hear people say this but then when I start quizzing them on the contents of the conversation they hardly understood much. They maybe understood a few words but failed to grasp the overall meaning or even the context. Just think about how a certain words in a sentence be contain most of the meaning:
Just compare the to sentences:
“You are a traitor!”
and
“Traitor!“
In a heated confrontation, the word 'traitor' contains 90% of the meaning. Even if there are three other words in the sentence, the core message is concentrated in that one term. If you don't understand that specific word, you understand nothing.
This can all change if the context changes. For instance if the pronoun ‘you’ is said with an angry face to a specific person in a crowd, then the personal pronoun ‘you’ suddenly gains immense weight. We learn that this person must have done something wrong.
This I believe demonstrates one reason why a language must be lived to be truly acquired. Understanding meaning often requires physical presence and parttaking in a shared experience, not just mental exercises such as memorizing and recalling.
Visual Pollution
One of the biggest drawbacks of starting your language journey with written language is what could be called 'visual pollution.' When we see a new word written down—especially in languages like French or English, where spelling and pronunciation rarely align the brain automatically tries to pronounce it based on the letters it recognizes from your mother tongue or another language you know.
This creates a “flawed sound-image” conjured up in our minds from trying to attach sounds to visual images. This can be incredibly difficult to correct later. However if we do make sure we get more auditory input and makes sure we get to hear the words repeatedly in different contexts we will start to absorb it into the auditory centers of the brain which is much better for active recall during actual conversation with a native speaker.
So it is of primary importance to get proficient in every day conversation in that language before you start diving into technical terms that might actually require you to store them in visual memory just because of how rarely they are encountered.
I would even argue that everyday conversation is the true foundation of a language. It is not a unit or an item like a “word“ but it is partaking in the process of human interaction in a new cultural context and in new language.
So on a more practical level this would mean that before ever seeing a word or a phrase in writing you get to imprint “a functional version of the pronunciation” directly into your auditory memory that is based on the fact that somebody understands you when you use that pronounciation. This will prevent “visual pollution” of the spelling from later distorting your speech, because due to the way our memory works we can keep these visual images of word for a very long time.
My Experience of Learning French
I experienced this firsthand when learning I was learning French on my own. I became conversationally fluent without opening a single textbook. I used the Michel Thomas method and supplemented it by recording phrases from French travelers while working as a guide. By recording native speakers saying the things I wanted to be able to say and then listening to these phrases over and over again (as described in this article) I learned to produce the sounds accurately enough to be understood in real conversations.
When I finally started reading French, something remarkable happened: I immediately saw an “internal logic” in the spelling because I already knew the pronouncation. While the French written language is certainly distinct from its pronunciation there is a deep consistency in how the words are written. Instead of the written language being a complex puzzle to solve it became a simple coding of what I already knew.
I still want to point out that I am not particularly fluent in French today because I haven't spoken it for a long time and it never became as big a part of my life as Thai and Chinese. Still, I can hold simple conversations and chat with children and elderly people who don't speak English. This has often been useful to me when I get french tourists on my tours.
Why most people will still prefer written language
Even though people have often complimented me in my language learning abilities I have always maintained that it is just a method that anyone can adopt and I have tried to explain here in detail a practical approach to this method.
Still what I have noticed is that most people will still prefer focusing on learning through the written language. I think it feels more like actual aquisition, you are gaining something, covering some material because you hold it temporarily in your pictoral memory and the images regularily pop up. Whilst learning through listening only doesn’t really give you that feeling of progress in the short term.
Also constantly trying to have simple conversations with people just makes people feel silly or like they are not intelligent. However, you just have to push through that phase and put your ego aside. Just keep having conversations where you are not freestyling to much and not inventing your own grammar by puzzling together words in a way that feels logical to you.
Just keep copying other people and learn their phrases. When you hear a phrase from a native speaker that you think might would be useful in the context of your own life record it and listen to it again and again. It might be silly and embarassing at times but just don’t give up - it will be a very rewarding journey!
Language acquisition of children
Now that we have established how important listening is I want to underscore that you have to listen to the appropriate material. This is material you can mostly understand from context and is not too abstract. Don’t start listening to a podcast on economics on day one. Then you can slowly start having simple conversations where you are more concerned with listening and getting feedback than actually performing monologues on your own.
You have to do a lot of listening. Just consider how children learn a language. They spend years immersing themselves in the sounds of the language before they ever utter a speak a single word. Then one day they produce their first word and soon as that happens they are unstoppable. Long before they learn to read or write they are capable of relatively complex conversations (considering the complexity of their life). This is the sequence our brains are evolutionarily hardwired for:
Listening → Understanding → Speaking → Reading → Writing
When we attempt to reverse this order by starting with the written form of the language we are essentially working against millions of years of evolutionary history and the brain’s natural programming.
Automatic Language Growth (ALG)
In Thailand there is a language school called the “AUA Language Center” that operates on precisely this philosophy I have been describing. At AUA students are encouraged to listen for 800 to 1,000 hours before attempting to speak a single word or read a single letter.
This method, known as Automatic Language Growth (ALG), strictly warns against using written language or drilling grammar rules during the initial phase. Attempting to anchor sounds to symbols too early creates a visual pollution that obstructs natural language acquisition.
Instead, students spend the first 800 hours watching and listening to the teachers tell stories and converse using gestures, drawings, and context. Even if the teachers are speaking in Thai the whole time, it is easy enough so most of what they say can be understood easily from context.
By mimicking the language acquisition of a child the adult brain absorbs the language naturally. When these students finally begin to speak, they do so with a near-perfect accent and intuitive flow because they have already been exposed to the language before ever trying to produce it.
Our Innate Ability to Mimic Sounds
All human beings possess an incredible innate ability to mimic sounds from their environment. We see this in birdwatchers who learn to whistle complex bird-calls, and in tribes within the Amazon rainforest who can mimic animals with such precision that they can deceive them during a hunt. There is nothing supernatural about this ability. This is a fundamental human trait that serves as the very foundation for our ability to speak. If we could not do this language would not exist.
The Key: Environment and Repetition
The most important factor when trying to acquire a language in this way is hearing the same sounds repeatedly across many different contexts. We must create the right environment to activate our innate ability to naturally absorb a language. This means listening to material that is challenging yet simple enough for us to follow the main thread. It must contain some new words that can be understood mostly from context or just make sense to memorize at the current stage. The material could also just contain unfamiliar uses of known phrases or words. By immersing ourselves in native content that is challenging yet mostly understandable the brain automatically begins to decode the language without needing to memorize grammar rules or entire dictionaries. I go deeper into this in the section on comprehensible input.
Learning through listening
I hope it has become clear why I recommend using sound as the primary tool for acquiring a new language. While the written word can certainly serve as a helpful aid for reviewing or reinforcing specific vocabulary, it should never be the main focus at the beginning of the journey.
If we begin with the written word and process language primarily through the visual cortex, we become trapped in a cycle of retrieving information from the wrong place of our brain every time we try to speak.
It is like storing your data on an old slow hard drive with immense latency so that every time you need to say something, the “read head” of your brain must jump between locations to translate symbols into sounds. By prioritizing listening, however, you store the data in the “RAM” of your speech centers, where the response time is instantaneous and the flow becomes natural.
The Brain Has Hardly Changed
Biologically in the 5000 years since the invention of written language our brains have remained largely the same. If you further consider that written language only became widespread around 100 years ago you can see that this is not long enough for any meaningful evolution tok occur to the brain.
Still we have adapted culturally to become masters of symbolic thought. This ability allowed us to evolve symbols into to phonetic written languages and numerical systems and then those languages and numbers into programming languages that command silicon and metal, creating sound and images from thin air.
We now even have Large Language Models (AI) that generate complex text autonomously. It is still interesting to note that even these systems learn language primarily by “listening” to data and analyzing patterns and context within massive amounts of information exactly as I have described here.
AI models do not learn by piecing together individual words using grammar rules from textbooks. They mimic human speech by using probability to find the next logical sound or word within a specific context, precisely as we do when we allow our brains to function without the interference of the written word.
If you are interested in continuing on this topic and want more practical steps on how to implement this in your language study, I recommend reading my other “Stop learning vocab and grammar and start listening.“
Til að fá praktíska nálgun á það hvernig þú getur nýtt þér þetta mæli ég með að lesa greinina mína um lykilatriði í tungumálanámi.

