To add another element, I think sentences in any languages can be seen as two kind of elements :
- The “fillers”, the “set phrases”, the “common phrases” : “There is a”, “As far as I know”
- The “contextual”, “meaningful” words : “Bird”, “Fly”.
If you learn only vocabulary in a vacuum, you get very good at recognizing the meaningful words but since those are drown inside a lot of “fillers”, it feels like everything is gibberish.
Contrarily, if you immerse a lot, you might get very good at recognizing those fillers, but having no clue what was the topic of the sentence.
Let me get you an example :
“今日はこんな感じで終わろうかな” https://youtu.be/AiBz-TZbPk4?si=jOIu_Qz3v6BIHj9X&t=688
This is 90% filler so things like “今日は”, ”こんなかんじで”, by listening at them again and again, you’ll just process them as an “unit” and not really like “words” anymore.
Now, when you listen something like
“本当に尊敬する。”, https://youtu.be/AiBz-TZbPk4?si=BLC_0_MMvke4iZXa&t=469
the 本当に and する should not even be processed, you hear them, you now the emotion is not “Truly”, but 本当に. You also now that it ends with する, giving you a very good sense that what comes before was a suru verb since it had not particle. So now you have a bit more “brain power” to interpret the “尊敬”、そんけい. You know where it starts, you know where it ends, so you know そんけい is the word. So now, you can use think about “what is the word ?”
So you see, when you give your example “鳥は空に飛ぶ”, the problem is that it’s a very meaning-heavy sentence. Every single word give some very crucial aspect and it’s difficult to know where something might or might not start. So I think even though it’s a very “simple sentence”, it might also be difficult to hear about it, because ever word chains very fast, and your brain has no time or no clue to really see what starts or ends where.
So in my opinion, the fix is as simple as just continuing listening, so your “elementary bricks” of understanding will become longer and longer. For example, in the podcast I linked, she says a lot at the end : 聞いてくれてありがとう. At first, I might had to think “Ah yes, きいて is て form of きく, くれて or くれる, so I guess it’s the grammar point て+くれる, with ありがとう at the end”. Nowadays, I just hear "きいてくれてありがとう” as one big block. And since I don’t need any “processing power” to decypher this, my brain is more “ready” for new things more complex, like “meaningful standalone terms”
Hope it makes a bit of sense