Differentiation theory describes the way in which we make sense of information; we could nuance this further and draw distinctions between "information", "data", "knowledge", etc. but the point is perhaps best illustrated by something like a magic-eye puzzle, where you cross your eyes and see a 3d sailboat. Some people, who know how to look for the sailboat, can see it; others cannot. The procedure for picking the "sailboat" information out of the apparent mess of colors and shapes is approximately what differentiation theory describes.
An example that includes intermodal perceptions might be a party, classroom or other crowded environment with many people talking. At first, the room is merely a jumble of sounds; we might be able to perceive the occasional single word spoken emphatically, or an outburst, or a speaker with a distinct voice or inflections. We might also get an idea of the language being spoken even if we can't clearly distinguish it. The differentiation process would begin by choosing a method of isolating information; one might choose to take an intermodal path, and look at a speaker. This would allow the use of lip-reading or interpretation of body language to coincide with the isolation of the sounds made by that particular speaker. On the other hand, one might close their eyes, restricting focus to fewer fields of input, and thereby allowing them to better concentrate on isolating the information presented in speech.
Something more specific to hearing might involve listening to a recording of a speaker in a foreign, but familiar language (such as German or Spanish) where inflections typically carry the same meaning. The listener, if asked to guess what the speaker is saying, will differentiate familiar information from unfamiliar; for example, they will not know, or even be able to guess, what words like "krankenwagen" or "gallinero" mean, but they will probably be able to guess the mood of the speaker, or the context the speech was delivered in. They can probably determine the speaker's gender, age, and fluency as well. With a sufficiently long sample, they may also be able to determine if the language contains new sounds, recognize common words, and identify root words or cross-language cognates. Thus, despite not actually knowing the language or the information directly communicated by the literal meaning of the words themselves, the listener can still acquire information.