Internalization of Speech: Pronunciation and Perception of the Word ———————————————————

There are various philosophers who have discussed the role of language in ancient India. Among them, Bhartṛhari considered the relation between the superficial appearance of speech and its essential nature. In actual life, we pronounce and perceive the word. He held that there must be some link between ideal logic and worldly truth. His focus in the Brahmakāṇḍa of the Vākyapadīya, is on the process of communication, the process of the internalization of speech. He differentiates the perspective of the speaker and the hearer, and explains the movement of sounds. The sphoṭa theory addresses both how to pronounce the word and how to perceive it. Traces of his discussion are found in the works of his follower Maṇḍanamiśra in his book, Sphoṭasiddhi.

see in his distinct use of the terms nāda and dhvani, which elsewhere are used synonymously to mean sound of any kind. The understanding of meaning, whether as speaker or as hearer, is necessarily related to our consciousness. That is, at some point the word exists in our minds. Nonetheless, there is as yet no clear account of the process of its internalization.
In this paper, I will consider two questions: 1) what exactly is the difference between the external and the internal aspects of the word for Bhartṛhari and Maṇḍana, and 2) how does the Grammarians' theory of language explain the process of the internalization of speech. 1 I will proceed as follows: In section 2, I will illustrate the relation between sound and the word, and try to show the basic structure of the word using the Grammarians' discussion as a clue. In section 3, I will focus on Bhartṛhari's sphoṭa theory and try to show how he defined sound. Finally, in section 4, I will illustrate how Maṇḍana discussed the manifestation of sphoṭa in his SS and how he followed Bhartṛhari's discussion.
Pāṇini stated in the above rule that if it is not yet related to some other object as its meaning, its own word-or sound-form (śabdasvarūpa) is the primary referent. Regarding this issue, Haradatta, who is a Grammarian in medieval times (7)(8), interpreted the word's own form (śabdasvarūpa) as the universal (sāmānya: generic concept) of different individual variations of one word. 8 The word agniḥ can be pronounced in different ways, namely by different tones, tempos and so on, but it has its own essential form which is the basis of all the variations. Its own word-or sound-form is its most essential 'referent' (vācya: meaning to be referred). And, I say it is this (inner and essential) form of the word or sound that is sphoṭa. All of the words consist of their own word-form (svarūpa = sāmānya) and modalities which make them appear differently. Therefore, every word has sphoṭa as the core of its existence. The same thing is claimed also by Bhartṛhari using the concept of "the word's generic form" (śabdākṛti). 9 Both are not the actual (or superficial) form of the word but that which is perceived as its true nature though decorated by sounds. And accordingly, the previous definitions of śabda change slightly: [Condition 1] When the linguistic convention of a śabda is known, [1] The śabda is connected both with its own form (svarūpa = ākṛti = sphoṭa) and with its meaning. Such a śabda is equivalent to pada or vākya, namely "the word." [Condition 2] When the linguistic convention of a śabda is not known, [2] The śabda is connected only with its own form (svarūpa = ākṛti = sphoṭa). Such a śabda is merely dhvani "sound."

Sphoṭa Theory of Bhartṛhari
The concept of sphoṭa is developed from the idea of śabda in the MBh. Although Patañjali himself did not give a clear explanation of sphoṭa, Bhartṛhari refined this into an elaborate philosophical theory in the VP. After Patañjali's examination of śabda, the Grammarians, including Bhartṛhari, no longer gave any importance to the "phoneme" (varṇa). Instead, sound (dhvani) was emphasized as the cause of manifestation of the word (śabda, more precisely śabdasvarūpa = sphoṭa), beginning, it seems, with Bhartṛhari.

Accumulation of Sounds: External and Bodily Sounds
Bhartṛhari has differentiated the usage of the terms dhvani and nāda. According to his explanation in the auto-commentary on I.47, dhvani is the external sound which pervades the space (vyoman) and is the fundamental cause of the manifestation of sphoṭa. On the other hand, nāda is the internal form of sound (= bodily resonance) which has been accumulated by the speech-organ. Let us start with the following verse 10 : VP I.47 (p. 105, ll.1-2): 11 The [conceptual word (=sphoṭa)], 12 which has been ascertained (vitarkita) by the intellect before [the utterance] and which has been made to reside (niveśita) in a particular meaning (= a word-form is assigned to a particular meaning), is seized (anugrah) through sound (dhvani) which has been transformed (vivṛtta) [into bodily resonance] by the speech organs.
The content of this verse is split into two phases: 1) the process of residence, namely, pervasion (vyāveśa) by the word-form of the referent (I.47ab), and 2) the process of pronunciation (I.47cd). And as for the second phase, the process of the actual pronunciation of the word is explained by Bhartṛhari as follows: Auto-commentary on I.47 (p. 105, l.6-p. 106, l.3): 13 Regarding "which has been transformed by the speech organs" [in the verse]. Indeed, the reality of the word (śabdatattva) which is not characterized by the transformation (vikriyā) is transformed according to the external sound (dhvani) which is characterized by the transformation. Then when the subtle external sound, which pervades [the ether], is accumulated by the function of the [speech-]organ, it has been transformed as the gross bodily resonance (nāda) which is the object of perception just like a cloud compacted [ Here dhvani, if it pervades everywhere, should be also inside our body. In that sense, the translation "external sound" is not precise. Regardless of this danger, I keep translating it as "external sound" in order to differentiate from sphoṭa as well as nāda. Those subtle, external sounds are developed/transformed into the gross bodily resonances when accumulated by the speech-organ. As we can see from the term "before [the utterance]," this is the explanation of how to pronounce the word. The word is manifested by the subtle, external sounds which have been transformed into the gross, bodily resonances. Subtle, external sounds are pervasive but imperceptible, while gross bodily resonances are perceptible. Pronunciation is the process in which the speech-organ accumulates subtle external sounds and transforms them into the perceptible entities. And because of the sequentiality of bodily resonances, Bhartṛhari explains, we feel that the word is sequential. 14 The bodily resonances appearing inside the speaker manifest sphoṭa. That means, in the case of the speaker, that there is a causal relationship between the bodily resonances and sphoṭa. And since the bodily resonances have their own sequence, sphoṭa is affected by this and appears to be sequential, although in reality it has no sequence. This is because sphoṭa is single and fixed (nitya).

Classification of External Sounds and Bodily Resonances
Now another question arises: how exactly are dhvani and nāda different? In the following definitions of Bhartṛhari, we find there are two classifications in both of them, namely primary and secondary. 15

Classification of External Sounds
External sounds are first defined by Bhartṛhari as subtle particles pervading the ether. How, then, does he think of actual sound, which is also called dhvani? In the following passages, Bhartṛhari proposes two kinds of dhvani, both of which are actual sound, distinct from any imperceptible entity: primary, external sound and secondary, external sound. First of all, we notice that the word grahaṇa "grasping" is used here. So we should change our perspective from the speaker to the hearer. And accordingly, dhvani is taken as the sound to be perceived, in total contrast to nāda which is the sound to be uttered. Just like the bodily resonances affect sphoṭa when the speaker pronounces the word, now the external sounds play the same role: they affect sphoṭa when the hearer perceives the word. In the duration of the fixed (nitya) things, there is no function of the capacity of time as assistant. As for all these sphoṭas, those we call varṇasphoṭa, padasphoṭa, and vākyasphoṭa, in mundane reality they do have a nature that is tracked by the intellect between two limits, prior and posterior. But [in reality] there is no difference between them as to duration, regardless of how large or small they are. They do not have different durations [themselves]. But when we become aware of them, we falsely attribute to them the duration of our perception of them. Regarding "the time of operation as well as that of one's own" [in the verse]: The primary bodily resonance is that of which form of duration is superimposed, due to the non-distinction [between the dhvani and the sphoṭa], onto the body of the word, and which is the cause of establishing the worldly cognition/expression regarding the distinction of time of short, long, and prolated vowel. On the other hand, the secondary bodily resonance brings about the respective establishment of external time of conditions such as fast.
Each word exists without the delimitation of time or size. We generally feel that the time required to pronounce gauḥ is shorter than that required for devadattaḥ, but such a difference of size is not of the word's own form but is caused by the primary sound. The primary, external sound is referred to by the Grammarians as the cause of the manifestation of sphoṭa. On the other hand, the secondary sound is the cause of the differences in intonation, pitch, accent, or tempo (= vṛtti: any kind of modality). We can readily understand that the physical sound or tone that differentiates a word is considered to be secondary. The modality of the secondary sound is the cause of continuous perception (prabandhanimitta), and it ensures the continuity of manifestation of sphoṭa. This whole discussion, however, is limited to the side of the hearer. After perceiving the word together with the sound, sphoṭa becomes manifest in the hearer's mind. But the perception of the word is inevitably influenced by the external sounds, which have been the bodily resonances on the side of the speaker.
Bodily resonance is also divided into primary (prākṛtanāda) and secondary (vaikṛtanāda). And we find that the explanation of these two is almost the same as that of the classification of the external sound: Auto-commentary on I.101 (p. 166, l.3-p. 167, l.2): 18 In the duration of the fixed (nitya) things, there is no function of the capacity of time as assistant. As for all these sphoṭas, those we call varṇasphoṭa, padasphoṭa, and vākyasphoṭa, in mundane reality they do have a nature that is tracked by the intellect between two limits, prior and posterior. But [in reality] there is no difference between them as to duration, regardless of how large or small they are. They do not have different durations [themselves]. But when we become aware of them, we falsely attribute to them the duration of our perception of them. Regarding "the time of operation as well as that of one's own" [in the verse]: The primary bodily resonance is that of which form of duration is superimposed, due to the non-distinction [between the dhvani and the sphoṭa], onto the body of the word, and which is the cause of establishing the worldly cognition/expression regarding the distinction of time of short, long, and prolated vowel. On the other hand, the secondary bodily resonance brings about the respective establishment of external time of conditions such as fast. Now, we encounter the problem that the explanations of these two bodily resonances also refer to the state of perception (upalabdhi), and if we take this perception as the hearer's perception, dhvani and nāda would be identical. Therefore, in order to keep logical consistency, this perception is to be taken as that of the speaker. Maybe taking it in this way works: as the deaf person's speaking difficulties show, some aspect of perception is also required for a speaker.
Here, I propose that we can understand nāda ('bodily resonance') as the sound on the side of the speaker, while dhvani ('external sound') as that on the side of the hearer. This understanding is different from the last I.47: there dhvani is the subtle external sound pervading the ether, while nāda is the gross bodily resonance transformed from dhvanis. Contrast between dhvani and nāda is on the one hand 'subtle' and 'gross,' and on the other hand 'the hearer's side' and 'the speaker's side.' But a common feature exists between the two. Namely, nāda is always related to the body or pronunciation in the speech-organ, and dhvani is related to outside the body or perception of the external world. So Bhartṛhari's usages are consistent, even though he introduces hereafter different opinions about the word and sound, some of which take these concepts differently, which certainly means that such definitions of sounds were controversial even in his times.

Manifestation of Sphoṭa
Another question arises: how do we know sphoṭa? Or do we really perceive the aspect of sphoṭa in the word? Bhartṛhari thought, I surmise, that 1) such a single, indivisible conception is possible only in our awareness, and that 2) it is to be perceived gradually although its form is single. Thus, Bhartṛhari proposes the schema of the manifestation of sphoṭa in such a way that the unanalyzable cognition (anupākhyeyajñāna) becomes clearer and clearer: In the same way (just as the memory of a vedic verse or a verse in ordinary speech is strengthened by its repetition), through the unanalyzable cognitions that are in conformity with the grasping [of sphoṭa], the [word's] own form is ascertained when the word is manifested by the sounds (dhvani). In the intellect into which the seeds are imparted by the bodily resonances (nāda) and which has reached maturity through repetition, the word (the word in the mind = sphoṭa) is ascertained together with the final sound. Different sounds are first specified by individual efforts of the speaker who intends to pronounce a particular word. The initial sound already manifests the unitary cognition in its entirety. At that point, however, it remains quite unclear as well as ambiguous, and is designated as 'unanalyzable cognition' (anupākhyeyajñāna). This in turn generates the impressions (bhāvanā = saṃskāra) or the seeds, whereby as subsequent sounds are produced, the unanalyzable cognition is made clearer and clearer. As this process is reiterated, the pronunciation of the final sound produces the cognition that embeds the utterly clear image of the word-form (śabdasvarūpa), that is, sphoṭa. In this way, the form of sphoṭa is gradually made clearer by each impression until it is completely manifested.
Bhartṛhari emphasized the close connection between the speaker and the hearer by using the terms nāda and dhvani. Sounds are derived from the speaker's utterance. They are transferred from the speaker to the hearer. As soon as the hearer perceives the physical sounds uttered by the speaker, the latent traces arise in his intellect. Bhartṛhari's sphoṭa theory therefore focuses on the communication which necessarily consists of both sides. phonemes and the understanding. Against this position, Maṇḍana insists that phonemes cannot be the cause of understanding because they have sequence and cannot co-occur. The phonemes cannot convey the meaning singly, nor can they act together. Therefore, the unified meaning cannot arise from them. In the same manner, he empathetically refutes the view that latent impressions can become the cause of the understanding of meaning, either directly or indirectly.
Though most pages of the SS are devoted to the criticism of the varṇa theory, in this paper we do not discuss how Maṇḍana responds to Kumārila's objection. In the middle of the SS, when he proposes the process of the manifestation of sphoṭa, Maṇḍana explains it following Bhartṛhari's system. Let us have a look at how it is the same as or different from the VP's statement. To explain, the efforts [of articulation], whose various forms are being directly perceived through the function of the mind which ascertains the effort that produces (samutthāpaka) the word, always discriminate (vyāvṛt) sounds, by depending on (āyatamāna) [the efforts] themselves, as being based on their intrinsic nature. Therefore, different words do not always appear because they are manifested by certain (fixed) bodily resonances (nāda).
The verse says that external sounds (dhvani), which are differentiated by the efforts of articulation, manifest the word. And in the auto-commentary it is rephrased as follows: the word is revealed by the certain internal resonances (nāda) which have been discriminated from the external sounds (dhvani) by means of the efforts of articulation. Maṇḍana interprets the word vivṛtta in the VP I.47 as bhinna and vyāvṛtta. Bhartṛhari rephrased it in his auto-commentary as vikriyā and prāptavivarta, and therefore I took vivṛtta as meaning the accumulation and transformation as its consequence. However, Maṇḍana's understanding of the VP I.47 is slightly different from that. The word vyāvṛt can have the meaning not so distant from what we understand in vivṛt. But by adding the prefix ā-, this passage suggests Maṇḍana's own idea on the relation between dhvani and nāda, that the latter is the effect of exclusion from the former. At least, the idea of transformation from the subtle external sounds into the gross bodily resonances cannot be found in the auto-commentary.
Maṇḍana continues the auto-commentary on v.18 as follows, which is in turn closely related to the VP I. Nor do other bodily resonances (itaranāda) become useless, because of the difference of the manifestation. To explain, to the hearer in whose mind specific latent impressions (bhāvanā) have not yet arisen, the preceding sounds (pūrve dhvanayaḥ) make manifest the apprehensions (prakhyā: undifferentiated perceptions), which grasp the unclear (avyakta) form [of the word] and [at the same time] sow the seeds that are the impressions conducive (anuguṇa) to the The hearer first directly perceives particular physical sounds which are uttered by the speaker. Each physical sound generates an impression in the hearer's mind, which helps the perception of the immediate sound. By means of the latent traces generated in the direct perception, the hearer internalizes those sounds as sphoṭa. Here, we can see that the first part of the commentary is talking concisely from the perspective of the speaker about the process of the internalization of the external sounds, and then it changes the perspective to that of the hearer. The SS v.18 is in fact a concise summary of the VP I.47 (karaṇebhyo vivṛttena dhvaninā: the speaker's perspective) and I.83 (pratyayair anupākhyeyair grahaṇānuguṇais: the hearer's perspective). However, this summary may bring about a danger of misunderstanding dhvani. For, as we have seen before, the usages of dhvani in I.47 and I.83 are slightly different: the former is the subtle sounds pervading the ether, while the latter is the external sounds perceived by the hearer's sense-organ. And as far as I checked, Maṇḍana gave up adopting the idea of the subtle sounds explained in the auto-commentary on I.47. Indeed, in the SS, we realize that Maṇḍana does not talk in such detail about the perspective of the speaker. He refers to the speaker only as the starting point of the whole process of communication, and also when he criticizes the oneness of the speaker, which is one of the conditions of the understanding of meaning held by the Varṇavādins. His interest focuses on how the hearer perceives the word, and not on how the speaker pronounces the word. And this makes a great deal of sense because "the process of understanding the word" is not relevant to the speaker: he already knows what he wants to say and makes the effort to pronounce it, and therefore for him, the existence of sphoṭa is evident.

Concluding Remarks: Internalization of Speech
The nature of the word is its conceptual form (svarūpa). This form, or sphoṭa, is the signifier (word) as well as the signified (referent), and is consistent (nitya = siddha) as long as one belongs to a particular language community. And consequently, as long as it is called "the word," it must have meaning. Speech is internalized by the speaker at the time of pronunciation, and is transferred by him to the hearer. Focusing on the former, we see the relation between sound (nāda) and the place of articulation. Focusing on the latter, on the other hand, the relation between sound (dhvani) and the auditory faculty is seen. Both are different processes but sounds are the same. By carefully seeing the usage of nāda and dhvani, we can find how Bhartṛhari thought of the process of communication, that is, the circulation of sounds from subtle sound pervading in the ether to the actual sound pronounced by the speaker.
The sphoṭa theory is the theory of how the word is perceived and understood by the hearer.