Research on linguistic rhythm

Research in this field has often been concerned with the dichotomy of stress-timed vs. syllable-timed languages, which is attributed to Pike (1945), though it has been reformulated by Abercrombie (1967). According to Abercrombie (1978), syllable-timed languages such as Spanish and French exhibit isochrony at the syllable level (i.e. syllables tend to have the same duration), whereas stress-timed languages such as English and German exhibit isochrony at the foot level (i.e. inter-stress intervals tend to have the same duration); yet, this theory has been confuted by various authors who carried out instrumental maesurements on different languages (e.g. Roach, 1982). Bertinetto (1977) and Dauer (1982) hypothesized that the impression of stress-timing/syllable timing could come from the presence/absence of certain phonological properties in a language, in particular a) the presence/absence of vowel reduction and b) the presence/absence of complex consonantal clusters; moreover, both of them claimed that the distinction between rhythm groups should not be considered as absolute, but rather as a continuum. On these grounds, authors such as Ramus, Nespor & Mehler (1999) and Grabe & Low (2002) proposed their rhythm correlates (see the links below for detailed information about each of them).

Recently, staff at the laboratory have researched within the field of linguistic rhythm, focusing particularly on rhythm metrics (sometimes also called rhythm correlates). To put it simply, these are formulae applied to instrumental measurements used in order to give a rhythmic evaluation of a language. These pages aim at offering an insight on the most often used rhythm metrics and at divulgating the results we obtained.
Below in this page you can find a historical account of the research in this field (most of this material has been extracted from Paolo Mairano's MA thesis). By following these links, instead, you can read about the most often used rhythm metrics (the deltas, the varcos, the PVIs and the CCIs), see the formulae and check the results we obtained on data of several languages. You may also be interested in Correlatore, a programme developed at our laboratory, which automatically calculates rhythm correlates from Praat's TextGrids.

The pioneers

Although the terms “stress-timed” and “syllable-timed” were introduced by Pike (1945), the existence of two different rhythm groups of languages had already been noticed earlier. Eriksson (1991) reports that the 18th century phonetician Joshua Steele had already put forward the idea that stresses in English occurred at fixed temporal intervals. His claims were supported only by intuition as, obviously, no tools were available at that time to provide instrumental evidence. In the 20th century, Lloyd James (1940, quoted in Pike, 1945), distinguished between languages with “machine-gun rhythm” (i.e. syllable-timed languages) and “Morse code rhythm” (i.e. stress-timed languages). Classe (1939, quoted in Bertinetto, 1989 and in Eriksson, 1991) tried to provide experimental evidence of the existence of regular inter-stress intervals, but he had to conclude that they only came up under special circumstances.

The classics: Pike and Abercrombie

It was Pike (1945) who first used the terms “stress-timed “ and “syllable-timed” languages, which are still in use today. He claims that the duration of inter-stress intervals is constant and, therefore, independent of the number of syllables (which are, consequently, compressed if the length of inter-stress intervals is increased) for stress-timed languages; instead, in syllable-timed languages, syllable duration is constant and, therefore, the duration of inter-stress intervals is proportional to the number of syllables. However, in his book The Intonation of American English, whose aim is to teach the American intonation to foreigners, he provides no empirical tests to prove his hypotheses.

Abercrombie not only drew on Pike’s distinction and terminlolgy, but also claimed that “as far as we know, every language in the world is spoken with one kind of rhythm or with the other … French, Telugu and Yoruba … are syllable-timed languages, … English, Russian and Arabic … are stress-timed languages” (1967, quoted in Roach, 1982:73). Moreover, he introduced the concept of “isochrony”, both at the foot level (i.e. the temporal duration of inter-stress intervals in stress-timed languages was believed to be constant) and at the syllable level (i.e. the temporal duration of the syllables in syllable-timed languages was believed to be constant).

The consequence of this was, on the one hand, that

“there is considerable variation in syllable length in a language spoken with stress-timed rhythm whereas in a language spoken with a syllable timed rhythm the syllables tend to be equal in length”
and, on the other hand, that
“in syllable-timed languages, stress pulses are unevenly spaced”
(Abercrombie, 1967, quoted in Roach, 1982:74).
Further studies classified most Romance languages as syllable-timed and most Germanic and Slavonic languages as stress-timed. Moreover, a third rhythm group was discovered, which was based on the mora and included, for instance, Japanese and Tamil.

Roach and other scepticists

Roach (1982) carried out an experimental test based on Abercrombie’s assumption that syllable length tends to be greatly variable in stress-timed languages and equal in syllable-timed languages. His study involved six languages, three of which are normally considered as stress-timed (English, Russian and Arabic), while the other three are normally classified as syllable-timed (French, Telegu and Yoruba).

Firstly, Roach calculated the standard deviation of the durations of the syllables in the four languages assuming that if Abercrombie’s hypothesis (the durations of syllables is constant in syllable-timed languages but greatly variable in stress-timed languages) was right, its value had to be higher for stress-timed languages and lower for syllable-timed languages. However, Abercrombie’s hypotheses were not confirmed by the results: for some syllable-timed languages (French and Yoruba) the value of the standard deviation of syllable lengths was indeed lower than for English, but it was higher for Yoruba than for both Russian and Arabic, which is in contradiction with Abercrombie’s statement. At any rate, Roach notes that the differences among the values obtained are too small (ranging from 66 milliseconds in Telugu to 86 in English) to justify the classification of a language as syllable-timed or as stress-timed.

Secondly, he calculated the standard deviation of inter-stress intervals in order to test Abercrombie’s second statement, i.e. that the length of inter-stress intervals should be constant in stress-timed languages and greatly variable in syllable-timed languages. So, according to Abercrombie’s statement, one would expect the standard deviation of the duration of inter-stress intervals to be lower for stress-timed languages and higher for syllable-timed languages. But again, the results do not confirm the hypothesis: surprisingly enough, the values calculated for syllable-timed languages (French, Yoruba and Telegu) are all higher than those calculated for stress-timed languages (English, Arabic and Russian).

However, as Roach says, the results of this experiment may have been influenced by the fact that only one speaker per language was recorded and by the difficulty in establishing which are the prominent stresses and, consequently, where the precise boundaries of inter-stress intervals fall. Yet, there seems to be no doubt about the fact that the differences are all too small to justify any conclusions as to the classification of a language into a rhythmic category. Therefore, Roach suggests that Abercrombie’s criteria for the distinction between the two rhythm groups are inadequate and that stress-timing and syllable-timing may only be a matter of perception: “a language is syllable-timed if it sounds syllable-timed” (Roach, 1982:78).

It has to be said that Roach is not the only one who is sceptic about Abercrombie’s statements. On the one hand, many linguists did not find isochrony at the foot level in Germanic languages, while linguists working on Romance languages did not find isochrony at the syllable level. A great deal of these studies are reported by Bertinetto (1989) and Eriksson (1991); I shall only mention that of Lehiste (1990, quoted in Eriksson, 1991) since it concerns Icelandic: the author did not find evidence of the supposed stress-timing in Icelandic since the duration of feet turned out to be proportional to the number of syllables. Another interesting hypothesis is expressed by Major (1985, quoted by Bertinetto, 1989), who claims that formal Portuguese shows the properties typical of syllable-timed languages, whereas informal Portuguese shows the properties typical of stress-timed languages.

The turn of the screw

In the 80s, many phoneticians abandoned Abercrombie’s hypotheses as it was clear that the theory of isochrony was not supported by experimental data. Dauer (1983) stated that languages cannot be classified as either 100% stress-timed or as 100% syllable-timed, rather they have to be classified according to which of the two rhythmic patterns is predominant and they have to be placed along a continuum ranging from total stress-timing to total syllable-timing. He also stated that the perception of a language as syllable-timed or as stress-timed may be the result of the presence or absence of a series of phonological phenomena. Bertinetto (1977) proposed a list of these phonological properties:

  1. Vowel reduction vs. full articulation in unstressed syllables;
  2. Relative uncertainty vs. certainty in syllable counting, at least in some cases;
  3. Tempo acceleration obtained (mainly) through compression of unstressed syllables vs. proportional compression;
  4. Complex syllable structure, with relatively uncertain syllable boundaries, vs. simple structure and well-defined boundaries;
  5. Tendency of stress to attract segmental material in order to build up heavy syllables vs. no such tendency;
  6. Relative flexibility in stress placement […] vs. comparatively stronger rigidity of prominence.
  7. Relative density of secondary stresses, with the corresponding tendency towards short ISI (inter-stress intervals, my insertion), and (conversely) relative tolerance for large discrepancies in the extent of the ISI. This feature seems to oppose languages like English or German on the one side, to languages like Italian or Spanish on the other.
    (Bertinetto, 1977, quoted in Bertinetto, 1989:108-109)

Bertinetto (1989) recognised a) and d) as the most indicative of stress-timing or syllable-timing, a view which is essentially shared by Dauer (1983). Schmid (2004) also added certain properties, such as the preference for closed syllables in stress-timed languages vs. the preference for open syllables in syllable-timed languages.

The phenomenon of vowel reduction is typically a phonological property of stress-timed languages and contributes to give prominence to stressed vowels (and, consequently, to stressed syllables) by shortening the length of unstressed vowels and making their quality less definite (which usually tends to be in the schwa area). On the contrary, in the languages where this phenomenon does not exist or is not consistent (e.g. syllable-timed languages), unstressed vowels tend to have a comparable length and a similar quality to stressed vowels, thus creating the impression that the duration of stressed and unstressed syllables is nearly alike.

As for d), it is normally accepted that the syllabic inventory is larger in stress-timed languages than in syllable-timed languages. As a consequence, we can state that while syllable-timed languages have a simple syllabic structure (i.e. only light consonantal clusters), stress-timed languages have a complex syllabic structure (i.e. they also have heavy consonantal clusters) particularly in stressed syllables, which are then given further prominence.

In conclusion, Dauer (1983) and Bertinetto (1977 and 1989) suggest that the more phonological properties typical of stress-timing a language possesses, the more it can be placed near the stress-timing pole of the continuum; conversely, the more phonological properties typical of syllable-timing a language has, the more it can be placed near the syllable-timing pole of the continuum. It has to be remarked that some languages possess properties typical of both rhythm groups: Nespor (1990, quoted in Ramus et al., 1999) notes that Catalan has a simple syllabic structure but allows for vowel reduction, while, conversely, Polish has a complex syllabic structure but does not allow for vowel reduction.

Compensatory shortening

The term “compensatory shortening” refers to the phonological phenomenon by which, in certain languages, the stressed syllable of a foot or word tends to be compressed according to the number of the following unstressed syllables of that foot or word. This phenomenon is more precisely called intersyllabic compensation in opposition to intrasyllabic compensation, which refers to the phenomenon by which the phonemes of a syllable tend to be compressed in function of the number of the other phonemes present in that syllable. Intuitively, intersyllabic compensation has been associated with stress-timing, whereas intrasyllabic compensation has been associated with syllable-timing: the tendency of readjusting the length of the syllables of each foot (intersyllabic compensation) and the phonemes of each syllable (intrasyllabic compensation) are in fact interpreted as an attempt to standardise the length of, respectively, the feet and the syllables.

As for intersyllabic compensation, it has been studied by various linguists (see Bertinetto, 1989 and 1990, for a summary of many of these studies) and the results seem to confirm that it is a characteristic of stress-timed languages.

However, as for intrasyllabic compensation, the results of some studies do not confirm that it is a characteristic of syllable-timed languages. Vayra, Fowler & Avesani (1987) noticed more intrasyllabic compensation in English than in Italian and therefore suggested that English presents “intimations of syllable-timing”. This view is not shared by Bertinetto (1989), who claimed that “no (alleged) isosyllabic language examined so far exhibits strong inclinations towards intrasyllabic compensation” (Bertinetto, 1989:122). He argued, instead, that intrasyllabic and intersyllabic compensation should be considered as the different facets of the same property, which is symptomatic of stress-timing. He went further, claiming that “the ultimate difference between iso-accentual and iso-syllabic languages might lie in the different degrees of flexibility they exhibit at all relevant levels of structure” (Bertinetto, 1989:123). Finally, he provided a new list which adds certain properties typical of stress-timed languages to the list presented above, mainly including the various types of compensation.

Essential bibliography
Abercrombie, D. (1967) Elements of General Phonetics, Edinburgh University Press.
Bertinetto, P. M. (1977) “Syllabic Blood”, ovvero l’italiano come lingua ad isocronismo sillabico. Studi di Grammatica Italiana, vol. 6, pp. 69-96.
Bertinetto, P. M. (1989) Reflections on the Dichotomy ‘Stress’ vs. ‘Syllable-timing’. Revue de Phonétique Appliquée, Mons, pp. 99-130.
Bertinetto, P. M. (1990) Coarticolazione e ritmo nelle lingue naturali. Rivista Italiana di Acustica, XVI/2-3, pp. 69-74.
Bertinetto, P. M. & Bertini, C. (2008). On modeling the rhythm of natural languages. Proc. of the 4th International Conference on Speech Prosody, Campinas 2008, 427-430.
Dauer, R. M. (1983) Stress-timing and Syllable-timing Reanalysed. Journal of Phonetics, n° 11, pp. 51-62.
Dellwo, V. (2006). Rhythm and speech rate: A variation coefficient for deltaC. Language and Language Processing: Proceedings of the 38th Linguistic Colloquium, Piliscsaba 2003, ed. by Pawel Karnowski Imre Szigeti, 231–241. Frankfurt: Peter Lang.
Eriksson, A. (1991) Aspects of Swedish Speech Rhythm. Doctoral Dissertation, University of Göteborg.
Grabe, E. & Low, E.L. (2002). Durational variability in speech and the rhythm class hypothesis. In: Gussenhoven, C., Warner, N. (eds), Papers in Laboratory Phonology 7, Berlin: Mouton de Gruyter, 515-546.
Pike, K. L. (1945) The Intonation of American English. Ann Arbor. University of Michigan Press.
Ramus, F., Nespor, M. & Mehler, J. (1999). Correlates of linguistic rhythm in the speech signal. Cognition, 73/3, 265-292.
Vayra, M., Fowler, C. & Avesani, C. (1987) Word-level Coarticulation and Shortening in Italian and English Speech. Status Report on Speech Research, Haskins Laboratories, n° 91, pp. 75-89; also in Studi di Grammatica Italiana, n° 13, pp. 249-69.