Lies, Damned Lies and Phonological Statistics.
Anthologica Universe Atlas / Forums / Terra Firma / Lies, Damned Lies and Phonological Statistics.

? Yaali Annar The Gote
posts: 94
, Initiate Speaker message
So I dicked around with Phoible data and decided to do some queries to find out some statistic about phonemes. Percentage means the percentage of language in the database (1673 un total) with that features.

First, vowel point of articulation. As expected, front and central vowels are prototypically unrounded and back vowels are protptypically rounded:

back rounded: 99,3%
front unrounded: 99,7%
central unrounded: 96,6%

And less common contra-roundedness are:

back unrounded: 18,9
front rounded: 6,6
central rounded: 6,5

Now, for the height, phoible data seems to only distinguish 5 height unlike upside who distinguish 7. But... eh... let's see the frequency of vowels by height:

tense high (high): 98,1
slack high (lowered high): 64,6
tense mid (higher mid): 88,7
slack mid (lower mid): 21,5
low: 99,8

Missing from the data is the true mid (i.e. where the schwa is) and the raised low (where /æ ɐ/ is)

Hey, let's combine them into a matrix! Due to the wonkiness of phoible data, I opt to use UPSID data instead:

frunfrroceuncerobaunbaro
high91,25,814,21,69,185,2
lowered high17,30,90,90,31,415,6
higher mid31,53,44,90,93,234,0
mid5,00,919,11,41,842,4
lower mid4,01,83,60,32,338,6
raised low1,00,03,60,30,30,3
low6,50,089,40,07,44,5
? Yaali Annar The Gote
posts: 94
, Initiate Speaker message
Now, here's a prelim data of the consonants. First the PoA (taken from UPSID, I'll try to find out how phoible consonant classification work.)

bilabial: 99,7
labiodental: 45
dental: 35
alveolar: 63,6
postalveolar: 64,3
retroflex: 20,1
palatal: 89,8
velar: 99,5
uvular: 18,6
pharyngeal: 4,2
glottal: 74,7

Of course it's not universal accross MoA. For example, on further query, labiodental values are almost one of /f v/

UPSID has kinda weird classification of dental, alveolar, postalveolar. I lumped UPSID's "alveolar" with its "dental/alveolar" (yeah it's written like that, with slash) also "postaveolar" is listed as "palatal-alveolar" in UPSID.

Anyway MoA:

nasal: 96,4
plosive: 100
affricate: 67,1
fricative: 93,3
trill: 35,4
flap: 33
approximant: 96,2
implosive: 11,9
click: 1,1

All 451 languages registered in UPSID has plosive, how surprising. Notice how there's no distinction of voicing. Some MoA are prototypically voiced and some are prototypically unvoiced. Let's check them out:

voicelessvoiced
nasal3,996,4
plosive98,873,3
affricate63,134,8
fricative91,550,9
approximant5,396,2
affricated click1,31,1
affricated trill0,20,2
click1,10,8
flap0,233
implosive0,811,5
r-sound0,210,6
trill0,635,2


Yeah... there's also "r-sound" in UPSID data for a catch all bucket to store rhotics.
? Yaali Annar The Gote
posts: 94
, Initiate Speaker message
I made grafix

BfZKgM5.png

Some explanation:

Soomehow "lax high central" vowel is not recorded in phoible.
Phoible also classify schwa as "lax middle vowel" , hence the percentage of the upper mid and lower mid.
? Yaali Annar The Gote
posts: 94
, Initiate Speaker message
Now, here's the UPSID data turned into grafx

JxTA6C0.png

There's some observation that can be seen here:

Phoible merges true-mid with upper-mid vowel *except* the central ones where it merged down instead. The lax  high central vowel seems to be merged up with the tense high central vowels (It doesn't make any much difference).

The raised low vowels seems to be merged down with the true low vowel.