Before I start. Bear in mind that even though they have data for almost 2000 languages. Phoible's data is rather uneven. IIRC african languages are more documented in Phoible.
Anyway, I want to see which phonemes are more represented in certain type of language. For example, here is the top most common phoneme in a language with clicks, along with their frequency.
Phoneme Frequency
m 18
i 18
u 18
a 17
j 16
w 16
s 16
k 15
p 15
h 14
This data don't tell us much. All those 10 phonemes are not only common in click languages, but common in almost all languages. So, what I do instead is comparing phoneme frequency in the click languages, to phoneme frequency to all the languages registered in phoible. If we took the difference in the frequency, we can find the most over-represented phonemes in click languages:
Phoneme Frequency Frequency Difference
In Subset In All
kǃ 72.22 0.81 71.41
kǀ 66.67 0.75 65.92
pʰ 72.22 17.21 55.02
kʰ 72.22 17.77 54.45
t̠ʃʰ 61.11 6.86 54.25
tʰ 66.67 13.28 53.39
tsʰ 55.56 5.11 50.44
kǃʰ 44.44 0.50 43.95
ŋǃ 44.44 0.50 43.95
kǁ 44.44 0.50 43.95
kʼ 50.00 9.41 40.59
tsʼ 44.44 5.11 39.33
ɬ 44.44 5.99 38.46
kǀʰ 38.89 0.44 38.45
t̠ʃʼ 44.44 6.92 37.52
tʼ 44.44 7.23 37.21
ɡǃ 33.33 0.37 32.96
ɡǀ 33.33 0.37 32.96
d̠ʒ 66.67 34.23 32.44
pʼ 38.89 6.92 31.97
I use simple substraction instead of division, to avoid "DIVISION BY ZERO" issues.
Anyway, we can see that in click languages, click consonant are over represented, which is obvious. But what is not so obvious is that aspirated stops are over represented, as well as ejectives.
How about the most under-represented phonemes? Well...
Phoneme Frequency Frequency Difference
In Subset In All
ɡ 0.00 68.89 -68.89
ɔ 0.00 46.13 -46.13
˦ 0.00 29.68 -29.68
ɾ 11.11 33.92 -22.80
oː 5.56 27.00 -21.44
eː 5.56 26.87 -21.31
kp 0.00 19.20 -19.20
ɡb 0.00 18.95 -18.95
iː 16.67 33.85 -17.19
ɨ 5.56 22.26 -16.70
t 66.67 83.23 -16.56
ə 11.11 27.24 -16.13
˧ 0.00 15.71 -15.71
k 83.33 98.69 -15.36
o 61.11 75.81 -14.70
ɣ 0.00 14.65 -14.65
ɔː 0.00 14.15 -14.15
n 77.78 89.71 -11.94
aː 22.22 33.23 -11.01
ɔ̃ 0.00 10.79 -10.79
I have no idea why /g/ appears there.
Edit: There seems to be something funky with postgres JOIN feature. This might cause some value to be listed as 0, when in reality it' s larger than 0.