You are currently using the General_American lexicon
Switch lexicon:- British RP_beta
- CMU_0.6
- New York_beta
- Scottish__Southern_Borders_beta
- Shakespeare
- Upper Midwest_beta
No files processed yet.
Usage
This page is used to create categorized word lists based on the unique words in a file. By default, there is a category for each vowel which contains a list of words in which that vowel is the primary sound. There is also a default category for all words that contain /AXR/. The default categories have labels which include the sound and a sample word. For custom categories, all words which match the sound pattern for the category will be included in that category. The portions of the words which matched the criteria for a category will also be underlined, bolded and in the color selected.
The output file can also optionally contain the IPA and/or Arpabet and/or customizable dictionary-style forms of the words next to their written forms, similarly to sound search. If transcribed forms are included, they will appear after the written forms of the words in each category on the same line. You can select to include IPA in the output file by checking the box next to Add IPA to the output file and you can add Arpabet to the output file by checking the box next to Add Arpabet to the output file. You can include dictionary-style transcriptions by checking the box next to Add dictionary-style phonetics to the output file. You can customize the dictionary-style transcription by clicking the Show/hide transcription option box. You can replace the symbol used for each sound by entering it in the box next to the sound. You can also use the checkboxes to specify how stressed syllables should be marked and whether syllables should be separated by dashes. If IPA is to be added, you can choose whether primary and secondary stress markers should be included in the IPA transcriptions. The stress markers are included by default. To remove the stress markers, uncheck the box next to Display primary stress markers in IPA or Display secondary stress markers in IPA. These boxes will appear once the box for including IPA is checked.
The pronunciations used to determine if a word belongs to a category on based on those in the lexicon specified. Words which cannot be found in the lexicon are placed in the "Unknown" category.
The input file can be plain text, an RTF or a PDF. RTF and PDF files should have matching extensions (.rtf or .pdf). RTF and PDF files are auto-converted which can result in some formatting being mistaken for words.
The input file can be encoded in UTF-8 or ASCII however not all non-Ascii characters will be recognized.
Modifying the categories used
Selecting default categories
You can choose which of the default categories will be used by clicking the Display/hide default category options button. This will display a list of checkboxes next to each category which is included by default. You can then uncheck any categories you do not want used for a particular file.
Creating custom categories
You can create your own categories by defining sound patterns which a word's pronunciation needs to match to be included in a category. You can create up to 9 custom categories based on the basic markup interface by clicking Show/hide basic custom categories. You can also create a separate set of up to 9 categories based on the advanced mark-up sound pattern specification interface. You can create custom categories based on both pattern types simultaneously. In addition to defining the sound pattern for a category you can also optionally enter a label in the text box. If no label is entered the label used for the category is based on the sound pattern used. For example, a category based on an advanced pattern in which Preceding sound is set to S=/s/ and Primary sound is set to Nasal, will have the label "S,Nasal".
The interfaces used to specify the sound patterns is identical to the ones used for mark-up. Please see the usage information for basic mark-up and advanced markup for more information on the interface for specifying sound patterns.
Available patterns and categories:
Positions:
- Syllable-start : Specify that the pattern should start at the beginning of a syllable. Must occur at the start of the preceding context (but after Syllable, if used).
- Word-start : Specify that the pattern should start at the beginning of a word. Must occur at the start of the preceding context (but after Syllable, if used).
- NOT_Syllable-start : Specify that the beginning of the pattern, including its preceding context, must not occur at a syllable boundary. Must occur at the start of the preceding context (but after Syllable, if used).
- NOT_Word-start : Specify that the beginning of the pattern, including its preceding context, must not occur at a word boundary. Must occur at the start of the preceding context (but after Syllable, if used).
- Syllable-end : Specify that the pattern should end at the end of a syllable. Must occur at the end of the following context.
- Word-end : Specify that the pattern should end at the end of a word. Must occur at the end of the following context.
- NOT_Syllable-end : Specify that the pattern, including its following context must match sounds occurring before the end of a syllable. Must occur at the end of the following context.
- NOT_Word-end : Specify that the pattern, including its following context must match sounds occurring before the end of a word. Must occur at the end of the following context.
Sound category descriptions:
- Vowel : Any vowel including R-vowels
- Stressed Vowel : Any vowel which has a primary stress
- Unstressed Vowel : Any vowel which does not have a primary stress (including ones with secondary stress)
- Consonant : Any consonant (stops, liquids, glides, fricatives and affricates).
- R-colored-vowel : Vowels that are followed an R sound in the same syllable, including ER
- Vowel-no-R : Vowels that are not followed by an R sound
- Nasal : n, m and ŋ
- Liquid : l or ɹ
- Glide : j or w
- Sonorant : A vowel, nasal, liquid or glide
- Sonorant-consonant : A nasal, liquid or glide (but not a vowel)
- Stop : Any voiced or voiceless stop
- Voiced-stop : b d, g or ʔ
- Voiceless-stop : p t or k
- Fricative : Any voiced or voiceless fricative
- Affricate : tʃ or dʒ
- Voiced-fricative : v, ð, z, or ʒ
- Voiceless-fricative : f, θ, s, ʃ or h
- Stop-or-flap : Any voiced or voiceless stop or a flap(ɾ). Note that flap(ɾ) is not used in the CMU dictionary.
Individual Arpabet sounds:
Arpabet | IPA | Dictionary style phonics |
IY | /i/ | [ē] as in "beat" (IY) |
IH | /ɪ/ | [ĭ] as in "hit" (IH) |
EH | /ɛ/ | [ĕ] as in "pet" (EH) |
AE | /æ/ | [ă] as in "hat" (AE) |
AH | /ʌ/ | [ʌ] as in "cup" (AH) |
UW | /u/ | [Ū] as in "shoe" (UW) |
UH | /ʊ/ | [Ŭ] as in "could" (UH) |
AO | /ɔ/ | [ô] as in "ball" (AO) |
AA | /ɑ/ | [ä] as in "father" (AA) |
EY | /eɪ/ | [ā] as in "made" (EY) |
AY | /aɪ/ | [ī] as in "tight" (AY) |
OY | /ɔɪ/ | [oi] as in "voice" (OY) |
OW | /oʊ/ | [ō] as in "go" (OW) |
AW | /ɑʊ/ | [ow] as in "cow" (AW) |
ER | /ɝ/ | [ər] as in "heard" (ER) |
IH R | /ɪɚ/ | [ear] as in "beer" (IH R) |
EH R | /ɛɚ/ | [air] as in "bare" (EH R) |
UH R | /ʊɚ/ | [oor] as in "cure" (UH R) |
AO R | /ɔɚ/ | [oar] as in "door" (AO R) |
AA R | /ɑɚ/ | [ar] as in "car" (AA R) |
AX | /ə/ | [ə] as in "about" (AX) |
AXR | /ɚ/ | [ər] as in "another" (AXR) |
P | /p/ | [p] as in "pan" (P) |
B | /b/ | [b] as in "bat" (B) |
T | /t/ | [t] as in "tag" (T) |
D | /d/ | [d] as in "dog" (D) |
K | /k/ | [k] as in "kite" (K) |
G | /ɡ/ | [ɡ] as in "game" (G) |
CH | /tʃ/ | [ch] as in "chair" (CH) |
JH | /dʒ/ | [dg] as in "judge" (JH) |
F | /f/ | [f] as in "fan" (F) |
V | /v/ | [v] as in "van" (V) |
TH | /θ/ | [th] as in "thin" (TH) |
DH | /ð/ | [th] as in "these" (DH) |
S | /s/ | [s] as in "some" (S) |
Z | /z/ | [z] as in "zoo" (Z) |
SH | /ʃ/ | [sh] as in "ship" (SH) |
ZH | /ʒ/ | [zh] as in "rouge" (ZH) |
HH | /h/ | [h] as in "hand" (HH) |
M | /m/ | [m] as in "move" (M) |
N | /n/ | [n] as in "nose" (N) |
NG | /ŋ/ | [ng] as in "sing" (NG) |
L | /l/ | [l] as in "late" (L) |
R | /ɹ/ | [r] as in "red" (R) |
DX | /ɾ/ | [t] as in "matter" (DX) |
Y | /j/ | [y] as in "yellow" (Y) |
W | /w/ | [w] as in "will" (W) |