README 2.4 KB

1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768
  1. This directory contains a typical US diphone voice built using the
  2. simplest diphone method specified in the festvox document
  3. (http://www.festvox.org/festvox/) in section "US/UK English
  4. Walkthrough" Note although this is based on the same recordings as the
  5. distributed festvox_kal*.tar.gz voice this version does not have the
  6. tidy ups that the standard release has.
  7. The included recordings are actually the KAL voice (again), as taken
  8. from (http://www.festvox.org/databases/cmu_us_kal_diphone/).
  9. Here we assume there is no LAR files (even though there were in that
  10. original recording) and appropriate parameters have been set in
  11. bin/make_pm_wave to extract the pitchmarks from waveforms directly.
  12. Note there has ben *NO* tidy up to phone labels, power, or ptichmakrs
  13. which can make a real difference to the end quality. This is release
  14. as an pedagogical example, our real release of the diphone voice
  15. based on these recording has a number of extra corrections
  16. There *NO* need to copy this whole directory, Everthing is derivable
  17. from the festvox document, except the files in the wav/ directory
  18. The directory structure is
  19. bin/
  20. basic scripts for building prompts, labelling feature files etc.s
  21. cep/
  22. Ceptrum files dynamically created in phone autolabellingl
  23. dic/
  24. Final diphone dictionary final (used at run-time)
  25. etc/
  26. prompt file, and some labelling templates
  27. festival/
  28. Not used in diphone bases
  29. festvox/
  30. scheme voice definition files (used at run-time)
  31. group/
  32. extracted diphones into signle group file for distribution
  33. lab/
  34. autolabelled phone labels
  35. lar/
  36. recorded EGG signal files (not used in this example)
  37. lpc/
  38. LPC parameters plus residuals, (used at run-time for nongrouped version)
  39. mcep/
  40. MFCC (Mel Frequency Cepstrum Coefficients) not used in diphone databases
  41. pm/
  42. Pitchmark files as extract from waveforms (or EGG signal)
  43. pm_lab/
  44. derived pitchmark labeled files from pm/ enabling emulabel (and others
  45. display programs) to show the pitchmarks and waveform files.
  46. prompt-cep/
  47. cepstrum files for
  48. prompt-lab/
  49. label files for synthesized prompts
  50. prompt-wav/
  51. waveforms of synthesized prompts
  52. wav/
  53. recorded spoken nonsense words (in Microsoft riff (wav) format).
  54. If you are using Xwaves you should convert these to NIST format
  55. wrd/
  56. word label files (not usedin diphone databases)