Skip to Content

Developer's Guide to Creating Talking Menus for Set-top Boxes and DVDs
For DVD Developers

The rate of DVD adoption and penetration into American homes has accelerated in a manner unprecedented in the history of consumer electronic sales. It's easy to understand why the format has become so popular. Picture quality, audio quality and packaging have all played a significant role, but many industry analysts point to another feature unique to DVDs: the ability to include extra features, such as outtakes, commentaries, interviews and even games.

All of these additional features are made possible by the fact that DVDs are not simply recordings of media. They are actually specialized computer programs run by the computers embedded in DVD players. The actual media content is organized and presented by the software. As the discs grow in popularity, that software will include a growing variety of features and functionalities. This is great news for accessibility advocates, since it means that developers can create DVD software that supports built-in interactive audio navigation to assist those who are blind or have limited vision. The bad news is that the current DVD technical specification presents some obstacles that inevitably complicate the task of building in an audio-navigation feature. These difficulties can be overcome, and once the workarounds have been mastered, developers will be on their way to making their products fully accessible.

For developers, authoring a disc that includes audio navigation means thinking in new ways about the following:

  • The workflow, from design to delivery
  • The structure of the menu tree
  • The means by which users will toggle audio navigation on and off
  • The bit budget
  • File naming conventions and other asset management issues
  • Testing and quality control.


Adding audio navigation to a DVD does alter the typical developer workflow by adding several extra steps to the development process. Typically, DVD authoring begins with the creation of an overall design that includes the scripting of a menu tree — a kind of road map for the user interface. That script determines the range of options to be presented to users as well as the interconnections between those options, their organization into discrete menu screens and the manner of access to the actual content. When adding audio navigation, an additional script must be created. Each text block in that script must be recorded by a narrator and prepared for inclusion on the disc along with other media files.

There are several questions that title designers must answer before completing these steps. They are:

  • Will the spoken text exactly match the on-screen text in menu items?
  • Does the current menu have an on-screen title? If not, will the audio-navigation feature add a spoken title to the menu?
  • What will be the characteristics of the recorded voice (e.g. age, gender)?
Once the script is complete and the navigating voice recorded, designers must consider two more problems:

  • What will be the fidelity / data compression required by the developer?
  • Are there buffering concerns that require silent pads to be added to individual audio clips?
Because the talking-menu media take up space on the final disc, developers need to compress the audio files. This reduces both size and fidelity. Determining exactly the extent of compression will depend on the capacity of the disc and the total amount of content it must carry. In some cases, audio clips will not require significant compression. In other cases, storage concerns may drive the developer to significantly compress the clips. Either way, the determination needs to be made early in the process.


One of the most important interface decisions to be made in the early stages of development is how to toggle the audio navigation feature on and off. It's not a trivial matter, because if the mechanism for doing so is awkward or counter-intuitive, audio navigation may become an irritant rather than an aid.

Generally speaking, there are two ways of enabling or disabling the feature. Developers should always include an enable/disable feature within the menu tree, as shown below in the accessibility options screen from the Marcus Garvey: Look for Me in the Whirlwind DVD.

The Garvey Help menu with the Accessibility Options submenu selected. Choices are Turn Audio Descriptions On/Off, Turn Audio Navigation On/Off, Turn English Subtitles On/Off, and What is the Descriptive Video Service?.

Users should also be able to enable or disable the feature using the remote. One problem persists, however, in using the remote control: manufacturers have not equipped remote controls with a button dedicated to the audio-navigation feature. Some pioneering developers have worked around this limitation by co-opting existing button combinations.

Within the four DVDs that incorporated NCAM's access suggestions, we created a de facto standard for turning audio navigation on and off: pressing the 1 and Select keys on the remote control. This choice is generally seen as prudent, since it does not normally affect any other function. But it is not ideal. There is no guarantee that DVD developers won't, in the future, find some widely accepted use for this particular key combination, thereby causing a conflict. For that reason, it is imperative that developers also include an enable/disable selection within the disc's menu tree.

Furthermore, the opening screen of the disc, which announces the availability of audio navigation, should provide clear instructions for locating the enable/disable menu option. Which menu should contain the selection depends on the overall design of the disc and interface. Obvious choices are to include it within the same menu that controls subtitling or foreign-language choices, or even to build an entirely separate menu dedicated to accessibility options, as shown below in the Abraham and Mary Lincoln — A House Divided DVD. Wherever it ends up, it must be easy to find and easy to use for both sighted and blind users.

Lincolns main menu with accessibility option highlighted in the list of 5 menu items.

Discs that offer audio navigation ought to announce that fact upon insertion into the DVD player. As previously shown, an opening screen should include visible text as well as an audio clip that informs the user that the disc is equipped with an audio-navigation feature. The method for toggling the feature on and off should be clearly enunciated at the beginning of the DVD. Small-scale usability testing suggests that sighted users will benefit from having talking menus on by default. However, until audio navigation is a more widely accepted feature, the default setting will in most cases leave the talking menu disabled.

Bit Budget

Managing the bit budget is a straightforward affair for DVD developers, made only slightly more complicated by the inclusion of an audio-navigation feature. Currently, DVDs come in two sizes, 5 gigabytes and 9 gigabytes. In general, commercial media, such as film and television programs, are released on 9-gigabyte discs while 5-gigabyte discs are more popular among developers of educational and institutional titles. One reason for the difference is that 5-gigabyte discs can be burned at the desktop while 9-gigabyte discs can only be created using more expensive mastering and duplication equipment. They are therefore less practical for small developers. However, either disc size is certainly adequate for accommodating audio navigation, both for the digitized audio clips and for the additional menus and code required to implement the system.

For planning purposes, developers typically assume that each menu screen will be about 100 kilobytes in size. While the number of menu screens varies widely from title to title, developers who include an audio-navigation feature will soon find that they must manage many times the normal number of menus, thanks to a sharp limitation on the number of audio clips that can be associated with a single menu.

Imagine a screen with five menu items. The user who depends on audio navigation expects to hear each menu item spoken as the cursor moves from one to the next. For the five items, the developer would need to somehow link five separate audio clips to the screen. Unfortunately, this process is somewhat more cumbersome than one would like.

For historical reasons, the technical specification for DVDs allows only one audio clip to be linked to any given menu screen. In order to create the illusion of a five-item audio menu, the developer will actually need to create six visually identical menus, each with its own audio clip. One of the six would necessarily be the menu screen that appears before the user has made a selection. This screen would have an audio clip attached that announces the name of the menu screen and gives navigation information. Then, each of the five menu items would require its own screen with an attached audio clip. Each of these screens would be identical except for the visual highlighting added to the appropriate menu item. (Visual highlights remain important even when using talking menus. Not all users who will depend on these menus will be totally blind — some sighted users may find the talking menus helpful.)

Suddenly, a single menu has turned into six menus. The amount of disc space required increases proportionally, from 100 kilobytes to 600 kilobytes, plus the amount of space required for the six audio clips. This last number is harder to pin down since it depends very much on the length of the clip and the severity of compression imposed upon it during digitizing. What we do know is that the dynamic range of the human voice is narrow and can be compressed significantly without losing intelligibility. Developers can probably get away with a compression of 96 kilobits per second (kbs) or 12 kbs.

A rough calculation can determine the amount of space required for menus and media. First, we'll make the following assumptions:

number of menus = 15
number of audio clips / menu = 6
average length of spoken clip = 5 seconds

Each menu actually requires seven menus: the silent menu, when audio navigation is not in use, the opening menu for audio navigation, plus one menu for each of the items. The total space required for the graphic menus is calculated as follows:

7 x 15 x 100 kilobytes =10.5 megabytes

Second, we can calculate the total length of the audio clips:

6 items x 15 menus x 5 seconds x 12 kilobytes per second = 5.4 megabytes.

Therefore, the total storage cost of the graphic and audio menus can be calculated as follows:

10.5 megabytes (graphic menus) + 5.4 megabytes (audio clips) = 15.9 megabytes.

Though that may sound like a considerable amount of media, the good news is that it's really just a drop in the bit bucket. On a 5-gigabyte disc, 15.9 megabytes consumes only 0.3% of available space. On a 9-gigabyte disc, that amounts to little more 0.15% of available space.

The conclusion is clear: adding a talking menu does increase the total size of the menu system dramatically, but it shouldn't come close to squeezing out the intended content of the disc. It therefore presents no real barrier to implementation.

Menus and the Development Environment

As mentioned, audio navigation presents a novel challenge to developers because of the fact that the DVD specification allows only a single file — for our purposes, an audio clip — to be attached to a given menu. The workaround for this limitation is straightforward, though somewhat cumbersome: developers must create duplicate menus to provide links to the audio clips. For example, the Partners of the Heart DVD includes several menus. The main menu has a title plus five options, as shown below.

Partners of the Heart Main Menu with five options, 'Play Film,' 'Special Features,' Chapters,' 'Accessibility,' and 'Credits'

With audio navigation enabled, the developer intends for the menu to behave in a very straightforward manner. When the screen first appears, an audio clip announces the name of the screen. Each time the user moves the cursor to a new selection, the appropriate audio clip plays to announce the name of the language selected. To create this behavior, the developer will need to create six menus — one for each audio clip. Each time the cursor moves from one selection to the next, the entire menu will be replaced by a new menu. Graphically, however, the behavior of the menu system will be as sighted viewers expect.

For the sake of simplicity in this narrative, we'll assign menu labels to each screen (one for each selection plus the menu title), from M0 to M5, respectively. Similarly, let's assign the labels A0 to A5 to the individual audio clips that will be played.

When the user first enters the menu, M0 is displayed and A0 is played. In this case, M0 shows the entire graphic screen with no highlighting. A0 is the clip that announces the name of the screen and gives navigation instructions. When the user presses the down key on the remote, M0 is replaced by M1 while A1 is played. (M1 is the same screen but with the first selection, "Play Film," highlighted. A1 is an audio clip that says "Play Film." As the user scrolls down the list, successive screens, with the appropriate selection highlighted, appear as the appropriate clip plays. It is important to note that when the screen is currently showing M6 as the selection and the user presses the down button, the most logical behavior is to return to M1 and to play A1. Of course, at any point the user can select the highlighted choice and the DVD player will respond appropriately.

While this workaround is straightforward, it does require careful preparation. One significant requirement is the addition of silent buffers at the head and tail of each audio clip. Apparently many DVD players experience a delay in playback of audio clips due to buffering speed. This delay can cause the head and even the tail of the clip to become truncated. Developers have found that the best solution to this problem is to add a small bit of silence to the head and tail of each clip. The pads ensure that during playback, all meaningful content will be audible.

The length of the silent clip should be determined empirically, as the optimum length depends upon size of the clip and the horsepower of the DVD player, which varies among different models. To begin, a developer might try adding 15 to 20 frames (.5 to .75 seconds) to the head and tail of the clip. Of course a longer silence will also work, but if it becomes too long, the silence will create the perception of a performance lag on players that handle buffering more effectively, so the shorter the better.

Asset Management and Quality Assurance Adding audio navigation to a DVD title will probably mean adding hundreds of menus and audio clips to the disc. Those additional files should be managed carefully to ensure a smooth development process.

Before beginning work, developers should create a means of referencing and organizing the large of number of files. One solution is to create a naming system for each graphic and audio file that encodes the content and function of each file. For example, a file name could be assembled from a series of hierarchical references, beginning with the disc reference, followed by a screen number, menu selection number and finally audio navigation clip reference. For example, the file name 011304A10 would resolve as:

Disc 01
Screen 13
Selection 04
Audio File A10

With a bit of practice, it should quickly become second nature to read the name of a file and to understand its contents and its purpose, helping to avoid costly mistakes in which an audio file is linked to the wrong action.

As implied, creating these efficiencies can have a profound impact on the amount of time it takes to test and debug a DVD title that includes audio navigation. Initial accounts from participating developers suggest that adding audio navigation extends by a factor of five the amount of time required to conduct quality assurance over the amount of time required for discs without this feature. To prevent quality assurance from taking even longer, developers must be extremely careful to keep their file trees organized.

Keep in Mind

Only a few pioneering DVD developers are creating titles that include audio navigation. It is still somewhat uncharted territory. These guidelines offer support, but each developer who begins the work is, in some way, an innovator. In that spirit, here are three broad recommendations that developers can take to heart when setting out.

Learn the behavior of users

Simply put, developers must understand how their intended end-users would like to interact with the audio interface. Developers should internalize these preferences as much as possible, with the goal of developing an intuitive sense of what is likely to help or confuse a blind or visually impaired user.

Accept the limitations of the spec

The DVD specification was not developed with audio navigation in mind. If it had been, then no doubt it would have allowed developers to link more than one audio file to each menu. But that's not the case, so rather than curse the fates, move forward.

Hack and work around

In the interest of satisfying the user despite the limitations imposed by the spec, developers should tackle the challenge of adding audio navigation to their discs with imagination and determination. Any insistence on quality is sure to pay off.