Skip to Content

Accessible Digital Media Guidelines
Guideline J: Multimedia in DTBs


Provide accessible multimedia in Digital Talking Books (DTBs).

Strictly speaking, a digital talking book (DTB) is a multimedia representation of a print publication that provides access to the text through digitally recorded human voice or synthetic text-to-speech technology. DTBs are largely aimed at blind or visually impaired users, but are also used to help improve reading and comprehension skills in students with learning disabilities. They provide various levels of navigation, from partial to complete, and can be read on dedicated hardware devices or on software players that run on Windows or Macintosh computers. There is also at least one DTB player for Windows Pocket PC/Mobile devices.

There are six types of DTBs, defined in the current DAISY/NISO Standard:

  1. Audio with title element only: DTB without structure. This is the simplest class of DTB and is used for books where structure will not be applied.
  2. Audio with NCX (an XML file that defines the structure of the DTB) only: DTB with structure. The NCX, if present, contains only the structure of the book and may contain links to features such as narrated footnotes, etc. This is the most common form of DTB and is ideal for stand-alone players.
  3. Audio with NCX and partial text: DTB with structure and some additional text. The XML textual-content file contains only the structure of the book and the text of components where keyword searching and direct access to the text would be beneficial, e.g., index, glossary, etc.
  4. Audio and full text: DTB with structure and complete text and audio. This form of a DTB is the most complex but provides the greatest level of access. The XML textual-content file contains the structure and the full text of the book. The audio and the text are synchronized.
  5. Full text and some audio: DTB with structure, complete text and limited audio. The XML textual content file contains the structure and the text of the book. The audio files contain recordings of parts of the text. This type of DTB could be used for a dictionary where only pronunciations were provided in audio form.
  6. Text and no audio: E-text with structure. The XML textual-content file contains the structure and text of the book. There are no audio files.
While accessible in-line audio clips can be used in any player capable of playing recorded speech, players that support external linking (that is, hyperlinks leading to external sources such as audio or video clips) can, in theory, integrate either audio or video by launching the media in a stand-alone player. Not all reading devices support linking, however, so authors must be sure to provide a fallback for devices that lack this capability. And when it comes to embedding video elements into DTBs, there is currently only one software DTB player supports this feature: Dolphin's EasyReader.

A question that often arises is whether to provide human-recorded or synthetic speech (i.e., text-to-speech) in a DTB. The most sophisticated DTB-creation software allows the author to incorporate either or both, whereas some applications may only accommodate text-to-speech integration. Human speech sounds the best to most human ears, but is time-consuming and expensive to create: narrators must spend hours carefully reading and recording text, ideally in soundproof rooms. The recordings must then edited and carefully checked for accuracy before being integrated into the DTB. In the case of a Type 4 DTB, text and audio must also be synchronized, which adds to the workload. The result, however, is a completely natural-sounding, easy-to-listen-to audio book.

Synthetic speech, on the other hand, can be created in a fraction of the time it takes to record, edit and process human speech. Some DTB-creation software can import the text of the book, automatically generate structured, searchable text with synchronized speech in a matter of minutes compared to the many hours it takes to do the same with human speech. Many DTB players can read text using built-in text-to-speech software, so the author of the book need not even include audio files in the package (Type 6 DTB), simplifying the process even further.

In recent years, synthetic speech technology has undergone improvements leading to voices that sound less robotic and artificial. It is generally believed that these improvements make them easier to listen to for long periods of time. There has been little formal research thus far to determine user preferences of human vs. synthetic speech, but the National Library for the Blind (NLB) has conducted one survey investigating the acceptance of leisure-reading material using synthetic audio synchronized with full text and navigational structure. Their findings, as well as anecdotal evidence, suggest that human speech is generally preferred over synthetic, and that human speech may be best for leisure reading (e.g., novels or magazines) whereas synthetic speech may be sufficient for all other materials (such as textbooks, technical documents, bank statements, medical records, etc.). Objective research to explore the efficacy of synthetic vs. natural speech will be necessary to definitively prove this, however.

There are several applications available for creating DTBs, ranging in price from free to several hundred dollars. Visit the Daisy Consortium's Production Tools page for more information. At one end of the spectrum is the free application DTBMaker, available in a simple on-line version that imports text and generates a DTB containing structured text and synthetic speech. (Downloadable versions are available as well.) The user need only follow basic markup instructions before uploading the file. When DTBMaker is finished creating the book, the user is notified via e-mail that the materials can be downloaded in a ZIP archive. The DTB can then be opened in any DAISY-compatible player. If the source files contain links to external multimedia, DTBMaker will keep those links intact when it performs the DTB conversion.

At the other end of the spectrum is the Dolphin's EasyPublisher, a sophisticated DTB-creation package. EasePublisher can record, edit, synchronize and process human speech, create synchronized text-to-speech DTBs, edit text, create different styles and perform a multitude of other functions.

Note: the checkpoints below describe the bare basics for recording and synchronizing human speech, or generating and synchronizing synthetic speech, using EasePublisher 1.02. For complete, step-by-step details about recording and processing audio, see the Help files that are associated with EasyPublisher.

Also note that this guideline does not provide anything beyond a very basic summary about the creation and structuring of DTBs themselves. Rather, the information here is focused on the integration of multimedia into DTBs. For complete information about DTB markup and structure, authors are encouraged to read the following two resources:

Checkpoint J1
Add in-line multimedia to DTBs

All DTBs that contain audio can accommodate described, in-line audio clips (that is, those that are pre-recorded and then inserted directly into the structure of the book). For example, a DTB that is created with recorded speech will treat accessible audio clips as just another audio element in the book, and when the book is played back with recorded speech the described clip will be played automatically by the DTB reader as with any other audio element.

Technique J1.1
Integrate in-line multimedia in DTBs

After the source files (e.g., HTML or plain text) of the book have been prepared, use EasyPublisher 1.02 to convert the text to a DTB. You may synchronize the text with human-recorded speech or convert the text to synthetic speech. Both processes are briefly summarized below (for full details, see the Help files that are associated with EasyPublisher).

To create and synchronize human speech:

  1. Click on the first sentence of the text.
  2. Press Right-Ctrl+F5 to launch the recorder, then begin speaking the text into a microphone.
  3. After you complete the first sentence, immediately press Ctrl+Enter to move the on-screen focus to the next phrase. Speak the text into the microphone.
  4. Repeat this process until the entire book has been recorded.
  5. Press Spacebar or F5 to stop the recording at any time.
To create and synchronize synthetic speech:

  1. Click on the first sentence of the text.
  2. From the Tools menu, choose TTS Encode
  3. Choose Settings..., then select the voices you want to use for differentiating headings from body text. Adjust the voice settings as desired, choose an audio format, then press OK.
  4. Open Tools/TTS Encode again, then choose Whole Project to convert the entire book to synthetic speech, or choose Current Chapter to convert just the current chapter.
  5. After the conversion is complete, test it by pressing the spacebar to start and stop the playback of audio. To re-record, delete the audio from the waveform area and repeat the steps above.
Regardless of which approach you choose, the end result will be a project that looks similar to the one shown below, with both the book text and audio waveforms visible on the screen.

The EasePublisher application showing text and corresponding audio waveforms.

To add an in-line described audio clip, first record and edit the audio clip in a separate application such as NCAM's MAGpie, Audacity or Sony's Sound Forge. Save the file as MP3 or WAV (the former provides the most compact file size), then follow the steps below.

  1. In EasyPublisher, click once on the text of the phrase that occurs immediately prior to the insertion point of the described audio clip.
  2. Press the spacebar to play the audio to the end of this phrase. Press the spacebar to stop the playback at the end of the phrase. Use the right and left arrow keys to precisely position the cursor at the end of the waveform, if necessary.
  3. From the Project menu, choose Import/Import audio file(s)...
  4. Browse to the audio file you want to insert. Select the file and press Open.
  5. In the Import Audio dialog, double click on the audio file.
  6. From the Audio File Import Settings dialog, choose "insert at pos". Press OK.
  7. Press the Start Import Audio button. The clip will be imported and inserted into the timeline at the cursor position.
  8. Click again on the text of the phrase that precedes the described audio clip, and press the spacebar to listen to your work. If the insertion was not successful, delete the described clip and re-import.
  9. Repeat these steps for subsequent described audio clips.
Once you have finished preparing the DTB (cleaning up and formatting all text, importing audio files, etc.), prepare the book for playback on all devices (note: these instructions describe a simplified process for building a project. For full details, see the Help files that are associated with EasyPublisher):

  1. Press F9 to open the Build Options dialog.
  2. Choose Full Text from the "Build distribution as..." list at the bottom of the window if you want the book to contain full text as well as audio (ideal for software readers that display text); choose NCC Only if you only want the NCC file (similar to a table of contents) plus audio (sufficient for hardware players that do not display text).
  3. Choose +build/encode from the Build Options list at the bottom of the window. (This will clean up any extraneous audio files in addition to encoding the book for distribution.)
  4. Select the options you want from the other tabs, being sure to specify a folder target in the Folders tab.
  5. Press the Start button.
  6. Test the presentation by opening the .NCC file in a software player, or by copying the distribution files onto a CD for playback on a hardware player.

Technique J1.2
Integrate linked multimedia into DTBs

Before creating a DTB that contains links to external or local multimedia, take care to note that not all DTB players (e.g., hardware players) support linking. If you want to provide a DTB with links, it may also be a good idea to provide the same book with in-line, accessible audio clips as an alternative for those readers who do not have a player capable of activating links.

To create a book with links, code the links in the source documents as any HTML hyperlink: i.e., <a href="http://mywebsite.org/mymovie.qtl">Movie: How to Build a Doghouse</a>. You can point to video or audio files stored on a remote server, bearing in mind that the user will have to have access to the Internet in order to access remote multimedia, or to files stored on the reader's local hard drive, which are delivered as part of the DTB package downloaded by the user (e.g., in a .zip archive).

Next, follow the instructions for converting source files to a DTB. Add human-recorded or synthetic speech, and then build the project for distribution. Always test the book by opening it in a DTB reader that supports linking.

Technique J1.3
Integrate embedded, accessible multimedia into DTBs

A DTB with embedded, accessible multimedia represents perhaps the most flexible type of digital talking book. Authors may create structured, searchable text and can integrate audio or even video right into the book.

The irony of embedding video into a DTB must be acknowledged, yet it is not an unreasonable proposition given that a) it can be done today, and b) it will serve a useful purpose as DTBs enter the mainstream. Students could, for example, read a biology text and watch (or listen to) a movie describing cell division, all without having to open an external application to play the multimedia. And embedded movies can be captioned as well as described, something useful for deaf or hard-of-hearing users with or without learning disabilities. Embedding movies can eliminate the need for an Internet connection — the movies are downloaded and installed with the rest of the DTB — and keeps the book portable and easy to use.

One obstacle to embedding multimedia (video or audio) directly into a DTB is support: no hardware DTB player currently on the market has any accommodation for embedded multimedia, let alone visual display. There is, however, a software players that support the integration of embedded audio or video clips with captions and audio descriptions: Dolphin's EasyReader. It can play embedded multimedia in QuickTime, Windows Media and Real formats; all three of these formats can accommodate captions, and two (Real and QuickTime) can accommodate descriptions. SMIL (Synchronized Multimedia Integration Language) presentations for both QuickTime and Real can even be embedded into DTBs. The usability of embedded clips may be limited at this point, however — this player does not yet support convenient control of the embedded multimedia players for blind or keyboard-only users — but future versions of DTB players may offer improved support.

Multimedia in each of the three major formats (Real, QuickTime and Windows Media) can be embedded in a DTB through the use of the <object> element, as described in the following three techniques.

Authors embedding video clips should be sure to include the <prodnote>, or producer's note, element. <prodnote> contains language commonly used to provide verbal descriptions of visual elements (e.g., charts, graphs or videos), to supply operating instructions, or describe what the video is showing for users who cannot play the video. A <prodnote> could, for example, provide a one- or two-sentence summary of the video embedded in the DTB as well as an alternative URL for the user to access the presentation via a browser or multimedia player. The <prodnote> can be displayed visually and/or rendered in audio. See the DAISY/NISO Structure Guidelines for more information about using <prodnote>. <prodnote> is shown in the examples below.

Technique J1.3.1
Embed accessible QuickTime multimedia into DTBs

First, add captions or audio descriptions to your QuickTime presentation and embed them or create a SMIL presentation. You may even add control toggles for the captions and audio descriptions.

To embed the QuickTime clip, add <object> to the HTML source documents, as shown below.

<object classid="clsid:02bf25d5-8c17-4b23-bc80-d3488abddc6b" codebase="http://www.apple.com/qtactivex/qtplugin.cab" height="356" width="320">
<param name="src"value="mymovie.mov" />
<param name="autoplay" value="false" />
<param name="controller" value="true" />
</object>

<prodnote render="required">This movie illustrates the process of cell division. See http://ncam.wgbh.org/mymovie.qtl for a described version of this movie.</prodnote>
classid and codebase are mandatory; enter them exactly as shown above. You must also enter the height and width of the movie. QuickTime supports many different parameters for controlling the behavior and appearance of embedded movies. Three are illustrated in the sample above and are explained below.

  • src (required): defines what media to play
  • autoplay: defines whether or not the media will play upon loading. It is recommended to set autoplay to a value of false so the movie doesn't begin playing when the book is loaded.
  • controller: defines whether or not the player's controls are visible. It is recommend to set this value to true. Note that you must add 16 pixels to the height attribute to accommodate the controller.
<prodnote> is used to illustrate how to describe the embedded video for users who do not have a DTB reader that supports this type of multimedia. Including the URL in the <prodnote> instructs users how to access the movie in an alternative setting, such as a stand-alone multimedia player.

Next, follow the instructions for converting source files to a DTB. Add human or synthetic speech, and then build the project for distribution. Finally, test the book by opening it in a DTB reader that supports embedded multimedia.

Technique J1.3.2
Embed accessible Real media into DTBs

First, add captions or audio descriptions to your Real presentation and create a SMIL presentation. To embed the Real clip, add <object> to the HTML source documents, as shown below.

<!-- This is the movie.-->
<object id="RVOCX" classid="clsid:CFCDAA03-8BE4-11cf-B84B-0020AFBBCCFA" width="250" height="180">
<param name="CONSOLE" value="myConsoleName"/>
<param name="SRC" value="mymovie.smil"/>
<param name="CONTROLS" value="ImageWindow"/>

<!-- This is the controller.-->
<object id="RVOCX" classid="clsid:CFCDAA03-8BE4-11cf-B84B-0020AFBBCCFA" width="250" height="20">
<param name="console" value="myConsoleName"/>
<param name="controls" value="ControlPanel"/>
<param name="autostart" value="false"/>
</object>

<prodnote render="required">This movie illustrates the process of cell division. See http://ncam.wgbh.org/mymovie.smil for a described version of this movie.</prodnote>
classid is mandatory; enter it exactly as shown above. You must also enter the height and width of the movie. Note that the controller and the movie itself are two separate elements. RealPlayer supports many different parameters for controlling the behavior and appearance of embedded movies. Three are illustrated in the sample above and are explained below.

  • console: specifies whether multiple controls should be linked together
  • controls: embeds the RealPlayer's controls in the page
  • autostart: defines whether or not the media will play upon loading. It is recommended to set autoplay to a value of false so the movie doesn't begin playing when the book is loaded.
<prodnote> is used to illustrate how to describe the embedded video for users who do not have a DTB reader that supports this type of multimedia. Including the URL in the <prodnote> instructs users how to access the movie in an alternative setting, such as a stand-alone multimedia player.

Next, follow the instructions for converting source files to a DTB. Add human-recorded or synthetic speech, and then build the project for distribution. Finally, test the book by opening it in a DTB reader that supports embedded multimedia.

Technique J1.3.3
Embed accessible Windows Media multimedia into DTBs

First, add captions to your Windows Media presentation and create a SAMI presentation. Note that Windows Media does not accommodate an additional audio-description track. If you wish to add audio descriptions to a Windows Media presentation you must create an additional audio track and encode it with the video and program audio, or create a program audio track that also contains integrated descriptions.

To embed the Windows Media clip, add <object> to the HTML source documents, as shown below.

<object id="mediaplayer" classid="CLSID:22D6F312-B0F6-11D0-94AB-0080C74C7E95" height="300" width="320" type="application/x-oleobject">
<param name="AutoStart" value="true" />
<param name="FileName" value="mymovie.asx" />
<param name="ShowControls" value="true" />
<param name="CaptioningID" value="captext" />

<embed type="application/x-mplayer2" src="mymovie.asx" name="MediaPlayer" width="320" height="300" ShowControls="1" CaptioningID="captext" />
</object>
<br />

<!-- partition for captions -->
<div id="captext" style="width: 320px; height: 60px;"></div>

<prodnote render="required">This movie describes the process of cell division. See http://ncam.wgbh.org/mymovie.asx for a described version of this movie.</prodnote>
classid is mandatory; enter it exactly as shown above. You must also enter the height and width of the movie. Windows Media Player supports many different parameters for controlling the behavior and appearance of embedded movies. Four are illustrated in the sample above and are explained below.

  • FileName (required): defines what media to play.
  • AutoStart: defines whether or not the media will play upon loading. It is recommended to set AutoPlay to a value of false so the movie doesn't begin playing when the book is loaded.
  • ShowControls: defines whether or not the player's controls are visible. It is recommended to set this value to true.
  • CaptioningID: determines the location of the captions. In this example, they are to be placed in <div id="captext">, which has a width of 320 pixels and a height of 60 pixels.
<prodnote> is used to illustrate how to describe the embedded video for users who do not have a DTB reader that supports this type of multimedia. Including the URL in the <prodnote> instructs users how to access the movie in an alternative setting, such as a stand-alone multimedia player.

For ideal file maintenance, and to ensure that Windows Media-compatible will always play the associated SAMI file, store the video and SAMI files in separate directories and coordinate their playback using an ASX file. The ASX file contains the paths (and other optional parameters) for the video and SAMI files, as shown below.

<asx version="3.0">
<title>MyCaptionedMovie</title>
<entry>
<ref href="http://ncam.wgbh.org/movies/mymovie.asf?sami=http://ncam.wgbh.org/captions/mymovie.smi"/>
</ref>
</entry>
</asx>
Read more about other elements that can be used in an ASX file in the Windows Media Library.

Note that the video and SAMI sources are contained in a single HREF, separated only by a question mark (?) without spaces. When entering the value of the FileName parameter in <object>, be sure to point to the ASX file, not the base media.

Next, follow the instructions for converting source files to a DTB. Add human-recorded or synthetic speech, and then build the project for distribution. Finally, test the book by opening it in a DTB reader that supports embedded multimedia.