Introduction and background

The use of video is now an established method for delivering information and entertainment on the Web, and its inclusion in Web pages and other electronic materials has been greatly simplified thanks to the introduction of new technology and markup, such as HTML5's video element. Accessible video, while not exactly mainstream, is slowly becoming more available thanks to innovative approaches to delivering captions and video descriptions as well as new regulations and legislation, such as the Twenty-First Century Communications and Video Accessibility Act.

When it comes to making video accessible to blind and visually impaired audiences, the use of video descriptions (also known as audio descriptions) has been the traditional route. New approaches to creating and delivering video descriptions are now in development, such as those which replace human narration with machine-generated, text-to-speech (TTS) narration. For examples of how TTS descriptions can be integrated into Web-based video, see the solutions from IBM-Research Tokyo and the Carl and Ruth Shapiro Family National Center for Accessible Media (NCAM) at WGBH that illustrate effective methods for using HTML5 and Javascript to deliver text-based audio descriptions.

These days, some authors are experimenting with not just passively playing video on a Web page, but instead are launching supplemental materials, or enhancements, on the page in real time while the video continues to play: for example, while a video of a biology lecture about Eastern Cricket Frogs plays on a page, pop-up windows appear at appropriate intervals showing maps of the frog's habitat, or Wikipedia entries about Eastern Cricket Frogs, or pictures of predators. While this can be a useful and exciting way to give students supplemental information in real-time, users who are blind or visually impaired may be completely unaware that these enhancements are opening on the page. These users must not only be alerted to the presence of these on-screen resources, but the resources themselves must be accessible to assistive technology.

Now, with funding provided by the Reader's Digest Partners for Sight Foundation, NCAM has combined TTS technology with advances in markup languages and applied them in new and innovative ways to deliver accessible supplemental information that helps make science- or math-focused video materials more accessible to blind and visually impaired users. This new project aims to demonstrate the inclusion of enhancements in ways that are both visual and non-visual, all of which are screen-reader accessible and delivered using HTML5, Javascript and the Popcorn.js HTML5 Media Framework. Read on to learn about the demonstration model that is now available for you to test.

Demonstrating TTS Enhancements with Video

Before playing the demonstration video, please read the system requirements to learn about which screen readers and browsers can be used to take full advantage of the enhancements.

The project's first prototype model, the Human Genome Project, demonstrates how supplemental materials can be launched along the timeline of a four-minute video. The TTS enhancements are listed below.

You will note that with the exception of the Wikipedia entries, all of the enhancements will automatically cause the player to pause and then resume playing after a pre-determined interval. The enhancements may be turned on and off collectively or individually, and all of the controls for the player and the enhancements are keyboard and screen-reader accessible. See below for more information on controlling the presentation and the player. Note that if you find the pre-determined pause length to be too short or too long, you can increase or decrease the length of the description pause by using the increase/decrease buttons on the player.

While the glossary terms, Wikipedia entries and images are visible as well as accessible to screen readers, the image descriptions and video descriptions are hidden and are available only to screen readers. If you are not using a screen reader, you will not hear these descriptions. The speed, voice and pitch at which all of the enhancements are delivered will depend on your screen reader's settings. Be aware that if your screen reader is set to read slowly, some enhancements may overlap with, and thus obscure, the video's audio track.

The player itself can be controlled with a mouse or from the keyboard using the buttons located below each player. Additionally, there are checkboxes located below the player that can be used to select which enhancements you want to see or hear. Keyboard shortcuts, also known as access keys, have been assigned to each player function. These shortcuts are listed below.

Keyboard shortcuts for controlling the player

Keyboard shortcuts for turning the enhancements on or off (default is on)

You will need to press certain modifier keys along with the numbers or letters above in order to activate the shortcuts. Which modifiers you press will depend on the browser you're using. A list of browsers and their modifier keys is shown below. (Note: some screen readers will announce the modifier keys for you.)

Keyboard-shortcut modifier keys:

Please see special notes and troubleshooting if you are having problems hearing the enhancements, seeing the video or controlling the player.

System Requirements

To run the demonstration model and access the TTS enhancements, you will need the following:

On the Mac, the demonstration model will only work with VoiceOver and Safari. It will not work with Firefox, Chrome or Opera. On Windows, you must use either JAWS or NVDA screen readers. Currently, Window-Eyes does not provide support for text displayed in live regions and so cannot be used to read the TTS descriptions.


Screen-reader problems

Can't hear the enhancements with a screen reader; screen reader behaves oddly

  1. First, review the system requirements. Using a Mac? You must be running OS 10.7/Lion, not 10.6/Snow Leopard, in order to hear the TTS descriptions. Note that if you are not using a screen reader, you will not be able to hear the TTS enhancements.
  2. If you find that VoiceOver always reads the TTS descriptions twice, shut down VoiceOver and restart it. (You should not need to close and re-open your browser.)
  3. If your screen reader is not reading the TTS descriptions, reloading the page and shutting down/restarting the screen reader usually causes the descriptions to be read aloud.

Also remember that the speed at which the enhancements are delivered depends entirely on your screen-reader settings. If you find that the descriptions are competing with or obscuring the program audio, you can increase or decrease the length of the description pause by using the increase/decrease buttons on the player. You can also simply increase your screen reader's reading speed.

Browser problems

Can't see video or hear audio

  1. The videos are supplied in MP4 and Ogg/Theora formats using the HTML5 video element. ARIA markup is used to help screen readers locate and read the TTS enhancements. For best results, use the most current versions of the browsers and screen readers listed in the system requirements.
  2. Firefox may occasionally display erratic video, video and program audio that are somewhat out of sync, or stuttering audio. If this happens, pause or stop the video, wait a few seconds and then resume playback.

VoiceOver isn't passing the keyboard-shortcut commands for the players

VoiceOver will take control of the keyboard-shortcut combinations by default. To use VoiceOver with the keyboard shortcuts, first press Control+Option+Tab to activate the pass-through command, then press the keyboard-shortcut commands you want to use.

Download the source code

Download a zip file of the entire demonstration (26MB) to see how it works and to try it yourself.

Your Comments

NCAM is interested in hearing your comments about this demonstration model. Please let us know if you have suggestions for improvements or have problems accessing the TTS enhancements.