The research issues are split into five broad areas:
There are two main data delivery issues: storage and transmission. How can we address the problem of huge storage requirements of the MPEG-I encoded video data accumulated through daily news broadcasts? An hour of video takes up about 600MB of disk space. To populate the news on demand library we need to investigate how much data is necessary for the index to be kept current and when can the data be "forgotten". The data can be degraded to lower quality video at fewer frames per second, and lower resolution. We can also eliminate the video entirely and save only the audio portion. Finally we can retain only the text transcript without audio or video.
The second data delivery issue concerns the transmission of the video news story to a remote user. Essentially, we need to provide fast enough networks to allow MPEG-I bit rates to be transmitted continuously, and servers that can keep up with this demand for several users.
The user interface issues deal with the way users explore the library once it is available. Can the user intuitively navigate the space of features and options provided in the Informedia:News-on-Demand interface? What other features should the system provide to allow users to obtain the information they are looking for? Our plan is to move the system to a testbed deployment and gain insights from users as well as iterate on various interface design alternatives.
Natural language processing research for News-on-Demand has to provide acceptable segmentation of the news broadcasts into stories. We also want to generate more meaningful short summaries of the news stories in natural sounding English. Natural language processing also has a role in query matching for optimal retrieval from the story texts. Finally, the system would greatly improve if queries could be parsed to separate out dates, major concepts and types of news sources.
Image processing research  is continuing to refine the scene segmentation (the identification of cuts in the video). Within a scene and within a story, image processing gives us the key frame to represent that scene or story. The choice of a single key frame to best represent a whole scene is a subject of active research. In the longer term, we plan to add text detection and optical character recognition (OCR) capabilities for reading captions and text off the screen background. In the future, we also hope to include similarity-based image matching in the retrieval features available to a user.
Speech recognition helps create a time-aligned transcript of the spoken words. When closed-captioning is available, speech recognition is used in conjunction with the closed-captioning to improve the time-alignment of the transcripts. For news broadcasts that are not close-captioned, we need a transcript generated exclusively by the speech recognition system. The vocabulary and language model used here approximate a ``general English news'' language model. It was based on a large corpus of North American business news from 1987 to 1994 .
During library exploration, speech recognition allows a user to query the system by voice, simplifying the interface by making the interaction more direct. The integration of speech with the interface enhances access to the stored video data by allowing more immediate and direct entry of queries.