A video document retrieval system automatically indexes from subtitles and then retrieves relevant clips from a large collection of video recordings in response to a user query.
The Subtitles Search System deals with the problem of finding all the relevant documents in a video collection of TV programs for a given user’s query.
The purpose of a video document retrieval system is to automatically index and then retrieve relevant items from a large collection of video recordings in response to a user query. The video documents will be TV programme with accompanying subtitles.
Potential users of the system are likely to watch video on their computer instead of on their TV, examine individual clips within program store rather than entire programs. This project is a step towards Video On Demand (VOD) and interactive TV. Customers can watch video whenever and whatever they want. However VOD and iTV are still a long way too go.
Though devices like the Tivo can time-shift TV and record it, they can only hold 30 hours. The web, especially with a broadband connection, can hold unlimited amounts of video.
To retrieve desired information from such huge video repositories, a search engine is demanded. Some cursive video search web sites are based on videos’ description. Either a trailer of a relevant film or a complete version is playbacked as one result. User cannot skip around in the movie and find the scene they are interested.
Since many television programmes (more than 70%), movies on DVD and pre-recorded video tapes have specially coded closed caption subtitles, the feature that interests deaf and hard of hearing users, makes feasible to search by keywords in the subtitles without any requirements on voice or image recognition.
On television programs, subtitles are provided on teletext page 888. It is available on ITV, Channel 4, Channel 5 and BBC1 and 2. Both programs and their accompanying subtitles can be captured by a TV card installed on a workstation. It’s possible to use currently available devices to get video source and subtitles as well and develop a web-based search engine for video documents, enabling users to access specific program clips in response to the keywords they input.
The Subtitles Search System will be expected to meet the following objectives.
A main element of work of this project is the index construction task. It involves a series of pre-processing on subtitles before writing keywords into database. It should be case and inflection insensible for the keywords. The change of word’s form (mostly suffix) for distinctions as its number, tense, person, and mood must not be distinctive in searching.
In addition to recording the root of every ‘sensible’ word to database, the size of the index should be made as small as possible, to speed up searching and save disk space in the condition of not impairing search precision.
To playback a related video clip corresponding to user’s query, a video file need to be segmented. Conventional techniques to segment written text is “chapterization”, which can not be applied to segment subtitles. Instead non-lexical information extracted from the start time and end time of dialogues gives an indication of the nature segmentation of a block.
To locate information in the video in response to user’s query and rank matching results.
In order to increases the accessibility of the Subtitles Search System to as many users as possible, it will be accessed through the Internet using a web browser. In addition to good usability of the web interface the playback of selected programme clips must be as fast as possible, therefore streaming media technology will be used when user selects one link on the result page, starting from the start time of the block where the searching keyword(s) appears.
To facilitate TV programs’ management, web interface for administrator of the system is also provided. Administrator can add programs via the web page.