K3: Searching TV Programs Using Teletext Subtitles

Background and Related Work

Citation
, XML
Authors

Abstract

Finding information has always been very difficult. In recent years, World Wide Web search engines have rapidly become a primary access point for electronically stored information. These systems seek, download, and index web pages on a massive scale. Typically they permit a full-text search on the contents, or most partial contents, of web pages and some files they point to.

The Subtitles Search System differs from web search engines on the data format they retrieve. It concerns video document retrieve instead of full-text. However, the technologies applied on web search engines are also effective for retrieving and browsing multimedia information. The search engine of this system mimics web search engines.

2. Background and Related Work

2.1 Web search engines

Finding information has always been very difficult. In recent years, World Wide Web search engines have rapidly become a primary access point for electronically stored information. These systems seek, download, and index web pages on a massive scale. Typically they permit a full-text search on the contents, or most partial contents, of web pages and some files they point to.

The Subtitles Search System differs from web search engines on the data format they retrieve. It concerns video document retrieve instead of full-text. However, the technologies applied on web search engines are also effective for retrieving and browsing multimedia information. The search engine of this system mimics web search engines.

Table 1 shows some features of famous search engines on the Internet.

Search Engines

Boolean

Default

Proximity

Truncation

Case

Fields

Stop

Sorting

Google

 -, OR

and

Phrase

No

No

title, URL, more

Yes, + searches

Relevance, site

All the Web

+, -, or with ( )

and

Phrase

No

No

title, URL, link, more

No

Relevance, site

Lycos

+, –

and

Phrase

No

No

title, URL, link, more

No

Relevance

MSN Search

and, or, not, ( ), +, –

and

Phrase

No

Yes

title, link

Yes

Relevance

Northern Light

and, or, not, ( ), +, –

and

Phrase

Yes * %,
auto plurals

No

title,URL, more

No

Relevance, site, date, folders

iWon

AND, OR, NOT, ( ), +, –

and

Phrase

Yes * ?

Yes

title, link, domain

Yes

Relevance, site

AltaVista Simple

+, -, AND, OR, AND NOT, ( )

OR

Phrase, NEAR

Yes *

Yes

title, URL, link, more

No

Relevance, site

AltaVista Adv.

and, or, and not, ( )

phrase

Phrase, near

Yes *

Yes

title, URL, link, more

No

Relevance, if used

HotBot

and, or, not, ( ), +, –

and

Phrase

Yes *

Yes

title, more

Yes

Relevance, site

NBCi

AND, OR, NOT, ( ), +, –

and

Phrase

Yes *

Yes

title, more

Yes

Relevance

Excite

AND, OR, NOT, ( ), +, –

or

Phrase

No

No

No

Yes

Relevance, site

Table 1. Search engine features chart [ENG]

It lists Boolean operations they support, and if multiple terms are entered, the default operation they are processed as. Phrase searching can be designated by double quotes around a search phrase.

“Truncation” refers to the ability to search just a portion of a word, i.e. wildcard enable. Typically, a symbol such as the asterisk is used to represent the rest of the term. End truncation is where several letters at the beginning of a word are specified but the ending can vary. With internal truncation, a symbol can represent one or more characters with a word. For instance, college* finds college, colleges, collegiums, collegial, and col*r finds color, colour, colander.

“Case” means whether the engine is case sensitive or not. In general, most search engines will match upper case, lower case, and mixed case as all the same term. Some search engines have the capability to match exact case. Entering a search tem in lower case will usually find all cases. In a case sensitive engine, entering any upper case letter in a query term will invoke the exact case match, i.e. exceed finds exceed, eXceed, Exceed, while eXceed finds only eXceed.

“Fields” relates web indexes of each search engine. Early web indexes provided three separate indexes to html pages: titles, headings, and the most frequent words in the text. Fields searching allows the user to designate where a specific search term will appear. Rather than searching for words anywhere on a web page, fields define specific structural units of a document. The title, the URL, an image tag, or a hyperlink are common fields on html pages. The indexing of Subtitles Search System is a full-content index search engine, where every “meaningful” words appear in subtitles will be indexed.

“Stop words” are frequently occurring words that are not searchable. Some search engines include common words such as ‘the’ and others have may include numbers or frequent html strings. The stop list of Subtitles Search System is in Appendix B.

“Sorting” is the ability to organise the results of a search. Usually, Internet search engines sort the results by relevance, determined by their proprietary relevance ranking algorithms. Other options are to arrange the results by date, alphabetically by title, or by root URL or host name. The Subtitles Search System uses relevance ranking.

2.2 Search video documents

2.2.1 Locate clips on DVD

Most DVDs are “chapterized”, the whole movie is broken down into several tracks, to enable skip around in the movie and locate to a desired section. However, only movies have widely applied this technology, other video data such as sports, news didn’t use this technology and they are usually viewed as a whole.

Chapterization is especially useful in training, advertising and marketing in business. If a customer needs some information on a specific function of a product, (s)he can go to the demo video (whether tape or web streaming media), skip directly to the part (s)he need, and get the information. If (s)he has to pull the information from video by locating the video manually using rewinding and fast forwarding to find the section. It would spend lots of time.

Being able to search through video can be very powerful. However, key word searching relying on speech recognition software such as Cambridge’s Multimedia Document Retrieval [MDR] is difficult.

2.2.2 Searchable Video

The next step beyond chapterized video is searchable video. User can search where a specific word was said during the presentation. It may achieved by turning the spoken word into searchable text via speech recognition and hence makes video as easy to search as web pages through a search engine. Unfortunately this technology suffers a high error rate.

These speech recognition software become even less valuable when the video has a wide variety of accents and poorly recorded sound. This type of speech simply can’t be recognised by the speech recognition programs.

Even with a high quality video source, only around 50% right could be got via a speech recognition software. It has big problems in proper nouns (people and places), however these are usually the range of words that users would like to search against.

True searchable video completely relied on speech recognition still remains a technology that cannot be put into use.

The best searches happen when the program had closed captioning and the search engine could be based on the text directly from the Vertical Blanking Interval (VBI). VBI is extra lines in the TV signal that are not devoted to the picture. In recent years they have been dedicated to closed captioning text. In Europe the VBI is dedicated to teletext channels that display text-only pages on TV.

2.3 Streaming media on the Internet – From downloading to streaming

For a long time and until recently, the only way to access audio and video files on the web was to download them temporarily to the local disk, a process which could take some time depending on the size of the file. Of course, this process is handled automatically by the browser: it runs a media player installed on the machine according to the file type and plays back the file once it has been received in its entirety.

Today the web delivery of multimedia files evolves to the point that audio and video can be played immediately, not to download first and then play. This kind of real-time delivery (“streaming”) of audio has already been a reality, even over a modem connection. A number of radio stations are now “web-casting” their programs. Many special events, conference presentations and training are being delivered “live” over the Internet in streaming format.

Video streaming has not yet reached the same level of performance. Real-time video provides a small playback window and delivery tends to be choppy and erratic since the enormous volume of visual information of a reliable streaming video needed to transmit and synchronise over the congested Internet. It is a tremendous technical challenge. As compression technologies develop and network bandwidth increases, video broadcasting on the Internet should steadily improve in quality and performance.

2.4 Versatile Media Players

The players (or plug-in controls) allow the user to play, stop and pause video files, and in most cases also fast forward or rewind. With streaming formats it is possible to have usable multimedia files which are quite long in length–some on the web are an hour or longer–and still have it possible for the user to navigate through the file as desired. For downloaded audio files (in au, aiff, or wav formats), that kind of length would not be practical since it would take too long to download over the Internet and would also require too much temporary storage space on the local PC. Streaming formats are compressed much more than other sound or video formats and thus are easier to receive and to store.

There are versatile streaming media players available for developers to playback video on the Internet.

2.4.1 Java Media Framework API

“Developers can use the Java Media Framework (JMF) API to implement a tailored media player that receives and plays multimedia data from sources stored locally or on the network. It allows for cross-platform rendering, control, and synchronisation of all major media types and file formats independent of the network protocol. In addition, a Java Media Framework API player can be created directly from a URL thereby providing an easy way to embed multimedia in Java technology-based applets and applications.” (see 4.7.1 for details)

Figure 1. Java media player [JMF]

2.4.2 Windows Media API

2.4.2.1 Windows Media Player

Microsoft Windows Media Player [WMP] delivers a complete all-in-one player that is easier to use. Windows Media Player comprises a lot of features in a single application: CD player, audio and video player, media guide, Internet radio, portable device music file transfer etc.

Most of all, it’s very easy for a developer to use and enable customerization via Interactive Skins. It lets user personalise the look and features of Windows Media Player by changing the user interface. User can also extend both the look and the features of the Player using standard extensible markup language (XML) and JScript.

Using Windows media player, a developer does not need to go deep to event controller level to tailor the player like in Java media player. That’s the reason that it is chosen in the Subtitles Search System after comparing it with JMF.

2.4.2.2 Windows Media On-Demand Producer

Microsoft produces another tool called Windows Media On-Demand Producer that is used exclusively for on-demand production and includes no live encoding capabilities. In addition to encoding, it also allows the producer to chapterize video so that the users can jump around like they would on a DVD.

It can also works with a video capture card, if provided on the running computer, to capture video directly into the program or it can load in source WAV files or AVI video files. The scripting the producer creates, including the commands, can be imported and exported between files making it easy to reuse.

2.5 Related work

2.5.1 Text documents indexing – semantic database

Word matching suffers from two problems: synonymy (many words with similar meanings) and homonym (one word has dissimilar meanings). Disambiguation is used to solve homonym by indexing word senses rather than words [WDNT], while synonymy can be addressed by thesaurus-based query expansion

George A Miller (1990) developed WordNet, which offers the possibility to discriminate word senses in documents and queries. This would prevent homonym, i.e. matching spring in its “metal device” meaning with documents mentioning spring in the sense of “spring time”. Then retrieval accuracy could be improved. WordNet also provides the chance of distinct synonymy, matching semantically related words. For example, spring, fountain, outflow, outpouring, in the appropriate senses, can be identified as occurrences of the same concept, “natural flow of ground water”.

Besides synonymy, it can be used to measure semantic distance between occurring terms to get more sophisticated ways of comparing documents and queries. This technology makes identifying commercials possible (see “Remove commercials” in chapter 5).

2.5.2 Multimedia data modelling

In the “three phase multimedia document retrieval” [3PR] project for digital library at Seoul, each document instance containing text or multimedia data is modelled as multimedia enrich databases, using object-oriented relational data model, as illustrated in Figure 2.

The data in multimedia document databases is assumed in the pair <data_type, data>. The data type of a data can be integer, text, audio, image, video, etc. Each tuple can represent a composite data type, which represents another tuple, and complex data type, which represents more than one data of the same type.

For instance, an article can be defined as in Figure 2. The section attribute of article refers to one or more sections, each of which in turn contains title and bodies. Again, the body attribute of section refers to one or more figures and/or paragraphs. The section attribute is a composite object type that its value is another tuple or a set of tuples.

 
create type multimedia_t (
    type    text,      
    id      text, 
    naming  text,      
    file    LO,
objects setof(multimedia_t))
 
create type bodies_t ( 
    figure  setof(LO), 
    para    setof(doc), 
    film    setof(multimedia_t),
audio   setof(multimedia_t))
 
create type sections_t (
    title  text not null,
    body   bodies_t)
subsect setof(sections_t))
 
create table article (
title  text not null,
    author setof(text),
afil   text,
abst   doc,
    keyword text,
section setof(sections_t),
ack    text,
bib    setof(bibliograph_t),
appendix setof(appendices_t))
 
Note: LO means Large Object type possibly being used for multimedia data

Figure 2. Object relational data model of 3PR

Multimedia data was represented as being meta-attributes, logical-attributes, and semantic attributes. Meta-attributes are of information externally represented without referring to and internal contents. Logical attributes are typical database attributes which represent internal contents of multimedia data. Semantic attributes are user annotations about the multimedia data, therefore they are not meta or logical attributes.

Therefore queries posed to multimedia document databases should be expressed in different ways to be processed to return their results. That is, the select clause of multimedia queries is not only text-formatted but also image-, video-, audio- based. A condition (in the where clause) specifies not only for logical comparisons but also for semantic comparisons. A possible SQL-like query to be presented in multimedia document databases likes

SELECT *

FROM article d

WHERE d.section.body.film.type = ‘Image’

AND d.section.body.film.object.naming = ‘mountain’

AND d.season = ‘Fall’

This design is for digital library where the media data is combined with other format data. Since a number of television programmes are captioned, a text transcript of the audio component of the programme can provide a significant amount of information. The conventional text retrieval techniques are competent for the content-based multimedia retrieval system – Subtitles Search System.

2.5.3 Previous audio/video retrieval systems

Recent years have seen a rapid increase of multimedia applications. Most of them collect text contents using speech recognition, Optical Character Recognition (OCR), or even image understanding.

The Video Mail Retrieval [VMR] project at Cambridge University developed a system to retrieve stored video material using the spoken audio soundtrack. The Spoken Document Retrieval (SDR) System [SUKT] at Imperial College of Science implemented a broadcast news retrieval system using speech recognition. Specially, this SSS project focuses on the content-based location, retrieval, and playback of potentially relevant video data.

Previous work on the VMR project and SDR demonstrated practical retrieval of audio messages using speech recognition for content identification. Both of them collected television news broadcasts (along with accompanying subtitle transcriptions) as test data source. The enormous potential size of the news broadcast archive dramatically shows the need for ways of automatically finding and retrieving multimedia information. Quantitative experiments demonstrate that Information Retrieval (IR) methods developed for searching full text archives can accurately retrieve multimedia data, given suitable subtitle transcriptions. In addition, the same techniques could be used to locate interesting stories within an individual news broadcast [SUKT].

2.5.3.1 Segmentation

Those multimedia retrieval projects searched radio/television news or video mail. Retrieval performance was measured by success in retrieving full stories. This presumably has some bearing on performance for the segmentation that it would involve.

In this project, the precise evaluation is measured only by success in retrieving full sentences, which will not be influenced by segmentation performance.

2.6 Language and tools used in Subtitles Search System

Java servlet/bean/JSP – JDBC – ODBC – MS Access on Tomcat 3.2.3 (on Windows 2000) is used as the tools of this system.

The use of Servlets allowed convenient management of events and program flow, but cumbersome ability to generate responses. Conversely, the use of JSP provided excellent definition of response pages but gives maintenance concerns with scriptlet code with easy yet powerful tags.

Figure 3. Languages/tools of Subtitles Search System

Subtitles Search System attempts to unite the three technologies (servlets, beans and jsp) by blending the merits of all three approaches. This approach solves most of the integration problems.

In the implementation phase, some individual java programs/applets were developed. They are modified to servlets or beans in later integration phase to work with other sections smoothly using web page events, instead of typing java commands and error-prone long paths on command line.

The hybrid system is mapped to MVC (Model, View and Controller) design paradigm as

Figure 4. MVC model of a hybrid java-based web-oriented system [JSP]

For example, when user clicks search button on search.html, it triggers stem.jsp, which then transfers the keywords user inputted to a bean named Stemmer.class (it processes the input word like change to lower case, remove punctuation and suffix). Stem.jsp then getProperty from the bean and redirect to a servlet URL named SearchEngineServ.class which does the search things and writes result as some links on html. These links lead to a Playback servlet which can play a program clips starting from designed start position.

In the Subtitles Search System, there is no link from servlet to bean, or from servlet to jsp, as shown in Figure 4.