Free Trial

Safari Books Online is a digital library providing on-demand subscription access to thousands of learning resources.

Share this Page URL
Help

Chapter 5: Argo > 2. LEARNING EFFICIENT TOPIC SPACE - Pg. 71

Argo However, the differences of our method to (Evans, & Fernandez, M., et al., 2006) lie in that: 1) we only keep a few numbers of category nodes cor- responding to the largest weights rather than the whole weight vector to represent a user interest. This intuitively will reduce noise. 2) Evans et al. (2006) used the domain-specific ontology of mul- timedia with the goal of personalized multimedia search, while we use a general ontology to tackle the problem of targeted advertising. Adopting a general ontology is important not only because ads themselves cover a large variety of concepts, but also because the concept space covered by user-generated photos is very rich and is large enough to describe the natural concepts of the webpage content (Li, Guo, & Zhao, 2008). 3) We use different method to represent a category node. We adopt the ontology of the Open Directory Proj- ect (ODP, http://dmoz.org/), and represent each node as a term distribution which is learnt from the web pages assigned to this node by human experts. Each category node is therefore called a "topic". In short, a user interest in our approach is defined as a topic distribution, while each topic is again a term distribution which represents certain semantics or concepts. We will detail this step in Section 3.2.2. The entire approach of user interest learning and ads suggestion is conducted as follows: In an offline step, we learn a hierarchical topic space which supports real-time matching of textual queries. Then in the online stage, given a group of images (in one webpage), firstly we adopt a data- driven image annotation approach to automatically annotating each image. Secondly, we combine the generated annotations with user-submitted tags (if they are available) and use the combination as a query to match the hierarchical topic space. The output is a topic distribution to represent a user interest. Thirdly, a ranking model is applied to rank ads by their relevance to the user interest, and the top-ranked ads are returned as the sugges- tions. Figure 3 illustrates the process. Note that we focus only on ads relevance while do not discuss the bidding problem in this research. Relevance is very important because current online advertising revenue is mainly based on user behaviors (Pay-Per-Click or Pay-Per-Action). Thus, the key to attract a user's click is to suggest ads which are relevant to either the user's infor- mation need or interest (Broder & Fontoura, et al., 2007; Chen, Xue, & Yu, 2008). The remainder of this chapter is organized as follows. In Section 2 we detail the offline topic space learning method. In Section 3 the online advertising approach is described, which contains the entire process of image understanding (Section 3.1), user interest modeling (Section 3.2) and ads ranking (Section 3.3). We evaluate the proposed approach in Section 4 and draw the conclusion in Section 5. 2. LEARNING EFFICIENT TOPIC SPACE Previous work has proven the effectiveness of a hierarchical ontology in learning user interest. Some researchers learnt ontology from users' browsing history, click-through data, or book- marks, etc. (Kim & Chan, 2003; Grcar, Mladenic, & Grobelnik, 2005; Zhou, & Wu, S.-T., et al., 2006). Broder et al. leveraged the commercially built taxonomy by Yahoo!US (2007). And many other researchers adopted the publicly available ontology provided by ODP (Ma, Pant, & Sheng, 2007; Trajkova & Gauch, 2004; Chen, Xue, & Yu, 2008). ODP is a manually edited directory. Currently it contains 4.6 million URLs that have been cat- egorized into 787,774 categories by 68,983 human editors. A desirable feature is that for each category node of ODP, there is a large amount of manually chosen webpages that are freely available to be used for either learning a topic or categorizing a document at the query time. Therefore we build our hierarchical topic space based on the ODP tree 71