[0121] The statistical model, derived in block 520 above, and the obtained session features may be used to determine predictive values 1530 that the ad is a good ad and/or a bad ad (block 1415). The predictive values may include a probability value (e.g., derived using Eqn. (3) or (5) above) that indicate the probability of a good ad given session features associated with user selection of that ad. The predictive values may also include a probability value (Eqn. (4) above) that indicates the probability of a bad ad given measured session features associated with user selection of that ad. Therefore, session feature values may be input into Eqn. (3), (4) and/or (5) to obtain a predictive value(s) that the selected ad is good or bad. For example, values for session features x.sub.1, x.sub.2, x.sub.3 and x.sub.4 may be input into Eqn. (3) to obtain a probability value for P(good ad|session features x.sub.1, x.sub.2, x.sub.3, x.sub.4). As shown in FIG. 15, the measured session features 1520 may be input into statistical model 130 and statistical model 130 may output predictive values 1530 for the ad 1505.
[0122] Ad/query features associated with the selection of the advertisement may be obtained (block 1420). As shown in FIG. 15, the ad/query features 1535 may be obtained in association with selection 1500 of the ad 1505. The ad/query features 1535 may include an identifier associated with the advertiser of ad 1505 (e.g., a visible uniform resource locator (URL) of the advertiser), a keyword that ad 1505 targets, words in the search query issued by the user that ad 1505 did not target, and/or a word in the search query issued by the user that the advertisement did not target but which is similar to a word targeted by advertisement 1505. Other types of ad or query features, not described above, may be used consistent with principles of the invention. For example, any of the above-described ad/query features observed in combination (e.g., a pairing of two ad/query features) may be used as a single ad query/feature.
[0123] For each obtained ad/query feature (i.e., obtained in block 1420 above), the determined predictive values may be summed with stored values that correspond to the ad/query feature (block 1425). The determined predictive values may be summed with values stored in a data structure, such as, for example, data structure 1600 shown in FIG. 16. As shown in FIG. 16, data structure 1600 may include multiple ad/query features 1610-1 through 1610-N, with a “total number of ad selections” 1620, a total “good” predictive value 1630 and a total “bad” predictive value 1640 being associated with each ad/query feature 1610. Each predictive value determined in block 1405 can be summed with a current value stored in entries 1630 or 1640 that corresponds to each ad/query feature 1610 that is further associated with the advertisement and query at issue. As an example, assume that an ad for “1800flowers.com” is provided to a user in response to the search query “flowers for mother’s day.” The session features associated with the selection of the ad return a probability P(good ad|ad selection) of 0.9. Three ad/query features are associated with the ad and query: the query length (the number of terms in the query), the visible URL of the ad, and the number of words that are in the query, but not in the keyword that’s associated with the ad. For each of the three ad/query features, a corresponding “total number of ad selections” value in entry 1620 is incremented by one, and 0.9 is added to each value stored in the total good predictive value 1630 that corresponds to each of the ad/query features.
[0124] As shown in FIG. 15, each of the determined predictive values 1530 may be summed with a current value in data structure 1600. Blocks 1400 through 1425 may be selectively repeated for each selection of an ad, by one or more users, to populate data structure 1600 with numerous summed predictive values that are associated with one or more ad/query features.
EXEMPLARY ODDS ESTIMATION PROCESS
[0125] FIGS. 17 and 18 are flowcharts of an exemplary process for estimating odds of good or bad qualities associated with advertisements using the total predictive values 1630 or 1640 determined in block 1425 of FIG. 14. As one skilled in the art will appreciate, the process exemplified by FIGS. 17 and 18 can be implemented in software and stored on a computer-readable memory, such as main memory 430, ROM 440, or storage device 450 of servers 320 or 330 or client 310, as appropriate.
[0126] The estimated odds that a given advertisement is good or bad is a function of prior odds that the given advertisement was good or bad, and one or more model parameters associated with ad/query features associated with selection of the given advertisement. The model parameters may be calculated using an iterative process that attempts to solve for the parameter values that produce the best fit of the predicted odds of a good or bad advertisement to the actual historical data used for training.
[0127] The model parameters associated with each ad/query feature may consist of a single parameter, such as a multiplier on the probability or odds of a good advertisement or bad advertisement. Alternatively, each ad/query feature may have several model parameters associated with it that may affect the predicted probability of a good or bad advertisement in more complex ways.
[0128] In the following description, various odds and probabilities are used. The odds of an event occurring and the probability of an event occurring are related by the expression: probability=odds/(odds+1). For example, if the odds of an event occurring are 1/2 (i.e., the odds are “1:2″ as it is often written), the corresponding probability of the event occurring is 1/3. According to this convention, odds and probabilities may be considered interchangeable. It is convenient to express calculations in terms of odds rather than probabilities because odds may take on any non-negative value, whereas probabilities must lie between 0 and 1. However, it should be understood that the following implementation may be performed using probabilities exclusively, or using some other similar representation such as log(odds), with only minimal changes to the description below.
[0129] FIG. 17 is a flow diagram illustrating one implementation of a prediction model for generating an estimation of the odds that a given advertisement is good or bad based on ad/query features associated with selection of the advertisement. In accordance with one implementation of the principles of the invention, the odds of a good or bad ad may be calculated by multiplying the prior odds (q.sub.0) of a good ad or bad ad by a model parameter (m.sub.i) associated with each ad/query feature (k.sub.i), henceforth referred to as an odds multiplier. Such a solution may be expressed as: q=q.sub.0m.sub.1m.sub.2m.sub.3 . . . m.sub.m.
[0130] In essence, the odds multiplier m for each ad/query feature k may be a statistical representation of the predictive power of this ad/query feature in determining whether or not an advertisement is good or bad.
[0131] In one implementation consistent with principles of the invention, the model parameters described above may be continually modified to reflect the relative influence of each ad/query feature k on the estimated odds that an advertisement is good or bad. Such a modification may be performed by comparing the average predicted odds that advertisements with this query/ad feature are good or bad, disregarding the given ad/query feature, to an estimate of the historical quality of advertisements with this ad/query feature. In this manner, the relative value of the analyzed ad/query feature k may be identified and refined.
[0132] Turning specifically to FIG. 17, for each selected ad/query feature (k.sub.i), an average self-excluding probability (Pi) may be initially calculated or identified (act 1700). In one implementation, the self-excluding probability (Pi) is a value representative of the relevance of the selected ad/query feature and may measure the resulting odds that an advertisement is good or bad when the selected ad/query feature’s model parameter (m.sub.i) is removed from the estimated odds calculation. For ad/query feature 3, for example, this may be expressed as: P.sub.3n+((q.sub.0m.sub.1m.sub.2m.sub.3 . . . m.sub.n)/m.sub.3/(((q.sub.0m.sub.1m.sub.2m.sub.3 . . . m.sub.n)/m.sub.3+1)
[0133] In one embodiment, the self-excluding probability for each ad/query feature may be maintained as a moving average, to ensure that the identified self-excluding probability converges more quickly following identification of a model parameter for each selected ad/query feature. Such a moving average may be expressed as: P.sub.in(avg)=.alpha.P.sub.i(n-1)(avg)+(1-.alpha.)P.sub.in, where .alpha. is a statistically defined variable very close to 1 (e.g., 0.999) used to control the half-life of the moving average. As shown in the above expression, the value of Pi for the current number of ad selections (n) (e.g., a current value for “total number of ad selections” 1620 for ad/query feature k.sub.i) is weighted and averaged by the value of P.sub.i as determined at the previous ad selection (e.g., n-1).
[0134] Next, the average self-excluding probability (P.sub.i(avg)), may be compared to historical information relating to the number of advertisement selections observed and the odds of a good or bad advertisement observed for the observed selections (act 1710). The model parameter m.sub.i associated with the selected ad/query feature k.sub.i may then be generated or modified based on the comparison of act 1710 (act 1720) (as further described below with respect to blocks 1820 and 1830 of FIG. 18).