[0110] Returning to FIG. 5, a statistical model may then be derived that determines the probability that each selected ad is a good quality ad given the measured session features associated with the ad selection (block 520). An existing statistical technique, such as, for example, logistic regression may be used to derive the statistical model consistent with principles of the invention. Regression involves finding a function that relates an outcome variable (dependent variable y) to one or more predictors (independent variables x.sub.1, x.sub.2, etc.). Simple linear regression assumes a function of the form: y=c.sub.0+c.sub.1*x.sub.1+c.sub.2*x.sub.2 +. . . Eqn. (1) and finds the values of c.sub.0, c.sub.1, c.sub.2, etc. (c.sub.0 is called the “intercept” or “constant term”). In the context of the present invention, each predictor variable x.sub.1, x.sub.2, x.sub.3, etc. corresponds to a different session feature measured during ad selection. Logistic regression is a variation of ordinary regression, useful when the observed outcome is restricted to two values, which usually represent the occurrence or non-occurrence of some outcome event, (usually coded as 1 or 0, respectively), such as a good advertisement or a bad advertisement in the context of the present invention.
[0111] Logistic regression produces a formula that predicts the probability of the occurrence as a function of the independent predictor variables. Logistic regression fits a special s-shaped curve by taking the linear regression (Eqn. (1) above), which could produce any y-value between minus infinity and plus infinity, and transforming it with the function: P=exp(y)/(1+exp(y)) Eqn. (2) which produces P-values between 0 (as y approaches minus infinity) and 1 (as y approaches plus infinity). Substituting Eqn. (1) into Eqn. (2), the probability of a good advertisement, thus, becomes the following: P .function. ( good .times. .times. ad | ad .times. .times. selection ) = f g .function. ( session .times. .times. features .times. .times. x 1 , x 2 , x 3 .times. ) = e ( c g .times. .times. 0 + c g .times. .times. 1 * x 1 + c g .times. .times. 2 * x 2 + ) 1 + e ( c g .times. .times. 0 + c g .times. .times. 1 * x 1 + c g .times. .times. 2 * x 2 + ) Eqn . .times. ( 3 ) where c.sub.g0 is the constant of the equation, and c.sub.gn is the coefficient of the session feature predictor variable x.sub.n. The probability of a bad advertisement may, similarly, be determined by the following: P .function. ( bad .times. .times. ad | ad .times. .times. selection ) = f b .function. ( session .times. .times. features .times. .times. x 1 , x 2 , x 3 .times. ) = e ( c b .times. .times. 0 + c b .times. .times. 1 * x 1 + c b .times. .times. 2 * x 2 + ) 1 + e ( c b .times. .times. 0 + c b .times. .times. 1 * x 1 + c b .times. .times. 2 * x 2 + ) Eqn . .times. ( 4 ) where c.sub.b0 is the constant of the equation, and c.sub.bn is the coefficient of the session feature predictor variables x.sub.n.
[0112] A fit of the statistical model may be tested to determine which session features are correlated with good or bad quality advertisements. If a logistic regression technique is used to determine the statistical model, the goal of logistic regression is to correctly predict the outcome for individual cases using the most parsimonious model. To accomplish this goal, a model is created that includes all predictor variables (e.g., session features) that are useful in predicting the outcome of the dependent y variable. To construct the statistical model, logistic regression can test the fit of the model after each coefficient (c.sub.n) is added or deleted, called stepwise regression. For example, backward stepwise regression may be used, where model construction begins with a full or saturated model and predictor variables, and their coefficients, are eliminated from the model in an iterative process. The fit of the model is tested after the elimination of each variable to ensure that the model still adequately fits the data. When no more predictor variables can be eliminated from the model, the model construction has been completed. The predictor variables that are left in the model, each corresponding to a measured session feature, identify the session features that are correlated with good or bad advertisements. Logistic regression, thus, can provide knowledge of the relationships and strengths among the different predictor variables. The process by which coefficients, and their corresponding predictor variables, are tested for significance for inclusion or elimination from the model may involve several different known techniques. Such techniques may include the Wald test, the Likelihood-Ratio test, or the Hosmer-Lemshow Goodness of Fit test. These coefficient testing techniques are known in the art and are not further described here. In other implementations, existing techniques for cross validation and independent training may be used instead of techniques of classical estimation and testing of regression coefficients, as described above.
[0113] Other existing statistical techniques, instead of, or in addition to logistic regression, may be used to derive a statistical model consistent with principles of the invention. For example, a “stumps” model, using “boosting” techniques may be used to derive the statistical model. As one skilled in the art will recognize, “boosting” is a machine learning technique for building a statistical model by successively improving an otherwise weak statistical model. The basic idea is to repeatedly apply the same algorithm to an entire training data set, but differentially weight the training data at each stage. The weights are such that cases that are well-fit by the model through stage k receive relatively small weights at stage k+1, while cases that are ill-fit by the model through stage k receive relatively large weights at stage k+1.
[0114] Stumps are a weak statistical model that can be applied at each stage. A stump is a 2-leaf classification tree consisting of a root node and a binary rule that splits the cases into two mutually exclusive subsets (i.e., the leaf nodes). A rule could take the form “ClickDuration<120sec” and all cases with ClickDuration satisfying the rule go into one leaf node and those not satisfying the rule go into the other leaf node. Another rule could take the form “AdSelection was the last ad selection” and all cases with AdSelection satisfying the rule go into one leaf node and those not satisfying the rule go into the other leaf node.
[0115] Various algorithms can be used to fit the “boosted stump” model including, for example, gradient-based methods. Such algorithms may proceed as follows: given a set of weights, among all possible binary decision rules derived from session features that partition the cases into two leaves, choose that one which minimizes the (weighted) loss function associated with the algorithm. Some examples of loss functions are “Bernoulli loss” corresponding to a maximum likelihood method, and “exponential loss” corresponding to the well-known ADABoost method. After choosing the best binary decision rule at this stage, the weights may be recomputed and the process may be repeated whereby the best binary rule is chosen which minimizes the new (weighted) loss function. This process may be repeated many times (e.g., several hundred to several thousand) and a resampling technique (such as cross-validation) may be used to define a stopping rule in order to prevent over-fitting.
[0116] Boosted stumps have been shown to approximate additive logistic regression models whereby each feature makes an additive nonlinear contribution (on the logistic scale) to the fitted model. The sequence of stumps define the relationship between session features and the probability that an ad is rated “good”. The sequence can be expressed by the statistical model: P .function. ( good .times. .times. ad | session .times. .times. feature .times. .times. x ) = e ( c 0 + c 1 .times. B .times. .times. 1 .times. ( x ) + c 2 * B .times. .times. 2 .times. ( x ) + ) 1 + e ( c 0 + c 1 .times. B .times. .times. 1 .times. ( x ) + c 2 * B .times. .times. 2 .times. ( x ) + ) Eqn . .times. ( 5 ) where Bk(x)=1 if session feature x satisfies the kth binary rule, or Bk(x)=0 if session feature x does not satisfy the kth binary rule. The coefficients c.sub.k, k=1, . . . , are a by-product of the algorithm and relate to the odds of a good ad at the kth binary rule. In practice, given session feature x, each binary rule can be evaluated and the corresponding coefficients accumulated to get the predicted probability of a good ad. A statistical model, similar to Eqn. (5) above, may similarly be derived that defines the relationship between session features and the probability that an ad is rated “bad.”
[0117] Though logistic regression and boosted stumps have been described above as exemplary techniques for constructing a statistical model, one skilled in the art will recognize that other existing statistical techniques, such as, for example, regression trees may be used to derive the statistical model consistent with principles of the invention.
EXEMPLARY PROCESS FOR DETERMINING PREDICTIVE VALUES RELATED TO AD QUALITY
[0118] FIG. 14 is a flowchart of an exemplary process for determining predictive values relating to the quality of an advertisement according to an implementation consistent with the principles of the invention. As one skilled in the art will appreciate, the process exemplified by FIG. 14 can be implemented in software and stored on a computer-readable memory, such as main memory 430, ROM 440, or storage device 450 of servers 320 or 330 or client 310, as appropriate.
[0119] The exemplary process may begin with the receipt of a search query (block 1400). A user may issue the search query to server 320 for execution by search engine system 325. A set of ads that match the received search query may be obtained by search engine system 325 (block 1405). Search engine system 325 may execute a search, based on the received search query, to ascertain the set of ads, and other documents, that match the search query. Search engine system 325 may provide the set of ads, and a list of the other documents, to the user that issued the search query.
[0120] Session features associated with the selection of an ad from the set of ads may be obtained (block 1410). The session features may be measured in real-time during user ad selection or may be obtained from logs of recorded user behavior associated with ad selection. As shown in FIG. 15, a user may select 1500 an ad 1505 associated with a document 1510 (e.g., a document containing search results and relevant ads). An ad landing document 1515 may be provided to the user in response to selection of the ad 1505. As shown in FIG. 15, session features 1520 associated with the selection 1500 of ad 1505 may be measured. The measured session features may include any type of user behavior associated with the selection of an advertisement, such as those described above with respect to block 510 (FIG. 5).