Estimating Accuracy from Unlabeled Data A Probabilistic Logic Approach

Introduction

Background

Estimating the accuracy of classifiers is central to machine learning and many other fields. Most existing approaches to eastimating accuracy are supervised, meaning that a set of labeled examples is required for the Estimation. However, being able to estimate the accuracies of classifiers using only Unlabeled data is very important for many applications. Futhermore, tasks which involve making several predictions which are tied together by logical constraints are abundant in machine learning.

Intuition

Mutual Exclusion

If domains d1 and d2 are mutually exclusive,then $f^{d_1} =1$ implies that $f^{d_2} =0$.
For example, in the NELL setting, if a NP belongs to the “city” category,then it cannot also belong to the “animal” category.

Subsumption

For example, in the NELL setting, if a NP belongs to the “cat” category, then it must also belong to the “animal” category.

Proposed Methed

Define a set of Logic Rules

(i) defining a set of logic rules for modeling the logical constraints between the $f^d$ and the $\hat{f}^d_j$ ,in terms of the error rates $e^d_j$ and the known logical constraints.

Probabilistic Logic

In classical logic, we have a set of predicates (e.g., mammal(x) indicating whether x is a mammal, where x is a variable) and a set of rules defined in terms of these predicates. These ground predicates and rules take boolean values.

In probabilistic logic, we are instead interested in inferring the probabilities of these ground predicates and rules being true, given a set of observed ground predicates and rules.Furthermore, the truth values of ground predicates and rules may be continuous and lie in the interval [0,1], instead of being boolean,representing the probability that the corresponding ground predicate or rule is true.

Model

Function Approximation Outputs: $\hat{f}^d_j(X),j=1,…,N^d, inputs X\in \chi$
Target Function Outputs: $f^d(X),inputs X\in \chi$
Function Approximation Error Rates: $e^d_j,j=1,…,N^d$

Ensemble Rules

$$\hat{f}^d_j(X)\land \neg e^d_j \rightarrow f^d(X),\neg \hat{f}^d_j(X)\land \neg e^d_j \rightarrow \neg f^d(X)$$
$$\hat{f}^d_j(X)\land e^d_j \rightarrow\neg f^d(X),\neg \hat{f}^d_j(X)\land e^d_j \rightarrow f^d(X)$$

Constraints

Mutual Exclusion Rule

$$ME(d_1,d_2)\land \hat{f}^{d_1}_j(X) \land f^{d_2}(X) \rightarrow e^{d_1}_j,for d_1\neq d_2=1,…,D,j=1,…,N^{d_1},and X\in \chi $$

Subsumption Rule

$$SUB(d_1,d_2)\land \neg\hat{f}^{d_1}_j(X) \land f^{d_2}(X) \rightarrow e^{d_1}_j,for d_1=d_2=1,…,D,j=1,…,N^{d_1},and X\in \chi$$

Perform probabilistic inference

(ii) performing probabilistic inference using these rules as priors,in order to obtain the most likely values of the $e^d_j$ and the $f^d$,which are not observed.

Probabilistic Soft Logic (PSL)

Define the terms:

The unobserved ground predicate values: $\boldsymbol{Y}={Y_1,…,Y_m},Domain\ \boldsymbol{D}=[0,1]^m$.
Observed ground predicate values: $\boldsymbol{X}={X_1,…,X_m},Domain\ \boldsymbol{D}=[0,1]^n$.
Continuous potential functions: $\phi={\phi_1,…,\phi_k}$,$\phi_j(\boldsymbol{X},\ \boldsymbol{Y})=(max{ \mathcal{l}_j(\boldsymbol{X},\boldsymbol{Y}),0 } )^{p_j}$, $\mathcal{l}_j$ is a linear functions of X and Y,$p_j\in{1,2}$.
Free parameters $\lambda={\lambda_1,…,\lambda_k}$

Define HL-MRF Density:

$$f(\boldsymbol{Y})=\frac{1}{Z}exp(-\Sigma^k_{j=1}\lambda_j\phi_j(\boldsymbol{X},\boldsymbol{Y}))$$

Define logical operator

$$P\land Q \triangleq max{P+Q-1,0},\quad P\lor Q \triangleq min{P+Q-1,0}$$
$$\neg P \triangleq 1-P,\quad P\rightarrow Q \triangleq min{1-P+Q,0} $$

Grounding

Solving the optimization problem

Experiments