Probabilistic Checkable Proof (PCP)
April 30, 2023
This post gives a gentle introduction to the Probabilistic Checkable Proof (PCP), the pearl of computational complexity theory.
Introduction
Consider the canonical NP problem, boolean satisfiability (SAT). To determine if a Boolean formula is satisfiable, a deterministic verifier needs to check a polynomial length proof — the assignment of each variable. But what if the verifier is allowed to use randomness and to fail with a small probability? It turns out that the probablistic verifier still needs a polynomial-size proof, but it does not need to look at all of it. With oracle access to the proof, it only needs to make constant queries to the proof, i.e. check constant number of locations of the proof. The proof string is of course much more complex than the assignment of each variable. We call these proofs “probablistic checkable proofs” (PCP). The aforementioned fact about NP is the famouse PCP theorem, and in this post, we will walk through the key components of the proof.
Definition
A Language is in the complexity class PCP if there exists a polynomial-time probablistic oracle Turing machine called the PCP verifier such that
- is the alphabet for the proofs
- uses bits of randomness and queries
- Completeness: If , there exists a proof such that
- Soundness: If , for all proofs ,
Basic Examples and Results
- PCP = P
- PCP = P
- PCP = BPP
- PCP = NP
- PCP = NP
Usually and we simply drop . Further, if we drop the completeness and soundness parameters, we implicity assume that . Note that given bits of randomness and queries, the verifier can check at most locations of the proof, which is the effective size of any proof. (Here, we assume the verifier is non-adaptive.)
The Statement of the PCP Theorem
The theorem demands a very intricate proof, which is of polynomial size and requires only constant number of queries. The proof we will explore is based on “proof composition”. We will construct two not so strong proofs: a very long proof (exponential-size) that requires only constant number of queries, and a short proof (polynomial size) that requires polylog queries. Then we will compose them to take the merits of both. Thus, the outline of the proof will contain three main components: (1) an exponential-size PCP, (2) a polynomial-size PCP, and (3) the proof composition. Generally speaking, every PCP is a smart coding of the plain proof, so we will use different codes borrowed from coding theory throughout the journey.
Exponential-size PCP
Note that the effective proof size is , hence the name exponential-size PCP. To prove this statement, we will consider a NP-complete problem called QUADEQ, which determines if a system of quadratic equations has a satisfying assignment. The prover should use Walsh-Hadamard code to construct a exponential-size PCP of the satisfying assignment. The verifier will first use linearity testing to verify the validity of the codeword, i.e., to check that the proof is indeed the Walsh-Hadamard code of some string. Then it can use constant queries to verify if the string is a satisfying assignment with high probability.
QUADEQ
A quadratic equation of variables over has the form . A system of quadratic equations over is in QUADEQ if the system has a satisfying assignment .
The polynomial-size proof of QUADEQ is simply the satisfying assignment of the variables , so QUADEQ is clearly in NP. To see that QUADEQ is NP-complete, it is easist to reduce from circuit-satisfiabily. We can assign a variable to represent the output of each gate in the circuit. Then
- .
Walsh-Hadamard Code
To construct a exponential-size PCP, we will make use of the Walsh-Hadamard code. Given a message , its Walsh-Hadamard codeword is , where . Here is the index of , a string of exponential size, and is the dot-product of and .
Random Subsum Principle
As a fundamental principle of coding, if the messages and differs in a few locations, we want their codewords to differ in almost all the locations. This is indeed true to Walsh-Hadamard Code, which can be summarized as the random subsum principle:
Linear Functions
Another principle of coding is that the verifier should be able to check the validity of a codeword, i.e., the given string is the codeword of some string, or the given string is close to the codeword of some string. To demonstrate this property, we will consider the Walsh-Hadamard codeword of as a linear function defined by . In general, a linear function satisfies . It’s straightforward to see that is indeed a linear function. Conversely, every linear function can be written as for some . Let be the string with a single at the -th location. Then . Thus, the Walsh-Hadamard code is equivalent to linear functions over . To check the validity of a codeword is to test whether it is close to a linear function.
BLR Linearity Testing
Formally, we say that a function is -close to a linear function if
The BLR linearity testing is actually very simple: we randomly query two locations and , and test if . Nevertheless, it is very powerful. We state without a proof of the following theorem: If
is -close to some linear function.
This suggests a simple probabilistic algorithm for linearity testing, and hence verifying a Walsh-Hadamard code.
Local Decoding
Walsh-Hadamard code is also a locallly decodable code, i.e., we can recover a single bit of a corrupted codeword while making constant number of queries. Suppose is -close to a linear function (a Walsh-Hadamard codeword), and we want to decode , but is corrupted. Then we can pick a random $r \in {0,1}^n, and since
we can decode by querying if these two locations are not corrupted. If the codeword is corrupted at every location with the same probability, then this procedure actually makes things worse. However, in our PCP system in the later sections, we will only look for specific locations of the codeword. The local decoding procedure guards against a malicious prover who will concentrate the corruptions on the certain locations we are looking at.
An Exponential-size PCP for QUADEQ
QUADEQ asks, given quadratic equations of variables, if there is satisfying assignment of the variables. Equivalently, QUADEQ asks, given inputs and , where is a matrix, and is an dimensional vector, if there is a dimensional vector such that and . Here should be a satisfying assignment.
Our exponential-size PCP should be and , where . Now the verifier will carry out the following steps:
- (Linearity Test) Check that , are linear functions i.e., , are Walsh-Hadamard code- words.
- (Tensor Test) Check that , encode the same source word i.e., encodes and f encodes .
- (Satisfiability Test) Check that .
We’ve already discussed the linearity test. We will describe the remaining two steps.
Tensor Test
The idea of tensor test is very straightforward. If and for some , then
Thus, given two linear functions and , the tensor test is as following:
- Choose uniformly at random.
- Accept if
The protocol is clearly complete. It is also sound: based on random subsum principle, it can be shown that each tensor test rejects correctly with at least probability. Thus, it is also efficient: we only need to repeat the test constant number of times to achieve desired parameters.
Satisfiability Test
The satisfiability test is also quite straightforward:
- Choose uniformly at random.
- Accept if
Again, based on the random subsum principle, each test rejects correctly with at least probability. Again, we only need to repeat the test constant number of times to achieve desired parameters.
Polynomial-size PCP
Reed-Muller code
SumCheck Protocol
Low-degree Testing
PCP Theorem via Proof Composition
Now we’ve introduced two weaker PCP theorems, we will compose them to prove the PCP theorem. In order to do that, we introduce two variants of PCPs, whose purposes will be explicit once we get to proof composition
PCP of Proximity (PCPP)
We now present a relaxation of PCPs that only verify that the input is close to an element of the language. The advantage of this relaxation is that it allows the possibility that the verifier may read only a small number of bits from the input. Actually, for greater generality, we will divide the input into two parts , giving the verifier explicitly and as an oracle, and we only count the verifier’s queries to the latter. Thus we consider languages consisting of pairs of strings. For a pair language , we define .
Definition
A Language is in the complexity class PCPP if there exists a polynomial-time probablistic oracle Turing machine called the PCPP verifier such that
- Completeness: If , there exists a proof such that
- Soundness: If , for all proofs ,
We introduce a new proximity parameter , and the distance is defined as
if and if .
On one hand, the soundness criterion of PCPPs is a relaxation of the soundness of PCPs. Whereas, a PCP is required to reject (with high probability) every input that is not in the language, a PCPP is only required to reject input pairs in which the second element (i.e., ) is far from being suitable for the first element (i.e., is far from L(x)). That is, in a PCPP, nothing is required in the case that is close to and yet . On the other hand, the query complexity of a PCPP is measured more stringently, as it accounts also for the queries to the input-part y (on top of the standard queries to the proof ). This should be contrasted with a standard PCP that has free access to all its input, and is only charged for access to an auxiliary proof.
Construction
Robust PCP
We now present a strengthening of the standard PCP soundness condition. Instead of asking that any proof is rejected with high probability, we ask that any proof is far from acceptance with high probability.
Definition
Given an input and a proof , if we fix the random bits , we have a local view . We define the accepting views as
A Language is in the complexity class RPCP if there exists a polynomial-time probablistic oracle Turing machine called the PCP verifier such that
- Completeness: If , there exists a proof such that
- Soundness: If , for all proofs ,
Here, the distance function is defined similarly as in the case of PCPP. Note that PCP RPCP, i.e., a vanilla PCP has robustness.
Construction
Proof Composition
To carry out proof composition, we will have an outer RPCP which has a small size, and an inner PCPP which uses a few queries. When the outer RPCP determines the oracle positions to query given random bits , , the inner PCPP will verify that the selected positions are accepted, . The new proof will be the concatenation of and for all . Note that we need the outer RPCP to be robust, because the inner PCPP only has soundness gurantee if is far from accepting.
References
[1] CS 294 at UC Berkeley, Probabilistically Checkable and Interactive Proof Systems, Spring 2019, Website