Author: mutourend & lynndell, Bitlayer Labs

Original title: (Binius STARKs Analysis and Its Optimization)

Original link: https://blog.bitlayer.org/Binius_STARKs_Analysis_and_Its_Optimization/

Abstract: The bit widths of the 1st, 2nd, and 3rd generation STARK proof systems are 252, 64, and 32 bits respectively. Although the coding efficiency has been improved, there is still wasted space. Binius directly operates on bits, and the coding is compact and efficient. It is likely to be the fourth generation of STARK in the future. Binius uses arithmetic based on the tower binary field, an improved version of HyperPlonk product and permutation check, small field polynomial commitment and other technologies to improve efficiency from all angles. Further optimization can be made in binary field multiplication, ZeroCheck, SumCheck, PCS, etc. to further improve the proof speed and reduce the proof size.

1 Introduction

Different from elliptic curve-based SNARKs, STARKs can be considered as hash-based SNARKs. One of the main reasons for the low efficiency of current STARKs is that most values ​​in actual programs are small, such as indexes, true and false values, counters, etc. in for loops. However, in order to ensure the security of Merkle tree-based proofs, when data is expanded using Reed-Solomon encoding, many additional redundant values ​​will occupy the entire domain, even if the original value itself is very small. To solve this problem, reducing the size of the domain has become a key strategy.

As shown in Table 1, the encoding bit width of the first-generation STARKs is 252 bits, the encoding bit width of the second-generation STARKs is 64 bits, and the encoding bit width of the third-generation STARKs is 32 bits, but the 32-bit encoding bit width still has a lot of wasted space. In comparison, the binary domain allows direct bit operations, and the encoding is compact and efficient without any wasted space, that is, the fourth-generation STARKs.Bitlayer Research:Binius STARKs原理解析及其优化思考

Table 1: STARKs evolution path

Compared with the finite fields discovered in recent years such as Goldilocks, BabyBear, and Mersenne31, the research on binary fields can be traced back to the 1980s. Currently, binary fields have been widely used in cryptography, with typical examples including:

  • Advanced Encryption Standard (AES), based on the F28 domain;

  • Galois Message Authentication Code (GMAC), based on the F2128 domain;

  • QR code, using Reed-Solomon encoding based on F28;

  • The original FRI and zk-STARK protocols, as well as the SHA-3 finalist Grøstl hash function, which is based on the F28 domain and is a hash algorithm that is well suited for recursion.

When a smaller domain is used, the expansion domain operation becomes more important to ensure security. The binary domain used by Binius relies entirely on the expansion domain to ensure its security and practical usability. Most polynomials involved in Prover calculations do not need to enter the expansion domain, but only need to operate in the base domain, thus achieving high efficiency in a small domain. However, random point checks and FRI calculations still need to go deeper into the larger expansion domain to ensure the required security.

There are two practical problems when building a proof system based on binary fields: when calculating the trace representation in STARKs, the field size used should be larger than the order of the polynomial; when making Merkle tree commitments in STARKs, Reed-Solomon encoding is required, and the field size used should be larger than the size after the encoding is expanded.

Binius proposed an innovative solution to deal with these two problems separately and to achieve this by representing the same data in two different ways: first, using multivariate (specifically multilinear) polynomials instead of univariate polynomials, and representing the entire computational trajectory through its values ​​on "hypercubes"; second, since the length of each dimension of the hypercube is 2, it is not possible to perform a standard Reed-Solomon expansion like STARKs, but the hypercube can be regarded as a square, and the Reed-Solomon expansion can be performed based on the square. This method greatly improves coding efficiency and computing performance while ensuring security.

2 Principle Analysis

The construction of most current SNARKs systems usually consists of the following two parts:

  • Information-Theoretic Polynomial Interactive Oracle Proof (PIOP): PIOP, as the core of the proof system, converts the input computational relationship into a verifiable polynomial equation. Different PIOP protocols allow the prover to send polynomials step by step through interaction with the verifier, so that the verifier can verify whether the calculation is correct by querying the evaluation results of a small number of polynomials. Existing PIOP protocols include: PLONK PIOP, Spartan PIOP and HyperPlonk PIOP, etc. They each handle polynomial expressions differently, thus affecting the performance and efficiency of the entire SNARK system.

  • Polynomial Commitment Scheme (PCS): PCS is used to prove whether the polynomial equation generated by PIOP is true. PCS is a cryptographic tool that allows the prover to commit to a polynomial and later verify the evaluation result of the polynomial while hiding other information of the polynomial. Common polynomial commitment schemes include KZG, Bulletproofs, FRI (Fast Reed-Solomon IOPP), and Brakedown. Different PCS have different performance, security, and applicable scenarios.

According to specific needs, different PIOPs and PCSs can be selected and combined with appropriate finite fields or elliptic curves to build proof systems with different properties. For example:

• Halo2: Combining PLONK PIOP with Bulletproofs PCS and based on the Pasta curve. Halo2 is designed with a focus on scalability and the removal of the trusted setup in the ZCash protocol.

• Plonky2: PLONK PIOP combined with FRI PCS, based on Goldilocks domain. Plonky2 is designed to achieve efficient recursion. When designing these systems, the selected PIOP and PCS must match the finite field or elliptic curve used to ensure the correctness, performance and security of the system. The choice of these combinations not only affects the proof size and verification efficiency of SNARK, but also determines whether the system can achieve transparency without trusted setup, and whether it can support extended functions such as recursive proofs or aggregate proofs.

Binius: HyperPlonk PIOP + Brakedown PCS + Binary Domain. Specifically, Binius includes five key technologies to achieve its efficiency and security. First, arithmetic based on towers of binary fields forms the basis of its calculation, which enables simplified operations in binary domains. Second, Binius adapted the HyperPlonk product and permutation check in its interactive oracle proof protocol (PIOP) to ensure safe and efficient consistency checks between variables and their permutations. Third, the protocol introduces a new multilinear shift argument that optimizes the efficiency of verifying multilinear relations on small domains. Fourth, Binius uses an improved version of the Lasso search argument, which provides flexibility and strong security for the search mechanism. Finally, the protocol uses a small-field polynomial commitment scheme (Small-Field PCS), which enables it to implement an efficient proof system on binary domains and reduces the overhead typically associated with large domains.

2.1 Finite fields: arithmetic based on towers of binary fields

Tower-like binary fields are key to enabling fast verifiable computations, primarily due to two aspects: efficient computation and efficient arithmetic. Binary fields inherently support highly efficient arithmetic operations, making them ideal for performance-sensitive cryptographic applications. Furthermore, the binary field structure supports simplified arithmetic operations, i.e., operations performed on binary fields can be expressed in a compact and easily verifiable algebraic form. These properties, coupled with the ability to fully exploit their hierarchical nature through the tower structure, make binary fields particularly suitable for scalable proof systems such as Binius.Bitlayer Research:Binius STARKs原理解析及其优化思考

Here, "canonical" refers to the unique and direct representation of elements in the binary field. For example, in the most basic binary field F2, any k-bit string can be directly mapped to a k-bit binary field element. This is different from the prime field, which cannot provide such a canonical representation within a given number of bits. Although a 32-bit prime field can be contained in 32 bits, not every 32-bit string can uniquely correspond to a field element, while the binary field has the convenience of this one-to-one mapping. In the prime field Fp, common reduction methods include Barrett reduction, Montgomery reduction, and special reduction methods for specific finite fields such as Mersenne-31 or Goldilocks-64. In the binary field F2k, commonly used reduction methods include special reduction (such as used in AES), Montgomery reduction (such as used in POLYVAL), and recursive reduction (such as Tower). The paper (Exploring the Design Space of Prime Field vs. Binary Field ECC-Hardware Implementations) points out that the binary field does not need to introduce carry in addition and multiplication operations, and the square operation of the binary field is very efficient because it follows the simplified rule of (X + Y )2 = X2 + Y 2.

As shown in Figure 1, a 128-bit string: This string can be interpreted in many ways in the context of a binary field. It can be viewed as a unique element in a 128-bit binary field, or parsed as two 64-bit tower field elements, four 32-bit tower field elements, 16 8-bit tower field elements, or 128 F2 field elements. This flexibility of representation does not require any computational overhead, just typecasting the bit string, which is a very interesting and useful property. At the same time, small field elements can be packed into larger field elements without additional computational overhead. The Binius protocol exploits this property to improve computational efficiency. In addition, the paper (On Efficient Inversion in Tower Fields of Characteristic Two) explores the computational complexity of multiplication, squaring, and inversion operations in an n-bit tower binary field (which can be decomposed into m-bit subfields).

Bitlayer Research:Binius STARKs原理解析及其优化思考

Figure 1: Tower Binary Fields

Bitlayer Research:Binius STARKs原理解析及其优化思考

2.2 PIOP: Adapted HyperPlonk Product and PermutationCheck for Binary Domains

The PIOP design in the Binius protocol draws on HyperPlonk and uses a series of core checking mechanisms to verify the correctness of polynomials and multivariate sets. These core checks include:

  1. GateCheck: Verifies that the confidential witness ω and the public input x satisfy the circuit operation relation C(x,ω)=0 to ensure that the circuit operates correctly.

  2. PermutationCheck: Verifies that two multivariate polynomials f and g evaluated on a Boolean hypercube result in a permutation relation f(x) = f(π(x)) to ensure consistency of permutation between polynomial variables.

  3. LookupCheck: Verifies that the evaluation of a polynomial is in a given lookup table, i.e., f(Bµ) ⊆ T(Bµ), ensuring that certain values ​​are within a specified range.

  4. MultisetCheck: Checks whether two multivariable sets are equal, that is, {(x1,i,x2,)}i∈H={(y1,i,y2,)}i∈H, ensuring consistency between multiple sets.

  5. ProductCheck: Checks whether a rational polynomial evaluated on a Boolean hypercube is equal to some declared value ∏x∈Hµ f(x) = s, ensuring the correctness of the polynomial product.

  6. ZeroCheck: Verify that a multivariate polynomial is zero at any point on the Boolean hypercube ∏x∈Hµ f(x) = 0,∀x ∈ Bµ to ensure the zero point distribution of the polynomial.

  7. SumCheck: Checks whether the sum of a multivariate polynomial is the declared value ∑x∈Hµ f(x) = s. By converting the evaluation problem of a multivariate polynomial into the evaluation of a univariate polynomial, the computational complexity of the verifier is reduced. In addition, SumCheck also allows batch processing, by introducing random numbers and constructing linear combinations to achieve batch processing of multiple sum verification instances.

  8. BatchCheck: Based on SumCheck, it verifies the correctness of multiple multivariate polynomial evaluations to improve protocol efficiency.

Although Binius and HyperPlonk have many similarities in protocol design, Binius makes improvements in the following three aspects:

  • ProductCheck optimization: In HyperPlonk, ProductCheck requires that the denominator U is non-zero everywhere on the hypercube and the product must be equal to a specific value; Binius simplifies this check process by specializing this value to 1, thereby reducing the computational complexity.

  • Handling of division by zero: HyperPlonk fails to adequately handle the division by zero case, resulting in the inability to assert that U is non-zero on the hypercube; Binius handles this issue correctly, and Binius's ProductCheck continues to handle it even when the denominator is zero, allowing generalization to arbitrary product values.

  • Cross-column PermutationCheck: HyperPlonk does not have this feature; Binius supports PermutationCheck between multiple columns, which enables Binius to handle more complex polynomial permutations.

Therefore, Binius improves the flexibility and efficiency of the protocol by improving the existing PIOPSumCheck mechanism, especially providing stronger functional support when dealing with more complex multivariate polynomial verification. These improvements not only solve the limitations in HyperPlonk, but also lay the foundation for future proof systems based on binary domains.

2.3 PIOP: New multilinear shift argument - for boolean hypercubes

In the Binius protocol, the construction and processing of virtual polynomials is one of the key technologies, which can effectively generate and operate polynomials derived from input handles or other virtual polynomials. The following are two key methods:

  • Packing: This method optimizes the operation by packing smaller elements at adjacent positions in the lexicographic order into larger elements. The Pack operator operates on blocks of size 2κ and combines them into a single element in the high-dimensional domain. Through multilinear extension (MLE), this virtual polynomial can be evaluated and processed efficiently, transforming the function t into another polynomial, thereby improving the computational performance.

  • Shift Operator: The shift operator rearranges the elements within a block, performing a circular shift based on a given offset o. This method works for blocks of size 2b, with each block performing a shift based on the offset. The shift operator is defined using the support of a detection function, ensuring consistency and efficiency when dealing with virtual polynomials. The complexity of evaluating this construction grows linearly with the block size, making it particularly useful for dealing with large datasets or high-dimensional scenarios in Boolean hypercubes.

2.4 PIOP: Adapted Lasso lookup argument for binary domains

The Lasso protocol allows the prover to commit to a vector a ∈ Fm and prove that all its elements exist in a pre-specified table t ∈ Fn. Lasso unlocks the concept of "lookup singularities" and can be applied to multilinear polynomial commitment schemes. Its efficiency is reflected in the following two aspects:

  • Proof efficiency: For m lookups in a table of size n, the prover only needs to commit to m+n domain elements. These domain elements are small and all lie in the set {0,...,m}. In a commitment scheme based on multiple exponentiation, the computational cost for the prover is O(m + n) group operations (such as elliptic curve point addition), plus the evaluation cost of proving whether the multilinear polynomial is a table element on a Boolean hypercube.

  • No need to commit to large tables: If table t is structured, no commitment is required to it, so very large tables (e.g., 2128 or larger) can be processed. The prover's runtime is only related to the number of table entries visited. For any integer parameter c > 1, the main cost of the prover is the proof size, and the number of committed domain elements is 3·c m + c·n1/c . These domain elements are small and lie in the set {0,...,max{m,n1/c,q} − 1}, where q is the maximum value in a.

The Lasso protocol consists of the following three components:

  • Virtual polynomial abstraction for large tables: Operations on large tables are implemented by combining virtual polynomials, ensuring efficient lookup and processing within the table.

  • Small table search: The core of Lasso is small table search, which is the core construction of the virtual polynomial protocol. It uses offline memory detection to verify whether the evaluation of a virtual polynomial on a Boolean hypercube is a subset of the evaluation of another virtual polynomial. This search process is reduced to the task of multi-set detection.

  • Multi-set checking: Lasso introduces a virtual protocol to perform multi-set checking, verifying whether the elements of two sets are equal or satisfy a certain condition.

The Binius protocol adapts Lasso to operations in binary fields, assuming that the current field is a prime field with large characteristics (much larger than the length of the column being searched). Binius introduces a multiplication version of the Lasso protocol, requiring the prover and the verifier to jointly increment the "memory count" operation of the protocol, not by simply adding 1, but by multiplication generators in the binary field. However, this multiplication adaptation introduces more complexity. Unlike the increment operation, the multiplication generator is not incremented in all cases. There is a single track at 0, which may become an attack point. To prevent this potential attack, the prover must commit to a read count vector that is non-zero everywhere to ensure the security of the protocol.

2.5 PCS: Adapted Brakedown PCS - Applicable to Small-Field

The core idea of ​​building BiniusPCS is packing. The Binius paper provides two Brakedown polynomial commitment schemes based on binary fields: one is instantiated using concatenated code; the other uses block-level encoding technology and supports the use of Reed-Solomon codes alone. The second Brakedown PCS scheme simplifies the proof and verification process, but the proof size is slightly larger than the first one. However, the simplification and implementation advantages it brings make this trade-off worthwhile.

Binius polynomial commitment mainly uses small field polynomial commitment and extended field evaluation, small field universal construction and block-level coding and Reed-Solomon code technology.

Small field polynomial commitment and extended field evaluation: The commitment in the Binius protocol is a polynomial commitment over a small field K and evaluated in a larger extended field L/K. This approach ensures that every multilinear polynomial t(X0,...,Xℓ−1) belongs to the field K[X0,...,Xℓ−1], while the evaluation point can be located in the larger extended field L. The commitment scheme is specifically designed for small field polynomials and can be queried over the extended field, while ensuring the security and efficiency of the commitment.

Small Domain Universal Construction: The small domain universal construction ensures that the extended domain L is large enough to support security evaluation by defining the parameter ℓ, the domain K and its associated linear block code C. To improve security while maintaining computational efficiency, the protocol guarantees the robustness of the commitment through the characteristics of the extended domain and the use of linear block codes to encode polynomials.

Block-level encoding and Reed-Solomon codes: Binius proposed a block-level encoding scheme for polynomials with fields smaller than the alphabet of linear block codes. With this scheme, even polynomials defined in small fields (such as F2) can be efficiently committed using Reed-Solomon codes with large alphabets such as F216. Reed-Solomon codes were chosen because of their efficiency and maximum distance separation properties. The scheme simplifies the operation complexity by packaging the messages and encoding them line by line, and then using Merkle trees for commitment. Block-level encoding allows efficient commitment of polynomials in small fields without the high computational overhead usually associated with large fields, making it possible to commit polynomials in small fields such as F2 and maintain computational efficiency in generating proofs and verifications.

3. Optimization thinking

In order to further improve the performance of the Binius protocol, this paper proposes four key optimization points:

  1. GKR-based PIOP: For binary domain multiplication operations, the GKR protocol is used to replace the Lasso Lookup algorithm in the Binius paper, which can significantly reduce Binius's commitment overhead;

  2. ZeroCheck PIOP optimization: trade-off between computational overhead between Prover and Verifier, making ZeroCheck operation more efficient;

  3. Sumcheck PIOP optimization: Optimization of Sumcheck for small domains, further reducing the computational burden on small domains;

  4. PCS Optimization: Reduce proof size and improve overall protocol performance through FRI-Binius optimization.

3.1 GKR-based PIOP: GKR-based binary field multiplication

The Binius paper introduces a lookup-based scheme to achieve efficient binary-domain multiplication. The binary-domain multiplication algorithm adapted from the Lasso lookup argument relies on the linear relationship between lookups and addition operations, which are proportional to the number of limbs in a single word. Although this algorithm optimizes the multiplication operation to some extent, it still requires auxiliary commitments that are linearly related to the number of limbs.

The core idea of ​​the GKR (Goldwasser-Kalai-Rothblum) protocol is that the prover (P) and the verifier (V) agree on a layered arithmetic circuit over a finite field F. Each node of the circuit has two inputs, which are used to calculate the required function. In order to reduce the computational complexity of the verifier, the protocol uses the SumCheck protocol to gradually simplify the statement about the circuit output gate value into a lower-level gate value statement, until the statement is finally simplified to a statement about the input. In this way, the verifier only needs to check the correctness of the circuit input.

The GKR-based integer multiplication algorithm converts "checking whether two 32-bit integers A and B satisfy A·B =? C" into "checking whether (gA)B =? gC holds", and uses the GKR protocol to significantly reduce commitment overhead. Compared with the previous Binius lookup scheme, the GKR-based binary domain multiplication operation only requires one auxiliary commitment, and by reducing the overhead of Sumchecks, the algorithm is more efficient, especially in scenarios where Sumchecks operations are cheaper than commitment generation. With the advancement of Binius optimization, GKR-based multiplication operations have gradually become an effective way to reduce the overhead of binary domain polynomial commitments.

3.2 ZeroCheck PIOP Optimization: Prover and Verifier Computational Overhead Tradeoff

The paper (Some Improvements for the PIOP for ZeroCheck) adjusts the workload distribution between the prover (P) and the verifier (V) and proposes multiple optimization schemes to balance the overhead. This work explores different k value configurations to achieve a cost trade-off between the prover and the verifier, especially in terms of reducing transmitted data and reducing computational complexity.

Reduce the prover's data transmission: By transferring part of the work to the verifier V, the amount of data sent by the prover P is reduced. In the i-th round, the prover P needs to send vi+1(X) to the verifier V, where X=0,...,d + 1. The verifier V checks the following equation to verify the correctness of the data

vi = vi+1(0) + vi+1(1).

Optimization method: The prover P can choose not to send vi+1(1), but let the verifier V calculate the value by itself in the following way

vi+1(1) = vi − vi+1(0).

Furthermore, in round 0, the honest prover P always sends v1(0) = v1(1) = 0, which means that no evaluation calculation is required, thus significantly reducing the computational and transmission costs to d2n−1CF + (d + 1)2n−1CG.

Reduce the number of evaluation points for the prover: In the i-th round of the protocol, the verifier has sent a sequence of values ​​r = (r0, ..., ri−1) in the previous i rounds. The current protocol requires the prover (P) to send the polynomial

vi+1(X) = ∑ δˆn(α,(r,X,x))C(r,X,x).x∈H −−1

Optimization method: The prover P sends the following polynomial. The relationship between these two functions is:

vi(X) = vi′(X)·δi+1((α0,...,αi),(r,X))

where δˆi+1 is completely known by the verifier because he has α and r. The benefit of this modification is that the degree of vi′(X) is 1 less than that of vi(X), which means that the prover needs to evaluate fewer points. Therefore, the main protocol changes occur in the checking phase between rounds.

In addition, the original constraint vi = vi+1(0)+vi+1(1) is optimized to (1−αi)vi′+1(0)+αivi′+1(1) = vi′(X). The prover needs to evaluate and send less data, further reducing the amount of data transmitted. Calculating δˆn−i−1 is also more efficient than calculating δˆn. With these two improvements, the cost is reduced to approximately: 2n−1(d− 1)CF + 2n−1dCG. In the common case of d=3, these optimizations reduce the cost by a factor of 5/3.

Algebraic interpolation optimization: For an honest prover, C(x0,...,xn−1) is zero on Hn, which can be expressed as: C(x0,...,xn-1)= ∑xi(xi-1)Qi(x0,...,xn-1). Although Qi is not unique, an orderly decomposition can be constructed by polynomial long division: starting from Rn=C, Qi and Ri are calculated by dividing by xi(xi−1) successively, where R0 is the multilinear extension of C on Hn and is assumed to be zero. Analyzing the degree of Qi, it can be concluded that for j>i, the degree of Qj on xi is the same as C; for j = i, the degree is reduced by 2; for j i, the degree is at most 1. Given a vector r = (r0,...,ri), Qj(r,X) is multilinear for all j ≤ i. In addition, 1) Qj(r,X) is the only multilinear polynomial equal to C(r,X) on Hn−i. For any X and x ∈ Hn−i−1, it can be expressed as:

C(r,X,x) − Qi(r,X,x) = X(X − 1)Qi+1(r,X,x)

Therefore, in each round of the protocol, only a new Q is introduced, whose evaluation value can be calculated from C and the previous Q, achieving interpolation optimization.

3.3 Sumcheck PIOP Optimization: Sumcheck Protocol Based on Small Domains

The STARKs scheme implemented by Binius has a very low commitment overhead, so that the prover bottleneck is no longer PCS, but the sum-check protocol. In 2024, Ingonyama proposed an improved solution for the small-domain-based Sumcheck protocol (corresponding to the Algo3 and Algo4 algorithms in Figure 2) and open-sourced the implementation code. Algorithm 4 focuses on merging the Karatsub algorithm into Algorithm 3, minimizing the number of extended-domain multiplications at the cost of additional base-domain multiplications, so when extended-domain multiplications are much more expensive than base-domain multiplications, Algorithm 4 performs better.

  • The impact of switching rounds and improvement factors

The improvement of the Sumcheck protocol based on small domains focuses on the selection of the switching round t. The switching round refers to the time point when switching from the optimization algorithm back to the naive algorithm. The experiments in the paper show that at the optimal switching point, the improvement factor reaches the maximum value and then shows a parabolic trend. If this switching point is exceeded, the performance advantage of the optimization algorithm weakens and the efficiency decreases. This is because the base domain multiplication on the small domain has a higher time ratio than the extended domain multiplication, so it is crucial to switch back to the naive algorithm at the right time.

Bitlayer Research:Binius STARKs原理解析及其优化思考

Figure 2: Relationship between switching rounds and improvement factor

For specific applications, such as Cubic Sumcheck (d = 3), the small domain Sumcheck protocol improves upon the naive algorithm by an order of magnitude. For example, when the base domain is GF[2], Algorithm 4 outperforms the naive algorithm by nearly 30 times.

  • Effect of base domain size on performance

The experimental results of the paper show that a smaller base domain (such as GF[2]) can make the optimization algorithm show more significant advantages. This is because the time ratio of extended domain to base domain multiplication is higher on smaller base domains, and thus the optimization algorithm exhibits a higher improvement factor under this condition.

  • Optimization benefits of Karatsuba algorithm

The Karatsuba algorithm has shown significant results in improving the performance of Sumcheck based on small domains. For the base domain GF[2], the peak improvement factors of Algorithm 3 and Algorithm 4 are 6 and 30 respectively, indicating that Algorithm 4 is five times more efficient than Algorithm 3. Karatsuba optimization not only improves the operating efficiency, but also optimizes the switching point of the algorithm, reaching the best at t=5 in Algorithm 3 and t=8 in Algorithm 4 respectively.

  • Improved memory efficiency

In addition to improving the running time, the Sumcheck protocol based on small domains also shows significant advantages in memory efficiency. The memory requirement of Algorithm 4 is O(d·t), while the memory requirement of Algorithm 3 is O(2d·t). When t=8, Algorithm 4 only requires 0.26MB of memory, while Algorithm 3 requires 67MB to store the product of the base domain. This makes Algorithm 4 more adaptable on memory-constrained devices, especially suitable for client-side proof environments with limited resources.

3.4 PCS Optimization: FRI-Binius reduces Binius proof size

A major drawback of the Binius protocol is its relatively large proof size, which scales as the square root of the witness size as O(√N). This square root proof size is a limitation compared to more efficient systems. In contrast, polylogarithmic proof size is critical to achieving truly "succinct" validators, as demonstrated in advanced systems like Plonky3, which achieves logarithmic proofs through advanced techniques such as FRI.

The paper (Polylogarithmic Proofs for Multilinears over Binary Towers), referred to as FRI-Binius, implements the binary domain FRI folding mechanism, bringing four innovations:

  • Flattened Polynomial: The initial multilinear polynomial is converted to a LCH (Low Code Height) novel polynomial basis.

  • Subspace vanishing polynomials: used to perform FRI on the coefficient domain and achieve FFT-like decomposition via additive NTT (number theoretic transform).

  • Algebraic basis packing: supports efficient packing of information in the protocol, removing embedding overhead.

  • Ring-Exchange SumCheck: A novel SumCheck method that utilizes ring-exchange techniques to optimize performance.

The core idea of ​​the multilinear polynomial commitment scheme (PCS) based on binary field FRI-Binius is that the FRI-Binius protocol operates by packaging the initial binary field multilinear polynomial (defined on F2) into a multilinear polynomial defined on a larger field K.

In the binary domain-based FRI-BiniusPCS, the process is as follows:

  • Commitment stage: A commitment to a multilinear polynomial of ℓ variables (defined on F2) is transformed into a commitment to a packed multilinear polynomial of ℓ′ variables (defined on F2128), thus reducing the number of coefficients by a factor of 128.

  • Evaluation phase: The prover and the verifier conduct ℓ′ rounds of cross-ring switching SumCheck and FRI interactive proof (IOPP):

– The FRI open proof takes up the majority of the proof size.

– The SumCheck cost for the prover is similar to the SumCheck cost on a regular large domain.

– The FRI cost for the prover is the same as that on a regular large domain.

– The verifier receives 128 elements from F2128 and performs 128 additional multiplications.

With FRI-Binius, the size of Binius proofs can be reduced by an order of magnitude. This brings Binius proof size closer to state-of-the-art systems while maintaining compatibility with the binary domain. The FRI folding technology customized for the binary domain, combined with algebraic packing and SumCheck optimizations, enables Binius to generate more concise proofs while maintaining efficient verification.

Bitlayer Research:Binius STARKs原理解析及其优化思考

表 2: Binius vs. FRI-Binius Proof Size

Bitlayer Research:Binius STARKs原理解析及其优化思考

Table 3: Plonky3 FREE vs. FRI-Binius

4 Summary

The entire value proposition of Binius is that the smallest power-of-two domain can be used for witnesses, so you only need to choose the domain size based on your needs. Binius is a co-design solution that "uses hardware, software, and the Sumcheck protocol accelerated in FPGAs" to quickly prove with very low memory usage. Proof systems such as Halo2 and Plonky3 have 4 key steps that take up most of the computation: generating witness data, committing to witness data, vanishing argument, and opening proof. Taking the Keccak in Plonky3 and the Grøstl hash function in Binius as examples, the corresponding computational proportions of the above 4 key steps are shown in Figure 3:

Bitlayer Research:Binius STARKs原理解析及其优化思考

Figure 3: Smaller commit cost

It can be seen that the Prover commit bottleneck has been almost completely removed in Binius, and the new bottleneck lies in the Sumcheck protocol. The large number of polynomial evaluations and domain multiplication problems in the Sumcheck protocol can be efficiently solved with the help of dedicated hardware. The FRI-Binius scheme, a variant of FRI, provides a very attractive option - eliminating the embedding overhead from the domain proof layer without causing a surge in the cost of the aggregate proof layer. Currently, the Irreducible team is developing its recursive layer and announced a collaboration with the Polygon team to build a Binius-based zkVM; JoltzkVM is switching from Lasso to Binius to improve its recursive performance; the Ingonyama team is implementing an FPGA version of Binius.

 

References

  1. 2024.04 Binius Succinct Arguments over Towers of Binary Fields

  2. 2024.07 Fri-Binius Polylogarithmic Proofs for Multilinears over Binary Towers

  3. 2024.08 Integer Multiplication in Binius: GKR-based approach

  4. 2024.06 zkStudyClub - FRI-Binius: Polylogarithmic Proofs for Multilinears over Binary Towers

  5. 2024.04 ZK11: Binius: a Hardware-Optimized SNARK - Jim Posen

  6. 2023.12 Episode 303: A Dive into Binius with Ulvetanna

  7. 2024.09 Designing high-performance zkVMs

  8. 2024.07 Sumcheck and Open-Binius

  9. 2024.04 Binius: highly efficient proofs over binary fields

  10. 2023.12 SNARKs on binary fields: Binius - Part 1

  11. 2024.06 SNARKs on binary fields: Binius - Part 2

  12. 2022.10 HyperPlonk, a zk-proof system for ZKEVMs