Skip to main content

NOTICE: Due to a lapse in annual appropriations, most of this website is not being updated. Learn more.

Form submissions will still be accepted but will not receive responses at this time. Sections of this site for programs using non-appropriated funds (such as NVLAP) or those that are excepted from the shutdown (such as CHIPS and NVD) will continue to be updated.

U.S. flag

An official website of the United States government

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.

Quasi-Systolic Processor and Quasi-Systolic Array

Patent Number: 11,651,231

Problem

The matrix decomposition system block.

During the training of typical AI models, in order to update the matrix of weights in a particular network layer, a developer must calculate an additional update matrix to transfer into the main weight array. This update matrix is just as big as the main array, which is already very large. This large matrix calculation dramatically reduces the advantages of using certain emerging classes of vector matrix multipliers, such as crossbar arrays, since this large matrix requires lots of traditional computing resources. Also, it’s well known that, even during traditional training, transmission of weight updates and network gradients consumes a lot of computing resources.

Invention

The invention is a computer architecture that efficiently calculates a weight matrix update for an AI model, especially ones designed to be implemented in hardware neural networks. Our architecture uses an approximation algorithm to do this calculation with less memory overhead. The architecture uses special matrix decomposition methods, such as streaming principal component analysis, to calculate an approximation of this update matrix using far fewer parameters, which need to be stored in memory.

Potential Commercial Applications

This technology's primary use is for training AI models in the cloud or at the edge. This approach potentially yields many of the critical benefits of batch updates, requires substantially less overhead, and has a significantly lower computational cost. 

Competitive Advantage

The fewer memory locations and calculations needed to train the network, the less time, area, and energy are needed to operate an AI hardware system.

Created August 17, 2023, Updated September 15, 2025
Was this page helpful?