Memorism Processors

Technical Manual

Memorism Processors technology is explained in detail.

Contents

  1. Introduction
  2. Trends in the semiconductor industry and computer problem
    1. Trends in the semiconductor industry and computer research institutions
    2. Problems in von Neumann-type computers
    3. Pre-processing and complex algorithms
  3. Solving computer issues with Memorism Processors
    1. Future information processing society and Memorism Processors
    2. DBP processor
    3. XOP processor
    4. SOP processor
    5. Optimization of information processing
    6. Specialization of information processing
  4. Solving problems with Memorism Processors
    1. Program simplification
    2. Paradigm shift into the modernization of information processing
    3. Renewal of von Neumann-type computers
    4. Theoretical verification thus far

1. Introduction

“To increase the information processing speed, we can use lots of servers,” “A better performance always means higher power consumption,” “For processing such as information searches, we can rely on indexes,” “Image recognition should be left to machine learning.” To prepare for the arrival of big data, IoT, and AI-based societies, it is necessary to review these kinds of common sense ideas and values (paradigms) concerning information processing.

Memorism Processors were developed in anticipation of the arrival of big data, IoT, and AI-based societies of the future. Having “super-fast” and “super energy efficient” information detection processing in place is of critical importance, so much so that saying that “it can solve all information processing issues” is not an overstatement. However, there is still a great deal of misconception about Memorism Processors; people think that they are difficult to understand or get used to. For this reason, this technical manual provides a detailed explanation of the main aspects of Memorism Processors.

2. Trends in the semiconductor industry and computer problem

2-1. Trends in the semiconductor industry and computer research institutions

As semiconductor miniaturization technology approaches its limit, it is no longer possible to expect major improvements in CPU and GPU performance. This situation prompted a surge in research on new types of computers such as quantum computers. However, the fact that the processing that quantum computers can perform is limited to solving combinational problems, coupled with the long time frame required for them to be miniaturized and commercialized as general-purpose products, is problematic. This scenario boosted research on in-memory computing (IMC) ≒ near memory computing (NMC)  ≒ process in memory (PIM), which integrates memory and operation functions, and heterogeneous computing.
Unfortunately, these types of research are still virtually non-existent in Japan.

2-2. Problems in von Neumann-type computers

The computers that we all use today are called von Neumann-type computers, which were developed to automate arithmetic operations. Thanks to its versatility, this type of computer is used in various fields and undoubtedly built today’s information society. However, it is also true that it struggles to process many tasks because it is inefficient and slow.

A typical type of processing that current computers struggle with is “information processing that involves detecting the desired data from a massive volume of information (data)”. Examples of this type of processing include “search, collation, recognition, authentication, classification, and sorting”. The configuration of von Neumann-type computers is well suited for information processing such as arithmetic operations, but the architecture with physically separated operation functions and memory causes bus bottleneck problems, which directly affect the “information processing involving data detection”.

2-3. Pre-processing and complex algorithms

Due to the conditions noted above, the only way to perform processing such as search, collation, recognition, authentication, classification, and sorting at high speeds is to create an index in advance or through a complex algorithm. However, these indexes and algorithms require pre-processing and updates, which not only sacrifice real-time capability, but also reduce the system’s overall performance and energy efficiency, in addition to making it more specialized. For these reasons, they are not seen as permanent solutions. Moreover, the previously explained case of bus bottlenecks has been considered a problem since shortly after the development of current computers, but it has been neglected ever since without an effective solution. Therefore, it is necessary to keep in mind that computers are used today as if evading problems.

3. Solving computer issues with Memorism Processors

3-1. Future information processing society and Memorism Processors

Information processing such as search, collation, recognition, authentication, classification, and sorting, which involve data detection, is deeply connected with the big data, IoT, and AI-based information processing societies of the future. For this reason, it is essential to improve the performance and functions of processors, but as the semiconductor miniaturization technology known by Moore’s law approaches its limit, it is no longer possible to expect a continuous improvement in the performance of conventional (classic) processors, namely, CPUs and GPUs. Consequently, it is necessary to develop information processors based on a new concept. The only way to achieve a suitable configuration (architecture) for “information processing involving data detection” is by integrating the memory and the operation function. Hence, Memorism Processors have integrated memory and operation functions, and were invented to solve the above-mentioned problems.

CPUs and GPUs consist of two types of operation functions: ALU (Arithmetic and Logic Unit), which can perform arithmetic and logical operations, and FPU (Floating Point Unit), which can perform floating-point number operations. These operation functions, which contain a large number of transistors, are expensive and very power-consuming.For this reason, only a few dozen operation functions in a CPUs chip and a few thousand operation functions in a GPUs chip can be implemented.  Therefore, it is important to keep in mind that repeatedly performing simple processing such as search, collation, recognition, authentication, classification, and sorting with the expensive and highly power-consuming functions of CPUs and GPUs is a waste of capacity.

However, the operation functions of Memorism Processors are specialized in searching, collating, recognizing, authenticating, classifying, and sorting. Further, they have a small number of transistors and are space-saving, low cost, and energy efficient, which makes it possible to execute the massively parallel processing of more than a million tasks with a single chip. Therefore, Memory Processors have a well-suited configuration for tasks, such as search, collation, recognition, authentication, classification, and sorting.
Memorism Processors can be seen as coprocessors and accelerators configured to more efficiently process essential tasks for the future progress of information processing, namely search, collation, recognition, authentication, classification, and sorting. Since the CPU, a conventional processor, is good at “OS processing, control processing, and communication processing,” and the GPU is good at “arithmetic operations,” they can perform these tasks and leave the others (search, collation, recognition, authentication, classification, and sorting) to the coprocessors and accelerators; that is, the Memorism Processors.

Memorism Processors are designed to take over “information processing involving data detection,” a typical example of tasks that current computers struggle with. It is a computing technology that comes in three types and six kinds of processors that were systematized based on different information processing purposes and types of information (data).
This is an exclusive computing technology that does not exist anywhere else in the world. Not only does it provide a significant upgrade in operation performance, but it also promotes the renewal of computers (see glossary) that can truly modernize the information processing industry (see glossary).

Moreover, the fact that Memorism Processors can be produced immediately in current semiconductors production lines without taxing semiconductor manufacturers with equipment investments is also a significant advantage that allows Memorism Processors to be supplied to the market at a low price.

3-2. DBP processor

The DBP is a processor designed to speed up the search or collation of general data. The search or collation of data by a conventional CPU and memory require a scan of the memory by the CPU.

For example, let us consider the task of detecting “all 0” data from the data stored in the memory. In this case, the majority of the data scanned by the CPU (for example 99%) to detect the “all 0” data is not the desired data; in other words, 99% of the work is wasted. To solve this problem, it is necessary to conduct pre-processing, such as preparing an index in advance, with which the true sense of real-time is lost. Moreover, this index needs to be updated every time the base data changes. Since this update is a very demanding task, speeding it up requires a high-performance (and highly power-consuming) CPU. As a result, the search system becomes not only large and power-consuming, but also highly technical, which makes its development very costly and long.

The DBP is a processor that solves the information processing problems mentioned above from their roots. It is based on the concept of inverting the matrix array of the data in the memory, and has a configuration with very wide data columns and super space-saving (super low-cost) operators built into the memory chip. Since this configuration requires no index, the power consumption is reduced, and the data can be searched and collated at a super high speed. Further, by connecting the necessary number of DBPs in parallel, they can be scaled up to any data size. Since the DBPs process the search and collation by themselves, the processing can be executed in a constant amount of time even if the number of DBPs increases (i.e., even if the data capacity increases). As a result, search systems using DBPs are compact, lightweight, energy efficient, and are not as specialized, which reduces their development cost and time frame.

Below is a case of a DBP implemented to DRAM memory

  1.  Just by storing a data table of any size to a DBP, it can search 64 billion cases in less than 100 milliseconds with no index/no tuning/1 CPU. (* If the intention is to simply increase the speed, it can be increased indefinitely.)
  2.  It complies with the general DRAM standards (JEDEC), features the DBP’s massively parallel operation function, and can be used as general memory. It is very reliable, and its temperature characteristics and calculation accuracy are guaranteed.
  3.  It is up to twice power of regular DRAM chip and can drastically reduce the number of CPUs. As a result, the whole system generates little heat and becomes very energy efficient
  4.  Since it requires no pre-processing (index creation) or tuning for search and collation, it can execute almost everything in real-time and requires no index update (only data rewriting).
  5.  Since it requires no pre-processing or tuning, even non-specialist engineers (programmers) can construct a database.
  6.  Due to its high affinity with CPUs, it has a wide range of applications, from servers to PCs. Therefore, high demand is expected in the coming years.

Two kinds of DBPs are available: the Standard DBP (S-DBP), suitable for the search or collation of big data, and Bitmap DBP (B-DBP), suitable for bitmap operations.

3-3. XOP processor

The XOP processor is designed to make efficient (massively parallel) comparisons between two sets of data groups to detect matching or similar data. A comparison operation between two data groups, with n and m pieces of data, by a CPU requires (n×m)/2 operations; if n and m increase, the burden on the CPU becomes massive. The XOP is a processor designed to solve this kind of problem in comparison with CPU operations.

The chip contains two sets of memory parts that store the X and Y data, as well as a large number of operators that compare data at the intersections of data lines from two sets of memory. As an example, if a bit-serial comparison operation says that X and Y are both 1K, 1M (million) bit-serial comparison operators are built-in as a result. Then, if the X and Y data match or are similar, it outputs that address. By repeatedly batch processing X and Y, it can process not only 1K×1K operations, but also 100 million×100 million data comparisons. In such a case, 1M operators perform comparison operations in parallel, which is a fundamental evolution in serial comparisons by CPUs. Moreover, by making one of the two sets of XOP data known data, it can take over processes that heavily tax the CPU, such as the creation of histograms, sorting, correlation verification, and classification.

3-4. SOP processor

The SOP is a processor designed for the standardization and generalization of pattern recognition technologies. While pattern recognition technologies are widely seen as the origin of knowledge processing, CPUs and GPUs struggle with pattern recognition. One of the reasons for this is that the basic means of pattern recognition technology is pattern matching, but pattern matching by CPUs and GPUs is so inefficient that it is not worth using.

Patterns include one-dimensional arrays, such as character strings and time-series data, and N-dimensional arrays. Regarding the use of pattern matching for these array data, the pattern matching known in normalized expressions has been widely used for character strings, which are one-dimensional arrays. Normalized expressions are used not only in character strings, but also in analyses of DNA base sequences. However, while pattern matching of images (two-dimensional arrays) has been used for many decades, today, its use is limited to fields such as the positioning and external inspection of objects for which it is easy to obtain a template pattern. This is because creating an ideal template pattern for the pattern matching of image data is a complex task, and even more crucially, pattern matching by CPUs and GPUs takes too long.

As a result, right now, the only alternatives are the use of complex algorithms, such as the Haar algorithm for face detection, HOG algorithms for body detection, and SIFT for general detection, or relying on machine learning, such as deep learning. However, algorithm and machine learning are highly specialized in that they require choosing the most appropriate options for the “detection and recognition target and purpose,” and development is often long and expensive.

The SOP is a processor designed to solve the problems noted above by performing pattern matching tasks by hardware, which was considered difficult until now, and thereby promote a fundamental evolution and standardization of pattern recognition technology. While the SOP can handle the pattern matching of one-dimensional to N-dimensional arrays, here we explain the pattern matching of images (two-dimensional arrays) for which there is high demand.

The SOP expands the conventional set operations (Boolean operations) to “set operations with the position added.” By adding the position to the set operations, it is possible to detect edges, corners , area, and patterns on images (i.e., pattern matching in the broad sense); in other words, various objects on an image, as well as their characteristics, can be detected at an ultra high-speed without involving complex algorithms or learning.

We have verified that by implementing an SOP by ASIC, it can perform pattern matching more than 10,000 times faster than the CPU of a laptop (TDP of about 15W), even with a prototype-level ASIC.

As a result, one set operation (pattern matching in a broad sense) takes just a few microseconds and consumes about 1W through the process, and almost 0W while non-operational. With this, it will be possible to achieve a power efficiency as much as 1 million times higher than what is possible today when ASIC is commercialized.

With ultra-fast pattern matching, it will be possible to solve old problems in the image recognition field, such as template pattern optimization issues, as well as problems involving the rotation and scaling of images.
The use of standardized template patterns makes software development easier and makes possible real-time image recognition.

These are the main examples of two-dimensional applications (images) of the SOP:

  1.  Extraction of image characteristics and image filter through specific edges, corners , area, and patterns
  2.  External inspection and flaw and abnormality detection in the FA field
  3.  Object tracking and stereo matching
  4.  Similar (illegal) image search, similar (illegal) video frame search, image database
  5.  Character recognition, character reading
  6.  Detection (recognition) of specific objects such as faces
  7.  General object recognition by comprehensive pattern matching of edges, corners area, and patterns

Due to its ultra high speed and power efficiency, the SOP not only offers a new recognition method that is difficult to implement with conventional machine and deep learning, but may also lead to faster and more advanced AI processing by combining conventional machine and deep learning.
It can be applied to not only specialized equipment, such as surveillance cameras and on-board equipment, but also general cameras, video equipment, computers, and smartphones.

A one-dimensional SOP can be used for the high-speed detection (recognition) and analysis of character string data by normalized expressions and base sequences of DNA by pattern matching, while a three-dimensional SOP can be used for the high-speed detection (recognition) and analysis of three-dimensional spaces (e.g., molecular structures) and ordinary spaces by pattern matching.

There are three kinds of SOP, which can be used for the pattern matching (recognition) of 1D, 2D, and N-dimensional array data.

3-5. Optimization of information processing

As suggested by Amdahl’s law, accelerating a single part of the information processing system through parallelization does not contribute much to improving the performance of the whole system. Only when many parts of the system are parallelized, optimized, and sped up does performance improve significantly.
However, it should be noted that parallelizing processes that the system struggles with or is inefficient at does not result in major performance improvements.
It is most important to use a processing device with high parallelization efficiency according to the purpose of information processing and the type of information.
Based on the principles of information processing mentioned above, we developed three types of Memorism Processors for each purpose of information processing, and six types of processors for each type of information. The maximum computing efficiency can be achieved by using the most suitable type of processor for each purpose and type of information.
This hyper heterogeneous computing technology concept is original to our company.

3-6. Specialization of information processing

While conventional computing is based on decentralized (divided) processing, which is characterized by multi-cores and cloud, Memorism Processors -based computing is specialized in that a dedicated processor performs the processing that it excels at, which is significant progress in the information processing field.
Through computing technology that combines conventional processors and Memorism Processors, we offer various advantages over the traditional method of information processing.

4. Solving problems with Memorism Processors

4-1. Program simplification

Developing databases such as SQL and Oracle without the help of experts or specialized companies is a difficult task. The main reason, as explained before, is that it requires pre-processing such as the search index and its tuning (optimization), which is considered very time-consuming and difficult for engineers (programmers) without database experience.
While it is difficult to prove numerically that a program is simpler, we conducted an empirical test on the creation of a database program that used a DBP as described below.
We requested an engineer (programmer) with no previous database experience to develop a search system by using the following methods:
・A database search system using MySQL (open source database software)
・A database search system using DBP (with the user manual)
The engineer was able to build the MySQL program , but tuning (optimization) was difficult, and he only could do it with a specialist’s advice and required a considerably long time.
However, in the case of a search system that uses the DBP, since it only requires installing the program according to the manual, with no need for pre-processing such as indexing or its tuning (optimization), the engineer (programmer) was able to construct the system very quickly.
This is an example of the DBP, but the SOP and XOP do not require complex pre-processing or tuning either and can also simplify programs.1

4-2. Paradigm shift into the modernization of information processing

The software systems of current computers are based on the technology of European and American IT companies which, naturally, was developed without considering the  existence of Memorism Processors.
As a result, pre-processing by complex algorithms, such as index and metadata, and tuning are indispensable in information detection processing, and because of this it is difficult for non-specialists to construct a system.
For this reason, software developers today are divided among database specialists, image and voice recognition specialists, IoT specialists, and others.
However, as detailed in “Program simplification,” we have proven that with the Memorism technology, it is possible to escape from the constraints of index and metadata. Even software programmers not specialized in databases can construct a database system and an image recognition system.
With this, companies will no longer need to rely on the same software as before, most of which is European and American IT technology, and with reduced development costs and time frames, they will be able to enjoy much more economic and responsive (fast delivery) information processing.
This radical change in common sense is a huge paradigm shift in information processing and a major feature of Memorism technology that can truly modernize the information processing industry.

4-3. Renewal of von Neumann-type computers

Since the development of ENIAC in 1946, current von Neumann-type computers have improved greatly in terms of the performance of the CPU, memory, and peripherals, but the system configuration has seen absolutely no progress. To prepare for the arrival of the post-Moore era and big data, AI, and IoT-based societies, system configurations must be renewed to meet modern needs.

Summary of information detection problems in von Neumann-type computers solved (renewed) by Memorism:

  • I need more speed… Memorism is thousands of times faster through ASIC.
  • I need more energy efficiency… It consumes just a few W per chip and requires minimal to no cooling.
  • I want to process larger volumes of data… It can handle large volumes because of its high processing speed.
  • I want to simplify programs… It needs no complex algorithms.
  • IoT needs (better edge performance)… It is fast and offers low-power edge devices.
  • AI needs (smarter and more energy-efficient)… It offers new low-power AI devices.
  • Economy… Simplified software and hardware can be developed at lower costs.
  • Responsiveness… Simplified software and hardware can be developed more quickly.

The various advantages listed above may contribute to the standardization of information processing (software) and make software more user friendly (simpler) and environmentally friendly, boosting the progress of information processing.
This is the renewal of von Neumann-type computers that is set to modernize the information processing industry.

4-4. Theoretical verification thus far

The patented theory of DBP, XOP, and SOP has been verified by FPGA.
The intellectual property of RTL and GDS of ASIC used in two-dimensional SOP (2D-SOP) for image recognition has been completed.
A feasibility study with a DRAM development company demonstrated that a DRAM-type DBP can search 64 billion cases of data (4TB) in about 100 milliseconds/1CPU with no index or tuning.
Software simplification by a Memorism Processor was empirically tested.

TOP