WO2001016761A2 - Efficient page allocation - Google Patents

Efficient page allocation Download PDF

Info

Publication number
WO2001016761A2
WO2001016761A2 PCT/US2000/024216 US0024216W WO0116761A2 WO 2001016761 A2 WO2001016761 A2 WO 2001016761A2 US 0024216 W US0024216 W US 0024216W WO 0116761 A2 WO0116761 A2 WO 0116761A2
Authority
WO
WIPO (PCT)
Prior art keywords
memory
shared
bit
shared memory
pages
Prior art date
Application number
PCT/US2000/024216
Other languages
French (fr)
Other versions
WO2001016761A3 (en
Inventor
Karlon K. West
Chris Miller
Original Assignee
Times N Systems, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Times N Systems, Inc. filed Critical Times N Systems, Inc.
Priority to AU71085/00A priority Critical patent/AU7108500A/en
Publication of WO2001016761A2 publication Critical patent/WO2001016761A2/en
Publication of WO2001016761A3 publication Critical patent/WO2001016761A3/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/52Program synchronisation; Mutual exclusion, e.g. by means of semaphores
    • G06F9/526Mutual exclusion algorithms
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/0223User address space allocation, e.g. contiguous or non contiguous base addressing
    • G06F12/0284Multiple user address space allocation, e.g. using different base addresses
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0806Multiuser, multiprocessor or multiprocessing cache systems
    • G06F12/0815Cache consistency protocols
    • G06F12/0817Cache consistency protocols using directory methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • G06F8/45Exploiting coarse grain parallelism in compilation, i.e. parallelism between groups of instructions
    • G06F8/457Communication
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5011Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
    • G06F9/5016Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals the resource being the memory
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication
    • G06F9/544Buffers; Shared memory; Pipes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0806Multiuser, multiprocessor or multiprocessing cache systems
    • G06F12/0815Cache consistency protocols
    • G06F12/0837Cache consistency protocols with software control, e.g. non-cacheable data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/52Indexing scheme relating to G06F9/52
    • G06F2209/523Mode

Definitions

  • the invention relates generally to the field of computer systems where one or more CPU are connected to one or more RAM subsystems, or portions thereof. More particularly, the invention relates to computer science techniques that utilize efficient page allocation.
  • some software packages allow a cluster of workstations to share work.
  • the work arrives, typically as batch jobs, at an entry point to the cluster where it is queued and dispatched to the workstations on the basis of load.
  • message-passing means that a given workstation operates on some portion of a job until communications (to send or receive data, typically) with another workstation is necessary. Then, the first workstation prepares and communicates with the other workstation.
  • MPP Massively Parallel Processor
  • a highly streamlined message-passing subsystem can typically require 10,000 to 20,000 CPU cycles or more.
  • Message-passing parallel processor systems have been offered commercially for years but have failed to capture significant market share because of poor performance and difficulty of programming for typical parallel applications. Message-passing parallel processor systems do have some advantages. In particular, because they share no resources, message-passing parallel processor systems are easier to provide with high-availability features. What is needed is a better approach to parallel processor systems.
  • U.S. Patent Applications 09/273,430, filed March 19, 1999 and PCT USOO/01262, filed January 18, 2000 are hereby expressly incorporated by reference herein for all purposes.
  • U.S. Ser. No. 09/273,430 improved upon the concept of shared memory by teaching the concept which will herein be referred to as a tight cluster.
  • the concept of a tight cluster is that of individual computers, each with its own CPU(s), memory, I/O, and operating system, but for which collection of computers there is a portion of memory which is shared by all the computers and via which they can exchange information.
  • 09/273,430 describes a system in which each processing node is provided with its own private copy of an operating system and in which the connection to shared memory is via a standard bus.
  • the advantage of a tight cluster in comparison to an SMP is "scalability" which means that a much larger number of computers can be attached together via a tight cluster than an SMP with little loss of processing efficiency.
  • Another technique known in the art is to arrange free pages in a linked list. The consumes less, albeit significant, space but requires more time for management thereof.
  • a goal of the invention is to simultaneously satisfy the above-discussed requirements of improving and expanding the tight cluster concept which, in the case of the prior art, are not satisfied.
  • One embodiment of the invention is based on a method, comprising: writing a bit-map representing a freedom state of a plurality of memory pages in a shared memory unit; and scrambling an order of said plurality of memory pages prior to writing said bit-map to reduce contention.
  • Another embodiment of the invention is based on An apparatus, comprising: a shared memory node including a plurality of shared memory pages; a first processing node coupled to said shared memory node; and a second processing node coupled to said shared memory node, wherein a first portion of said shared memory pages owned by said first processing node is coupled to a first separate memory bus and a second portion of said shared memory pages owned by said second processing node is coupled to a second separate memory bus to reduce contention.
  • Another embodiment of the invention is based on an electronic media, comprising: a computer program adapted to write a bit-map representing a freedom state of a plurality of memory pages in a shared memory unit; and scramble an order of said plurality of memory pages prior to writing said bit-map to reduce contention.
  • Another embodiment of the invention is based on a computer program comprising computer program means adapted to perform the steps of writing a bit-map representing a freedom state of a plurality of memory pages in a shared memory unit; and scrambling an order of said plurality of memory pages prior to writing said bit-map to reduce contention when said computer program is run on a computer.
  • Another embodiment of the invention is based on a system, comprising a multiplicity of processors, each with some private memory and the group with some shared memory, interconnected and arranged such that memory accesses to a first set of address ranges will be to local, private memory whereas memory accesses to a second set of address ranges will be to shared memory, and arranged so that one particular member of said second set is a small region, encoded so that each small elemental portion of said small region represents one of the minimum assignable sub-regions of said shared memory.
  • Another embodiment of the invention is based on a computer system in which each of one or more CPUs has access to a shared area of RAM, such that each CPU may access any area of this shared area.
  • Another embodiment of the invention is based on a computer system, comprising a shared memory node; a first processing node coupled to said shared memory node; and a second processing node coupled to said shared memory node, wherein one or more CPUs has access to a shared area of RAM, such that each CPU may access any area of the shared area.
  • FIG. 1 illustrates a block schematic view of a system, representing an embodiment of the invention. DESCRIPTION OF PREFERRED EMBODIMENTS The invention and the various features and advantageous details thereof are explained more fully with reference to the nonlimiting embodiments that are illustrated in the accompanying drawing and detailed in the following description of preferred embodiments. Descriptions of well known components and processing techniques are omitted so as not to unnecessarily obscure the invention in detail.
  • U.S. Ser. No. 09/273,430 include a system which is a single entity; one large supercomputer.
  • the invention is also applicable to a cluster of workstations, or even a network.
  • the invention is applicable to systems of the type of Pfister or the type of U.S. Ser. No. 09/273,430 in which each processing node has its own copy of an operating system.
  • the invention is also applicable to other types of multiple processing node systems.
  • the context of the invention can include a tight cluster as described in
  • a tight cluster is defined as a cluster of workstations or an arrangement within a single, multiple-processor machine in which the processors are connected by a high-speed, low-latency interconnection, and in which some but not all memory is shared among the processors.
  • accesses to a first set of ranges of memory addresses will be to local, private memory but accesses to a second set of memory address ranges will be to shared memory.
  • the significant advantage to a tight cluster in comparison to a message-passing cluster is that, assuming the environment has been appropriately established, the exchange of information involves a single STORE instruction by the sending processor and a subsequent single LOAD instruction by the receiving processor.
  • WSGR 15245-712; WSGR 15245-713; WSGR 15245-716; WSGR 15245-717; WSGR 15245-718; WSGR 15245-719; and WSGR 15245-720 can be performed in such a way as to require relatively little system overhead, and to be done once for many, many information exchanges. Therefore, a comparison of 10,000 instructions for message-passing to a pair of instructions for tight-clustering, is valid.
  • the invention can include a shared-memory cluster and means of providing highly-efficient operating system control for such a system.
  • a shared-memory cluster and means of providing highly-efficient operating system control for such a system.
  • the means of controlling shared memory in such a tight cluster for improved performance is the provision of a highly efficient method of page mapping utilized by the operating extensions running on different processors within the cluster.
  • the invention can include reducing subsystem contention.
  • the invention can include methods to efficiently and correctly manage memory ownership of portions of the subsystem.
  • the invention can be used in the environment described in U.S. Ser. No. 09/273,430 where multiple computers are provided with means to selectively address a first set of memory address ranges which will be to private memory and a second set of memory ranges which will be to shared memory.
  • the invention can be a free-page management scheme which is far more efficient in both memory space used and maintenance time required.
  • bit n of row m is not necessarily page number nk+m, where k is the length of a row in the bit map. This technique precludes the allocation of pages in a fashion so uniform as to tend to cause hot-spots in memory.
  • the preferred embodiment of the scrambling technique is to generate a linear bit-map of the pages, then multiply the resulting polynomial by a known, fixed polynomial chosen to have no real factors.
  • the length (k) of the rows in the bit-mapped page is selected to be an entity efficiently handled by the underlying cache and memory system. In the preferred embodiment, the length is the same as the length of a cache line. Of course, the invention can be implemented with different row lengths.
  • a pointer is established to the first row of the hashed page.
  • the kernel running thereon locks the pointer via semaphore, then reads the row pointed to, sets all the bits within that row to indicate unavailable status, and increments the pointer, then releases the semaphore.
  • the kernel process then reverses the scrambling process, records the addresses of the pages marked free, and adds them to a pool of shared pages available to that processor.
  • the technique involves use of two thresholds. The first, lower, of these thresholds is the lowest number of free pages in a particular subsystem's free page list below which it goes to the bit-mapped page to request additional pages. The higher of these thresholds is the number of freed pages in a particular list above which the subsystem returns pages to the common pool, and restores the bit-maps corresponding to those pages.
  • Restoration of pages may cause accessing of more than one row of the bit map.
  • the pointer is semaphore-locked, then each row for which free pages are to be returned is read and the particular bits representing the newly-freed pages are bit-flipped by the processor and the row written back to the bit-mapped page. After the bits are written, the pointer is returned to its value at entry and the semaphore is released.
  • the number of free pages an acquiring process acquires upon reading a row will become less than the number of bits in the row, and can be substantially less. If the lower threshold is not satisfied by one such get_pages operation, the operation will be repeated until the lower threshold is satisfied. When some subsystem cannot obtain sufficient pages after a system-settable number of reads, all processors are signaled, and the upper threshold for each is reduced.
  • memory pages are allocated from contiguous pools, and hot spots (places in memory where lots of accesses occur by more than one CPU a high percentage of the time) tend to form.
  • a computing system of the type described in U.S. Ser. No. 09/273,430 can be designed where shared memory pages can be located on physically separate memory buses such that there is no contention in accessing memory on the separate buses at the same time by different CPUs.
  • the CPU When a CPU needs access to one or more a pages of shared memory, the CPU first determines what physical shared memory pages to use, and if the pages are not owned, then those pages may be linked into a traditional page frame database for use by that CPU.
  • the means taught by this disclosure involve the selection process of those shared memory pages.
  • a first, simplistic method might be to split shared memory into blocks, one block per shared memory bus, and as shared memory is used, multiple blocks will eventually become used, and bus contention will be reduced. It is the general case however, that multiple sequential pages are used at a single time by any given CPU, and bus contention is not reduced by much.
  • a second method would be to stripe the pages across all shared memory buses, such that sequential page accesses will always use different shared memory buses. This scenario also leads to problems in that system data structures and other highly used memory pages tend to form hot spots, and shared memory bus contention is still not reduced as much as possible.
  • the invention can include sufficiently randomizing shared memory pages, and tracking where those pages are by using a large polynomial hash function (including but not limited to the standard IEEE 32-bit CRC function) such that in a special shared memory page, each bit in the page represents a shared memory page available for general application use, and the location of the bits in that special page use a CRC hash function to determine which shared memory pages on which shared memory buses should be used.
  • a large polynomial hash function including but not limited to the standard IEEE 32-bit CRC function
  • the available shared pages are allocated (e.g., associated with bits of a bit-map) at block 101.
  • a random hash is generated into the bit-map page at block 102.
  • a cache line of bits representing available shared memory pages is read by a processing node from the bit-map page at block 103. (In this example, shared memory pages that are available are represented in the bit-map by l's; shared memory pages that are not available are represented in the bit-map by 0's.)
  • preferred embodiments of the invention can be identified one at a time by testing for the substantially highest performance.
  • the test for the substantially highest performance can be carried out without undue experimentation by the use of a simple and conventional benchmark (speed) experiment.
  • substantially as used herein, is defined as at least approaching a given state (e.g., preferably within 10% of, more preferably within 1% of, and most preferably within 0.1% of).
  • coupled as used herein, is defined as connected, although not necessarily directly, and not necessarily mechanically.
  • means, as used herein, is defined as hardware, firmware and/or software for achieving a result.
  • program or phrase computer program is defined as a sequence of instructions designed for execution on a computer system.
  • a program may include a subroutine, a function, a procedure, an object method, an object implementation, an executable application, an applet, a servlet, a source code, an object code, and/or other sequence of instructions designed for execution on a computer system.
  • a practical application of the invention that has value within the technological arts is an environment where there are multiple compute nodes, each with one or more CPU and each CPU with private RAM, and where there are one or more RAM, and where there are one or more RAM units which are accessible by some or all of the compute nodes, and where two or more shared RAM units can be accessed simultaneously with no memory bus contention.
  • Another practical application of the invention that has value within the technological arts is waveform transformation.
  • the invention is useful in conjunction with data input and transformation (such as are used for the purpose of speech recognition), or in conjunction with transforming the appearance of a display (such as are used for the purpose of video games), or the like. There are virtually innumerable uses for the invention, all of which need not be detailed here.
  • a system representing an embodiment of the invention, can be cost effective and advantageous for at least the following reasons.
  • the invention improves the speed of parallel computing systems.
  • the invention improves the scalability of parallel computing systems.
  • efficient page allocation can be a separate module, it will be manifest that the efficient page allocation may be integrated into the system with which it is associated.
  • all the disclosed elements and features of each disclosed embodiment can be combined with, or substituted for, the disclosed elements and features of every other disclosed embodiment except where such elements or features are mutually exclusive.

Abstract

Methods, systems and devices are described for efficient page allocation. A method includes: writing a bit-map representing a freedom state of a plurality of memory pages in a shared memory unit; and scrambling an order of said plurality of memory pages prior to writing said bit-map to reduce contention. The methods, systems and devices provide advantages because the speed and scalability of parallel processor systems is enhanced.

Description

EFFICIENT PAGE ALLOCATION
BACKGROUND OF THE INVENTION
1. Field of the Invention The invention relates generally to the field of computer systems where one or more CPU are connected to one or more RAM subsystems, or portions thereof. More particularly, the invention relates to computer science techniques that utilize efficient page allocation.
2. Discussion of the Related Art The clustering of workstations is a well-known art. In the most common cases, the clustering involves workstations that operate almost totally independently, utilizing the network only to share such services as a printer, license-limited applications, or shared files.
In more-closely-coupled environments, some software packages (such as NQS) allow a cluster of workstations to share work. In such cases the work arrives, typically as batch jobs, at an entry point to the cluster where it is queued and dispatched to the workstations on the basis of load.
In both of these cases, and all other known cases of clustering, the operating system and cluster subsystem are built around the concept of message-passing. The term message-passing means that a given workstation operates on some portion of a job until communications (to send or receive data, typically) with another workstation is necessary. Then, the first workstation prepares and communicates with the other workstation.
Another well-known art is that of clustering processors within a machine, usually called a Massively Parallel Processor or MPP, in which the techniques are essentially identical to those of clustered workstations. Usually, the bandwidth and latency of the interconnect network of an MPP are more highly optimized, but the system operation is the same.
In the general case, the passing of a message is an extremely expensive operation; expensive in the sense that many CPU cycles in the sender and receiver are consumed by the process of sending, receiving, bracketing, verifying, and routing the message, CPU cycles that are therefore not available for other operations. A highly streamlined message-passing subsystem can typically require 10,000 to 20,000 CPU cycles or more.
There are specific cases wherein the passing of a message requires significantly less overhead. However, none of these specific cases is adaptable to a general-purpose computer system.
Message-passing parallel processor systems have been offered commercially for years but have failed to capture significant market share because of poor performance and difficulty of programming for typical parallel applications. Message-passing parallel processor systems do have some advantages. In particular, because they share no resources, message-passing parallel processor systems are easier to provide with high-availability features. What is needed is a better approach to parallel processor systems.
There are alternatives to the passing of messages for closely-coupled cluster work. One such alternative is the use of shared memory for inter- processor communication.
Shared-memory systems, have been much more successful at capturing market share than message-passing systems because of the dramatically superior performance of shared-memory systems, up to about four-processor systems. In Search of Clusters, Gregory F. Pfister 2nd ed. (January 1998) Prentice Hall Computer Books, ISBN: 0138997098 describes a computing system with multiple processing nodes in which each processing node is provided with private, local memory and also has access to a range of memory which is shared with other processing nodes. The disclosure of this publication in its entirety is hereby expressly incorporated herein by reference for the purpose of indicating the background of the invention and illustrating the state of the art.
However, providing high availability for traditional shared-memory systems has proved to be an elusive goal. The nature of these systems, which share all code and all data, including that data which controls the shared operating systems, is incompatible with the separation normally required for high availability. What is needed is an approach to shared-memory systems that improves availability. Although the use of shared memory for inter-processor communication is a well-known art, prior to the teachings of U.S. Ser. No. 09/273,430, filed March 19, 1999, entitled Shared Memory Apparatus and Method for Multiprocessing Systems, the processors shared a single copy of the operating system. The problem with such systems is that they cannot be efficiently scaled beyond four to eight way systems except in unusual circumstances. All known cases of said unusual circumstances are such that the systems are not good price-performance systems for general-purpose computing.
The entire contents of U.S. Patent Applications 09/273,430, filed March 19, 1999 and PCT USOO/01262, filed January 18, 2000 are hereby expressly incorporated by reference herein for all purposes. U.S. Ser. No. 09/273,430, improved upon the concept of shared memory by teaching the concept which will herein be referred to as a tight cluster. The concept of a tight cluster is that of individual computers, each with its own CPU(s), memory, I/O, and operating system, but for which collection of computers there is a portion of memory which is shared by all the computers and via which they can exchange information. U.S. Ser. No. 09/273,430 describes a system in which each processing node is provided with its own private copy of an operating system and in which the connection to shared memory is via a standard bus. The advantage of a tight cluster in comparison to an SMP is "scalability" which means that a much larger number of computers can be attached together via a tight cluster than an SMP with little loss of processing efficiency.
What is needed are improvements to the concept of the tight cluster. What is also needed is an expansion of the concept of the tight cluster. In a typical computing system, every CPU can access all of RAM, either directly with Load and Store instructions, or indirectly, such as with a message passing scheme. When more than one CPU can access or manage the RAM subsystem or a portion thereof, certain accesses to that RAM must be synchronized to ensure mutually exclusive access to portions of the RAM subsystem. This in turn generates contention for those portions of the RAM subsystem, herein referred to as "pages" or "memory pages" by multiple CPUs and thereby reduces overall system performance. One problem in any shared-memory system is the allocation of free pages to the processors to use on an as-needed basis. Part of this problem is caused by the need to represent (and find) which pages are free. One technique used in the past is to form a page table with a pointer per free page (plus control words). One difficulty with this solution is that the table is a huge, sparse matrix so that not only does it consume a large memory space, but also traversing it to find empty pages requires large amounts of time.
Another technique known in the art is to arrange free pages in a linked list. The consumes less, albeit significant, space but requires more time for management thereof.
SUMMARY OF THE INVENTION A goal of the invention is to simultaneously satisfy the above-discussed requirements of improving and expanding the tight cluster concept which, in the case of the prior art, are not satisfied.
One embodiment of the invention is based on a method, comprising: writing a bit-map representing a freedom state of a plurality of memory pages in a shared memory unit; and scrambling an order of said plurality of memory pages prior to writing said bit-map to reduce contention. Another embodiment of the invention is based on An apparatus, comprising: a shared memory node including a plurality of shared memory pages; a first processing node coupled to said shared memory node; and a second processing node coupled to said shared memory node, wherein a first portion of said shared memory pages owned by said first processing node is coupled to a first separate memory bus and a second portion of said shared memory pages owned by said second processing node is coupled to a second separate memory bus to reduce contention. Another embodiment of the invention is based on an electronic media, comprising: a computer program adapted to write a bit-map representing a freedom state of a plurality of memory pages in a shared memory unit; and scramble an order of said plurality of memory pages prior to writing said bit-map to reduce contention. Another embodiment of the invention is based on a computer program comprising computer program means adapted to perform the steps of writing a bit-map representing a freedom state of a plurality of memory pages in a shared memory unit; and scrambling an order of said plurality of memory pages prior to writing said bit-map to reduce contention when said computer program is run on a computer. Another embodiment of the invention is based on a system, comprising a multiplicity of processors, each with some private memory and the group with some shared memory, interconnected and arranged such that memory accesses to a first set of address ranges will be to local, private memory whereas memory accesses to a second set of address ranges will be to shared memory, and arranged so that one particular member of said second set is a small region, encoded so that each small elemental portion of said small region represents one of the minimum assignable sub-regions of said shared memory. Another embodiment of the invention is based on a computer system in which each of one or more CPUs has access to a shared area of RAM, such that each CPU may access any area of this shared area. Another embodiment of the invention is based on a computer system, comprising a shared memory node; a first processing node coupled to said shared memory node; and a second processing node coupled to said shared memory node, wherein one or more CPUs has access to a shared area of RAM, such that each CPU may access any area of the shared area. These, and other goals and embodiments of the invention will be better appreciated and understood when considered in conjunction with the following description and the accompanying drawing. It should be understood, however, that the following description, while indicating preferred embodiments of the invention and numerous specific details thereof, is given by way of illustration and not of limitation. Many changes and modifications may be made within the scope of the invention without departing from the spirit thereof, and the invention includes all such modifications.
BRIEF DESCRIPTION OF THE DRAWING A clear conception of the advantages and features constituting the invention, and of the components and operation of model systems provided with the invention, will become more readily apparent by referring to the exemplary, and therefore nonlimiting, embodiments illustrated in the drawing accompanying and forming a part of this specification, wherein like reference characters (if they occur in more than one view) designate the same parts. It should be noted that the features illustrated in the drawing are not necessarily drawn to scale.
FIG. 1 illustrates a block schematic view of a system, representing an embodiment of the invention. DESCRIPTION OF PREFERRED EMBODIMENTS The invention and the various features and advantageous details thereof are explained more fully with reference to the nonlimiting embodiments that are illustrated in the accompanying drawing and detailed in the following description of preferred embodiments. Descriptions of well known components and processing techniques are omitted so as not to unnecessarily obscure the invention in detail.
The teachings of U.S. Ser. No. 09/273,430 include a system which is a single entity; one large supercomputer. The invention is also applicable to a cluster of workstations, or even a network.
The invention is applicable to systems of the type of Pfister or the type of U.S. Ser. No. 09/273,430 in which each processing node has its own copy of an operating system. The invention is also applicable to other types of multiple processing node systems. The context of the invention can include a tight cluster as described in
U.S. Ser. No. 09/273,430. A tight cluster is defined as a cluster of workstations or an arrangement within a single, multiple-processor machine in which the processors are connected by a high-speed, low-latency interconnection, and in which some but not all memory is shared among the processors. Within the scope of a given processor, accesses to a first set of ranges of memory addresses will be to local, private memory but accesses to a second set of memory address ranges will be to shared memory. The significant advantage to a tight cluster in comparison to a message-passing cluster is that, assuming the environment has been appropriately established, the exchange of information involves a single STORE instruction by the sending processor and a subsequent single LOAD instruction by the receiving processor.
The establishment of the environment, taught by U.S. Ser. No. 09/273,430 and more fully by companion disclosures (U.S. Provisional Application Ser. No. 60/220,794, filed July 26, 2000; U.S. Provisional Application Ser. No. 60/220,748, filed July 26, 2000; WSGR 15245-711 ;
WSGR 15245-712; WSGR 15245-713; WSGR 15245-716; WSGR 15245-717; WSGR 15245-718; WSGR 15245-719; and WSGR 15245-720, the entire contents of all which are hereby expressly incorporated herein by reference for all purposes) can be performed in such a way as to require relatively little system overhead, and to be done once for many, many information exchanges. Therefore, a comparison of 10,000 instructions for message-passing to a pair of instructions for tight-clustering, is valid.
The invention can include a shared-memory cluster and means of providing highly-efficient operating system control for such a system. Among the means of controlling shared memory in such a tight cluster for improved performance is the provision of a highly efficient method of page mapping utilized by the operating extensions running on different processors within the cluster.
In the context of a computing system for which the memory (e.g., RAM) subsystem or a portion of the subsystem is connected to one or more central processing units (CPUs), the invention can include reducing subsystem contention. The invention can include methods to efficiently and correctly manage memory ownership of portions of the subsystem.
The invention can be used in the environment described in U.S. Ser. No. 09/273,430 where multiple computers are provided with means to selectively address a first set of memory address ranges which will be to private memory and a second set of memory ranges which will be to shared memory. The invention can be a free-page management scheme which is far more efficient in both memory space used and maintenance time required.
Only one single page of memory is required; a page filled with a bit-map representing free pages. In order to avoid allocating pages on an overly- structured basis, the bits in the bit-map are scrambled prior to being written. These bits can be scrambled via a conformal mapping technique. The purpose of the scrambling technique is to assure that bit n of row m is not necessarily page number nk+m, where k is the length of a row in the bit map. This technique precludes the allocation of pages in a fashion so uniform as to tend to cause hot-spots in memory. The preferred embodiment of the scrambling technique is to generate a linear bit-map of the pages, then multiply the resulting polynomial by a known, fixed polynomial chosen to have no real factors. Of course, the invention can be implemented with other scrambling algorithms. The length (k) of the rows in the bit-mapped page is selected to be an entity efficiently handled by the underlying cache and memory system. In the preferred embodiment, the length is the same as the length of a cache line. Of course, the invention can be implemented with different row lengths.
After an initial setup, a pointer is established to the first row of the hashed page. When a first processor needs free pages from the shared-memory free-page list, the kernel running thereon locks the pointer via semaphore, then reads the row pointed to, sets all the bits within that row to indicate unavailable status, and increments the pointer, then releases the semaphore. The kernel process then reverses the scrambling process, records the addresses of the pages marked free, and adds them to a pool of shared pages available to that processor. The technique involves use of two thresholds. The first, lower, of these thresholds is the lowest number of free pages in a particular subsystem's free page list below which it goes to the bit-mapped page to request additional pages. The higher of these thresholds is the number of freed pages in a particular list above which the subsystem returns pages to the common pool, and restores the bit-maps corresponding to those pages.
Restoration of pages may cause accessing of more than one row of the bit map. When restoration is done, the pointer is semaphore-locked, then each row for which free pages are to be returned is read and the particular bits representing the newly-freed pages are bit-flipped by the processor and the row written back to the bit-mapped page. After the bits are written, the pointer is returned to its value at entry and the semaphore is released.
Eventually, therefore, the number of free pages an acquiring process acquires upon reading a row will become less than the number of bits in the row, and can be substantially less. If the lower threshold is not satisfied by one such get_pages operation, the operation will be repeated until the lower threshold is satisfied. When some subsystem cannot obtain sufficient pages after a system-settable number of reads, all processors are signaled, and the upper threshold for each is reduced.
In a computing system where more than one CPU has access to the RAM subsystem, or portion thereof, some means of distributing memory pages among the multiple memory buses should be provided. In the general case, memory pages are allocated from contiguous pools, and hot spots (places in memory where lots of accesses occur by more than one CPU a high percentage of the time) tend to form.
In a computing system where each CPU can communicate with the other CPUs, a methodology can be designed where the page allocation is sufficiently distributed across the entire set of memory pages with minimal overhead, that multiple banks of RAM can be utilized and the overall system performance is thereby increased.
U.S. Ser. No. 09/273,430 described a system in which each compute node has its own, private memory, but in which there is also provided a shared global memory, accessible by all compute nodes. In this case, contention for the shared memory bus only occurs when more than one node is attempting to access the shared memory pages located on the same memory bus at the same time. Other distributed, share everything compute systems, including but not limited to cc-NUMA, as well as traditional SMP machines that contain more than one memory bus, (usually connected via cross-bar switches) can benefit from the techniques taught but this disclosure.
A computing system of the type described in U.S. Ser. No. 09/273,430 can be designed where shared memory pages can be located on physically separate memory buses such that there is no contention in accessing memory on the separate buses at the same time by different CPUs. When a CPU needs access to one or more a pages of shared memory, the CPU first determines what physical shared memory pages to use, and if the pages are not owned, then those pages may be linked into a traditional page frame database for use by that CPU. The means taught by this disclosure involve the selection process of those shared memory pages. A first, simplistic method might be to split shared memory into blocks, one block per shared memory bus, and as shared memory is used, multiple blocks will eventually become used, and bus contention will be reduced. It is the general case however, that multiple sequential pages are used at a single time by any given CPU, and bus contention is not reduced by much.
A second method would be to stripe the pages across all shared memory buses, such that sequential page accesses will always use different shared memory buses. This scenario also leads to problems in that system data structures and other highly used memory pages tend to form hot spots, and shared memory bus contention is still not reduced as much as possible.
The invention can include sufficiently randomizing shared memory pages, and tracking where those pages are by using a large polynomial hash function (including but not limited to the standard IEEE 32-bit CRC function) such that in a special shared memory page, each bit in the page represents a shared memory page available for general application use, and the location of the bits in that special page use a CRC hash function to determine which shared memory pages on which shared memory buses should be used. As pages are used, bits are set from one to zero, as they are released, bits are set from zero to one. A given CPU can read a cache-line size of bits to get a set of usable pages, to help eliminate the need for future shared memory accesses. This randomization help prevents hot spots from developing and thereby greatly reduces shared memory bus contention.
Referring to FIG. 1, the available shared pages are allocated (e.g., associated with bits of a bit-map) at block 101. A random hash is generated into the bit-map page at block 102. A cache line of bits representing available shared memory pages is read by a processing node from the bit-map page at block 103. (In this example, shared memory pages that are available are represented in the bit-map by l's; shared memory pages that are not available are represented in the bit-map by 0's.) At block 104, it is determined whether enough unused pages were read by the cache line. If insufficient unused page bits are read, the invention cycles back to block 102 where another random hash is generated onto the bit-map page. When sufficient unused page bits are read, these bits are set to 0 at block 105. At block 106, an inverse polynomial is taken to find (identify) the shared memory pages associated with the read bits. The shared pages are returned to a calling function at block 107.
While not being limited to any particular performance indicator or diagnostic identifier, preferred embodiments of the invention can be identified one at a time by testing for the substantially highest performance. The test for the substantially highest performance can be carried out without undue experimentation by the use of a simple and conventional benchmark (speed) experiment. The term substantially, as used herein, is defined as at least approaching a given state (e.g., preferably within 10% of, more preferably within 1% of, and most preferably within 0.1% of). The term coupled, as used herein, is defined as connected, although not necessarily directly, and not necessarily mechanically. The term means, as used herein, is defined as hardware, firmware and/or software for achieving a result. The term program or phrase computer program, as used herein, is defined as a sequence of instructions designed for execution on a computer system. A program may include a subroutine, a function, a procedure, an object method, an object implementation, an executable application, an applet, a servlet, a source code, an object code, and/or other sequence of instructions designed for execution on a computer system.
Practical Applications of the Invention A practical application of the invention that has value within the technological arts is an environment where there are multiple compute nodes, each with one or more CPU and each CPU with private RAM, and where there are one or more RAM, and where there are one or more RAM units which are accessible by some or all of the compute nodes, and where two or more shared RAM units can be accessed simultaneously with no memory bus contention. Another practical application of the invention that has value within the technological arts is waveform transformation. Further, the invention is useful in conjunction with data input and transformation (such as are used for the purpose of speech recognition), or in conjunction with transforming the appearance of a display (such as are used for the purpose of video games), or the like. There are virtually innumerable uses for the invention, all of which need not be detailed here.
Advantages of the Invention A system, representing an embodiment of the invention, can be cost effective and advantageous for at least the following reasons. The invention improves the speed of parallel computing systems. The invention improves the scalability of parallel computing systems.
All the disclosed embodiments of the invention described herein can be realized and practiced without undue experimentation. Although the best mode of carrying out the invention contemplated by the inventors is disclosed above, practice of the invention is not limited thereto. Accordingly, it will be appreciated by those skilled in the art that the invention may be practiced otherwise than as specifically described herein.
For example, although the efficient page allocation described herein can be a separate module, it will be manifest that the efficient page allocation may be integrated into the system with which it is associated. Furthermore, all the disclosed elements and features of each disclosed embodiment can be combined with, or substituted for, the disclosed elements and features of every other disclosed embodiment except where such elements or features are mutually exclusive.
It will be manifest that various additions, modifications and rearrangements of the features of the invention may be made without deviating from the spirit and scope of the underlying inventive concept. It is intended that the scope of the invention as defined by the appended claims and their equivalents cover all such additions, modifications, and rearrangements.
The appended claims are not to be interpreted as including means-plus- function limitations, unless such a limitation is explicitly recited in a given claim using the phrase "means for." Expedient embodiments of the invention are differentiated by the appended subclaims.

Claims

CLAIMS What is claimed is:
1. A method, comprising: writing a bit-map representing a freedom state of a plurality of memory pages in a shared memory unit; and scrambling an order of said plurality of memory pages prior to writing said bit-map to reduce contention.
2. The method of claim 1, wherein writing said bit-map is effected with an operating system extension.
3. The method of claim 1, wherein scrambling said order is effected with an operating system extension.
4. The method of claim 1, wherein scrambling includes scrambling with a conformal mapping technique.
5. The method of claim 1, wherein scrambling includes scrambling with a large polynomial hash function.
6. The method of claim 1, wherein scrambling includes scrambling with a 32-bit CRC function.
7. An apparatus, comprising: a shared memory node including a plurality of shared memory pages; a first processing node coupled to said shared memory node; and a second processing node coupled to said shared memory node, wherein a first portion of said shared memory pages owned by said first processing node is coupled to a first separate memory bus and a second portion of said shared memory pages owned by said second processing node is coupled to a second separate memory bus to reduce contention.
8. The apparatus of claim 1, wherein said first portion consists of a first memory block and said second portion consists of a second memory block.
9. A computer system comprising the apparatus of claim 8.
10. An electronic media, comprising: a computer program adapted to write a bit-map representing a freedom state of a plurality of memory pages in a shared memory unit; and scramble an order of said plurality of memory pages prior to writing said bit-map to reduce contention.
11. A computer program comprising computer program means adapted to perform the steps of writing a bit-map representing a freedom state of a plurality of memory pages in a shared memory unit; and scrambling an order of said plurality of memory pages prior to writing said bit-map to reduce contention when said computer program is run on a computer.
12. A computer program as claimed in claim 11 , embodied on a computer- readable medium.
13. A system, comprising a multiplicity of processors, each with some private memory and the group with some shared memory, interconnected and arranged such that memory accesses to a first set of address ranges will be to local, private memory whereas memory accesses to a second set of address ranges will be to shared memory, and arranged so that one particular member of said second set is a small region, encoded so that each small elemental portion of said small region represents one of the minimum assignable sub-regions of said shared memory.
14. The system of claim 13, wherein the small elemental portions are arranged in rows, a row being defined as a group of said small elemental portions, and are conveniently accessible by a processor sharing access to said small region.
15. The system of claim 14, further comprising a locking mechanism is utilized to assure that when a first processor is accessing said small region, no other processor may access said small region until said first said processor relinquishes said locking mechanism.
16. The system of claim 15, further comprising a processor may claim any or all of said minimum assignable sub-regions represented by said small elemental portion of said small region, and does so by altering the value of each said small elemental portion and placing said altered values back into said row of said small region.
17. The system of claim 13 , wherein said small region is a page of memory as defined by the operating system and machine organization of said shared- memory system.
18. The system of claim 13, wherein said small elemental portion is a binary bit.
19. The system of claim 13, wherein said minimum assignable sub-region is a page of memory as defined by the operating system and machine organization of said shared-memory system.
20. The system of claim 13, wherein said arrangement of said elemental portions is scrambled so that the position of one of said elemental portions is not linearly related to the position in memory of the minimum assignable sub- region it represents.
21. The system of claim 13, wherein said processors utilize said small region for management of said minimum assignable regions.
22. The system of claim 21 , wherein said processors each run a separate copy of a management subsystem, herein called a micro-kernel.
23. The system of claim 22, wherein each copy of said micro-kernel, running on its own said processor, maintains multiple threshold values relating to management of said minimum assignable sub-region, said threshold values being utilized by said micro-kernels for determining when to obtain additional said minimum assignable sub-regions and when to return some of said minimum assignable sub-regions to the common pool.
24. A computer system in which each of one or more CPUs has access to a shared area of RAM, such that each CPU may access any area of this shared area.
25. A computer system as described in claim 24 where shared RAM is located on one or more separate memory buses that support simultaneous access.
26. A computer system as described in claim 25 in which the data structures for tracking current page locations are maintained by a randomizing polynomial, using a bit mask page as the managing data structure.
27. A computer system, comprising a shared memory node; a first processing node coupled to said shared memory node; and a second processing node coupled to said shared memory node, wherein one or more CPUs has access to a shared area of RAM, such that each CPU may access any area of the shared area.
28. A computer system as described in claim 27, wherein shared RAM is located on one or more separate memory buses that support simultaneous access.
29. A computer system as described in claim 27, wherein data structures for tracking current page locations are maintained by a randomizing polynomial, using a bit mask page as the managing data structure.
PCT/US2000/024216 1999-08-31 2000-08-31 Efficient page allocation WO2001016761A2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU71085/00A AU7108500A (en) 1999-08-31 2000-08-31 Efficient page allocation

Applications Claiming Priority (6)

Application Number Priority Date Filing Date Title
US15215199P 1999-08-31 1999-08-31
US60/152,151 1999-08-31
US22097400P 2000-07-26 2000-07-26
US22074800P 2000-07-26 2000-07-26
US60/220,974 2000-07-26
US60/220,748 2000-07-26

Publications (2)

Publication Number Publication Date
WO2001016761A2 true WO2001016761A2 (en) 2001-03-08
WO2001016761A3 WO2001016761A3 (en) 2001-12-27

Family

ID=27387201

Family Applications (9)

Application Number Title Priority Date Filing Date
PCT/US2000/024216 WO2001016761A2 (en) 1999-08-31 2000-08-31 Efficient page allocation
PCT/US2000/024217 WO2001016741A2 (en) 1999-08-31 2000-08-31 Semaphore control of shared-memory
PCT/US2000/024039 WO2001016760A1 (en) 1999-08-31 2000-08-31 Switchable shared-memory cluster
PCT/US2000/024329 WO2001016750A2 (en) 1999-08-31 2000-08-31 High-availability, shared-memory cluster
PCT/US2000/024147 WO2001016737A2 (en) 1999-08-31 2000-08-31 Cache-coherent shared-memory cluster
PCT/US2000/024248 WO2001016742A2 (en) 1999-08-31 2000-08-31 Network shared memory
PCT/US2000/024150 WO2001016738A2 (en) 1999-08-31 2000-08-31 Efficient page ownership control
PCT/US2000/024298 WO2001016743A2 (en) 1999-08-31 2000-08-31 Shared memory disk
PCT/US2000/024210 WO2001016740A2 (en) 1999-08-31 2000-08-31 Efficient event waiting

Family Applications After (8)

Application Number Title Priority Date Filing Date
PCT/US2000/024217 WO2001016741A2 (en) 1999-08-31 2000-08-31 Semaphore control of shared-memory
PCT/US2000/024039 WO2001016760A1 (en) 1999-08-31 2000-08-31 Switchable shared-memory cluster
PCT/US2000/024329 WO2001016750A2 (en) 1999-08-31 2000-08-31 High-availability, shared-memory cluster
PCT/US2000/024147 WO2001016737A2 (en) 1999-08-31 2000-08-31 Cache-coherent shared-memory cluster
PCT/US2000/024248 WO2001016742A2 (en) 1999-08-31 2000-08-31 Network shared memory
PCT/US2000/024150 WO2001016738A2 (en) 1999-08-31 2000-08-31 Efficient page ownership control
PCT/US2000/024298 WO2001016743A2 (en) 1999-08-31 2000-08-31 Shared memory disk
PCT/US2000/024210 WO2001016740A2 (en) 1999-08-31 2000-08-31 Efficient event waiting

Country Status (4)

Country Link
EP (3) EP1214653A2 (en)
AU (9) AU7474200A (en)
CA (3) CA2382927A1 (en)
WO (9) WO2001016761A2 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1895413A3 (en) * 2006-08-18 2009-09-30 Fujitsu Limited Access monitoring method and device for shared memory

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040205217A1 (en) * 2001-07-13 2004-10-14 Maria Gabrani Method of running a media application and a media system with job control
US6999998B2 (en) 2001-10-04 2006-02-14 Hewlett-Packard Development Company, L.P. Shared memory coupling of network infrastructure devices
US6920485B2 (en) 2001-10-04 2005-07-19 Hewlett-Packard Development Company, L.P. Packet processing in shared memory multi-computer systems
US7254745B2 (en) 2002-10-03 2007-08-07 International Business Machines Corporation Diagnostic probe management in data processing systems
US7685381B2 (en) 2007-03-01 2010-03-23 International Business Machines Corporation Employing a data structure of readily accessible units of memory to facilitate memory access
US7899663B2 (en) 2007-03-30 2011-03-01 International Business Machines Corporation Providing memory consistency in an emulated processing environment
US9442780B2 (en) * 2011-07-19 2016-09-13 Qualcomm Incorporated Synchronization of shader operation
US9064437B2 (en) * 2012-12-07 2015-06-23 Intel Corporation Memory based semaphores
WO2014190486A1 (en) 2013-05-28 2014-12-04 华为技术有限公司 Method and system for supporting resource isolation under multi-core architecture

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4403283A (en) * 1980-07-28 1983-09-06 Ncr Corporation Extended memory system and method
US4484262A (en) * 1979-01-09 1984-11-20 Sullivan Herbert W Shared memory computer method and apparatus
EP0313787A2 (en) * 1987-10-29 1989-05-03 International Business Machines Corporation A hardware mechanism for the dynamic customization of permutation using bit-matrix multiplication
EP0350713A2 (en) * 1988-07-11 1990-01-17 International Business Machines Corporation Bit map search by competitive processors
US5784699A (en) * 1996-05-24 1998-07-21 Oracle Corporation Dynamic memory allocation in a computer using a bit map index

Family Cites Families (26)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US3668644A (en) * 1970-02-09 1972-06-06 Burroughs Corp Failsafe memory system
US4414624A (en) * 1980-11-19 1983-11-08 The United States Of America As Represented By The Secretary Of The Navy Multiple-microcomputer processing
US4725946A (en) * 1985-06-27 1988-02-16 Honeywell Information Systems Inc. P and V instructions for semaphore architecture in a multiprogramming/multiprocessing environment
US5175839A (en) * 1987-12-24 1992-12-29 Fujitsu Limited Storage control system in a computer system for double-writing
DE68925064T2 (en) * 1988-05-26 1996-08-08 Hitachi Ltd Task execution control method for a multiprocessor system with post / wait procedure
US4965717A (en) * 1988-12-09 1990-10-23 Tandem Computers Incorporated Multiple processor system having shared memory with private-write capability
EP0457308B1 (en) * 1990-05-18 1997-01-22 Fujitsu Limited Data processing system having an input/output path disconnecting mechanism and method for controlling the data processing system
US5206952A (en) * 1990-09-12 1993-04-27 Cray Research, Inc. Fault tolerant networking architecture
US5434970A (en) * 1991-02-14 1995-07-18 Cray Research, Inc. System for distributed multiprocessor communication
JPH04271453A (en) * 1991-02-27 1992-09-28 Toshiba Corp Composite electronic computer
DE69227956T2 (en) * 1991-07-18 1999-06-10 Tandem Computers Inc Multiprocessor system with mirrored memory
US5315707A (en) * 1992-01-10 1994-05-24 Digital Equipment Corporation Multiprocessor buffer system
US5398331A (en) * 1992-07-08 1995-03-14 International Business Machines Corporation Shared storage controller for dual copy shared data
US5434975A (en) * 1992-09-24 1995-07-18 At&T Corp. System for interconnecting a synchronous path having semaphores and an asynchronous path having message queuing for interprocess communications
DE4238593A1 (en) * 1992-11-16 1994-05-19 Ibm Multiprocessor computer system
JP2963298B2 (en) * 1993-03-26 1999-10-18 富士通株式会社 Recovery method of exclusive control instruction in duplicated shared memory and computer system
US5590308A (en) * 1993-09-01 1996-12-31 International Business Machines Corporation Method and apparatus for reducing false invalidations in distributed systems
US5664089A (en) * 1994-04-26 1997-09-02 Unisys Corporation Multiple power domain power loss detection and interface disable
US5636359A (en) * 1994-06-20 1997-06-03 International Business Machines Corporation Performance enhancement system and method for a hierarchical data cache using a RAID parity scheme
US6587889B1 (en) * 1995-10-17 2003-07-01 International Business Machines Corporation Junction manager program object interconnection and method
US5940870A (en) * 1996-05-21 1999-08-17 Industrial Technology Research Institute Address translation for shared-memory multiprocessor clustering
JPH10142298A (en) * 1996-11-15 1998-05-29 Advantest Corp Testing device for ic device
US5829029A (en) * 1996-12-18 1998-10-27 Bull Hn Information Systems Inc. Private cache miss and access management in a multiprocessor system with shared memory
US5918248A (en) * 1996-12-30 1999-06-29 Northern Telecom Limited Shared memory control algorithm for mutual exclusion and rollback
US6360303B1 (en) * 1997-09-30 2002-03-19 Compaq Computer Corporation Partitioning memory shared by multiple processors of a distributed processing system
DE69715203T2 (en) * 1997-10-10 2003-07-31 Bull Sa A data processing system with cc-NUMA (cache coherent, non-uniform memory access) architecture and cache memory contained in local memory for remote access

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4484262A (en) * 1979-01-09 1984-11-20 Sullivan Herbert W Shared memory computer method and apparatus
US4403283A (en) * 1980-07-28 1983-09-06 Ncr Corporation Extended memory system and method
EP0313787A2 (en) * 1987-10-29 1989-05-03 International Business Machines Corporation A hardware mechanism for the dynamic customization of permutation using bit-matrix multiplication
EP0350713A2 (en) * 1988-07-11 1990-01-17 International Business Machines Corporation Bit map search by competitive processors
US5784699A (en) * 1996-05-24 1998-07-21 Oracle Corporation Dynamic memory allocation in a computer using a bit map index

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1895413A3 (en) * 2006-08-18 2009-09-30 Fujitsu Limited Access monitoring method and device for shared memory

Also Published As

Publication number Publication date
WO2001016737A2 (en) 2001-03-08
CA2382728A1 (en) 2001-03-08
AU7110000A (en) 2001-03-26
AU7108300A (en) 2001-03-26
WO2001016750A2 (en) 2001-03-08
WO2001016750A3 (en) 2002-01-17
WO2001016738A8 (en) 2001-05-03
WO2001016740A2 (en) 2001-03-08
WO2001016741A3 (en) 2001-09-20
AU7108500A (en) 2001-03-26
EP1214652A2 (en) 2002-06-19
WO2001016761A3 (en) 2001-12-27
WO2001016738A3 (en) 2001-10-04
WO2001016738A2 (en) 2001-03-08
EP1214653A2 (en) 2002-06-19
AU7100700A (en) 2001-03-26
WO2001016760A1 (en) 2001-03-08
WO2001016743A3 (en) 2001-08-09
CA2382927A1 (en) 2001-03-08
WO2001016737A3 (en) 2001-11-08
WO2001016741A2 (en) 2001-03-08
AU7112100A (en) 2001-03-26
AU7474200A (en) 2001-03-26
WO2001016742A3 (en) 2001-09-20
WO2001016743A2 (en) 2001-03-08
AU6949600A (en) 2001-03-26
WO2001016743A8 (en) 2001-10-18
WO2001016742A2 (en) 2001-03-08
CA2382929A1 (en) 2001-03-08
AU7113600A (en) 2001-03-26
EP1214651A2 (en) 2002-06-19
WO2001016738A9 (en) 2002-09-12
AU6949700A (en) 2001-03-26
WO2001016740A3 (en) 2001-12-27

Similar Documents

Publication Publication Date Title
US6816947B1 (en) System and method for memory arbitration
Chai et al. Designing high performance and scalable MPI intra-node communication support for clusters
US6088770A (en) Shared memory multiprocessor performing cache coherency
Unrau et al. Hierarchical clustering: A structure for scalable multiprocessor operating system design
Cheriton et al. Paradigm: A highly scalable shared-memory multicomputer architecture
US6820174B2 (en) Multi-processor computer system using partition group directories to maintain cache coherence
US6601089B1 (en) System and method for allocating buffers for message passing in a shared-memory computer system
Vaidyanathan et al. Improving concurrency and asynchrony in multithreaded MPI applications using software offloading
US20090125907A1 (en) System and method for thread handling in multithreaded parallel computing of nested threads
WO1990007154A1 (en) Memory address mechanism in a distributed memory architecture
WO2001016761A2 (en) Efficient page allocation
US6789256B1 (en) System and method for allocating and using arrays in a shared-memory digital computer system
Karlsson et al. Performance evaluation of a cluster-based multiprocessor built from ATM switches and bus-based multiprocessor servers
CN111240853B (en) Bidirectional transmission method and system for large-block data in node
CN102571580A (en) Data receiving method and computer
US6760743B1 (en) Instruction memory system for multi-processor environment and disjoint tasks
US20020052868A1 (en) SIMD system and method
US20080294832A1 (en) I/O Forwarding Technique For Multi-Interrupt Capable Devices
US7406554B1 (en) Queue circuit and method for memory arbitration employing same
US7114040B2 (en) Default locality selection for memory objects based on determining the type of a particular memory object
Stumm et al. Experiences with the Hector multiprocessor
CA1138119A (en) Shared memory computer method and apparatus
US7073004B2 (en) Method and data processing system for microprocessor communication in a cluster-based multi-processor network
Agarwala et al. Experimenting with a shared virtual memory environment for hypercubes
CN1042979C (en) Apparatus and method for distributed program stack

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A2

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CR CU CZ DE DK DM DZ EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG US US US UZ VN YU ZA ZW

AL Designated countries for regional patents

Kind code of ref document: A2

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
AK Designated states

Kind code of ref document: A3

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CR CU CZ DE DK DM DZ EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG US US US UZ VN YU ZA ZW

AL Designated countries for regional patents

Kind code of ref document: A3

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

122 Ep: pct application non-entry in european phase
NENP Non-entry into the national phase

Ref country code: JP

DPE2 Request for preliminary examination filed before expiration of 19th month from priority date (pct application filed from 20040101)