WO2009101563A1 - Multiprocessing implementing a plurality of virtual processors - Google Patents
Multiprocessing implementing a plurality of virtual processors Download PDFInfo
- Publication number
- WO2009101563A1 WO2009101563A1 PCT/IB2009/050505 IB2009050505W WO2009101563A1 WO 2009101563 A1 WO2009101563 A1 WO 2009101563A1 IB 2009050505 W IB2009050505 W IB 2009050505W WO 2009101563 A1 WO2009101563 A1 WO 2009101563A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- threads
- ones
- vacant
- task
- processing cores
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5027—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
- G06F9/505—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the load
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/48—Program initiating; Program switching, e.g. by interrupt
- G06F9/4806—Task transfer initiation or dispatching
- G06F9/4843—Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
- G06F9/4881—Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2209/00—Indexing scheme relating to G06F9/00
- G06F2209/50—Indexing scheme relating to G06F9/50
- G06F2209/5011—Pool
Definitions
- Multiprocessing implementing a plurality of virtual processors
- the invention relates to a multiprocessing system and a method of processing data processing tasks in a multiprocessing system.
- EP 1416377 describes a multi-processing system with a task dispatcher that dispatches tasks to different processors. Dispatching a task involves a signal from the dispatcher to a processor that is must start executing the task. When task has been dispatched from the task dispatcher, the receiving processor retrieves the instructions of the task if necessary, and starts executing the instructions. Typically, the task dispatcher selects processors that are "free", i.e. not executing a task and dispatches new tasks to these processors.
- a data processing circuit is provided.
- a number of virtual processing cores is implemented on a smaller number of physical processing cores.
- Each virtual processing core is implemented using a software thread for executing successive tasks on the virtual processing core successively.
- Each processing core executes a group of such threads.
- a task assigning unit assigns a new task to a selected one of the virtual processing cores for execution.
- the task assigning unit uses a dynamic property of threads underlying the virtual processing cores to determine selection preferences, for example to define a priority order among vacant virtual processing cores so that a vacant virtual processing core with highest priority can be selected,. To determine this dynamic property, the task assigning unit "looks through" the virtual processing cores and uses an aggregate count for the physical processing core that execute the thread, by obtaining an aggregate count of threads in one or more predetermined states in the physical processing core. It has been found that execution efficiency of virtual processing cores can be increased by "looking through" the virtual processing cores in this way.
- each aggregate count is a count of blocked threads in a respective physical processing core.
- Blocked threads are threads executing tasks that are waiting for a resource.
- the task assigning unit is configured to give selection preference to vacant threads executing on physical processing cores with higher count of blocked threads over vacant threads executing on physical processing cores with lower count of blocked threads. It has been found that execution efficiency is increased by using this type of property.
- Fig. 1 shows a data processing circuit
- Fig. 2 shows a software architecture
- Fig. 1 shows a data processing circuit, comprising a plurality of physical processing cores 10, a resource circuit 12 and a task assigning unit 14.
- Processing cores 10 are coupled to resource circuit 12.
- Resource circuit 12 may comprise a main memory circuit, function specific computation circuits, input interface circuits, output interface circuits etc. (not shown) shared by the physical processing cores and coupled to the processing cores via one or more shared busses, one or more networks and/or dedicated connections.
- Task assigning unit 14 may be implemented using a programmable processor, programmed with a program that makes it perform the functions described in the following.
- Task assigning unit 14 is coupled to processing cores 10.
- Task assigning unit 14 may be coupled to resource circuit 12, for example to a memory circuit in resource circuit 12 wherein information about a collection of tasks is stored.
- Fig. 2 shows a software architecture of the system, showing processing cores 10 containing threads 20, some of which have an associated task 22, 24. Furthermore a queue 26 of tasks 28 is shown that is waiting at task assigning unit 14 to be assigned to threads 20.
- (Software) threads also called threads of executions, are known per se.
- a thread is a set of instances of execution of instructions that are executed by a physical processing core in a sequence that is logically defined by the instructions and their order in the program or programs of which they are part.
- the sequence of execution of any particular thread may comprise sequence parts whose execution is separated from each other by execution of parts of other threads, with instances of execution of instructions that define no unique sequence relative to the particular thread.
- each thread is defined to a processing core by a context accessible to the physical processing core 10, and instructions for the processing core that provide for transfer of control to instructions of tasks 22, 24, reception back of control from these instructions and various set-up functions.
- the use of continuing threads 20 to execute successive tasks avoids the overhead of starting the tasks on their own, as temporary threads.
- the threads 20 continue to run on the processing cores 10 after completing tasks, each time taking up a next task without terminating the thread and restarting a new thread in between.
- the majority of tasks may be small in the sense that it is inefficient to move execution of these tasks from one processing core 10 to another or to start the task anew, because the execution time of the task is comparable to the time needed to move or start the task (for example if the execution time is less than ten times the time needed to move and/or start the task). For such small tasks the overhead is significantly reduce by using continuing threads to execute tasks.
- each physical processing core 10 executes a plurality of processing threads 20 concurrently.
- Concurrent execution may be implemented e.g. on a time-multiplexed basis, but as far as processing cores 10 have a parallel processing capability, for example as part of pipelining, concurrency may also be implemented using by means of such parallel processing.
- the implementation of concurrent execution of multiple threads 20 is known per se. For example, it may involve context switching between contexts wherein execution involves different stored contexts for a plurality of threads 20.
- a context may include a program counter value and register contents, for example.
- Each thread 20 effectively defines a respective, different virtual processing core designed to take on tasks 22, 24, 28 successively.
- the plurality of threads 20 implements a corresponding plurality of such virtual processing cores.
- Each physical processing core 10 switches between executing threads 20 with different ones of the tasks 22, 24, so that each of the threads 20 runs part of the time.
- Threads 20 can be in different execution states, indicated by letters R, W, A, B in the figure.
- a thread 20 can be in a running state R, wherein it is actually executed by processing core 10, or it in waiting state W, waiting for its turn to run on a processing core 10 in the time division multiplex scheme.
- the thread 20 may be in a requesting state A wherein it sends a request for a task to task assigning unit 14 to obtain a new task for executing, when it has finished a previous task.
- thread 20 that has a task may be in a blocked state B, where it is blocked from running, for example when the task has to wait for a resource before execution can continue. Waiting for a resource may involve waiting for data from a main memory when a cache miss has occurred, waiting for a specialized circuit such as an I/O circuit, or a specialized computation circuit to become free or for such a specialized circuit to complete an operation.
- each physical processing core 10 has a predetermined number of threads 20, of which at most one at a time is in the running state R, a first number of threads 20 is in the waiting state W, a second number of threads 20 is in the blocked state B e.g. because it is waiting for a resource, and a third number of threads 20 is in the requesting state A waiting for a new task 28.
- Task assigning unit 14 receives new tasks 28 that may be assigned to any of the virtual processing cores for execution. In an embodiment, task assigning unit 14 maintains a queue 26 of such tasks 28, but alternatively a plurality of queues or a pool of tasks without one fixed order may be used. When task assigning unit 14 receives a request for a new task 28 from a thread 20, this indicates a "free" virtual processing core. When task assigning unit 14 has a new task 28 waiting for assignment, task assigning unit 14 may send the task to the requesting free virtual processing core.
- Task assigning unit 14 needs to perform selection between virtual processing cores when more than one of such threads 20 have sent requests, because they are all in the requesting state A. Task assigning unit 14 does not arbitrarily select any vacant virtual processing core. Instead task assigning unit 14 uses dynamic properties of the virtual processing cores to give preference to certain vacant virtual processing cores, for example by defining a ranking of different vacant threads 20 dependent on the properties of the threads 20, and selecting a thread 20 with a highest ranking.
- task assigning unit 14 determines the properties to determine the preference
- task assigning unit 14 "looks through" the properties of the virtual processing core and uses a property of the underlying physical processing core 10 that is shared with other virtual processing cores. In particular, an aggregate count of threads in predetermined selected states on a physical processing core 10 may be used. Thus, instead of using the properties of the virtual processing core, the properties of the physical processing core 10 are used. It has been found that this may improve execution efficiency.
- task assigning unit 14 performs resolution by assigning the new task 28 to a requesting thread 20 on a physical processing core 10 dependent on the number of threads 20 in the physical processing core 10 that are in the blocked state B.
- task assigning unit 14 "looks through" the properties of the virtual processing core and uses a property of the underlying physical processing core 10 that is shared with other virtual processing cores.
- a virtual processing core may be selected that is implemented using a thread 20 on a processing core 10 that has a highest count of threads 20 in the blocked state B, or at least has no lower count of threads 20 in the blocked state B than any other processing core 10. This has the advantage that processing time lost due to waiting for resources may be reduced. Simulations have shown that this increases execution efficiency.
- a count of threads 20 in the blocked state that is up to date at the time of assignment of a new task to a thread 20.
- An earlier count or an averaged count may be used, wherein threads are counted that may in fact no longer be blocked at the time of selection of a vacant thread. Such a count is still predictive of future blocking.
- the selection of a thread 20 the instantaneous count of threads 20 in the blocked state at the time of selection may be used.
- the selection may be based on a sampled count of threads 20 in the blocked state that has been sampled at an arbitrary time point in a time interval preceding the selection.
- each processing core 10 is configured to send requests for new tasks 28 to task assigning unit 14 in combination with count values of threads 20 in the processing core 10 that are in the blocked state.
- each processing core 10 is configured to keep a "block count", of threads 20 in the blocked state, and optionally additional counts of threads 20 in different states and to send the block count with the request for a new task 28.
- these block counts may be made accessible to task assigning unit 14 separately from requests, for use to resolve the assignment of a new task 28 to threads 20.
- task assigning unit 14 may be configured to determine the number of threads 20 in the blocked state B for each of physical processing cores 10.
- Task assigning unit 14 may be configured to keep information about each of the threads 20, indicating the task 22, 24, if any, executed by the thread 20 and the state of the thread 20.
- physical processing cores 10 may be configured to indicate state changes of threads 20 to task assigning unit 14, which are used by task assigning unit 14 to update its information about the tasks.
- task assigning unit 14 may be configured to determine a count of blocked threads 20 from this information.
- task assigning unit 14 may use counts of threads 20 in a requesting state A in respective processing cores 10, in combination with the "block-count".
- task assigning unit 14 may be configured to select one of those processing cores 10 that has the most requesting threads 20, or at least no lower number of requesting threads 20 than any other of those processing cores 10.
- the priority of the block count and the count of requesting threads 20 may be reversed, the task assigning unit 14 may be configured to select one of the processing cores 10 with the most requesting threads 20 and, if a plurality of processing cores 10 has the same count of requesting threads 20, one of those processing cores 10 that has the highest block count or at least no lower number of requesting threads 20 than any other of those processing cores 10.
- each physical processing core 10 executes a predetermined number of threads 20 and no more.
- the blocked counts of blocked threads 20 may also be obtained by counting threads in all but the blocked state B.
- task assigning unit 14 may poll information in processing cores 10 to determine which of processing cores 10 have threads 20 in a requesting state A. In this embodiment task assigning unit 14 may select from processing cores 10 that are detected to have threads in the requesting state, dependent on counts of threads 20 in the blocked state in these processing cores 10.
- a sum of the block count and the count of requesting threads 20 may be used, the task assigning unit 14 being configured to select one of the processing cores 10 with the highest sum or at least no lower sum than any other of the processing cores 10 with a requesting thread 20.
- task assigning unit 14 obtains aggregate counts of threads 20 in selected states, such as the blocked state, for each of the physical processing cores and searches for a physical processing core 10 with a vacant thread 20 and a highest count, or at least no lower count than for any such physical processing core 10. Subsequently, one of the vacant threads on that physical processing core is selected to execute the new task.
- task assigning unit 14 performs a selection among the physical processing cores 10 on a round-robin basis, after adjusting the number of occurrences of different physical processing cores 10 in a round robin list from which physical processing cores 10 are selected.
- task assigning unit 14 obtains aggregate counts of threads 20 in selected states, such as the blocked state, for each of the physical processing cores 10 and adjusts the number of occurrences dependent on the counts, for example increasing the number of occurrences with increasing count of blocked threads. Round robin selection is repeated, if necessary until a physical processing core 10 with a vacant thread is selected.
- Task assigning unit 14 is a circuit configured to perform functions that have been described in the preceding.
- task assigning unit 14 may be implemented using a programmable processor circuit, programmed with a program that makes it perform such functions.
- a dedicated circuit may be used, designed to perform these functions.
- Such a dedicated circuit may comprise a buffer memory for storing a queue of new task identifiers, a demultiplexer for demultiplexing task identifiers from the buffer memory to selected processor cores 10 and a selection circuit to control the demultiplexer.
- the selection circuit may have inputs coupled to outputs of the physical processor cores 10 that supply signals indicating the present of a thread 20 in a requesting state and a count of threads in one or more predetermined states, such as a count of threads 20 in the blocked state in the physical processor cores 10.
- the selection circuit may control the demultiplexer to send a task identifiers of a top task in the buffer memory to one of the physical processor core that supplies a signal indicating a requesting thread and a highest count among the processor cores with a requesting thread.
- a computer program may be stored/distributed on a suitable medium, such as an optical storage medium or a solid-state medium supplied together with or as part of other hardware, but may also be distributed in other forms, such as via the Internet or other wired or wireless telecommunication systems. Any reference signs in the claims should not be construed as limiting the scope.
Abstract
Tasks are executed in a data processing circuit that comprises a plurality of physical processing cores (10). Each physical processing core (10) executes a groups of threads on a time multiplex basis. Each thread defines a virtual processing core, for taking on and executing tasks sequentially. A task assigning unit (14) determines for each task which of the virtual processing cores will execute the task. To select the virtual processing core, the task assigning unit (14) looks through the virtual processing cores at the underlying physical processing core (10), determining an aggregate count of threads (20) in one or more predetermined states in the respective group of threads (20) of the physical processing core (10). The aggregate counts are used to determine a priority among the virtual processing cores. The task assigning unit (14) selects a vacant thread (20) with a highest priority according to the aggregate counts. Each aggregate count may be a count of threads (20) that are blocked because they are waiting for access to a resource such as main memory.
Description
Multiprocessing implementing a plurality of virtual processors
FIELD OF THE INVENTION
The invention relates to a multiprocessing system and a method of processing data processing tasks in a multiprocessing system.
BACKGROUND
EP 1416377 describes a multi-processing system with a task dispatcher that dispatches tasks to different processors. Dispatching a task involves a signal from the dispatcher to a processor that is must start executing the task. When task has been dispatched from the task dispatcher, the receiving processor retrieves the instructions of the task if necessary, and starts executing the instructions. Typically, the task dispatcher selects processors that are "free", i.e. not executing a task and dispatches new tasks to these processors.
It is also known to define virtual processors, which are implemented by software threads on different physical processors. Concurrent processing by different virtual processors may be realized by allocating physical processors cyclically on a time-division multiplex basis to successive ones of a group of physical processors. In this case, tasks have to be dispatched to different virtual processors. EP 1416377 does not discuss assignment of different tasks to threads when there is a plurality of processing cores that can each execute a plurality of threads concurrently.
SUMMARY
Among others, it is an object to make it possible to improve efficiency of task execution in a data processing circuit with a plurality of processing cores that each implements a plurality of virtual processing cores.
A data processing circuit according to claim 1 is provided. Herein a number of virtual processing cores is implemented on a smaller number of physical processing cores. Each virtual processing core is implemented using a software thread for executing successive tasks on the virtual processing core successively. Each processing core executes a group of
such threads. A task assigning unit assigns a new task to a selected one of the virtual processing cores for execution.
The task assigning unit uses a dynamic property of threads underlying the virtual processing cores to determine selection preferences, for example to define a priority order among vacant virtual processing cores so that a vacant virtual processing core with highest priority can be selected,. To determine this dynamic property, the task assigning unit "looks through" the virtual processing cores and uses an aggregate count for the physical processing core that execute the thread, by obtaining an aggregate count of threads in one or more predetermined states in the physical processing core. It has been found that execution efficiency of virtual processing cores can be increased by "looking through" the virtual processing cores in this way.
In an embodiment each aggregate count is a count of blocked threads in a respective physical processing core. Blocked threads are threads executing tasks that are waiting for a resource. In this embodiment the task assigning unit is configured to give selection preference to vacant threads executing on physical processing cores with higher count of blocked threads over vacant threads executing on physical processing cores with lower count of blocked threads. It has been found that execution efficiency is increased by using this type of property.
BRIEF DESCRIPTION OF THE DRAWING
These and other advantageous aspects will become apparent from a description of exemplary embodiments, using the following Figures: Fig. 1 shows a data processing circuit Fig. 2 shows a software architecture
DESCRIPTION OF EXEMPLARY EMBODIMENT
Fig. 1 shows a data processing circuit, comprising a plurality of physical processing cores 10, a resource circuit 12 and a task assigning unit 14. Processing cores 10 are coupled to resource circuit 12. Resource circuit 12 may comprise a main memory circuit, function specific computation circuits, input interface circuits, output interface circuits etc. (not shown) shared by the physical processing cores and coupled to the processing cores via one or more shared busses, one or more networks and/or dedicated connections. Task assigning unit 14 may be implemented using a programmable processor, programmed with a program that makes it perform the functions described in the following. Task assigning unit
14 is coupled to processing cores 10. Task assigning unit 14 may be coupled to resource circuit 12, for example to a memory circuit in resource circuit 12 wherein information about a collection of tasks is stored.
Fig. 2 shows a software architecture of the system, showing processing cores 10 containing threads 20, some of which have an associated task 22, 24. Furthermore a queue 26 of tasks 28 is shown that is waiting at task assigning unit 14 to be assigned to threads 20. (Software) threads, also called threads of executions, are known per se. As used herein, a thread is a set of instances of execution of instructions that are executed by a physical processing core in a sequence that is logically defined by the instructions and their order in the program or programs of which they are part. The sequence of execution of any particular thread may comprise sequence parts whose execution is separated from each other by execution of parts of other threads, with instances of execution of instructions that define no unique sequence relative to the particular thread. Typically each thread is defined to a processing core by a context accessible to the physical processing core 10, and instructions for the processing core that provide for transfer of control to instructions of tasks 22, 24, reception back of control from these instructions and various set-up functions.
The use of continuing threads 20 to execute successive tasks avoids the overhead of starting the tasks on their own, as temporary threads. The threads 20 continue to run on the processing cores 10 after completing tasks, each time taking up a next task without terminating the thread and restarting a new thread in between. The majority of tasks may be small in the sense that it is inefficient to move execution of these tasks from one processing core 10 to another or to start the task anew, because the execution time of the task is comparable to the time needed to move or start the task (for example if the execution time is less than ten times the time needed to move and/or start the task). For such small tasks the overhead is significantly reduce by using continuing threads to execute tasks.
In operation, each physical processing core 10 executes a plurality of processing threads 20 concurrently. Concurrent execution may be implemented e.g. on a time-multiplexed basis, but as far as processing cores 10 have a parallel processing capability, for example as part of pipelining, concurrency may also be implemented using by means of such parallel processing. The implementation of concurrent execution of multiple threads 20 is known per se. For example, it may involve context switching between contexts wherein execution involves different stored contexts for a plurality of threads 20. A context may include a program counter value and register contents, for example.
Each thread 20 effectively defines a respective, different virtual processing core designed to take on tasks 22, 24, 28 successively. The plurality of threads 20 implements a corresponding plurality of such virtual processing cores. Each physical processing core 10 switches between executing threads 20 with different ones of the tasks 22, 24, so that each of the threads 20 runs part of the time.
Threads 20 can be in different execution states, indicated by letters R, W, A, B in the figure. A thread 20 can be in a running state R, wherein it is actually executed by processing core 10, or it in waiting state W, waiting for its turn to run on a processing core 10 in the time division multiplex scheme. The thread 20 may be in a requesting state A wherein it sends a request for a task to task assigning unit 14 to obtain a new task for executing, when it has finished a previous task. Also, thread 20 that has a task may be in a blocked state B, where it is blocked from running, for example when the task has to wait for a resource before execution can continue. Waiting for a resource may involve waiting for data from a main memory when a cache miss has occurred, waiting for a specialized circuit such as an I/O circuit, or a specialized computation circuit to become free or for such a specialized circuit to complete an operation.
When a thread 20 has finished a task 22, 24, it continues to exist, switches to the requesting state A, and signals to task assigning unit 14 to request a next task 28. Typically, each physical processing core 10 has a predetermined number of threads 20, of which at most one at a time is in the running state R, a first number of threads 20 is in the waiting state W, a second number of threads 20 is in the blocked state B e.g. because it is waiting for a resource, and a third number of threads 20 is in the requesting state A waiting for a new task 28.
Task assigning unit 14 receives new tasks 28 that may be assigned to any of the virtual processing cores for execution. In an embodiment, task assigning unit 14 maintains a queue 26 of such tasks 28, but alternatively a plurality of queues or a pool of tasks without one fixed order may be used. When task assigning unit 14 receives a request for a new task 28 from a thread 20, this indicates a "free" virtual processing core. When task assigning unit 14 has a new task 28 waiting for assignment, task assigning unit 14 may send the task to the requesting free virtual processing core.
Task assigning unit 14 needs to perform selection between virtual processing cores when more than one of such threads 20 have sent requests, because they are all in the requesting state A. Task assigning unit 14 does not arbitrarily select any vacant virtual processing core. Instead task assigning unit 14 uses dynamic properties of the virtual
processing cores to give preference to certain vacant virtual processing cores, for example by defining a ranking of different vacant threads 20 dependent on the properties of the threads 20, and selecting a thread 20 with a highest ranking.
When task assigning unit 14 determines the properties to determine the preference, task assigning unit 14 "looks through" the properties of the virtual processing core and uses a property of the underlying physical processing core 10 that is shared with other virtual processing cores. In particular, an aggregate count of threads in predetermined selected states on a physical processing core 10 may be used. Thus, instead of using the properties of the virtual processing core, the properties of the physical processing core 10 are used. It has been found that this may improve execution efficiency.
In an embodiment, task assigning unit 14 performs resolution by assigning the new task 28 to a requesting thread 20 on a physical processing core 10 dependent on the number of threads 20 in the physical processing core 10 that are in the blocked state B. Thus, task assigning unit 14 "looks through" the properties of the virtual processing core and uses a property of the underlying physical processing core 10 that is shared with other virtual processing cores. A virtual processing core may be selected that is implemented using a thread 20 on a processing core 10 that has a highest count of threads 20 in the blocked state B, or at least has no lower count of threads 20 in the blocked state B than any other processing core 10. This has the advantage that processing time lost due to waiting for resources may be reduced. Simulations have shown that this increases execution efficiency.
It should be noted that it is not necessary to use a count of threads 20 in the blocked state that is up to date at the time of assignment of a new task to a thread 20. An earlier count or an averaged count may be used, wherein threads are counted that may in fact no longer be blocked at the time of selection of a vacant thread. Such a count is still predictive of future blocking. In an embodiment the selection of a thread 20 the instantaneous count of threads 20 in the blocked state at the time of selection may be used. In an alternative embodiment, the selection may be based on a sampled count of threads 20 in the blocked state that has been sampled at an arbitrary time point in a time interval preceding the selection. As a further alternative, the selection may be based on an average count of threads 20 in the blocked state, corresponding to the instantaneous count averaged over a predetermined time interval prior to the selection. By using non- instantaneous counts simpler circuits, satisfying less stringent timing requirements may be used. Averaging may increase the prediction accuracy of future blocking.
In an embodiment, each processing core 10 is configured to send requests for new tasks 28 to task assigning unit 14 in combination with count values of threads 20 in the processing core 10 that are in the blocked state. In this embodiment each processing core 10 is configured to keep a "block count", of threads 20 in the blocked state, and optionally additional counts of threads 20 in different states and to send the block count with the request for a new task 28. In an alternative embodiment these block counts may be made accessible to task assigning unit 14 separately from requests, for use to resolve the assignment of a new task 28 to threads 20. In another embodiment task assigning unit 14 may be configured to determine the number of threads 20 in the blocked state B for each of physical processing cores 10. Task assigning unit 14 may be configured to keep information about each of the threads 20, indicating the task 22, 24, if any, executed by the thread 20 and the state of the thread 20. For this purpose, physical processing cores 10 may be configured to indicate state changes of threads 20 to task assigning unit 14, which are used by task assigning unit 14 to update its information about the tasks. In this embodiment, task assigning unit 14 may be configured to determine a count of blocked threads 20 from this information.
In a further embodiment the counts of blocked threads 20 used in combination with other parameters to control the selection of a thread 20 for a new task 28. Thus for example, task assigning unit 14 may use counts of threads 20 in a requesting state A in respective processing cores 10, in combination with the "block-count". When a plurality of processing cores 10 with requesting threads 20 and lowest block counts have equal block counts, task assigning unit 14 may be configured to select one of those processing cores 10 that has the most requesting threads 20, or at least no lower number of requesting threads 20 than any other of those processing cores 10.
In another embodiment, the priority of the block count and the count of requesting threads 20 may be reversed, the task assigning unit 14 may be configured to select one of the processing cores 10 with the most requesting threads 20 and, if a plurality of processing cores 10 has the same count of requesting threads 20, one of those processing cores 10 that has the highest block count or at least no lower number of requesting threads 20 than any other of those processing cores 10.
In embodiment each physical processing core 10 executes a predetermined number of threads 20 and no more. In this case, the blocked counts of blocked threads 20 may also be obtained by counting threads in all but the blocked state B.
Instead of using explicit requests for new tasks 28 from processing cores 10, task assigning unit 14 may poll information in processing cores 10 to determine which of
processing cores 10 have threads 20 in a requesting state A. In this embodiment task assigning unit 14 may select from processing cores 10 that are detected to have threads in the requesting state, dependent on counts of threads 20 in the blocked state in these processing cores 10.
In another embodiment, a sum of the block count and the count of requesting threads 20 may be used, the task assigning unit 14 being configured to select one of the processing cores 10 with the highest sum or at least no lower sum than any other of the processing cores 10 with a requesting thread 20.
Any known prioritized selection scheme may be used to give preference to threads 20 during selection. In an embodiment, task assigning unit 14 obtains aggregate counts of threads 20 in selected states, such as the blocked state, for each of the physical processing cores and searches for a physical processing core 10 with a vacant thread 20 and a highest count, or at least no lower count than for any such physical processing core 10. Subsequently, one of the vacant threads on that physical processing core is selected to execute the new task.
However, preference can be given in many other ways. In another example of an embodiment, task assigning unit 14 performs a selection among the physical processing cores 10 on a round-robin basis, after adjusting the number of occurrences of different physical processing cores 10 in a round robin list from which physical processing cores 10 are selected. In this embodiment task assigning unit 14 obtains aggregate counts of threads 20 in selected states, such as the blocked state, for each of the physical processing cores 10 and adjusts the number of occurrences dependent on the counts, for example increasing the number of occurrences with increasing count of blocked threads. Round robin selection is repeated, if necessary until a physical processing core 10 with a vacant thread is selected. After selecting the physical processing core 10 on the round-robin basis one of the vacant threads on that physical processing core 10 is selected to execute the new task. A similar scheme may be used in combination with a weighted (pseudo-)random selection instead of round-robin selection, wherein the selection probability of different physical processing cores is adjusted according to the aggregate counts.
Task assigning unit 14 is a circuit configured to perform functions that have been described in the preceding. As noted, task assigning unit 14 may be implemented using a programmable processor circuit, programmed with a program that makes it perform such functions. Alternatively, a dedicated circuit may be used, designed to perform these functions. Such a dedicated circuit may comprise a buffer memory for storing a queue of new
task identifiers, a demultiplexer for demultiplexing task identifiers from the buffer memory to selected processor cores 10 and a selection circuit to control the demultiplexer. The selection circuit may have inputs coupled to outputs of the physical processor cores 10 that supply signals indicating the present of a thread 20 in a requesting state and a count of threads in one or more predetermined states, such as a count of threads 20 in the blocked state in the physical processor cores 10. In this case, the selection circuit may control the demultiplexer to send a task identifiers of a top task in the buffer memory to one of the physical processor core that supplies a signal indicating a requesting thread and a highest count among the processor cores with a requesting thread.
Other variations to the disclosed embodiments can be understood and effected by those skilled in the art in practicing the claimed invention, from a study of the drawings, the disclosure, and the appended claims. In the claims, the word "comprising" does not exclude other elements or steps, and the indefinite article "a" or "an" does not exclude a plurality. A single processor or other unit may fulfil the functions of several items recited in the claims. The mere fact that certain measures are recited in mutually different dependent claims does not indicate that a combination of these measured cannot be used to advantage. A computer program may be stored/distributed on a suitable medium, such as an optical storage medium or a solid-state medium supplied together with or as part of other hardware, but may also be distributed in other forms, such as via the Internet or other wired or wireless telecommunication systems. Any reference signs in the claims should not be construed as limiting the scope.
Claims
1. A data processing circuit for providing a plurality of virtual processing cores executing in a plurality of threads (20), the plurality of threads (20) being subdivided into groups of threads (20), the circuit comprising: a plurality of physical processing cores (10) each configured to execute the threads (20) of a respective one of the groups on a time multiplex basis, each thread (20) defining a respective virtual processing core for executing tasks sequentially; a task assigning unit (14), configured to assign a new task to a selected one of the virtual processing cores for execution, the task assigning unit (14) being configured to select the selected one of the virtual processing cores from among virtual processing cores defined by vacant ones of the threads (20) that are not used by any task, the task assigning unit (20) being configured to give selection preference to vacant ones of the threads dependent on respective aggregate counts for respective ones of the physical processing cores (10) on which the vacant ones of the threads (20) are executed, the respective aggregate counts being aggregate counts of threads (20) in one or more predetermined states (B) in the respective ones of the physical processing cores.
2. A data processing circuit according to claim 1, wherein the aggregate count is a count of blocked ones of the threads (20) in the respective group of threads of the respective one of physical processing cores (10), the blocked ones of the threads (20) being threads (20) executing tasks that are waiting for a resource, the task assigning unit (14) being configured to give selection preference to vacant ones of the threads (20) executing on physical processing cores (10) with higher count of blocked ones of the threads (20) over vacant ones of the threads (20) executing on physical processing cores with lower count of blocked ones of the threads (20).
3. A data processing circuit according to claim 2, wherein the task assigning unit (14) is configured to determine further counts, each of the vacant ones of the threads (20) in a respective one of the physical processing cores (10), the task assigning unit (14) being configured to give selection preference to vacant ones of the threads (20) dependent on combinations of the further counts and the counts of blocked ones of the threads (20) for the respective ones of the physical processing cores (10).
4. A data processing circuit according to claim 1, wherein the aggregate count for each respective one of the physical processing cores (10) is a respective count of blocked ones of the threads (20) and the vacant ones of the threads (20) in the respective groups of threads (20) of the physical processing core (10), the blocked ones of the threads (20) being threads (20) executing tasks that are waiting for a resource, the task assigning unit (14) being configured to give selection preference to vacant ones of the threads (20) executing on physical processing cores (10) with higher count of blocked and vacant ones of the threads (20) over vacant ones of the threads (20) executing on physical processing cores (10) with lower count of blocked and vacant ones of the threads (20).
5. A data processing circuit according to claim 1, wherein the task assigning unit (14) is configured to determine the aggregate counts by sampling states of the threads (20) and/or counts of threads (20) in the one or more predetermined states (B), during a predetermined time interval prior to selection among the vacant ones of the threads (20).
6. A data processing circuit according to claim 1, wherein the task assigning unit is configured to determine the count by averaging a number of threads (20) in the one or more predetermined state during a predetermined time interval prior to selection among the vacant ones of the threads (20).
7. A data processing circuit according to claim 1, wherein the threads (20) are configured to signal to the task assigning unit (14) when they are vacant.
8. A data processing circuit according to claim 1, wherein the physical processing cores are configured to signal a number of threads in the one or more predetermined states to the task assigning unit during operation.
9. A data processing circuit according to claim 1, comprising a resource circuit (12) comprising at least one of a main memory, a function specific computation circuit, an input interface circuit, and output interface circuit and an input-output interface circuit, the physical processing cores (10) being configured to block threads (20) that execute tasks waiting for grant and/or completion of access to the resource circuit (12).
10. A method of executing tasks in a data processing circuit that comprises a plurality of physical processing cores (10) executing threads subdivided into groups of threads, the method comprising: executing the threads (20) of each group on a time multiplex basis in a respective one of the physical processing cores (10); executing respective tasks, each using a virtual processing core implemented by a respective one of the threads (20); determining, at least for each physical processing core whose respective group of threads (20) includes a vacant one of the threads (20) not used by a task, an aggregate count of threads (20) in one or more predetermined states in the respective group of threads (20) of the physical processing core (10); selecting one of the vacant ones of the threads (20) from among the vacant ones of the threads (20), giving selection preference to vacant ones of the threads (20) dependent on the aggregate counts; when a new task is available, assigning the new task to the selected vacant one of the threads (20) for execution of the new task.
11. A method according to claim 10, comprising: detecting blocked ones of the threads (20) that are blocked from proceeding with execution of the tasks executed by the blocked ones of the threads (20); and wherein the aggregate counts are count of blocked ones of the threads (20) on respective ones of the physical processing cores (10), and selection preference is given to vacant ones of the threads executing on physical processing cores (10) with higher count of blocked ones of the threads (20) over vacant ones of the threads (20) executing on physical processing cores (10) with lower count of blocked ones of the threads (20).
12. A computer program product that comprises a program of instructions that, when executed by a programmable processor, causes the programmable processor to operate as a task assigning unit (14) in a data processing circuit with on virtual processing cores defined by threads (20), the threads (20) being organized in groups, each group executing on a respective physical processing core (10), the program of instructions being configured to cause programmable processor to determine, at least for each physical processing core (10) whose respective group of threads (20) includes a vacant one of the threads (20) not used by a task, an aggregate count of threads (20) one or more predetermined states (B) in the respective group of threads (20) of the physical processing core; select one of the vacant ones of the threads (20) from among the vacant ones of the threads (20), giving selection preference to vacant ones of the threads (20) dependent on the aggregate counts; when a new task is available, assign the new task to the selected vacant one of the threads (20) for execution of the new task.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP08101490.4 | 2008-02-11 | ||
EP08101490 | 2008-02-11 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2009101563A1 true WO2009101563A1 (en) | 2009-08-20 |
Family
ID=40585546
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/IB2009/050505 WO2009101563A1 (en) | 2008-02-11 | 2009-02-09 | Multiprocessing implementing a plurality of virtual processors |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2009101563A1 (en) |
Cited By (31)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8209512B1 (en) * | 2009-03-16 | 2012-06-26 | Hewlett-Packard Development Company, L.P. | Selecting a cell that is a preferred candidate for executing a process |
WO2012135050A3 (en) * | 2011-03-25 | 2012-11-29 | Soft Machines, Inc. | Memory fragments for supporting code block execution by using virtual cores instantiated by partitionable engines |
US9195493B2 (en) | 2014-03-27 | 2015-11-24 | International Business Machines Corporation | Dispatching multiple threads in a computer |
US9213569B2 (en) | 2014-03-27 | 2015-12-15 | International Business Machines Corporation | Exiting multiple threads in a computer |
US9223574B2 (en) | 2014-03-27 | 2015-12-29 | International Business Machines Corporation | Start virtual execution instruction for dispatching multiple threads in a computer |
KR101738641B1 (en) | 2010-12-17 | 2017-05-23 | 삼성전자주식회사 | Apparatus and method for compilation of program on multi core system |
US9766893B2 (en) | 2011-03-25 | 2017-09-19 | Intel Corporation | Executing instruction sequence code blocks by using virtual cores instantiated by partitionable engines |
US9772867B2 (en) | 2014-03-27 | 2017-09-26 | International Business Machines Corporation | Control area for managing multiple threads in a computer |
US9811342B2 (en) | 2013-03-15 | 2017-11-07 | Intel Corporation | Method for performing dual dispatch of blocks and half blocks |
US9811377B2 (en) | 2013-03-15 | 2017-11-07 | Intel Corporation | Method for executing multithreaded instructions grouped into blocks |
US9823930B2 (en) | 2013-03-15 | 2017-11-21 | Intel Corporation | Method for emulating a guest centralized flag architecture by using a native distributed flag architecture |
US9842005B2 (en) | 2011-03-25 | 2017-12-12 | Intel Corporation | Register file segments for supporting code block execution by using virtual cores instantiated by partitionable engines |
US9858080B2 (en) | 2013-03-15 | 2018-01-02 | Intel Corporation | Method for implementing a reduced size register view data structure in a microprocessor |
US9886279B2 (en) | 2013-03-15 | 2018-02-06 | Intel Corporation | Method for populating and instruction view data structure by using register template snapshots |
US9886416B2 (en) | 2006-04-12 | 2018-02-06 | Intel Corporation | Apparatus and method for processing an instruction matrix specifying parallel and dependent operations |
US9891924B2 (en) | 2013-03-15 | 2018-02-13 | Intel Corporation | Method for implementing a reduced size register view data structure in a microprocessor |
US9898412B2 (en) | 2013-03-15 | 2018-02-20 | Intel Corporation | Methods, systems and apparatus for predicting the way of a set associative cache |
US9934042B2 (en) | 2013-03-15 | 2018-04-03 | Intel Corporation | Method for dependency broadcasting through a block organized source view data structure |
US9940134B2 (en) | 2011-05-20 | 2018-04-10 | Intel Corporation | Decentralized allocation of resources and interconnect structures to support the execution of instruction sequences by a plurality of engines |
US9965281B2 (en) | 2006-11-14 | 2018-05-08 | Intel Corporation | Cache storing data fetched by address calculating load instruction with label used as associated name for consuming instruction to refer |
US10031784B2 (en) | 2011-05-20 | 2018-07-24 | Intel Corporation | Interconnect system to support the execution of instruction sequences by a plurality of partitionable engines |
US10140138B2 (en) | 2013-03-15 | 2018-11-27 | Intel Corporation | Methods, systems and apparatus for supporting wide and efficient front-end operation with guest-architecture emulation |
US10146548B2 (en) | 2013-03-15 | 2018-12-04 | Intel Corporation | Method for populating a source view data structure by using register template snapshots |
US10146592B2 (en) | 2015-09-18 | 2018-12-04 | Salesforce.Com, Inc. | Managing resource allocation in a stream processing framework |
US10169045B2 (en) | 2013-03-15 | 2019-01-01 | Intel Corporation | Method for dependency broadcasting through a source organized source view data structure |
US10191746B2 (en) | 2011-11-22 | 2019-01-29 | Intel Corporation | Accelerated code optimizer for a multiengine microprocessor |
US10198298B2 (en) * | 2015-09-16 | 2019-02-05 | Salesforce.Com, Inc. | Handling multiple task sequences in a stream processing framework |
US10198266B2 (en) | 2013-03-15 | 2019-02-05 | Intel Corporation | Method for populating register view data structure by using register template snapshots |
US10228949B2 (en) | 2010-09-17 | 2019-03-12 | Intel Corporation | Single cycle multi-branch prediction including shadow cache for early far branch prediction |
CN110597639A (en) * | 2019-09-23 | 2019-12-20 | 腾讯科技(深圳)有限公司 | CPU distribution control method, device, server and storage medium |
US10521239B2 (en) | 2011-11-22 | 2019-12-31 | Intel Corporation | Microprocessor accelerated code optimizer |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2003007105A2 (en) * | 2000-11-24 | 2003-01-23 | Catharon Productions, Inc. | Computer multi-tasking via virtual threading |
-
2009
- 2009-02-09 WO PCT/IB2009/050505 patent/WO2009101563A1/en active Application Filing
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2003007105A2 (en) * | 2000-11-24 | 2003-01-23 | Catharon Productions, Inc. | Computer multi-tasking via virtual threading |
Non-Patent Citations (4)
Title |
---|
ANONYMOUS: "PANAGIOTIS E. HADJIDOUKAS Home Page", INTERNET ARTICLE, XP002530791, Retrieved from the Internet <URL:http://www.cs.uoi.gr/~phadjido/> [retrieved on 20090507] * |
ANONYMOUS: "Transactions on HiPEAC: Volume 3, Issue 2", INTERNET ARTICLE, XP002530792, Retrieved from the Internet <URL:http://www.hipeac.net/node/2414> [retrieved on 20090507] * |
JAN HOOGERBRUGGE, ANDREI TERECHKO: "A Multithreaded Multicore System for Embedded Media Processing", TRANSACTIONS ON HIPEAC, vol. 3, no. 2, 2 June 2008 (2008-06-02), pages 168 - 187, XP002526866, Retrieved from the Internet <URL:http://www.hipeac.net/system/files/paper_2.pdf> [retrieved on 20090507] * |
P.E. HADJIDOUKAS, V.V. DIMAKOPOULOS: "A Runtime Library for Lightweight Process-Scope Threads", INTERNET ARTICLE, September 2007 (2007-09-01), XP002526868, Retrieved from the Internet <URL:http://www.cs.uoi.gr/~phadjido/courses/E-85/download/software/psthreads_report.pdf> [retrieved on 20090507] * |
Cited By (55)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9886416B2 (en) | 2006-04-12 | 2018-02-06 | Intel Corporation | Apparatus and method for processing an instruction matrix specifying parallel and dependent operations |
US10289605B2 (en) | 2006-04-12 | 2019-05-14 | Intel Corporation | Apparatus and method for processing an instruction matrix specifying parallel and dependent operations |
US11163720B2 (en) | 2006-04-12 | 2021-11-02 | Intel Corporation | Apparatus and method for processing an instruction matrix specifying parallel and dependent operations |
US10585670B2 (en) | 2006-11-14 | 2020-03-10 | Intel Corporation | Cache storing data fetched by address calculating load instruction with label used as associated name for consuming instruction to refer |
US9965281B2 (en) | 2006-11-14 | 2018-05-08 | Intel Corporation | Cache storing data fetched by address calculating load instruction with label used as associated name for consuming instruction to refer |
US8209512B1 (en) * | 2009-03-16 | 2012-06-26 | Hewlett-Packard Development Company, L.P. | Selecting a cell that is a preferred candidate for executing a process |
US10228949B2 (en) | 2010-09-17 | 2019-03-12 | Intel Corporation | Single cycle multi-branch prediction including shadow cache for early far branch prediction |
KR101738641B1 (en) | 2010-12-17 | 2017-05-23 | 삼성전자주식회사 | Apparatus and method for compilation of program on multi core system |
US9274793B2 (en) | 2011-03-25 | 2016-03-01 | Soft Machines, Inc. | Memory fragments for supporting code block execution by using virtual cores instantiated by partitionable engines |
KR101826121B1 (en) * | 2011-03-25 | 2018-02-06 | 인텔 코포레이션 | Memory fragments for supporting code block execution by using virtual cores instantiated by partitionable engines |
US9990200B2 (en) | 2011-03-25 | 2018-06-05 | Intel Corporation | Executing instruction sequence code blocks by using virtual cores instantiated by partitionable engines |
US9934072B2 (en) | 2011-03-25 | 2018-04-03 | Intel Corporation | Register file segments for supporting code block execution by using virtual cores instantiated by partitionable engines |
US9766893B2 (en) | 2011-03-25 | 2017-09-19 | Intel Corporation | Executing instruction sequence code blocks by using virtual cores instantiated by partitionable engines |
US9842005B2 (en) | 2011-03-25 | 2017-12-12 | Intel Corporation | Register file segments for supporting code block execution by using virtual cores instantiated by partitionable engines |
US11204769B2 (en) | 2011-03-25 | 2021-12-21 | Intel Corporation | Memory fragments for supporting code block execution by using virtual cores instantiated by partitionable engines |
KR101966712B1 (en) | 2011-03-25 | 2019-04-09 | 인텔 코포레이션 | Memory fragments for supporting code block execution by using virtual cores instantiated by partitionable engines |
CN103635875A (en) * | 2011-03-25 | 2014-03-12 | 索夫特机械公司 | Memory fragments for supporting code block execution by using virtual cores instantiated by partitionable engines |
US10564975B2 (en) | 2011-03-25 | 2020-02-18 | Intel Corporation | Memory fragments for supporting code block execution by using virtual cores instantiated by partitionable engines |
KR20180015754A (en) * | 2011-03-25 | 2018-02-13 | 인텔 코포레이션 | Memory fragments for supporting code block execution by using virtual cores instantiated by partitionable engines |
WO2012135050A3 (en) * | 2011-03-25 | 2012-11-29 | Soft Machines, Inc. | Memory fragments for supporting code block execution by using virtual cores instantiated by partitionable engines |
US9921845B2 (en) | 2011-03-25 | 2018-03-20 | Intel Corporation | Memory fragments for supporting code block execution by using virtual cores instantiated by partitionable engines |
US10031784B2 (en) | 2011-05-20 | 2018-07-24 | Intel Corporation | Interconnect system to support the execution of instruction sequences by a plurality of partitionable engines |
US10372454B2 (en) | 2011-05-20 | 2019-08-06 | Intel Corporation | Allocation of a segmented interconnect to support the execution of instruction sequences by a plurality of engines |
US9940134B2 (en) | 2011-05-20 | 2018-04-10 | Intel Corporation | Decentralized allocation of resources and interconnect structures to support the execution of instruction sequences by a plurality of engines |
US10191746B2 (en) | 2011-11-22 | 2019-01-29 | Intel Corporation | Accelerated code optimizer for a multiengine microprocessor |
US10521239B2 (en) | 2011-11-22 | 2019-12-31 | Intel Corporation | Microprocessor accelerated code optimizer |
US9823930B2 (en) | 2013-03-15 | 2017-11-21 | Intel Corporation | Method for emulating a guest centralized flag architecture by using a native distributed flag architecture |
US10275255B2 (en) | 2013-03-15 | 2019-04-30 | Intel Corporation | Method for dependency broadcasting through a source organized source view data structure |
US9904625B2 (en) | 2013-03-15 | 2018-02-27 | Intel Corporation | Methods, systems and apparatus for predicting the way of a set associative cache |
US10140138B2 (en) | 2013-03-15 | 2018-11-27 | Intel Corporation | Methods, systems and apparatus for supporting wide and efficient front-end operation with guest-architecture emulation |
US10146548B2 (en) | 2013-03-15 | 2018-12-04 | Intel Corporation | Method for populating a source view data structure by using register template snapshots |
US11656875B2 (en) | 2013-03-15 | 2023-05-23 | Intel Corporation | Method and system for instruction block to execution unit grouping |
US10146576B2 (en) | 2013-03-15 | 2018-12-04 | Intel Corporation | Method for executing multithreaded instructions grouped into blocks |
US10169045B2 (en) | 2013-03-15 | 2019-01-01 | Intel Corporation | Method for dependency broadcasting through a source organized source view data structure |
US9898412B2 (en) | 2013-03-15 | 2018-02-20 | Intel Corporation | Methods, systems and apparatus for predicting the way of a set associative cache |
US10740126B2 (en) | 2013-03-15 | 2020-08-11 | Intel Corporation | Methods, systems and apparatus for supporting wide and efficient front-end operation with guest-architecture emulation |
US10198266B2 (en) | 2013-03-15 | 2019-02-05 | Intel Corporation | Method for populating register view data structure by using register template snapshots |
US9891924B2 (en) | 2013-03-15 | 2018-02-13 | Intel Corporation | Method for implementing a reduced size register view data structure in a microprocessor |
US10248570B2 (en) | 2013-03-15 | 2019-04-02 | Intel Corporation | Methods, systems and apparatus for predicting the way of a set associative cache |
US9886279B2 (en) | 2013-03-15 | 2018-02-06 | Intel Corporation | Method for populating and instruction view data structure by using register template snapshots |
US10255076B2 (en) | 2013-03-15 | 2019-04-09 | Intel Corporation | Method for performing dual dispatch of blocks and half blocks |
US9934042B2 (en) | 2013-03-15 | 2018-04-03 | Intel Corporation | Method for dependency broadcasting through a block organized source view data structure |
US9858080B2 (en) | 2013-03-15 | 2018-01-02 | Intel Corporation | Method for implementing a reduced size register view data structure in a microprocessor |
US9811377B2 (en) | 2013-03-15 | 2017-11-07 | Intel Corporation | Method for executing multithreaded instructions grouped into blocks |
US10503514B2 (en) | 2013-03-15 | 2019-12-10 | Intel Corporation | Method for implementing a reduced size register view data structure in a microprocessor |
US9811342B2 (en) | 2013-03-15 | 2017-11-07 | Intel Corporation | Method for performing dual dispatch of blocks and half blocks |
US9772867B2 (en) | 2014-03-27 | 2017-09-26 | International Business Machines Corporation | Control area for managing multiple threads in a computer |
US9223574B2 (en) | 2014-03-27 | 2015-12-29 | International Business Machines Corporation | Start virtual execution instruction for dispatching multiple threads in a computer |
US9213569B2 (en) | 2014-03-27 | 2015-12-15 | International Business Machines Corporation | Exiting multiple threads in a computer |
US9195493B2 (en) | 2014-03-27 | 2015-11-24 | International Business Machines Corporation | Dispatching multiple threads in a computer |
US10198298B2 (en) * | 2015-09-16 | 2019-02-05 | Salesforce.Com, Inc. | Handling multiple task sequences in a stream processing framework |
US11086687B2 (en) | 2015-09-18 | 2021-08-10 | Salesforce.Com, Inc. | Managing resource allocation in a stream processing framework |
US11086688B2 (en) | 2015-09-18 | 2021-08-10 | Salesforce.Com, Inc. | Managing resource allocation in a stream processing framework |
US10146592B2 (en) | 2015-09-18 | 2018-12-04 | Salesforce.Com, Inc. | Managing resource allocation in a stream processing framework |
CN110597639A (en) * | 2019-09-23 | 2019-12-20 | 腾讯科技(深圳)有限公司 | CPU distribution control method, device, server and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2009101563A1 (en) | Multiprocessing implementing a plurality of virtual processors | |
JP3678414B2 (en) | Multiprocessor system | |
US6748593B1 (en) | Apparatus and method for starvation load balancing using a global run queue in a multiple run queue system | |
US6658449B1 (en) | Apparatus and method for periodic load balancing in a multiple run queue system | |
US7065766B2 (en) | Apparatus and method for load balancing of fixed priority threads in a multiple run queue environment | |
US8875151B2 (en) | Load balancing method and apparatus in symmetric multi-processor system | |
US7950016B2 (en) | Apparatus for switching the task to be completed in a processor by switching to the task assigned time slot | |
JP5770721B2 (en) | Information processing system | |
US7487317B1 (en) | Cache-aware scheduling for a chip multithreading processor | |
US8695004B2 (en) | Method for distributing computing time in a computer system | |
US20030037091A1 (en) | Task scheduling device | |
US9870228B2 (en) | Prioritising of instruction fetching in microprocessor systems | |
CN109564528B (en) | System and method for computing resource allocation in distributed computing | |
US7818747B1 (en) | Cache-aware scheduling for a chip multithreading processor | |
US8627325B2 (en) | Scheduling memory usage of a workload | |
CN106569887B (en) | Fine-grained task scheduling method in cloud environment | |
US20090183166A1 (en) | Algorithm to share physical processors to maximize processor cache usage and topologies | |
US20030110203A1 (en) | Apparatus and method for dispatching fixed priority threads using a global run queue in a multiple run queue system | |
JP5397544B2 (en) | Multi-core system, multi-core system scheduling method, and multi-core system scheduling program | |
CN111597044A (en) | Task scheduling method and device, storage medium and electronic equipment | |
US8539491B1 (en) | Thread scheduling in chip multithreading processors | |
Horowitz | A run-time execution model for referential integrity maintenance | |
US11275621B2 (en) | Device and method for selecting tasks and/or processor cores to execute processing jobs that run a machine | |
US20160267621A1 (en) | Graphic processing system and method thereof | |
US9977751B1 (en) | Method and apparatus for arbitrating access to shared resources |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 09709637 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 09709637 Country of ref document: EP Kind code of ref document: A1 |