WO2009101563A1 - Multiprocessing implementing a plurality of virtual processors - Google Patents

Multiprocessing implementing a plurality of virtual processors Download PDF

Info

Publication number
WO2009101563A1
WO2009101563A1 PCT/IB2009/050505 IB2009050505W WO2009101563A1 WO 2009101563 A1 WO2009101563 A1 WO 2009101563A1 IB 2009050505 W IB2009050505 W IB 2009050505W WO 2009101563 A1 WO2009101563 A1 WO 2009101563A1
Authority
WO
WIPO (PCT)
Prior art keywords
threads
ones
vacant
task
processing cores
Prior art date
Application number
PCT/IB2009/050505
Other languages
French (fr)
Inventor
Jan Hoogerbrugge
Original Assignee
Nxp B.V.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nxp B.V. filed Critical Nxp B.V.
Publication of WO2009101563A1 publication Critical patent/WO2009101563A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/505Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the load
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/4881Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/50Indexing scheme relating to G06F9/50
    • G06F2209/5011Pool

Definitions

  • Multiprocessing implementing a plurality of virtual processors
  • the invention relates to a multiprocessing system and a method of processing data processing tasks in a multiprocessing system.
  • EP 1416377 describes a multi-processing system with a task dispatcher that dispatches tasks to different processors. Dispatching a task involves a signal from the dispatcher to a processor that is must start executing the task. When task has been dispatched from the task dispatcher, the receiving processor retrieves the instructions of the task if necessary, and starts executing the instructions. Typically, the task dispatcher selects processors that are "free", i.e. not executing a task and dispatches new tasks to these processors.
  • a data processing circuit is provided.
  • a number of virtual processing cores is implemented on a smaller number of physical processing cores.
  • Each virtual processing core is implemented using a software thread for executing successive tasks on the virtual processing core successively.
  • Each processing core executes a group of such threads.
  • a task assigning unit assigns a new task to a selected one of the virtual processing cores for execution.
  • the task assigning unit uses a dynamic property of threads underlying the virtual processing cores to determine selection preferences, for example to define a priority order among vacant virtual processing cores so that a vacant virtual processing core with highest priority can be selected,. To determine this dynamic property, the task assigning unit "looks through" the virtual processing cores and uses an aggregate count for the physical processing core that execute the thread, by obtaining an aggregate count of threads in one or more predetermined states in the physical processing core. It has been found that execution efficiency of virtual processing cores can be increased by "looking through" the virtual processing cores in this way.
  • each aggregate count is a count of blocked threads in a respective physical processing core.
  • Blocked threads are threads executing tasks that are waiting for a resource.
  • the task assigning unit is configured to give selection preference to vacant threads executing on physical processing cores with higher count of blocked threads over vacant threads executing on physical processing cores with lower count of blocked threads. It has been found that execution efficiency is increased by using this type of property.
  • Fig. 1 shows a data processing circuit
  • Fig. 2 shows a software architecture
  • Fig. 1 shows a data processing circuit, comprising a plurality of physical processing cores 10, a resource circuit 12 and a task assigning unit 14.
  • Processing cores 10 are coupled to resource circuit 12.
  • Resource circuit 12 may comprise a main memory circuit, function specific computation circuits, input interface circuits, output interface circuits etc. (not shown) shared by the physical processing cores and coupled to the processing cores via one or more shared busses, one or more networks and/or dedicated connections.
  • Task assigning unit 14 may be implemented using a programmable processor, programmed with a program that makes it perform the functions described in the following.
  • Task assigning unit 14 is coupled to processing cores 10.
  • Task assigning unit 14 may be coupled to resource circuit 12, for example to a memory circuit in resource circuit 12 wherein information about a collection of tasks is stored.
  • Fig. 2 shows a software architecture of the system, showing processing cores 10 containing threads 20, some of which have an associated task 22, 24. Furthermore a queue 26 of tasks 28 is shown that is waiting at task assigning unit 14 to be assigned to threads 20.
  • (Software) threads also called threads of executions, are known per se.
  • a thread is a set of instances of execution of instructions that are executed by a physical processing core in a sequence that is logically defined by the instructions and their order in the program or programs of which they are part.
  • the sequence of execution of any particular thread may comprise sequence parts whose execution is separated from each other by execution of parts of other threads, with instances of execution of instructions that define no unique sequence relative to the particular thread.
  • each thread is defined to a processing core by a context accessible to the physical processing core 10, and instructions for the processing core that provide for transfer of control to instructions of tasks 22, 24, reception back of control from these instructions and various set-up functions.
  • the use of continuing threads 20 to execute successive tasks avoids the overhead of starting the tasks on their own, as temporary threads.
  • the threads 20 continue to run on the processing cores 10 after completing tasks, each time taking up a next task without terminating the thread and restarting a new thread in between.
  • the majority of tasks may be small in the sense that it is inefficient to move execution of these tasks from one processing core 10 to another or to start the task anew, because the execution time of the task is comparable to the time needed to move or start the task (for example if the execution time is less than ten times the time needed to move and/or start the task). For such small tasks the overhead is significantly reduce by using continuing threads to execute tasks.
  • each physical processing core 10 executes a plurality of processing threads 20 concurrently.
  • Concurrent execution may be implemented e.g. on a time-multiplexed basis, but as far as processing cores 10 have a parallel processing capability, for example as part of pipelining, concurrency may also be implemented using by means of such parallel processing.
  • the implementation of concurrent execution of multiple threads 20 is known per se. For example, it may involve context switching between contexts wherein execution involves different stored contexts for a plurality of threads 20.
  • a context may include a program counter value and register contents, for example.
  • Each thread 20 effectively defines a respective, different virtual processing core designed to take on tasks 22, 24, 28 successively.
  • the plurality of threads 20 implements a corresponding plurality of such virtual processing cores.
  • Each physical processing core 10 switches between executing threads 20 with different ones of the tasks 22, 24, so that each of the threads 20 runs part of the time.
  • Threads 20 can be in different execution states, indicated by letters R, W, A, B in the figure.
  • a thread 20 can be in a running state R, wherein it is actually executed by processing core 10, or it in waiting state W, waiting for its turn to run on a processing core 10 in the time division multiplex scheme.
  • the thread 20 may be in a requesting state A wherein it sends a request for a task to task assigning unit 14 to obtain a new task for executing, when it has finished a previous task.
  • thread 20 that has a task may be in a blocked state B, where it is blocked from running, for example when the task has to wait for a resource before execution can continue. Waiting for a resource may involve waiting for data from a main memory when a cache miss has occurred, waiting for a specialized circuit such as an I/O circuit, or a specialized computation circuit to become free or for such a specialized circuit to complete an operation.
  • each physical processing core 10 has a predetermined number of threads 20, of which at most one at a time is in the running state R, a first number of threads 20 is in the waiting state W, a second number of threads 20 is in the blocked state B e.g. because it is waiting for a resource, and a third number of threads 20 is in the requesting state A waiting for a new task 28.
  • Task assigning unit 14 receives new tasks 28 that may be assigned to any of the virtual processing cores for execution. In an embodiment, task assigning unit 14 maintains a queue 26 of such tasks 28, but alternatively a plurality of queues or a pool of tasks without one fixed order may be used. When task assigning unit 14 receives a request for a new task 28 from a thread 20, this indicates a "free" virtual processing core. When task assigning unit 14 has a new task 28 waiting for assignment, task assigning unit 14 may send the task to the requesting free virtual processing core.
  • Task assigning unit 14 needs to perform selection between virtual processing cores when more than one of such threads 20 have sent requests, because they are all in the requesting state A. Task assigning unit 14 does not arbitrarily select any vacant virtual processing core. Instead task assigning unit 14 uses dynamic properties of the virtual processing cores to give preference to certain vacant virtual processing cores, for example by defining a ranking of different vacant threads 20 dependent on the properties of the threads 20, and selecting a thread 20 with a highest ranking.
  • task assigning unit 14 determines the properties to determine the preference
  • task assigning unit 14 "looks through" the properties of the virtual processing core and uses a property of the underlying physical processing core 10 that is shared with other virtual processing cores. In particular, an aggregate count of threads in predetermined selected states on a physical processing core 10 may be used. Thus, instead of using the properties of the virtual processing core, the properties of the physical processing core 10 are used. It has been found that this may improve execution efficiency.
  • task assigning unit 14 performs resolution by assigning the new task 28 to a requesting thread 20 on a physical processing core 10 dependent on the number of threads 20 in the physical processing core 10 that are in the blocked state B.
  • task assigning unit 14 "looks through" the properties of the virtual processing core and uses a property of the underlying physical processing core 10 that is shared with other virtual processing cores.
  • a virtual processing core may be selected that is implemented using a thread 20 on a processing core 10 that has a highest count of threads 20 in the blocked state B, or at least has no lower count of threads 20 in the blocked state B than any other processing core 10. This has the advantage that processing time lost due to waiting for resources may be reduced. Simulations have shown that this increases execution efficiency.
  • a count of threads 20 in the blocked state that is up to date at the time of assignment of a new task to a thread 20.
  • An earlier count or an averaged count may be used, wherein threads are counted that may in fact no longer be blocked at the time of selection of a vacant thread. Such a count is still predictive of future blocking.
  • the selection of a thread 20 the instantaneous count of threads 20 in the blocked state at the time of selection may be used.
  • the selection may be based on a sampled count of threads 20 in the blocked state that has been sampled at an arbitrary time point in a time interval preceding the selection.
  • each processing core 10 is configured to send requests for new tasks 28 to task assigning unit 14 in combination with count values of threads 20 in the processing core 10 that are in the blocked state.
  • each processing core 10 is configured to keep a "block count", of threads 20 in the blocked state, and optionally additional counts of threads 20 in different states and to send the block count with the request for a new task 28.
  • these block counts may be made accessible to task assigning unit 14 separately from requests, for use to resolve the assignment of a new task 28 to threads 20.
  • task assigning unit 14 may be configured to determine the number of threads 20 in the blocked state B for each of physical processing cores 10.
  • Task assigning unit 14 may be configured to keep information about each of the threads 20, indicating the task 22, 24, if any, executed by the thread 20 and the state of the thread 20.
  • physical processing cores 10 may be configured to indicate state changes of threads 20 to task assigning unit 14, which are used by task assigning unit 14 to update its information about the tasks.
  • task assigning unit 14 may be configured to determine a count of blocked threads 20 from this information.
  • task assigning unit 14 may use counts of threads 20 in a requesting state A in respective processing cores 10, in combination with the "block-count".
  • task assigning unit 14 may be configured to select one of those processing cores 10 that has the most requesting threads 20, or at least no lower number of requesting threads 20 than any other of those processing cores 10.
  • the priority of the block count and the count of requesting threads 20 may be reversed, the task assigning unit 14 may be configured to select one of the processing cores 10 with the most requesting threads 20 and, if a plurality of processing cores 10 has the same count of requesting threads 20, one of those processing cores 10 that has the highest block count or at least no lower number of requesting threads 20 than any other of those processing cores 10.
  • each physical processing core 10 executes a predetermined number of threads 20 and no more.
  • the blocked counts of blocked threads 20 may also be obtained by counting threads in all but the blocked state B.
  • task assigning unit 14 may poll information in processing cores 10 to determine which of processing cores 10 have threads 20 in a requesting state A. In this embodiment task assigning unit 14 may select from processing cores 10 that are detected to have threads in the requesting state, dependent on counts of threads 20 in the blocked state in these processing cores 10.
  • a sum of the block count and the count of requesting threads 20 may be used, the task assigning unit 14 being configured to select one of the processing cores 10 with the highest sum or at least no lower sum than any other of the processing cores 10 with a requesting thread 20.
  • task assigning unit 14 obtains aggregate counts of threads 20 in selected states, such as the blocked state, for each of the physical processing cores and searches for a physical processing core 10 with a vacant thread 20 and a highest count, or at least no lower count than for any such physical processing core 10. Subsequently, one of the vacant threads on that physical processing core is selected to execute the new task.
  • task assigning unit 14 performs a selection among the physical processing cores 10 on a round-robin basis, after adjusting the number of occurrences of different physical processing cores 10 in a round robin list from which physical processing cores 10 are selected.
  • task assigning unit 14 obtains aggregate counts of threads 20 in selected states, such as the blocked state, for each of the physical processing cores 10 and adjusts the number of occurrences dependent on the counts, for example increasing the number of occurrences with increasing count of blocked threads. Round robin selection is repeated, if necessary until a physical processing core 10 with a vacant thread is selected.
  • Task assigning unit 14 is a circuit configured to perform functions that have been described in the preceding.
  • task assigning unit 14 may be implemented using a programmable processor circuit, programmed with a program that makes it perform such functions.
  • a dedicated circuit may be used, designed to perform these functions.
  • Such a dedicated circuit may comprise a buffer memory for storing a queue of new task identifiers, a demultiplexer for demultiplexing task identifiers from the buffer memory to selected processor cores 10 and a selection circuit to control the demultiplexer.
  • the selection circuit may have inputs coupled to outputs of the physical processor cores 10 that supply signals indicating the present of a thread 20 in a requesting state and a count of threads in one or more predetermined states, such as a count of threads 20 in the blocked state in the physical processor cores 10.
  • the selection circuit may control the demultiplexer to send a task identifiers of a top task in the buffer memory to one of the physical processor core that supplies a signal indicating a requesting thread and a highest count among the processor cores with a requesting thread.
  • a computer program may be stored/distributed on a suitable medium, such as an optical storage medium or a solid-state medium supplied together with or as part of other hardware, but may also be distributed in other forms, such as via the Internet or other wired or wireless telecommunication systems. Any reference signs in the claims should not be construed as limiting the scope.

Abstract

Tasks are executed in a data processing circuit that comprises a plurality of physical processing cores (10). Each physical processing core (10) executes a groups of threads on a time multiplex basis. Each thread defines a virtual processing core, for taking on and executing tasks sequentially. A task assigning unit (14) determines for each task which of the virtual processing cores will execute the task. To select the virtual processing core, the task assigning unit (14) looks through the virtual processing cores at the underlying physical processing core (10), determining an aggregate count of threads (20) in one or more predetermined states in the respective group of threads (20) of the physical processing core (10). The aggregate counts are used to determine a priority among the virtual processing cores. The task assigning unit (14) selects a vacant thread (20) with a highest priority according to the aggregate counts. Each aggregate count may be a count of threads (20) that are blocked because they are waiting for access to a resource such as main memory.

Description

Multiprocessing implementing a plurality of virtual processors
FIELD OF THE INVENTION
The invention relates to a multiprocessing system and a method of processing data processing tasks in a multiprocessing system.
BACKGROUND
EP 1416377 describes a multi-processing system with a task dispatcher that dispatches tasks to different processors. Dispatching a task involves a signal from the dispatcher to a processor that is must start executing the task. When task has been dispatched from the task dispatcher, the receiving processor retrieves the instructions of the task if necessary, and starts executing the instructions. Typically, the task dispatcher selects processors that are "free", i.e. not executing a task and dispatches new tasks to these processors.
It is also known to define virtual processors, which are implemented by software threads on different physical processors. Concurrent processing by different virtual processors may be realized by allocating physical processors cyclically on a time-division multiplex basis to successive ones of a group of physical processors. In this case, tasks have to be dispatched to different virtual processors. EP 1416377 does not discuss assignment of different tasks to threads when there is a plurality of processing cores that can each execute a plurality of threads concurrently.
SUMMARY
Among others, it is an object to make it possible to improve efficiency of task execution in a data processing circuit with a plurality of processing cores that each implements a plurality of virtual processing cores.
A data processing circuit according to claim 1 is provided. Herein a number of virtual processing cores is implemented on a smaller number of physical processing cores. Each virtual processing core is implemented using a software thread for executing successive tasks on the virtual processing core successively. Each processing core executes a group of such threads. A task assigning unit assigns a new task to a selected one of the virtual processing cores for execution.
The task assigning unit uses a dynamic property of threads underlying the virtual processing cores to determine selection preferences, for example to define a priority order among vacant virtual processing cores so that a vacant virtual processing core with highest priority can be selected,. To determine this dynamic property, the task assigning unit "looks through" the virtual processing cores and uses an aggregate count for the physical processing core that execute the thread, by obtaining an aggregate count of threads in one or more predetermined states in the physical processing core. It has been found that execution efficiency of virtual processing cores can be increased by "looking through" the virtual processing cores in this way.
In an embodiment each aggregate count is a count of blocked threads in a respective physical processing core. Blocked threads are threads executing tasks that are waiting for a resource. In this embodiment the task assigning unit is configured to give selection preference to vacant threads executing on physical processing cores with higher count of blocked threads over vacant threads executing on physical processing cores with lower count of blocked threads. It has been found that execution efficiency is increased by using this type of property.
BRIEF DESCRIPTION OF THE DRAWING
These and other advantageous aspects will become apparent from a description of exemplary embodiments, using the following Figures: Fig. 1 shows a data processing circuit Fig. 2 shows a software architecture
DESCRIPTION OF EXEMPLARY EMBODIMENT
Fig. 1 shows a data processing circuit, comprising a plurality of physical processing cores 10, a resource circuit 12 and a task assigning unit 14. Processing cores 10 are coupled to resource circuit 12. Resource circuit 12 may comprise a main memory circuit, function specific computation circuits, input interface circuits, output interface circuits etc. (not shown) shared by the physical processing cores and coupled to the processing cores via one or more shared busses, one or more networks and/or dedicated connections. Task assigning unit 14 may be implemented using a programmable processor, programmed with a program that makes it perform the functions described in the following. Task assigning unit 14 is coupled to processing cores 10. Task assigning unit 14 may be coupled to resource circuit 12, for example to a memory circuit in resource circuit 12 wherein information about a collection of tasks is stored.
Fig. 2 shows a software architecture of the system, showing processing cores 10 containing threads 20, some of which have an associated task 22, 24. Furthermore a queue 26 of tasks 28 is shown that is waiting at task assigning unit 14 to be assigned to threads 20. (Software) threads, also called threads of executions, are known per se. As used herein, a thread is a set of instances of execution of instructions that are executed by a physical processing core in a sequence that is logically defined by the instructions and their order in the program or programs of which they are part. The sequence of execution of any particular thread may comprise sequence parts whose execution is separated from each other by execution of parts of other threads, with instances of execution of instructions that define no unique sequence relative to the particular thread. Typically each thread is defined to a processing core by a context accessible to the physical processing core 10, and instructions for the processing core that provide for transfer of control to instructions of tasks 22, 24, reception back of control from these instructions and various set-up functions.
The use of continuing threads 20 to execute successive tasks avoids the overhead of starting the tasks on their own, as temporary threads. The threads 20 continue to run on the processing cores 10 after completing tasks, each time taking up a next task without terminating the thread and restarting a new thread in between. The majority of tasks may be small in the sense that it is inefficient to move execution of these tasks from one processing core 10 to another or to start the task anew, because the execution time of the task is comparable to the time needed to move or start the task (for example if the execution time is less than ten times the time needed to move and/or start the task). For such small tasks the overhead is significantly reduce by using continuing threads to execute tasks.
In operation, each physical processing core 10 executes a plurality of processing threads 20 concurrently. Concurrent execution may be implemented e.g. on a time-multiplexed basis, but as far as processing cores 10 have a parallel processing capability, for example as part of pipelining, concurrency may also be implemented using by means of such parallel processing. The implementation of concurrent execution of multiple threads 20 is known per se. For example, it may involve context switching between contexts wherein execution involves different stored contexts for a plurality of threads 20. A context may include a program counter value and register contents, for example. Each thread 20 effectively defines a respective, different virtual processing core designed to take on tasks 22, 24, 28 successively. The plurality of threads 20 implements a corresponding plurality of such virtual processing cores. Each physical processing core 10 switches between executing threads 20 with different ones of the tasks 22, 24, so that each of the threads 20 runs part of the time.
Threads 20 can be in different execution states, indicated by letters R, W, A, B in the figure. A thread 20 can be in a running state R, wherein it is actually executed by processing core 10, or it in waiting state W, waiting for its turn to run on a processing core 10 in the time division multiplex scheme. The thread 20 may be in a requesting state A wherein it sends a request for a task to task assigning unit 14 to obtain a new task for executing, when it has finished a previous task. Also, thread 20 that has a task may be in a blocked state B, where it is blocked from running, for example when the task has to wait for a resource before execution can continue. Waiting for a resource may involve waiting for data from a main memory when a cache miss has occurred, waiting for a specialized circuit such as an I/O circuit, or a specialized computation circuit to become free or for such a specialized circuit to complete an operation.
When a thread 20 has finished a task 22, 24, it continues to exist, switches to the requesting state A, and signals to task assigning unit 14 to request a next task 28. Typically, each physical processing core 10 has a predetermined number of threads 20, of which at most one at a time is in the running state R, a first number of threads 20 is in the waiting state W, a second number of threads 20 is in the blocked state B e.g. because it is waiting for a resource, and a third number of threads 20 is in the requesting state A waiting for a new task 28.
Task assigning unit 14 receives new tasks 28 that may be assigned to any of the virtual processing cores for execution. In an embodiment, task assigning unit 14 maintains a queue 26 of such tasks 28, but alternatively a plurality of queues or a pool of tasks without one fixed order may be used. When task assigning unit 14 receives a request for a new task 28 from a thread 20, this indicates a "free" virtual processing core. When task assigning unit 14 has a new task 28 waiting for assignment, task assigning unit 14 may send the task to the requesting free virtual processing core.
Task assigning unit 14 needs to perform selection between virtual processing cores when more than one of such threads 20 have sent requests, because they are all in the requesting state A. Task assigning unit 14 does not arbitrarily select any vacant virtual processing core. Instead task assigning unit 14 uses dynamic properties of the virtual processing cores to give preference to certain vacant virtual processing cores, for example by defining a ranking of different vacant threads 20 dependent on the properties of the threads 20, and selecting a thread 20 with a highest ranking.
When task assigning unit 14 determines the properties to determine the preference, task assigning unit 14 "looks through" the properties of the virtual processing core and uses a property of the underlying physical processing core 10 that is shared with other virtual processing cores. In particular, an aggregate count of threads in predetermined selected states on a physical processing core 10 may be used. Thus, instead of using the properties of the virtual processing core, the properties of the physical processing core 10 are used. It has been found that this may improve execution efficiency.
In an embodiment, task assigning unit 14 performs resolution by assigning the new task 28 to a requesting thread 20 on a physical processing core 10 dependent on the number of threads 20 in the physical processing core 10 that are in the blocked state B. Thus, task assigning unit 14 "looks through" the properties of the virtual processing core and uses a property of the underlying physical processing core 10 that is shared with other virtual processing cores. A virtual processing core may be selected that is implemented using a thread 20 on a processing core 10 that has a highest count of threads 20 in the blocked state B, or at least has no lower count of threads 20 in the blocked state B than any other processing core 10. This has the advantage that processing time lost due to waiting for resources may be reduced. Simulations have shown that this increases execution efficiency.
It should be noted that it is not necessary to use a count of threads 20 in the blocked state that is up to date at the time of assignment of a new task to a thread 20. An earlier count or an averaged count may be used, wherein threads are counted that may in fact no longer be blocked at the time of selection of a vacant thread. Such a count is still predictive of future blocking. In an embodiment the selection of a thread 20 the instantaneous count of threads 20 in the blocked state at the time of selection may be used. In an alternative embodiment, the selection may be based on a sampled count of threads 20 in the blocked state that has been sampled at an arbitrary time point in a time interval preceding the selection. As a further alternative, the selection may be based on an average count of threads 20 in the blocked state, corresponding to the instantaneous count averaged over a predetermined time interval prior to the selection. By using non- instantaneous counts simpler circuits, satisfying less stringent timing requirements may be used. Averaging may increase the prediction accuracy of future blocking. In an embodiment, each processing core 10 is configured to send requests for new tasks 28 to task assigning unit 14 in combination with count values of threads 20 in the processing core 10 that are in the blocked state. In this embodiment each processing core 10 is configured to keep a "block count", of threads 20 in the blocked state, and optionally additional counts of threads 20 in different states and to send the block count with the request for a new task 28. In an alternative embodiment these block counts may be made accessible to task assigning unit 14 separately from requests, for use to resolve the assignment of a new task 28 to threads 20. In another embodiment task assigning unit 14 may be configured to determine the number of threads 20 in the blocked state B for each of physical processing cores 10. Task assigning unit 14 may be configured to keep information about each of the threads 20, indicating the task 22, 24, if any, executed by the thread 20 and the state of the thread 20. For this purpose, physical processing cores 10 may be configured to indicate state changes of threads 20 to task assigning unit 14, which are used by task assigning unit 14 to update its information about the tasks. In this embodiment, task assigning unit 14 may be configured to determine a count of blocked threads 20 from this information.
In a further embodiment the counts of blocked threads 20 used in combination with other parameters to control the selection of a thread 20 for a new task 28. Thus for example, task assigning unit 14 may use counts of threads 20 in a requesting state A in respective processing cores 10, in combination with the "block-count". When a plurality of processing cores 10 with requesting threads 20 and lowest block counts have equal block counts, task assigning unit 14 may be configured to select one of those processing cores 10 that has the most requesting threads 20, or at least no lower number of requesting threads 20 than any other of those processing cores 10.
In another embodiment, the priority of the block count and the count of requesting threads 20 may be reversed, the task assigning unit 14 may be configured to select one of the processing cores 10 with the most requesting threads 20 and, if a plurality of processing cores 10 has the same count of requesting threads 20, one of those processing cores 10 that has the highest block count or at least no lower number of requesting threads 20 than any other of those processing cores 10.
In embodiment each physical processing core 10 executes a predetermined number of threads 20 and no more. In this case, the blocked counts of blocked threads 20 may also be obtained by counting threads in all but the blocked state B.
Instead of using explicit requests for new tasks 28 from processing cores 10, task assigning unit 14 may poll information in processing cores 10 to determine which of processing cores 10 have threads 20 in a requesting state A. In this embodiment task assigning unit 14 may select from processing cores 10 that are detected to have threads in the requesting state, dependent on counts of threads 20 in the blocked state in these processing cores 10.
In another embodiment, a sum of the block count and the count of requesting threads 20 may be used, the task assigning unit 14 being configured to select one of the processing cores 10 with the highest sum or at least no lower sum than any other of the processing cores 10 with a requesting thread 20.
Any known prioritized selection scheme may be used to give preference to threads 20 during selection. In an embodiment, task assigning unit 14 obtains aggregate counts of threads 20 in selected states, such as the blocked state, for each of the physical processing cores and searches for a physical processing core 10 with a vacant thread 20 and a highest count, or at least no lower count than for any such physical processing core 10. Subsequently, one of the vacant threads on that physical processing core is selected to execute the new task.
However, preference can be given in many other ways. In another example of an embodiment, task assigning unit 14 performs a selection among the physical processing cores 10 on a round-robin basis, after adjusting the number of occurrences of different physical processing cores 10 in a round robin list from which physical processing cores 10 are selected. In this embodiment task assigning unit 14 obtains aggregate counts of threads 20 in selected states, such as the blocked state, for each of the physical processing cores 10 and adjusts the number of occurrences dependent on the counts, for example increasing the number of occurrences with increasing count of blocked threads. Round robin selection is repeated, if necessary until a physical processing core 10 with a vacant thread is selected. After selecting the physical processing core 10 on the round-robin basis one of the vacant threads on that physical processing core 10 is selected to execute the new task. A similar scheme may be used in combination with a weighted (pseudo-)random selection instead of round-robin selection, wherein the selection probability of different physical processing cores is adjusted according to the aggregate counts.
Task assigning unit 14 is a circuit configured to perform functions that have been described in the preceding. As noted, task assigning unit 14 may be implemented using a programmable processor circuit, programmed with a program that makes it perform such functions. Alternatively, a dedicated circuit may be used, designed to perform these functions. Such a dedicated circuit may comprise a buffer memory for storing a queue of new task identifiers, a demultiplexer for demultiplexing task identifiers from the buffer memory to selected processor cores 10 and a selection circuit to control the demultiplexer. The selection circuit may have inputs coupled to outputs of the physical processor cores 10 that supply signals indicating the present of a thread 20 in a requesting state and a count of threads in one or more predetermined states, such as a count of threads 20 in the blocked state in the physical processor cores 10. In this case, the selection circuit may control the demultiplexer to send a task identifiers of a top task in the buffer memory to one of the physical processor core that supplies a signal indicating a requesting thread and a highest count among the processor cores with a requesting thread.
Other variations to the disclosed embodiments can be understood and effected by those skilled in the art in practicing the claimed invention, from a study of the drawings, the disclosure, and the appended claims. In the claims, the word "comprising" does not exclude other elements or steps, and the indefinite article "a" or "an" does not exclude a plurality. A single processor or other unit may fulfil the functions of several items recited in the claims. The mere fact that certain measures are recited in mutually different dependent claims does not indicate that a combination of these measured cannot be used to advantage. A computer program may be stored/distributed on a suitable medium, such as an optical storage medium or a solid-state medium supplied together with or as part of other hardware, but may also be distributed in other forms, such as via the Internet or other wired or wireless telecommunication systems. Any reference signs in the claims should not be construed as limiting the scope.

Claims

CLAIMS:
1. A data processing circuit for providing a plurality of virtual processing cores executing in a plurality of threads (20), the plurality of threads (20) being subdivided into groups of threads (20), the circuit comprising: a plurality of physical processing cores (10) each configured to execute the threads (20) of a respective one of the groups on a time multiplex basis, each thread (20) defining a respective virtual processing core for executing tasks sequentially; a task assigning unit (14), configured to assign a new task to a selected one of the virtual processing cores for execution, the task assigning unit (14) being configured to select the selected one of the virtual processing cores from among virtual processing cores defined by vacant ones of the threads (20) that are not used by any task, the task assigning unit (20) being configured to give selection preference to vacant ones of the threads dependent on respective aggregate counts for respective ones of the physical processing cores (10) on which the vacant ones of the threads (20) are executed, the respective aggregate counts being aggregate counts of threads (20) in one or more predetermined states (B) in the respective ones of the physical processing cores.
2. A data processing circuit according to claim 1, wherein the aggregate count is a count of blocked ones of the threads (20) in the respective group of threads of the respective one of physical processing cores (10), the blocked ones of the threads (20) being threads (20) executing tasks that are waiting for a resource, the task assigning unit (14) being configured to give selection preference to vacant ones of the threads (20) executing on physical processing cores (10) with higher count of blocked ones of the threads (20) over vacant ones of the threads (20) executing on physical processing cores with lower count of blocked ones of the threads (20).
3. A data processing circuit according to claim 2, wherein the task assigning unit (14) is configured to determine further counts, each of the vacant ones of the threads (20) in a respective one of the physical processing cores (10), the task assigning unit (14) being configured to give selection preference to vacant ones of the threads (20) dependent on combinations of the further counts and the counts of blocked ones of the threads (20) for the respective ones of the physical processing cores (10).
4. A data processing circuit according to claim 1, wherein the aggregate count for each respective one of the physical processing cores (10) is a respective count of blocked ones of the threads (20) and the vacant ones of the threads (20) in the respective groups of threads (20) of the physical processing core (10), the blocked ones of the threads (20) being threads (20) executing tasks that are waiting for a resource, the task assigning unit (14) being configured to give selection preference to vacant ones of the threads (20) executing on physical processing cores (10) with higher count of blocked and vacant ones of the threads (20) over vacant ones of the threads (20) executing on physical processing cores (10) with lower count of blocked and vacant ones of the threads (20).
5. A data processing circuit according to claim 1, wherein the task assigning unit (14) is configured to determine the aggregate counts by sampling states of the threads (20) and/or counts of threads (20) in the one or more predetermined states (B), during a predetermined time interval prior to selection among the vacant ones of the threads (20).
6. A data processing circuit according to claim 1, wherein the task assigning unit is configured to determine the count by averaging a number of threads (20) in the one or more predetermined state during a predetermined time interval prior to selection among the vacant ones of the threads (20).
7. A data processing circuit according to claim 1, wherein the threads (20) are configured to signal to the task assigning unit (14) when they are vacant.
8. A data processing circuit according to claim 1, wherein the physical processing cores are configured to signal a number of threads in the one or more predetermined states to the task assigning unit during operation.
9. A data processing circuit according to claim 1, comprising a resource circuit (12) comprising at least one of a main memory, a function specific computation circuit, an input interface circuit, and output interface circuit and an input-output interface circuit, the physical processing cores (10) being configured to block threads (20) that execute tasks waiting for grant and/or completion of access to the resource circuit (12).
10. A method of executing tasks in a data processing circuit that comprises a plurality of physical processing cores (10) executing threads subdivided into groups of threads, the method comprising: executing the threads (20) of each group on a time multiplex basis in a respective one of the physical processing cores (10); executing respective tasks, each using a virtual processing core implemented by a respective one of the threads (20); determining, at least for each physical processing core whose respective group of threads (20) includes a vacant one of the threads (20) not used by a task, an aggregate count of threads (20) in one or more predetermined states in the respective group of threads (20) of the physical processing core (10); selecting one of the vacant ones of the threads (20) from among the vacant ones of the threads (20), giving selection preference to vacant ones of the threads (20) dependent on the aggregate counts; when a new task is available, assigning the new task to the selected vacant one of the threads (20) for execution of the new task.
11. A method according to claim 10, comprising: detecting blocked ones of the threads (20) that are blocked from proceeding with execution of the tasks executed by the blocked ones of the threads (20); and wherein the aggregate counts are count of blocked ones of the threads (20) on respective ones of the physical processing cores (10), and selection preference is given to vacant ones of the threads executing on physical processing cores (10) with higher count of blocked ones of the threads (20) over vacant ones of the threads (20) executing on physical processing cores (10) with lower count of blocked ones of the threads (20).
12. A computer program product that comprises a program of instructions that, when executed by a programmable processor, causes the programmable processor to operate as a task assigning unit (14) in a data processing circuit with on virtual processing cores defined by threads (20), the threads (20) being organized in groups, each group executing on a respective physical processing core (10), the program of instructions being configured to cause programmable processor to determine, at least for each physical processing core (10) whose respective group of threads (20) includes a vacant one of the threads (20) not used by a task, an aggregate count of threads (20) one or more predetermined states (B) in the respective group of threads (20) of the physical processing core; select one of the vacant ones of the threads (20) from among the vacant ones of the threads (20), giving selection preference to vacant ones of the threads (20) dependent on the aggregate counts; when a new task is available, assign the new task to the selected vacant one of the threads (20) for execution of the new task.
PCT/IB2009/050505 2008-02-11 2009-02-09 Multiprocessing implementing a plurality of virtual processors WO2009101563A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
EP08101490.4 2008-02-11
EP08101490 2008-02-11

Publications (1)

Publication Number Publication Date
WO2009101563A1 true WO2009101563A1 (en) 2009-08-20

Family

ID=40585546

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IB2009/050505 WO2009101563A1 (en) 2008-02-11 2009-02-09 Multiprocessing implementing a plurality of virtual processors

Country Status (1)

Country Link
WO (1) WO2009101563A1 (en)

Cited By (31)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8209512B1 (en) * 2009-03-16 2012-06-26 Hewlett-Packard Development Company, L.P. Selecting a cell that is a preferred candidate for executing a process
WO2012135050A3 (en) * 2011-03-25 2012-11-29 Soft Machines, Inc. Memory fragments for supporting code block execution by using virtual cores instantiated by partitionable engines
US9195493B2 (en) 2014-03-27 2015-11-24 International Business Machines Corporation Dispatching multiple threads in a computer
US9213569B2 (en) 2014-03-27 2015-12-15 International Business Machines Corporation Exiting multiple threads in a computer
US9223574B2 (en) 2014-03-27 2015-12-29 International Business Machines Corporation Start virtual execution instruction for dispatching multiple threads in a computer
KR101738641B1 (en) 2010-12-17 2017-05-23 삼성전자주식회사 Apparatus and method for compilation of program on multi core system
US9766893B2 (en) 2011-03-25 2017-09-19 Intel Corporation Executing instruction sequence code blocks by using virtual cores instantiated by partitionable engines
US9772867B2 (en) 2014-03-27 2017-09-26 International Business Machines Corporation Control area for managing multiple threads in a computer
US9811342B2 (en) 2013-03-15 2017-11-07 Intel Corporation Method for performing dual dispatch of blocks and half blocks
US9811377B2 (en) 2013-03-15 2017-11-07 Intel Corporation Method for executing multithreaded instructions grouped into blocks
US9823930B2 (en) 2013-03-15 2017-11-21 Intel Corporation Method for emulating a guest centralized flag architecture by using a native distributed flag architecture
US9842005B2 (en) 2011-03-25 2017-12-12 Intel Corporation Register file segments for supporting code block execution by using virtual cores instantiated by partitionable engines
US9858080B2 (en) 2013-03-15 2018-01-02 Intel Corporation Method for implementing a reduced size register view data structure in a microprocessor
US9886279B2 (en) 2013-03-15 2018-02-06 Intel Corporation Method for populating and instruction view data structure by using register template snapshots
US9886416B2 (en) 2006-04-12 2018-02-06 Intel Corporation Apparatus and method for processing an instruction matrix specifying parallel and dependent operations
US9891924B2 (en) 2013-03-15 2018-02-13 Intel Corporation Method for implementing a reduced size register view data structure in a microprocessor
US9898412B2 (en) 2013-03-15 2018-02-20 Intel Corporation Methods, systems and apparatus for predicting the way of a set associative cache
US9934042B2 (en) 2013-03-15 2018-04-03 Intel Corporation Method for dependency broadcasting through a block organized source view data structure
US9940134B2 (en) 2011-05-20 2018-04-10 Intel Corporation Decentralized allocation of resources and interconnect structures to support the execution of instruction sequences by a plurality of engines
US9965281B2 (en) 2006-11-14 2018-05-08 Intel Corporation Cache storing data fetched by address calculating load instruction with label used as associated name for consuming instruction to refer
US10031784B2 (en) 2011-05-20 2018-07-24 Intel Corporation Interconnect system to support the execution of instruction sequences by a plurality of partitionable engines
US10140138B2 (en) 2013-03-15 2018-11-27 Intel Corporation Methods, systems and apparatus for supporting wide and efficient front-end operation with guest-architecture emulation
US10146548B2 (en) 2013-03-15 2018-12-04 Intel Corporation Method for populating a source view data structure by using register template snapshots
US10146592B2 (en) 2015-09-18 2018-12-04 Salesforce.Com, Inc. Managing resource allocation in a stream processing framework
US10169045B2 (en) 2013-03-15 2019-01-01 Intel Corporation Method for dependency broadcasting through a source organized source view data structure
US10191746B2 (en) 2011-11-22 2019-01-29 Intel Corporation Accelerated code optimizer for a multiengine microprocessor
US10198298B2 (en) * 2015-09-16 2019-02-05 Salesforce.Com, Inc. Handling multiple task sequences in a stream processing framework
US10198266B2 (en) 2013-03-15 2019-02-05 Intel Corporation Method for populating register view data structure by using register template snapshots
US10228949B2 (en) 2010-09-17 2019-03-12 Intel Corporation Single cycle multi-branch prediction including shadow cache for early far branch prediction
CN110597639A (en) * 2019-09-23 2019-12-20 腾讯科技(深圳)有限公司 CPU distribution control method, device, server and storage medium
US10521239B2 (en) 2011-11-22 2019-12-31 Intel Corporation Microprocessor accelerated code optimizer

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2003007105A2 (en) * 2000-11-24 2003-01-23 Catharon Productions, Inc. Computer multi-tasking via virtual threading

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2003007105A2 (en) * 2000-11-24 2003-01-23 Catharon Productions, Inc. Computer multi-tasking via virtual threading

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
ANONYMOUS: "PANAGIOTIS E. HADJIDOUKAS Home Page", INTERNET ARTICLE, XP002530791, Retrieved from the Internet <URL:http://www.cs.uoi.gr/~phadjido/> [retrieved on 20090507] *
ANONYMOUS: "Transactions on HiPEAC: Volume 3, Issue 2", INTERNET ARTICLE, XP002530792, Retrieved from the Internet <URL:http://www.hipeac.net/node/2414> [retrieved on 20090507] *
JAN HOOGERBRUGGE, ANDREI TERECHKO: "A Multithreaded Multicore System for Embedded Media Processing", TRANSACTIONS ON HIPEAC, vol. 3, no. 2, 2 June 2008 (2008-06-02), pages 168 - 187, XP002526866, Retrieved from the Internet <URL:http://www.hipeac.net/system/files/paper_2.pdf> [retrieved on 20090507] *
P.E. HADJIDOUKAS, V.V. DIMAKOPOULOS: "A Runtime Library for Lightweight Process-Scope Threads", INTERNET ARTICLE, September 2007 (2007-09-01), XP002526868, Retrieved from the Internet <URL:http://www.cs.uoi.gr/~phadjido/courses/E-85/download/software/psthreads_report.pdf> [retrieved on 20090507] *

Cited By (55)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9886416B2 (en) 2006-04-12 2018-02-06 Intel Corporation Apparatus and method for processing an instruction matrix specifying parallel and dependent operations
US10289605B2 (en) 2006-04-12 2019-05-14 Intel Corporation Apparatus and method for processing an instruction matrix specifying parallel and dependent operations
US11163720B2 (en) 2006-04-12 2021-11-02 Intel Corporation Apparatus and method for processing an instruction matrix specifying parallel and dependent operations
US10585670B2 (en) 2006-11-14 2020-03-10 Intel Corporation Cache storing data fetched by address calculating load instruction with label used as associated name for consuming instruction to refer
US9965281B2 (en) 2006-11-14 2018-05-08 Intel Corporation Cache storing data fetched by address calculating load instruction with label used as associated name for consuming instruction to refer
US8209512B1 (en) * 2009-03-16 2012-06-26 Hewlett-Packard Development Company, L.P. Selecting a cell that is a preferred candidate for executing a process
US10228949B2 (en) 2010-09-17 2019-03-12 Intel Corporation Single cycle multi-branch prediction including shadow cache for early far branch prediction
KR101738641B1 (en) 2010-12-17 2017-05-23 삼성전자주식회사 Apparatus and method for compilation of program on multi core system
US9274793B2 (en) 2011-03-25 2016-03-01 Soft Machines, Inc. Memory fragments for supporting code block execution by using virtual cores instantiated by partitionable engines
KR101826121B1 (en) * 2011-03-25 2018-02-06 인텔 코포레이션 Memory fragments for supporting code block execution by using virtual cores instantiated by partitionable engines
US9990200B2 (en) 2011-03-25 2018-06-05 Intel Corporation Executing instruction sequence code blocks by using virtual cores instantiated by partitionable engines
US9934072B2 (en) 2011-03-25 2018-04-03 Intel Corporation Register file segments for supporting code block execution by using virtual cores instantiated by partitionable engines
US9766893B2 (en) 2011-03-25 2017-09-19 Intel Corporation Executing instruction sequence code blocks by using virtual cores instantiated by partitionable engines
US9842005B2 (en) 2011-03-25 2017-12-12 Intel Corporation Register file segments for supporting code block execution by using virtual cores instantiated by partitionable engines
US11204769B2 (en) 2011-03-25 2021-12-21 Intel Corporation Memory fragments for supporting code block execution by using virtual cores instantiated by partitionable engines
KR101966712B1 (en) 2011-03-25 2019-04-09 인텔 코포레이션 Memory fragments for supporting code block execution by using virtual cores instantiated by partitionable engines
CN103635875A (en) * 2011-03-25 2014-03-12 索夫特机械公司 Memory fragments for supporting code block execution by using virtual cores instantiated by partitionable engines
US10564975B2 (en) 2011-03-25 2020-02-18 Intel Corporation Memory fragments for supporting code block execution by using virtual cores instantiated by partitionable engines
KR20180015754A (en) * 2011-03-25 2018-02-13 인텔 코포레이션 Memory fragments for supporting code block execution by using virtual cores instantiated by partitionable engines
WO2012135050A3 (en) * 2011-03-25 2012-11-29 Soft Machines, Inc. Memory fragments for supporting code block execution by using virtual cores instantiated by partitionable engines
US9921845B2 (en) 2011-03-25 2018-03-20 Intel Corporation Memory fragments for supporting code block execution by using virtual cores instantiated by partitionable engines
US10031784B2 (en) 2011-05-20 2018-07-24 Intel Corporation Interconnect system to support the execution of instruction sequences by a plurality of partitionable engines
US10372454B2 (en) 2011-05-20 2019-08-06 Intel Corporation Allocation of a segmented interconnect to support the execution of instruction sequences by a plurality of engines
US9940134B2 (en) 2011-05-20 2018-04-10 Intel Corporation Decentralized allocation of resources and interconnect structures to support the execution of instruction sequences by a plurality of engines
US10191746B2 (en) 2011-11-22 2019-01-29 Intel Corporation Accelerated code optimizer for a multiengine microprocessor
US10521239B2 (en) 2011-11-22 2019-12-31 Intel Corporation Microprocessor accelerated code optimizer
US9823930B2 (en) 2013-03-15 2017-11-21 Intel Corporation Method for emulating a guest centralized flag architecture by using a native distributed flag architecture
US10275255B2 (en) 2013-03-15 2019-04-30 Intel Corporation Method for dependency broadcasting through a source organized source view data structure
US9904625B2 (en) 2013-03-15 2018-02-27 Intel Corporation Methods, systems and apparatus for predicting the way of a set associative cache
US10140138B2 (en) 2013-03-15 2018-11-27 Intel Corporation Methods, systems and apparatus for supporting wide and efficient front-end operation with guest-architecture emulation
US10146548B2 (en) 2013-03-15 2018-12-04 Intel Corporation Method for populating a source view data structure by using register template snapshots
US11656875B2 (en) 2013-03-15 2023-05-23 Intel Corporation Method and system for instruction block to execution unit grouping
US10146576B2 (en) 2013-03-15 2018-12-04 Intel Corporation Method for executing multithreaded instructions grouped into blocks
US10169045B2 (en) 2013-03-15 2019-01-01 Intel Corporation Method for dependency broadcasting through a source organized source view data structure
US9898412B2 (en) 2013-03-15 2018-02-20 Intel Corporation Methods, systems and apparatus for predicting the way of a set associative cache
US10740126B2 (en) 2013-03-15 2020-08-11 Intel Corporation Methods, systems and apparatus for supporting wide and efficient front-end operation with guest-architecture emulation
US10198266B2 (en) 2013-03-15 2019-02-05 Intel Corporation Method for populating register view data structure by using register template snapshots
US9891924B2 (en) 2013-03-15 2018-02-13 Intel Corporation Method for implementing a reduced size register view data structure in a microprocessor
US10248570B2 (en) 2013-03-15 2019-04-02 Intel Corporation Methods, systems and apparatus for predicting the way of a set associative cache
US9886279B2 (en) 2013-03-15 2018-02-06 Intel Corporation Method for populating and instruction view data structure by using register template snapshots
US10255076B2 (en) 2013-03-15 2019-04-09 Intel Corporation Method for performing dual dispatch of blocks and half blocks
US9934042B2 (en) 2013-03-15 2018-04-03 Intel Corporation Method for dependency broadcasting through a block organized source view data structure
US9858080B2 (en) 2013-03-15 2018-01-02 Intel Corporation Method for implementing a reduced size register view data structure in a microprocessor
US9811377B2 (en) 2013-03-15 2017-11-07 Intel Corporation Method for executing multithreaded instructions grouped into blocks
US10503514B2 (en) 2013-03-15 2019-12-10 Intel Corporation Method for implementing a reduced size register view data structure in a microprocessor
US9811342B2 (en) 2013-03-15 2017-11-07 Intel Corporation Method for performing dual dispatch of blocks and half blocks
US9772867B2 (en) 2014-03-27 2017-09-26 International Business Machines Corporation Control area for managing multiple threads in a computer
US9223574B2 (en) 2014-03-27 2015-12-29 International Business Machines Corporation Start virtual execution instruction for dispatching multiple threads in a computer
US9213569B2 (en) 2014-03-27 2015-12-15 International Business Machines Corporation Exiting multiple threads in a computer
US9195493B2 (en) 2014-03-27 2015-11-24 International Business Machines Corporation Dispatching multiple threads in a computer
US10198298B2 (en) * 2015-09-16 2019-02-05 Salesforce.Com, Inc. Handling multiple task sequences in a stream processing framework
US11086687B2 (en) 2015-09-18 2021-08-10 Salesforce.Com, Inc. Managing resource allocation in a stream processing framework
US11086688B2 (en) 2015-09-18 2021-08-10 Salesforce.Com, Inc. Managing resource allocation in a stream processing framework
US10146592B2 (en) 2015-09-18 2018-12-04 Salesforce.Com, Inc. Managing resource allocation in a stream processing framework
CN110597639A (en) * 2019-09-23 2019-12-20 腾讯科技(深圳)有限公司 CPU distribution control method, device, server and storage medium

Similar Documents

Publication Publication Date Title
WO2009101563A1 (en) Multiprocessing implementing a plurality of virtual processors
JP3678414B2 (en) Multiprocessor system
US6748593B1 (en) Apparatus and method for starvation load balancing using a global run queue in a multiple run queue system
US6658449B1 (en) Apparatus and method for periodic load balancing in a multiple run queue system
US7065766B2 (en) Apparatus and method for load balancing of fixed priority threads in a multiple run queue environment
US8875151B2 (en) Load balancing method and apparatus in symmetric multi-processor system
US7950016B2 (en) Apparatus for switching the task to be completed in a processor by switching to the task assigned time slot
JP5770721B2 (en) Information processing system
US7487317B1 (en) Cache-aware scheduling for a chip multithreading processor
US8695004B2 (en) Method for distributing computing time in a computer system
US20030037091A1 (en) Task scheduling device
US9870228B2 (en) Prioritising of instruction fetching in microprocessor systems
CN109564528B (en) System and method for computing resource allocation in distributed computing
US7818747B1 (en) Cache-aware scheduling for a chip multithreading processor
US8627325B2 (en) Scheduling memory usage of a workload
CN106569887B (en) Fine-grained task scheduling method in cloud environment
US20090183166A1 (en) Algorithm to share physical processors to maximize processor cache usage and topologies
US20030110203A1 (en) Apparatus and method for dispatching fixed priority threads using a global run queue in a multiple run queue system
JP5397544B2 (en) Multi-core system, multi-core system scheduling method, and multi-core system scheduling program
CN111597044A (en) Task scheduling method and device, storage medium and electronic equipment
US8539491B1 (en) Thread scheduling in chip multithreading processors
Horowitz A run-time execution model for referential integrity maintenance
US11275621B2 (en) Device and method for selecting tasks and/or processor cores to execute processing jobs that run a machine
US20160267621A1 (en) Graphic processing system and method thereof
US9977751B1 (en) Method and apparatus for arbitrating access to shared resources

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 09709637

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 09709637

Country of ref document: EP

Kind code of ref document: A1