US20100235845A1 - Sub-task processor distribution scheduling - Google Patents

Sub-task processor distribution scheduling Download PDF

Info

Publication number
US20100235845A1
US20100235845A1 US12/786,250 US78625010A US2010235845A1 US 20100235845 A1 US20100235845 A1 US 20100235845A1 US 78625010 A US78625010 A US 78625010A US 2010235845 A1 US2010235845 A1 US 2010235845A1
Authority
US
United States
Prior art keywords
sub
tasks
task
nodes
local node
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US12/786,250
Inventor
John P. Bates
Payton R. White
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sony Interactive Entertainment Inc
Original Assignee
Sony Computer Entertainment Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sony Computer Entertainment Inc filed Critical Sony Computer Entertainment Inc
Priority to US12/786,250 priority Critical patent/US20100235845A1/en
Publication of US20100235845A1 publication Critical patent/US20100235845A1/en
Assigned to SONY INTERACTIVE ENTERTAINMENT INC. reassignment SONY INTERACTIVE ENTERTAINMENT INC. CHANGE OF NAME (SEE DOCUMENT FOR DETAILS). Assignors: SONY COMPUTER ENTERTAINMENT INC.
Abandoned legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/5044Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering hardware capabilities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/50Indexing scheme relating to G06F9/50
    • G06F2209/5017Task decomposition

Definitions

  • Embodiments of the present invention are related to distributed computing and more particularly to distribution of computing tasks among multiple processors.
  • FIG. 1 is a block diagram of a distributed processing system according to an embodiment of the present invention.
  • FIG. 2 is a flow diagram of a method according to an embodiment of the present invention.
  • FIG. 3 is a graph depicting plots of effective execution time versus number of processors for determination of whether or not to distribute a task according to an embodiment of the present invention.
  • FIG. 4 is a block diagram depicting distributed processing cost determination using a look-up table model according to an embodiment of the present invention.
  • FIG. 5 is a block diagram depicting distributed processing cost determination using a liquid capital market model according to an embodiment of the present invention.
  • FIG. 1 depicts an example of s system 100 that may implement embodiments of the present invention.
  • FIG. 2 depicts a flow diagram of a method 200 that may be implemented, e.g., using the system 100 of FIG. 1 .
  • the system 100 generally includes a local processor node 102 that is operably coupled to one or more other processors referred to herein as distributed nodes.
  • the processor node 102 generally includes one or more individual processor units 104 and may include a memory 106 .
  • the processor units 104 may include one or more cell processor units.
  • Each cell processor unit may include a power processing unit PU and one or more synergistic processor units SPU. Each SPU may have an associated local memory. Each of the processor units 104 may be regarded as a node. Similarly, each of the processor elements within a processor unit 104 , e.g., the PU and SPUs may be regarded as nodes. The PU and SPUs may be connected to each other through a bus 108 .
  • the processor unit 104 may exchange data and/or instructions with locally distributed nodes such as other processor units 104 within the local node 102 , e.g., through an input/output (I/O) element 110 and a data bus 112 sometimes referred to as a “blade”.
  • the processor units 104 may communicate with other locally distributed processor nodes 103 through the I/O element 110 and a local data bus 114 , such as a peripheral component interconnect (PCI) or PCI express (PCIE) data bus.
  • the local processor nodes 103 may include multiple processor units 105 , an I/O element 107 and internal data bus 109 .
  • the processor units 104 in the local processor node 102 may communicate with remotely distributed processor nodes 116 via a network interface 118 coupled to the I/O element 110 and one or more networks 120 .
  • Each remotely distributed processor node 116 may include multiple processors 117 , which may be configured as described above.
  • the networks 120 may include one or more local area networks and/or one or more wide area networks.
  • a local area network refers to a computer network that spans a relatively small area, e.g., a single building or group of buildings.
  • Each node (individual device) in a LAN typically has one or more processors with which it executes programs.
  • Each node may also be able to access data and devices anywhere on the LAN.
  • a LAN may be connected to other LANs over any distance, e.g., via telephone lines or radio waves.
  • a system of LANs connected in this way is referred to as a wide-area network (WAN).
  • WAN wide-area network
  • the Internet is an example of a WAN.
  • Any suitable architecture may be used to implement such networks, e.g., client/server or peer-to-peer architecture.
  • client/server or peer-to-peer architecture each node has equivalent capabilities and responsibilities. This differs from client/server architectures, in which some nodes are dedicated to serving the others.
  • Each processor unit 104 may operate by executing coded instructions 122 .
  • the coded instructions 122 may be broken down into a set of tasks. Many processing tasks that are to be executed by the local processor node 102 (or a processor unit 104 within the local processor node 102 ) may be further broken down into interdependent sub-tasks. The sub-tasks may be executed in parallel.
  • the coded instructions 122 may include a distributed scheduler distributes the sub-tasks amongst multiple processors that are accessible to the local node 102 .
  • the distributed scheduler may be implemented in hardware, software, firmware or some combination of two or more of these.
  • a key task for the distributed scheduler is to determine whether and how to distribute the sub-tasks amongst available processing resources.
  • the determination of the number of nodes x may be based on a vector of resource quantities that describes both resource availability and the characteristics of the request.
  • a vector may include parameters such as processor cycles, memory space, storage (e.g., hard disk) space and network bandwidth required to process the task.
  • Estimates for the availability of distributed nodes may be made by middleware. Instantaneous, average, and expected utilization may be taken into account. Applications may make rough estimates of resources required by its tasks.
  • the flow diagram of FIG. 2 illustrates one method 200 among others for distributing processing tasks.
  • Parameters relating to the vector of resource quantities may be collected as indicated at block 202 .
  • the code 122 may include a resource overlay routine that runs in the background of an application and collects information on parameters relevant to determining whether and how to distribute processing tasks.
  • Such parameters may include, but are not limited to execution times of tasks on a given node, size of data to be divided among sub-tasks, size of code or data needed by every sub-task, size of output data produced by a task, outgoing and incoming bandwidths for one or more nodes, round-trip message times to distributed nodes, processor availability, processor usage.
  • the parameters may optionally be stored at block 204 in a parameter table 205 , which may be stored in a memory accessible by the local node 102 .
  • the parameter table 205 may be part of a database DB stored in the main memory 106 of the local node 102 .
  • the parameter table 205 may be updated from time to time as conditions change while waiting for new tasks.
  • the local processor node 102 When the local processor node 102 generates a new task, as indicated at block 206 , a determination is made whether or not to distribute processing of the task. Specifically, the task may be divided into one or more sub-tasks, as indicated at block 208 . An optimum number of nodes x on which to process the one or more sub-tasks may then be determined, as indicated at block 210 .
  • the number of nodes x may be based at least partly on parameters relating to processing the sub-tasks at nodes accessible by the local node, e.g., parameters stored in the parameter table 205 . It is noted that many tasks may involve multiple sub-tasks of the same kind, e.g., same input data and code. It is also noted that the resources and parameters may change over time. Thus it may be useful to return to block 202 to re-collect the parameters. This can provide a check as to whether the resources and/or parameters have changed before determining at block 210 the optimum number of nodes x for the next sub-task of the same kind.
  • a determination may be made at block 212 whether to process the task at the local node 104 or distribute the task to one or more distributed nodes accessible by the local node. Where distribution does not make sense, e.g., where x 1, local processing is faster and local processing resources are available, the task may be processed locally, as indicated at block 214 .
  • the tasks may be allocated for processing at one or more distributed nodes.
  • the available bandwidth may depend on the nature of the data transmission path.
  • each of the data busses 108 , 112 , 114 and the networks 120 connected to the local node 104 may have different bandwidths.
  • the term bandwidth generally refers to a rate of transfer of information in a given amount of time.
  • Bandwidths for digital data are typically expressed in a number of bits or bytes per second.
  • the bandwidth for data transfers between the SPUs may be as high as about 100 gigabytes per second (GByte/s).
  • Bandwidth for data transfers from an SPU to the memory 106 may be as high as about 20 GByte/s. Data transfers between cells over a “blade” may have a bandwidth of about 30 GByte/s.
  • the local bus 112 e.g., PCIE, may have a bandwidth of 20 GByte/s.
  • LAN bandwidths may range from about 10 megabytes per second (MByte/s) to about 10 GByte/s.
  • Bandwidths for a WAN, such as the Internet may range from about 128 kilobytes per second (KByte/s) to about 10 MByte/s.
  • the determination of whether to distribute tasks at block 212 and how to distribute tasks at block 216 may also depend on the type of task involved. Many different types of data processing tasks and sub-tasks may be implemented by distributed computing according to embodiments of the present invention. Each sub-tasks may be characterized by a sub-task type that distinguishes one type of sub-task from another and/or provides information for determining the optimum number of nodes x and or how to distribute the task or sub-task amongst accessible nodes. Such task and sub-task types include, but are not limited to Complex start-to-finish tasks, Divisible-by-N tasks, Intercommunicative persistent tasks, and Stateless persistent tasks or services.
  • a static task file describes how tasks depend on each other and their required data/code.
  • An optimal number of processors may be statically determined from the task file. One option for doing this is to statically determine an optimal distribution of tasks. Another option is to dynamically execute tasks by treating the allocated processors as a thread pool. As processors finish tasks, they notify a task master and wait for the next task. Code and data resources may be pre-fetched, e.g., if the size in bytes is known, and the destination node's bandwidth is known. In the absence of an estimate of the execution time of each sub-task, pre-fetching may be done for all sub-tasks at a start time for the task.
  • Divisible-by-N tasks are tasks that can be divided up as much as there are available resources. Equations for the optimal number of nodes N to use for executing such tasks may be derived if the one-node-execution-time is known for the task. Factors such as bandwidth and distribution method (linear/server vs. logarithmic/P2P) may be taken into account in the equation. Divisible-by-N tasks may fall into a number of different task types. For example, in one type of task, the same data is sent to all nodes. Ray tracing tasks are an example of this task type. In another task type a piece of data sent to each node. The SETI (search for extra-terrestrial intelligence) project is an example of this task type.
  • each allocated task is initialized with its unique index in the range of [0, N), and N is the actual number of allocated processors.
  • the required data for each task is distributed in some suitable manner. Note that parallel sub-tasks of complex hierarchical tasks might fit into this category.
  • intercommunicative persistent tasks (sometimes also referred to as persistent interactive tasks) as many processors as there are tasks are allocated. The tasks begin executing and communicating freely. Complex hierarchical tasks may execute in this environment by pre-allocating all sub-tasks and passing messages to progress through the execution stages. As long as parallel tasks are allocated to different nodes/processors, the same (if not better) performance may be achieved. Servers and game objects are a few examples, among others, of intercommunicative persistent tasks.
  • Stateless persistent tasks or services are global functions that have inputs and outputs, but no state. Such tasks are generic, so they may be redundantly duplicated on multiple nodes to load balance. Stateless persistent tasks may be executed on any available node. Certain distributed scheduling implications are present with this type of task. For example, in determining at block 210 how many nodes x to distribute the task to and/or how to distribute the task at blocks 216 , 218 , it may be useful to know how the task spawns new copies when it gets overloaded. In addition, factors such as usage percent and locality may be useful for determining how to distribute the task at blocks 216 , 218 .
  • Processing tasks may be further categorized into one of two categories: one time tasks and persistent tasks.
  • One-time tasks use all or nearly all resources of a processor for some amount of time, which may be estimated by the application.
  • Persistent tasks by contrast, use on average less than all resources of a processor for an unknown amount of time.
  • Persistent tasks may be characterized by bursty processor usage, which is often based on message traffic.
  • Appropriate distribution of sub-tasks may utilize unused processor resources throughout a WAN, such as the Internet.
  • a WAN such as the Internet.
  • Such use of available resources may be implemented, e.g., within the context of computer gaming to enhance a game experience, serve shared-space objects, trans-code media streams, or serve a game in the traditional client-server model.
  • Designing a deployable framework that can support these use cases is a non-trivial task. For example, it is desirable for such a framework to be scalable, decentralized, secure, cheat-proof, and economically sound.
  • SETI extra-terrestrial intelligence
  • Execution time estimates may be required in order to determine the number of processor nodes x at block 210 and to determine if a particular task should be allocated locally, on a local are network (LAN), or on a wide area network (WAN), e.g., at block 216 .
  • Precise automatic estimates may be difficult, if not impossible, to obtain.
  • Run-time averages and/or static code analysis may generally only be useful for determining the execution time of constant-time algorithms (O(c)).
  • computationally-intensive tasks that are suitable for parallel execution are not often composed of constant-time algorithms and are more often composed of variable-length algorithms. The execution time of such variable-length algorithms may depend on input-parameter values which are not available until run-time.
  • the execution-time of a given function may be predicted by a second function that computes resource requirements based on input-parameters.
  • the second function may cause a significant amount of overhead, because it would have to be executed many times during run-time for every new set of parameters to a function.
  • the efficiency of the distributed computing depends roughly on the accuracy of the processor-usage estimates.
  • the execution time may be a useful measure when determining available processing power in a homogeneous computing environment. In a heterogeneous environment, the number of cycles would be more useful. For persistent tasks, it may be more useful to know the percentage of CPU usage over time (or cycles per second in a heterogeneous environment). Therefore, the processor usage value may have to be interpreted alongside the task type.
  • a significant threshold question at block 212 is whether to distribute processing tasks or not to distribute it. Many tasks may be visualized as a group of two or more interdependent sub-tasks. Often, some of the sub-tasks may be executed in parallel. Given several available options for processing a particular task it is useful for a distributed scheduler to decide whether the sub-tasks should be (1) executed on a single local node processors, (2) distributed among multiple local node processors, (3) distributed on a LAN, (4) distributed on a WAN, or (5) distributed in some other way. There is a sliding scale from local node to LAN to WAN. A number of factors may be used to help determine where a particular task lies on this scale.
  • the execution time for each parallel sub-task refers to the time it takes to execute each sub-task on a single processor of a given type.
  • the execution time generally depends on the nature of the sub-task and the type of processor used to execute the sub-task. If each parallel sub-task has a long execution time it may be more appropriate to distribute such sub-tasks over a WAN. By contrast, short execution time sub-tasks may be more suitable for distribution amongst processors available at a local node.
  • Another factor to consider is the number of sub-tasks to be executed. If there are a large number of sub-tasks it may be more appropriate to distribute them over a WAN. If there are only a few (particularly with short execution times) it may be more appropriate to distribute them to processors available at a local node.
  • synchronous interaction generally refers to a blocking communication between the sub-tasks. For example, a first task may send a message to second task and wait for a reply from the second task before continuing any computation. If there is a significant amount of synchronous interaction between parallel sub-tasks it may be more appropriate to distribute the parallel sub-tasks over processors available at a local node.
  • synchronous interaction may be significant if sub-tasks are spending more time waiting for synchronous replies from other sub-tasks than they are spending on computation, performance may be enhanced by reducing the communication latency between them. Communication latency may be reduced, e.g., by distributing the sub-tasks over a LAN instead of a WAN or over a local node instead of a LAN.
  • each sub-task e.g., the size of input data and/or binary code to be utilized by the sub-task. Where each sub-task requires a significant amount of data it may be more appropriate to distribute the parallel sub-tasks amongst processors at a local node. Where each sub-tasks requires relatively little data it may be more appropriate to distribute the sub-tasks over a WAN.
  • equations may be derived to determine an optimal number of nodes x on which to execute a given task at block 210 for many types of parallel computing tasks. Derivation of such equations may involve determining an effective execution time (EET) in terms of the number of processor nodes (x). The equations may also consider additional variables, including:
  • These quantities may be obtained as part of the collection of parameters at block 202 and may be stored in the parameter table 205 .
  • the effective execution time EET in terms of number of processor nodes x may be approximated by:
  • Equation 1 The first term on the right hand side of Equation 1 represents the time it takes for all processor nodes to receive the data needed to start execution.
  • the second term represents the time required to execute a single sub-task.
  • the third term represents the time required to send the results of the sub-tasks back to the source.
  • the expression f(x, TS, CS) in the first term of Equation 1 may be a distribution metric that calculates how much data is sent serially before every processor node begins executing. Data may be distributed either linearly (f 1 ), where the source node sends the data to all processor nodes, or logarithmically (f 2 ), where the source node is the root of a binary distribution tree.
  • the expression f(x, TS, CS) may take on the form
  • f(x, TS, CS) may take on the form:
  • the expression f(x, TS, CS) may preferably be in the form of a function f h that is a hybrid of the linear form f 1 of Equation 2 and the logarithmic form f 2 of Equation 3. It is generally more efficient to send out TS data (which is divided among all processor nodes) linearly. Furthermore, it is usually more efficient to send out the CS data logarithmically.
  • the f 2 equation sends the entire CS data each step. The timing may be improved if an infinitesimally small piece of CS data is sent each step. Then, the time it takes to distribute the entire amount of data becomes limited mostly by the RTT.
  • the hybrid distribution function f h may take on the form:
  • TS ⁇ CS When TS ⁇ CS, it is most efficient to have the source node help distribute the CS data logarithmically, and when that is complete, divide and send out TS data to all peers.
  • TS>CS it is more efficient to distribute CS and TS data in parallel.
  • the max function in the lower expression on the right hand side of Equation 4 above describes the parallel process of sending the CS out logarithmically and the TS out linearly.
  • the max function returns the maximum of CS+TS and 2CS+BW 0 RTT ⁇ log 2 (x+1). The execution phase cannot start until both CS and TS data is received by processor nodes—hence the use of the max function.
  • the scheduler can simply allocate the maximum. Based on these results, small, equal-length, equal-size tasks are best suited for distributed computing. This is because the scheduler can determine the optimal number of nodes based on the total execution time and total data size, and then distribute the small tasks evenly among the chosen processor nodes.
  • Determining the optimum number of nodes x at block 210 and/or determining whether to distribute at block 212 and/or determining the allocation of tasks or sub-tasks to distributed nodes at block 216 may involve consideration of additional metrics beyond the parameters discussed above. For example, such additional metrics include determining a cost of processing the sub-tasks. Determining the cost may involve determining a cost per node for executing the tasks on x nodes. The cost per node may depend on a number of factors, including amounts of bandwidth and/or execution time used on each distributed node.
  • the cost per node may depend on the cost per node is based on one or more of the number of nodes x, a desired quality of service, an amount of constant sized data, a cost of transmission and a cost of usage. All these factors may be taken into account when taking the cost of distributed processing into account.
  • FIG. 3 illustrates numerical examples of whether to distribute or not to distribute. Specifically, the solid plot in FIG. 3 depicts a graph of EET versus x for the following values of parameters:
  • the dashed plot in FIG. 3 depicts a graph of EET versus x for the following values of parameters:
  • FIG. 4 depicts one possible model 300 for determining the costs of using remotely distributed resources.
  • user nodes 302 and provider nodes 304 are connected over a network 306 .
  • Owners of the provider nodes 304 may post their available resources and corresponding costs on a look-up table 308 that is accessible over the network 306 .
  • the look-up table 308 may be stored in the memory of a node connected to the network 306 .
  • the user nodes 302 may find out what processing resources available and for what cost by consulting the look-up table 308 .
  • user nodes 402 and provider nodes 404 may interact via a network 406 with a liquid capital market 408 .
  • the user nodes 402 may submit task requests 410 to the liquid capital market 408 for listing.
  • the liquid capital market 408 may list each task request 410 , e.g., on a website that is accessible to other user nodes 402 and provider nodes 404 .
  • Each task request 410 may include information about a task that the user node 402 wishes to distribute amongst available providers. Such information may include various parameters discussed above.
  • the task requests 410 may also include a price that an operator of the user node is willing to pay for use of resources on provider nodes.
  • Such a price may be stated, e.g., in terms of currency or resources the node operator is willing to relinquish in exchange for use of the provider node's resources.
  • Provider nodes 404 may submit bids 412 on task requests 410 listed on the liquid capital market 408 . User node operators and provider node operators may then come to an agreement on resources to be provided and the price for those resources through a process of open and competitive bidding.
  • consideration of additional metrics may involve determining whether a proposed distribution of sub-processing tasks is consistent with one or more user defined policies. For example, if a user wants to execute sub-tasks only in the state of California, then this user defined policy may be applied to a resource selection metric to cull resources provided from other states.
  • the term cache generally refers to a region of high speed storage (e.g., static random access memory (SRAM)) associated with a particular processor or node.
  • SRAM static random access memory
  • a cache is commonly used to store data and/or instructions that are frequently used.
  • To facilitate distribution of tasks it may therefore be desirable to determine the contents of caches associated with one or more distributed nodes to determine if and/or data needed for processing a task or sub-task is present. Tasks and/or sub-tasks may then be preferentially distributed at block 216 or block 218 to nodes having the needed code and/or data.

Abstract

A method for processing of processor executable tasks and a processor readable medium having embodied therein processor executable instructions for implementing the method are disclosed. A system for distributing processing work amongst a plurality of distributed processors is also disclosed. A task generated with a local node is divided into one or more sub-tasks. An optimum number of nodes x on which to process the sub-tasks is determined If x is greater than one a determination is made to either (1) execute the task at the local node with the processor unit, (2), distribute the task among two or more local node processors, (3) distribute the task to one or more of the distributed nodes accessible to the local node over a LAN, or (4) distribute the task to one or more of the distributed nodes that are accessible to the local node over a WAN.

Description

    CLAIM OF PRIORITY
  • This application is a continuation and claims the priority benefit of commonly-assigned, co-pending U.S. patent application Ser. No. 11/459,301 entitled “SUB-TASK PROCESSOR DISTRIBUTION SCHEDULING” to John P. Bates and Payton R. White, filed Jul. 21, 2006, the entire disclosures of which are incorporated herein by reference.
  • FIELD OF THE INVENTION
  • Embodiments of the present invention are related to distributed computing and more particularly to distribution of computing tasks among multiple processors.
  • BACKGROUND OF THE INVENTION
  • A major advance in electronic computation has been the development of systems that can perform multiple operations simultaneously. Such systems are said to perform parallel processing. Many computation tasks can be regarded as interdependent sub-tasks. Often, some of these sub-tasks may be implemented by parallel processing by distributing the tasks amongst local or remote processors.
  • It is within this context that embodiments of the present invention arise.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • The teachings of the present invention can be readily understood by considering the following detailed description in conjunction with the accompanying drawings, in which:
  • FIG. 1 is a block diagram of a distributed processing system according to an embodiment of the present invention.
  • FIG. 2 is a flow diagram of a method according to an embodiment of the present invention.
  • FIG. 3 is a graph depicting plots of effective execution time versus number of processors for determination of whether or not to distribute a task according to an embodiment of the present invention.
  • FIG. 4 is a block diagram depicting distributed processing cost determination using a look-up table model according to an embodiment of the present invention.
  • FIG. 5 is a block diagram depicting distributed processing cost determination using a liquid capital market model according to an embodiment of the present invention.
  • DESCRIPTION OF THE SPECIFIC EMBODIMENTS
  • Although the following detailed description contains many specific details for the purposes of illustration, anyone of ordinary skill in the art will appreciate that many variations and alterations to the following details are within the scope of the invention. Accordingly, the exemplary embodiments of the invention described below are set forth without any loss of generality to, and without imposing limitations upon, the claimed invention.
  • Embodiments of the present invention may be understood by reference to FIG. 1 and FIG. 2. FIG. 1 depicts an example of s system 100 that may implement embodiments of the present invention. FIG. 2 depicts a flow diagram of a method 200 that may be implemented, e.g., using the system 100 of FIG. 1. The system 100 generally includes a local processor node 102 that is operably coupled to one or more other processors referred to herein as distributed nodes. The processor node 102 generally includes one or more individual processor units 104 and may include a memory 106. By way of example, and without loss of generality, the processor units 104 may include one or more cell processor units. Each cell processor unit may include a power processing unit PU and one or more synergistic processor units SPU. Each SPU may have an associated local memory. Each of the processor units 104 may be regarded as a node. Similarly, each of the processor elements within a processor unit 104, e.g., the PU and SPUs may be regarded as nodes. The PU and SPUs may be connected to each other through a bus 108.
  • The processor unit 104 may exchange data and/or instructions with locally distributed nodes such as other processor units 104 within the local node 102, e.g., through an input/output (I/O) element 110 and a data bus 112 sometimes referred to as a “blade”. The processor units 104 may communicate with other locally distributed processor nodes 103 through the I/O element 110 and a local data bus 114, such as a peripheral component interconnect (PCI) or PCI express (PCIE) data bus. The local processor nodes 103 may include multiple processor units 105, an I/O element 107 and internal data bus 109. The processor units 104 in the local processor node 102 may communicate with remotely distributed processor nodes 116 via a network interface 118 coupled to the I/O element 110 and one or more networks 120. Each remotely distributed processor node 116 may include multiple processors 117, which may be configured as described above.
  • The networks 120 may include one or more local area networks and/or one or more wide area networks. As used herein, a local area network (LAN) refers to a computer network that spans a relatively small area, e.g., a single building or group of buildings. Each node (individual device) in a LAN typically has one or more processors with which it executes programs. Each node may also be able to access data and devices anywhere on the LAN. A LAN may be connected to other LANs over any distance, e.g., via telephone lines or radio waves. A system of LANs connected in this way is referred to as a wide-area network (WAN). The Internet is an example of a WAN. Any suitable architecture may be used to implement such networks, e.g., client/server or peer-to-peer architecture. In a peer-to-peer (P2P) architecture, each node has equivalent capabilities and responsibilities. This differs from client/server architectures, in which some nodes are dedicated to serving the others.
  • Each processor unit 104 may operate by executing coded instructions 122. The coded instructions 122 may be broken down into a set of tasks. Many processing tasks that are to be executed by the local processor node 102 (or a processor unit 104 within the local processor node 102 ) may be further broken down into interdependent sub-tasks. The sub-tasks may be executed in parallel. The coded instructions 122 may include a distributed scheduler distributes the sub-tasks amongst multiple processors that are accessible to the local node 102. The distributed scheduler may be implemented in hardware, software, firmware or some combination of two or more of these. A key task for the distributed scheduler is to determine whether and how to distribute the sub-tasks amongst available processing resources.
  • The determination of the number of nodes x may be based on a vector of resource quantities that describes both resource availability and the characteristics of the request. Such a vector may include parameters such as processor cycles, memory space, storage (e.g., hard disk) space and network bandwidth required to process the task. Estimates for the availability of distributed nodes may be made by middleware. Instantaneous, average, and expected utilization may be taken into account. Applications may make rough estimates of resources required by its tasks.
  • The flow diagram of FIG. 2 illustrates one method 200 among others for distributing processing tasks. Parameters relating to the vector of resource quantities may be collected as indicated at block 202. For example, the code 122 may include a resource overlay routine that runs in the background of an application and collects information on parameters relevant to determining whether and how to distribute processing tasks. Such parameters may include, but are not limited to execution times of tasks on a given node, size of data to be divided among sub-tasks, size of code or data needed by every sub-task, size of output data produced by a task, outgoing and incoming bandwidths for one or more nodes, round-trip message times to distributed nodes, processor availability, processor usage.
  • The parameters may optionally be stored at block 204 in a parameter table 205, which may be stored in a memory accessible by the local node 102. For example the parameter table 205 may be part of a database DB stored in the main memory 106 of the local node 102. The parameter table 205 may be updated from time to time as conditions change while waiting for new tasks. When the local processor node 102 generates a new task, as indicated at block 206, a determination is made whether or not to distribute processing of the task. Specifically, the task may be divided into one or more sub-tasks, as indicated at block 208. An optimum number of nodes x on which to process the one or more sub-tasks may then be determined, as indicated at block 210. The number of nodes x may be based at least partly on parameters relating to processing the sub-tasks at nodes accessible by the local node, e.g., parameters stored in the parameter table 205. It is noted that many tasks may involve multiple sub-tasks of the same kind, e.g., same input data and code. It is also noted that the resources and parameters may change over time. Thus it may be useful to return to block 202 to re-collect the parameters. This can provide a check as to whether the resources and/or parameters have changed before determining at block 210 the optimum number of nodes x for the next sub-task of the same kind.
  • Based on the value of x, a determination may be made at block 212 whether to process the task at the local node 104 or distribute the task to one or more distributed nodes accessible by the local node. Where distribution does not make sense, e.g., where x=1, local processing is faster and local processing resources are available, the task may be processed locally, as indicated at block 214.
  • If at block 212 it is determined that distribution makes sense, e.g., if x>1 and/or other criteria are satisfied, the tasks (or sub-tasks) may be allocated for processing at one or more distributed nodes. The nature of the distribution may depend on the value of x and the number of nodes available. For example, if x nodes are available the task (or sub-tasks) may be sent to x distributed nodes for processing, as indicated at block 216. If fewer than x nodes are available the task may be split to available nodes, as indicated at block 218. For example, suppose it is determined that x=20 nodes are optimal for processing a particular task and only ten are available. In such a case, the task may be split into two portions. Half of the work may be assigned to the ten available nodes and the remainder may be assigned to other nodes as they become available.
  • Determining whether and how to distribute tasks often depends on the bandwidth available for transmission of data to a distributed node. The available bandwidth may depend on the nature of the data transmission path. For example, each of the data busses 108, 112, 114 and the networks 120 connected to the local node 104 may have different bandwidths. As used herein, the term bandwidth generally refers to a rate of transfer of information in a given amount of time. Bandwidths for digital data are typically expressed in a number of bits or bytes per second. By way of example, in a cell processor, the bandwidth for data transfers between the SPUs may be as high as about 100 gigabytes per second (GByte/s). Bandwidth for data transfers from an SPU to the memory 106 may be as high as about 20 GByte/s. Data transfers between cells over a “blade” may have a bandwidth of about 30 GByte/s. The local bus 112, e.g., PCIE, may have a bandwidth of 20 GByte/s. LAN bandwidths may range from about 10 megabytes per second (MByte/s) to about 10 GByte/s. Bandwidths for a WAN, such as the Internet, may range from about 128 kilobytes per second (KByte/s) to about 10 MByte/s.
  • The determination of whether to distribute tasks at block 212 and how to distribute tasks at block 216 may also depend on the type of task involved. Many different types of data processing tasks and sub-tasks may be implemented by distributed computing according to embodiments of the present invention. Each sub-tasks may be characterized by a sub-task type that distinguishes one type of sub-task from another and/or provides information for determining the optimum number of nodes x and or how to distribute the task or sub-task amongst accessible nodes. Such task and sub-task types include, but are not limited to Complex start-to-finish tasks, Divisible-by-N tasks, Intercommunicative persistent tasks, and Stateless persistent tasks or services.
  • In Complex start-to-finish tasks a static task file describes how tasks depend on each other and their required data/code. An optimal number of processors may be statically determined from the task file. One option for doing this is to statically determine an optimal distribution of tasks. Another option is to dynamically execute tasks by treating the allocated processors as a thread pool. As processors finish tasks, they notify a task master and wait for the next task. Code and data resources may be pre-fetched, e.g., if the size in bytes is known, and the destination node's bandwidth is known. In the absence of an estimate of the execution time of each sub-task, pre-fetching may be done for all sub-tasks at a start time for the task.
  • Divisible-by-N tasks are tasks that can be divided up as much as there are available resources. Equations for the optimal number of nodes N to use for executing such tasks may be derived if the one-node-execution-time is known for the task. Factors such as bandwidth and distribution method (linear/server vs. logarithmic/P2P) may be taken into account in the equation. Divisible-by-N tasks may fall into a number of different task types. For example, in one type of task, the same data is sent to all nodes. Ray tracing tasks are an example of this task type. In another task type a piece of data sent to each node. The SETI (search for extra-terrestrial intelligence) project is an example of this task type. In divisible-by-N tasks, each allocated task is initialized with its unique index in the range of [0, N), and N is the actual number of allocated processors. The required data for each task is distributed in some suitable manner. Note that parallel sub-tasks of complex hierarchical tasks might fit into this category.
  • In intercommunicative persistent tasks (sometimes also referred to as persistent interactive tasks) as many processors as there are tasks are allocated. The tasks begin executing and communicating freely. Complex hierarchical tasks may execute in this environment by pre-allocating all sub-tasks and passing messages to progress through the execution stages. As long as parallel tasks are allocated to different nodes/processors, the same (if not better) performance may be achieved. Servers and game objects are a few examples, among others, of intercommunicative persistent tasks.
  • Stateless persistent tasks or services are global functions that have inputs and outputs, but no state. Such tasks are generic, so they may be redundantly duplicated on multiple nodes to load balance. Stateless persistent tasks may be executed on any available node. Certain distributed scheduling implications are present with this type of task. For example, in determining at block 210 how many nodes x to distribute the task to and/or how to distribute the task at blocks 216, 218, it may be useful to know how the task spawns new copies when it gets overloaded. In addition, factors such as usage percent and locality may be useful for determining how to distribute the task at blocks 216, 218.
  • Processing tasks, including those listed above, may be further categorized into one of two categories: one time tasks and persistent tasks. One-time tasks use all or nearly all resources of a processor for some amount of time, which may be estimated by the application. Persistent tasks, by contrast, use on average less than all resources of a processor for an unknown amount of time. Persistent tasks may be characterized by bursty processor usage, which is often based on message traffic.
  • Appropriate distribution of sub-tasks may utilize unused processor resources throughout a WAN, such as the Internet. Such use of available resources may be implemented, e.g., within the context of computer gaming to enhance a game experience, serve shared-space objects, trans-code media streams, or serve a game in the traditional client-server model. Designing a deployable framework that can support these use cases is a non-trivial task. For example, it is desirable for such a framework to be scalable, decentralized, secure, cheat-proof, and economically sound. Unfortunately, there are some common problems with existing distributed computing frameworks. Most such frameworks utilize a centralized core or are designed for “infinite workpile” tasks such as the search for extra-terrestrial intelligence (SETI).
  • Execution time estimates (and other resource usage estimates) may be required in order to determine the number of processor nodes x at block 210 and to determine if a particular task should be allocated locally, on a local are network (LAN), or on a wide area network (WAN), e.g., at block 216. Precise automatic estimates may be difficult, if not impossible, to obtain. Run-time averages and/or static code analysis may generally only be useful for determining the execution time of constant-time algorithms (O(c)). In addition, computationally-intensive tasks that are suitable for parallel execution are not often composed of constant-time algorithms and are more often composed of variable-length algorithms. The execution time of such variable-length algorithms may depend on input-parameter values which are not available until run-time. The execution-time of a given function may be predicted by a second function that computes resource requirements based on input-parameters. However, the second function may cause a significant amount of overhead, because it would have to be executed many times during run-time for every new set of parameters to a function. The efficiency of the distributed computing depends roughly on the accuracy of the processor-usage estimates.
  • In embodiments of the present invention it may also be desirable to consider other measures of available processor power when determining when and how to distribute multiple tasks or sub-tasks at blocks 212-216. For one-time tasks, the execution time may be a useful measure when determining available processing power in a homogeneous computing environment. In a heterogeneous environment, the number of cycles would be more useful. For persistent tasks, it may be more useful to know the percentage of CPU usage over time (or cycles per second in a heterogeneous environment). Therefore, the processor usage value may have to be interpreted alongside the task type.
  • A significant threshold question at block 212 is whether to distribute processing tasks or not to distribute it. Many tasks may be visualized as a group of two or more interdependent sub-tasks. Often, some of the sub-tasks may be executed in parallel. Given several available options for processing a particular task it is useful for a distributed scheduler to decide whether the sub-tasks should be (1) executed on a single local node processors, (2) distributed among multiple local node processors, (3) distributed on a LAN, (4) distributed on a WAN, or (5) distributed in some other way. There is a sliding scale from local node to LAN to WAN. A number of factors may be used to help determine where a particular task lies on this scale.
  • One factor that may be considered in determining whether to distribute at block 212 is the execution time for each parallel sub-task. The execution time for each parallel sub-task refers to the time it takes to execute each sub-task on a single processor of a given type. The execution time generally depends on the nature of the sub-task and the type of processor used to execute the sub-task. If each parallel sub-task has a long execution time it may be more appropriate to distribute such sub-tasks over a WAN. By contrast, short execution time sub-tasks may be more suitable for distribution amongst processors available at a local node.
  • Another factor to consider is the number of sub-tasks to be executed. If there are a large number of sub-tasks it may be more appropriate to distribute them over a WAN. If there are only a few (particularly with short execution times) it may be more appropriate to distribute them to processors available at a local node.
  • An additional factor to consider is the amount of synchronous interaction between the parallel sub-tasks. As used herein, synchronous interaction generally refers to a blocking communication between the sub-tasks. For example, a first task may send a message to second task and wait for a reply from the second task before continuing any computation. If there is a significant amount of synchronous interaction between parallel sub-tasks it may be more appropriate to distribute the parallel sub-tasks over processors available at a local node.
  • If there is a relatively small amount of synchronous interaction between parallel sub-tasks (or none at all) it may be more appropriate to distribute the parallel sub-tasks over a WAN. A number of factors may determine whether a given amount of synchronous interaction is “significant” or “relatively small”. For example, synchronous interaction may be significant if sub-tasks are spending more time waiting for synchronous replies from other sub-tasks than they are spending on computation, performance may be enhanced by reducing the communication latency between them. Communication latency may be reduced, e.g., by distributing the sub-tasks over a LAN instead of a WAN or over a local node instead of a LAN.
  • Yet another factor to consider is the amount of data needed for each sub-task, e.g., the size of input data and/or binary code to be utilized by the sub-task. Where each sub-task requires a significant amount of data it may be more appropriate to distribute the parallel sub-tasks amongst processors at a local node. Where each sub-tasks requires relatively little data it may be more appropriate to distribute the sub-tasks over a WAN.
  • To determine the number of nodes x at block 210 and/or distribute the tasks or sub-tasks at blocks 216, 218 it is often desirable to obtain estimates for task execution times and data requirements n order to assure better performance through distributed computing. Without such estimates, distributed computation may degrade the performance of a task (compared to executing the task locally).
  • According to embodiments of the present invention, equations may be derived to determine an optimal number of nodes x on which to execute a given task at block 210 for many types of parallel computing tasks. Derivation of such equations may involve determining an effective execution time (EET) in terms of the number of processor nodes (x). The equations may also consider additional variables, including:
      • ET: representing the execution time of all tasks on one node.
      • TS: representing a total size of data which is divided among sub-tasks.
      • CS: representing a constant sized data needed by every sub-task (ex: code size).
      • RS: representing a total size of output data produced by the tasks.
      • BWo, BWi: respectively representing outgoing and incoming bandwidths for all processor nodes.
      • RTT: representing a round-trip message time to processor node.
  • These quantities may be obtained as part of the collection of parameters at block 202 and may be stored in the parameter table 205.
  • According to an embodiment of the invention, the effective execution time EET in terms of number of processor nodes x may be approximated by:
  • EET f ( x , TS , CS ) BW o + ET x + RS BW i + RTT Equation 1
  • The first term on the right hand side of Equation 1 represents the time it takes for all processor nodes to receive the data needed to start execution. The second term represents the time required to execute a single sub-task. The third term represents the time required to send the results of the sub-tasks back to the source.
  • By way of example, the expression f(x, TS, CS) in the first term of Equation 1 may be a distribution metric that calculates how much data is sent serially before every processor node begins executing. Data may be distributed either linearly (f1), where the source node sends the data to all processor nodes, or logarithmically (f2), where the source node is the root of a binary distribution tree. The expression f(x, TS, CS) may take on the form

  • f 1 =CS·x+TS   Equation 2
  • if the data is distributed linearly. Alternatively, the expression f(x, TS, CS) may take on the form:
  • f 2 = CS · log 2 x + 2 TS + TS - 2 TS · log 2 x x - 2 TS x 2 Equation 3
  • if the data is distributed logarithmically.
  • According to embodiments of the present invention the expression f(x, TS, CS) may preferably be in the form of a function fhthat is a hybrid of the linear form f1of Equation 2 and the logarithmic form f2of Equation 3. It is generally more efficient to send out TS data (which is divided among all processor nodes) linearly. Furthermore, it is usually more efficient to send out the CS data logarithmically. The f2 equation sends the entire CS data each step. The timing may be improved if an infinitesimally small piece of CS data is sent each step. Then, the time it takes to distribute the entire amount of data becomes limited mostly by the RTT. The hybrid distribution function fh may take on the form:
  • f h ( h ) = { CS + TS + BW o RTT · log 2 ( x + 1 ) TS CS max ( CS + TS , 2 CS + BW o RTT · log 2 ( x + 1 ) ) . Equation 4
  • When TS<CS, it is most efficient to have the source node help distribute the CS data logarithmically, and when that is complete, divide and send out TS data to all peers. When TS>CS, it is more efficient to distribute CS and TS data in parallel. The max function in the lower expression on the right hand side of Equation 4 above describes the parallel process of sending the CS out logarithmically and the TS out linearly. The max function returns the maximum of CS+TS and 2CS+BW0RTT·log2(x+1). The execution phase cannot start until both CS and TS data is received by processor nodes—hence the use of the max function.
  • As an example, consider a distributed ray-tracing task, where all nodes need the same scene data. CS is large, and TS is effectively zero, so the equation for EET is:
  • EET CS BW o + RTT log 2 ( + ET x + RS BW i + RTT Equation 5
  • To find an equation for the optimal number of nodes on which to execute the task one may calculate the number of nodes X for which EET is the smallest). To determine this value of X, one may take a derivative of Equation 5 for EET with respect to x, and then find the value of x for which EET′=0.
  • EET RTT x · log 2 - ET x 2 = 0 Equation 6
  • Which yields:
  • x ET · log 2 RTT Equation 7
  • For realistic tasks, there may be a maximum possible number of task subdivisions. However, if the computed optimal number of execution nodes ends up being greater than the maximum possible subdivisions, the scheduler can simply allocate the maximum. Based on these results, small, equal-length, equal-size tasks are best suited for distributed computing. This is because the scheduler can determine the optimal number of nodes based on the total execution time and total data size, and then distribute the small tasks evenly among the chosen processor nodes.
  • Determining the optimum number of nodes x at block 210 and/or determining whether to distribute at block 212 and/or determining the allocation of tasks or sub-tasks to distributed nodes at block 216 may involve consideration of additional metrics beyond the parameters discussed above. For example, such additional metrics include determining a cost of processing the sub-tasks. Determining the cost may involve determining a cost per node for executing the tasks on x nodes. The cost per node may depend on a number of factors, including amounts of bandwidth and/or execution time used on each distributed node. In addition the cost per node may depend on the cost per node is based on one or more of the number of nodes x, a desired quality of service, an amount of constant sized data, a cost of transmission and a cost of usage. All these factors may be taken into account when taking the cost of distributed processing into account.
  • FIG. 3 illustrates numerical examples of whether to distribute or not to distribute. Specifically, the solid plot in FIG. 3 depicts a graph of EET versus x for the following values of parameters:
      • ET=5 seconds
      • TS=1 KByte
      • CS=1 KByte
      • RS=10 KBytes
      • BWo=30 KBytes/s
      • BWi=400 KBytes/s.
      • RTT=0.2 seconds.
  • As can be seen from the solid plot in FIG. 3, the processing time for x=1 processor is 5 seconds and a minimum EET of about 1.4 seconds is obtained for distributed processing using x=17 processors. In this example, based on the significantly shorter EET for 17 processors it makes sense to distribute.
  • The dashed plot in FIG. 3 depicts a graph of EET versus x for the following values of parameters:
      • ET=5 seconds
      • TS=100 KBytes
      • CS=0 KByte
      • RS=10 KBytes
      • BWo=30 KBytes/s
      • BWi=400 KBytes/s.
      • RTT=0.2 seconds.
  • As can be seen from the dashed plot in FIG. 3, the processing time for x=1 processor is 1 second the minimum EET. Even on 100 nodes, the next best EET value is 3.5 seconds. In this example, based on the significantly longer EET for any number of processors greater than 1 it makes sense not to distribute the task.
  • It is possible that costs for distributed processing may vary based on who is providing the resources for processing. Owners of such resources may reasonably be expected to be compensated for making them available. The amount of such compensation may vary from provider to provider. Thus, it is useful for users of distributed processing resources to be able to determine the cost of using such resources. FIG. 4 depicts one possible model 300 for determining the costs of using remotely distributed resources. In the model 300 user nodes 302 and provider nodes 304 are connected over a network 306. Owners of the provider nodes 304 may post their available resources and corresponding costs on a look-up table 308 that is accessible over the network 306. By way of example, the look-up table 308 may be stored in the memory of a node connected to the network 306. The user nodes 302 may find out what processing resources available and for what cost by consulting the look-up table 308.
  • In an alternative model 400 shown in FIG. 5, user nodes 402 and provider nodes 404 may interact via a network 406 with a liquid capital market 408. The user nodes 402 may submit task requests 410 to the liquid capital market 408 for listing. The liquid capital market 408 may list each task request 410, e.g., on a website that is accessible to other user nodes 402 and provider nodes 404. Each task request 410 may include information about a task that the user node 402 wishes to distribute amongst available providers. Such information may include various parameters discussed above. The task requests 410 may also include a price that an operator of the user node is willing to pay for use of resources on provider nodes. Such a price may be stated, e.g., in terms of currency or resources the node operator is willing to relinquish in exchange for use of the provider node's resources. Provider nodes 404 may submit bids 412 on task requests 410 listed on the liquid capital market 408. User node operators and provider node operators may then come to an agreement on resources to be provided and the price for those resources through a process of open and competitive bidding.
  • In alternative embodiments consideration of additional metrics may involve determining whether a proposed distribution of sub-processing tasks is consistent with one or more user defined policies. For example, if a user wants to execute sub-tasks only in the state of California, then this user defined policy may be applied to a resource selection metric to cull resources provided from other states.
  • Additional considerations may enter into the determination of how to distribute processing tasks at blocks 216, 218. For example, data transmission and processing times may be greatly reduced if code and/or data needed for performing a given task or sub-tasks are already present in the cache of a distributed node. As used herein, the term cache generally refers to a region of high speed storage (e.g., static random access memory (SRAM)) associated with a particular processor or node. A cache is commonly used to store data and/or instructions that are frequently used. To facilitate distribution of tasks it may therefore be desirable to determine the contents of caches associated with one or more distributed nodes to determine if and/or data needed for processing a task or sub-task is present. Tasks and/or sub-tasks may then be preferentially distributed at block 216 or block 218 to nodes having the needed code and/or data.
  • While the above is a complete description of the preferred embodiment of the present invention, it is possible to use various alternatives, modifications and equivalents. Therefore, the scope of the present invention should be determined not with reference to the above description but should, instead, be determined with reference to the appended claims, along with their full scope of equivalents. Any feature described herein, whether preferred or not, may be combined with any other feature described herein, whether preferred or not. In the claims that follow, the indefinite article “A”, or “An” refers to a quantity of one or more of the item following the article, except where expressly stated otherwise. The appended claims are not to be interpreted as including means-plus-function limitations, unless such a limitation is explicitly recited in a given claim using the phrase “means for.”

Claims (36)

1. A method for processing of processor executable tasks, comprising:
generating a task with a local node, wherein the local node includes one or more processor units operably coupled to one or more distributed nodes;
dividing the task into one or more sub-tasks;
determining an optimum number of nodes x on which to process the one or more sub-tasks, wherein x is based at least partly on parameters relating to processing the sub-tasks at nodes accessible by the local node; and
if x is greater than one, based on the value of x, making a determination as to whether to (1) execute the task at the local node with the processor unit, (2), distribute the task among two or more local node processors, (3) distribute the task to one or more of the distributed nodes accessible to the local node over a LAN, or (4) distribute the task to one or more of the distributed nodes that are accessible to the local node over a WAN; and
implementing (1), (2), (3), or (4) with the local node according to the determination.
2. The method of claim 1, further comprising distributing the sub-tasks to x nodes for processing.
3. The method of claim 2, further comprising retrieving output data for the sub-tasks from the x nodes.
4. The method of claim 1, further comprising collecting the parameters relating to processing the sub-tasks.
5. The method of claim 4, further comprising storing the parameters in a data storage medium that is accessible by the local node.
6. The method of claim 4, wherein the parameters include one or more of a number of available nodes, data transfer rates between the local node and one or more distributed nodes, round trip times between the local node and one or more distributed nodes, a number processor cycles for each sub-task, an amount of memory space required for each sub-task, an amount of storage space required for each sub-tasks, and a network bandwidth available for transmitting data related to the sub-tasks.
7. The method of claim 1 wherein determining the optimum number of nodes x is based on a minimum effective execution time (EET) for the task.
8. The method of claim 7 wherein the EET is calculated by a formula of the type:
EET f ( x , TS , CS ) BW o + ET x + RS BW i + RTT ,
where ET represents an execution time of all sub-tasks on one node,
TS represents a total size of data which is divided among the sub-tasks,
CS represents a constant sized data needed by each sub-task,
RS represents a total size of output data produced by the tasks
BWo, BWi respectively represent outgoing and incoming bandwidths for all nodes, and
RTT represents a round-trip message time from the local node to a distributed node.
9. The method of claim 1 wherein x is based on an execution time of all tasks on one node (ET) and a round-trip message time (RTT) from the local node to a distributed node.
10. The method of claim 9 wherein the determination of x is based on a ratio of ET to RTT.
11. The method of claim 1 wherein determining the optimum number of nodes x includes a consideration of additional metrics.
12. The method of claim 11 wherein the consideration of the additional metrics include determining a cost of processing the sub-tasks.
13. The method of claim 12 wherein determining the cost of processing the sub-tasks includes determining a cost per node.
14. The method of claim 13 wherein determining the cost per node includes determining an amount of bandwidth or execution time used on each distributed node.
15. The method of claim 13 wherein determining the cost per node is based on one or more of the number of nodes x, a desired quality of service, an amount of constant sized data, a cost of transmission and a cost of usage.
16. The method of claim 13 wherein determining a cost of processing the sub-tasks includes obtaining a cost from a look-up table of costs from providers.
17. The method of claim 13 wherein determining a cost of processing the sub-tasks includes the use of a liquid capital market in which providers bid for customers.
18. The method of claim 11 wherein the additional metrics include one or more user defined policies.
19. The method of claim 1 wherein each sub-tasks is characterized by a sub-task type, wherein the sub-task type distinguishes one type of sub-task from another and/or provides information for determining the optimum number of nodes x.
20. The method of claim 1, further comprising determining the contents of caches associated with one or more distributed nodes.
21. The method of claim 20, further comprising preferentially distributing sub-tasks to nodes having code and/or data needed for processing the sub-task.
22. A system for distributing processing work amongst a plurality of distributed nodes, the system comprising:
a local node connected to the plurality of distributed nodes, wherein the local node includes one or more processor units;
processor executable instructions embodied in a processor readable storage medium for execution by the one or more processor units, the instructions including:
one or more instructions for generating a task with the local node;
one or more instructions for dividing the task into one or more sub-tasks;
one or more instructions for determining an optimum number of nodes x on which to process the one or more sub-tasks, wherein x is based at least partly on parameters relating to processing the sub-tasks at nodes accessible by the local node; and
one or more instructions for determining, if x is greater than 1, based on the value of x, making a determination as to whether to (1) execute the task at the local node with the processor unit, (2), distribute the task among two or more local node processors, (3) distribute the task to one or more of the distributed nodes that are accessible to the local node over a LAN, or (4) distribute the task to one or more of the distributed nodes that are accessible to the local node over a WAN; and
one or more instructions for implementing (1), (2), (3), or (4) with the local node according to the determination.
23. The system of claim 22, further comprising a memory coupled to the local node.
24. The system of claim 23, further comprising information relating to the parameters stored in the memory.
25. The system of claim 22 wherein the parameters include one or more of a number of available nodes, data transfer rates between the local node and one or more other nodes, round trip times between the local node and one or more other nodes, a number processor cycles for each sub-task, an amount of memory space required for each sub-task, an amount of storage space required for each sub-tasks, and a network bandwidth available for transmitting data related to the sub-tasks.
26. The system of claim 22, wherein the processor executable instructions further include an instruction for collecting the parameters from the distributed nodes.
27. The system of claim 26, further comprising a memory coupled to the local node, the processor executable instructions including one or more instructions for storing the parameters in the memory.
28. The system of claim 27, wherein the processor executable instructions include one or more instructions for updating the parameters in the memory.
29. The system of claim 22, wherein the processor executable instructions include one or more instructions for determining x based on a minimum effective execution time (EET).
30. The system of claim 29 wherein the EET is calculated by a formula of the type:
EET f ( x , TS , CS ) BW o + ET x + RS BW i + RTT ,
where ET represents an execution time of all tasks on one node,
TS represents a total size of data which is divided among the sub-tasks,
CS represents a constant sized data needed by each sub-task,
RS represents a total size of output data produced by the tasks
BWo, BWi, respectively represent outgoing and incoming bandwidths for all processor nodes, and
RTT represents a round-trip message time from the local node to a node accessible by the local node.
31. The system of claim 22 wherein x is based on an execution time of all tasks on one node (ET) and a round-trip message time (RTT) from the local node to a node accessible by the local node.
32. The system of claim 31 wherein the determination of x is based on a ratio of ET to RTT.
33. The system of claim 22 wherein x is based on one or more additional metrics.
34. The system of claim 33 wherein the additional metrics include a cost of processing the sub-tasks and/or one or more user-defined policies.
35. The system of claim 22 wherein each sub-tasks is characterized by a sub-task type, wherein the sub-task type distinguishes one type of sub-task from another and/or provides information for determining the optimum number of nodes x.
36. A processor readable storage medium having embodied therein processor executable instructions for implementing a method for processing of processor executable tasks, the instructions including:
one or more instructions for generating a task with a local node, wherein the local node includes one or more processor units;
one or more instructions for dividing the task into one or more sub-tasks;
one or more instructions for determining an optimum number of nodes x on which to process the one or more sub-tasks, wherein x is based at least partly on parameters relating to processing the sub-tasks at nodes accessible by the local node; and
one or more instructions for determining, if x is greater than 1, based on the value of x, whether to (1) execute the task at the local node with the processor unit, (2), distribute the task among two or more local node processors, (3) distribute the task to one or more of the distributed nodes that are accessible to the local node over a LAN, or (4) distribute the task to one or more of the distributed nodes that are accessible to the local node over a WAN; and
one or more instructions for implementing (1), (2), (3), or (4) with the local node according to the determination.
US12/786,250 2006-07-21 2010-05-24 Sub-task processor distribution scheduling Abandoned US20100235845A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US12/786,250 US20100235845A1 (en) 2006-07-21 2010-05-24 Sub-task processor distribution scheduling

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US11/459,301 US7730119B2 (en) 2006-07-21 2006-07-21 Sub-task processor distribution scheduling
US12/786,250 US20100235845A1 (en) 2006-07-21 2010-05-24 Sub-task processor distribution scheduling

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
US11/459,301 Continuation US7730119B2 (en) 2006-07-21 2006-07-21 Sub-task processor distribution scheduling

Publications (1)

Publication Number Publication Date
US20100235845A1 true US20100235845A1 (en) 2010-09-16

Family

ID=38972678

Family Applications (2)

Application Number Title Priority Date Filing Date
US11/459,301 Active 2029-03-02 US7730119B2 (en) 2006-07-21 2006-07-21 Sub-task processor distribution scheduling
US12/786,250 Abandoned US20100235845A1 (en) 2006-07-21 2010-05-24 Sub-task processor distribution scheduling

Family Applications Before (1)

Application Number Title Priority Date Filing Date
US11/459,301 Active 2029-03-02 US7730119B2 (en) 2006-07-21 2006-07-21 Sub-task processor distribution scheduling

Country Status (2)

Country Link
US (2) US7730119B2 (en)
JP (1) JP4421637B2 (en)

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110276870A1 (en) * 2007-10-29 2011-11-10 Microsoft Corporation Calculation of spreadsheet data
WO2013021223A1 (en) * 2011-08-05 2013-02-14 Intel Corporation Method and system for work partitioning between processors with work demand feedback
US8589939B2 (en) 2011-03-03 2013-11-19 International Business Machines Corporation Composite contention aware task scheduling
US20140109102A1 (en) * 2012-10-12 2014-04-17 Nvidia Corporation Technique for improving performance in multi-threaded processing units
US20140245319A1 (en) * 2013-02-27 2014-08-28 Greenbutton Limited Method for enabling an application to run on a cloud computing system
US9098329B1 (en) * 2011-08-30 2015-08-04 Amazon Technologies, Inc. Managing workflows
US20160004570A1 (en) * 2014-07-01 2016-01-07 Samsung Electronics Co., Ltd. Parallelization method and electronic device
US20170093731A1 (en) * 2015-09-25 2017-03-30 Intel Corporation Technologies for network round-trip time estimation
WO2017106619A1 (en) * 2015-12-18 2017-06-22 Interdigital Patent Holdings, Inc. Systems and methods associated with edge computing
CN108153589A (en) * 2016-12-06 2018-06-12 国际商业机器公司 For the method and system of the data processing in the processing arrangement of multithreading
CN108572863A (en) * 2017-03-13 2018-09-25 国家新闻出版广电总局广播电视卫星直播管理中心 Distributed task dispatching system and method
WO2021034024A1 (en) * 2019-08-16 2021-02-25 Samsung Electronics Co., Ltd. Electronic apparatus and method for controlling thereof
CN114020442A (en) * 2022-01-04 2022-02-08 连连宝(杭州)信息技术有限公司 Task processing method and device, electronic equipment and readable storage medium
US20220179687A1 (en) * 2020-12-03 2022-06-09 Fujitsu Limited Information processing apparatus and job scheduling method

Families Citing this family (45)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7730119B2 (en) * 2006-07-21 2010-06-01 Sony Computer Entertainment Inc. Sub-task processor distribution scheduling
US8112751B2 (en) * 2007-03-01 2012-02-07 Microsoft Corporation Executing tasks through multiple processors that process different portions of a replicable task
US20080320088A1 (en) * 2007-06-19 2008-12-25 Searete Llc, A Limited Liability Corporation Of The State Of Delaware Helping valuable message content pass apparent message filtering
WO2009080108A1 (en) * 2007-12-20 2009-07-02 Phonak Ag Hearing system with joint task scheduling
US9043801B2 (en) * 2008-01-15 2015-05-26 International Business Machines Corporation Two-tiered dynamic load balancing using sets of distributed thread pools
JP5218548B2 (en) * 2008-03-13 2013-06-26 富士通株式会社 Job allocation apparatus, control program and control method for job allocation apparatus
KR20100018289A (en) * 2008-08-06 2010-02-17 삼성전자주식회사 System and method for simulating multi-tasking performance
US8201176B2 (en) * 2008-08-06 2012-06-12 International Business Machines Corporation Detecting the starting and ending of a task when thread pooling is employed
US9910708B2 (en) * 2008-08-28 2018-03-06 Red Hat, Inc. Promotion of calculations to cloud-based computation resources
US8661129B2 (en) * 2008-11-05 2014-02-25 Xerox Corporation System and method for decentralized job scheduling and distributed execution in a network of multifunction devices
US9270783B2 (en) 2008-12-06 2016-02-23 International Business Machines Corporation System and method for photorealistic imaging workload distribution
US20130302644A1 (en) * 2009-02-20 2013-11-14 Nucor Corporation Hot rolled thin cast strip product and method for making the same
US8266289B2 (en) * 2009-04-23 2012-09-11 Microsoft Corporation Concurrent data processing in a distributed system
US20120204183A1 (en) * 2009-09-02 2012-08-09 Plurality Ltd. Associative distribution units for a high flowrate synchronizer/schedule
JP2011076513A (en) * 2009-10-01 2011-04-14 Olympus Corp Distributed processing system
CN102043673B (en) * 2009-10-21 2015-06-03 Sap欧洲公司 Calibration of resource allocation during parallel processing
US8504400B2 (en) * 2010-03-24 2013-08-06 International Business Machines Corporation Dynamically optimized distributed cloud computing-based business process management (BPM) system
US20110235592A1 (en) * 2010-03-26 2011-09-29 Qualcomm Incorporated Network resource leasing
EP2580633A4 (en) * 2010-06-11 2015-12-30 Optuminsight Inc Apparatuses and methods for parallel analytics
JPWO2012023175A1 (en) * 2010-08-17 2013-10-28 富士通株式会社 Parallel processing control program, information processing apparatus, and parallel processing control method
JP5598229B2 (en) * 2010-10-01 2014-10-01 富士ゼロックス株式会社 Job distributed processing system, information processing apparatus, and program
US8768748B2 (en) * 2010-12-17 2014-07-01 Verizon Patent And Licensing Inc. Resource manager
JP6138701B2 (en) * 2011-03-04 2017-05-31 富士通株式会社 Distributed calculation method and distributed calculation system
US8949305B1 (en) * 2011-07-15 2015-02-03 Scale Computing, Inc. Distributed dynamic system configuration
US10257109B2 (en) 2012-01-18 2019-04-09 International Business Machines Corporation Cloud-based content management system
US8902774B1 (en) * 2012-03-28 2014-12-02 Amdocs Software Systems Limited System, method, and computer program for distributing telecommunications resources
US8869157B2 (en) * 2012-06-21 2014-10-21 Breakingpoint Systems, Inc. Systems and methods for distributing tasks and/or processing recources in a system
US9367357B2 (en) * 2013-01-18 2016-06-14 Nec Corporation Simultaneous scheduling of processes and offloading computation on many-core coprocessors
US9152467B2 (en) * 2013-01-18 2015-10-06 Nec Laboratories America, Inc. Method for simultaneous scheduling of processes and offloading computation on many-core coprocessors
US10180880B2 (en) * 2013-07-31 2019-01-15 International Business Machines Corporation Adaptive rebuilding rates based on sampling and inference
US9565252B2 (en) * 2013-07-31 2017-02-07 International Business Machines Corporation Distributed storage network with replication control and methods for use therewith
WO2015071975A1 (en) * 2013-11-13 2015-05-21 株式会社日立製作所 Application- and data-distribution management method, application- and data-distribution management system, and storage medium
US10108686B2 (en) * 2014-02-19 2018-10-23 Snowflake Computing Inc. Implementation of semi-structured data as a first-class database element
CN103905337B (en) 2014-03-31 2018-01-23 华为技术有限公司 A kind of processing unit of Internet resources, method and system
GB2513779B (en) * 2014-08-14 2015-05-13 Imp Io Ltd A method and system for scalable job processing
US10509683B2 (en) * 2015-09-25 2019-12-17 Microsoft Technology Licensing, Llc Modeling resource usage for a job
US10476322B2 (en) * 2016-06-27 2019-11-12 Abb Schweiz Ag Electrical machine
US10078468B2 (en) 2016-08-18 2018-09-18 International Business Machines Corporation Slice migration in a dispersed storage network
JP6904169B2 (en) 2017-08-30 2021-07-14 富士通株式会社 Task deployment program, task deployment method, and task deployment device
US11354106B2 (en) 2018-02-23 2022-06-07 Idac Holdings, Inc. Device-initiated service deployment through mobile application packaging
US10606636B2 (en) 2018-07-30 2020-03-31 Lendingclub Corporation Automated predictions for not-yet-completed jobs
US10866837B2 (en) 2018-07-30 2020-12-15 Lendingclub Corporation Distributed job framework and task queue
CN110532077B (en) * 2019-08-22 2021-12-07 腾讯科技(深圳)有限公司 Task processing method and device and storage medium
CN112395085B (en) * 2020-11-05 2022-10-25 深圳市中博科创信息技术有限公司 HDFS-based distributed relational database scheduling method
CN113434273B (en) * 2021-06-29 2022-12-23 平安科技(深圳)有限公司 Data processing method, device, system and storage medium

Citations (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5031089A (en) * 1988-12-30 1991-07-09 United States Of America As Represented By The Administrator, National Aeronautics And Space Administration Dynamic resource allocation scheme for distributed heterogeneous computer systems
US5414849A (en) * 1992-10-30 1995-05-09 Hitachi, Ltd. Evaluating method of data division patterns and a program execution time for a distributed memory parallel computer system, and parallel program producing method using such an evaluating method
US6009455A (en) * 1998-04-20 1999-12-28 Doyle; John F. Distributed computation utilizing idle networked computers
US6112225A (en) * 1998-03-30 2000-08-29 International Business Machines Corporation Task distribution processing system and the method for subscribing computers to perform computing tasks during idle time
US20030005068A1 (en) * 2000-12-28 2003-01-02 Nickel Ronald H. System and method for creating a virtual supercomputer using computers working collaboratively in parallel and uses for the same
US20030177240A1 (en) * 2001-12-04 2003-09-18 Powerllel Corporation Parallel computing system, method and architecture
US20030193683A1 (en) * 1998-08-27 2003-10-16 Margaret Motamed Printing method and apparatus having multiple raster image processors
US20030237084A1 (en) * 2002-06-20 2003-12-25 Steven Neiman System and method for dividing computations
US20050131893A1 (en) * 2003-12-15 2005-06-16 Sap Aktiengesellschaft Database early parallelism method and system
US20060064689A1 (en) * 2004-08-31 2006-03-23 Chao Zhang Systems and methods for assigning tasks to derived timers of various resolutions in real-time systems to maximize timer usage
US20060070078A1 (en) * 2004-08-23 2006-03-30 Dweck Jay S Systems and methods to allocate application tasks to a pool of processing machines
US20070088828A1 (en) * 2005-10-18 2007-04-19 International Business Machines Corporation System, method and program product for executing an application
US20070288638A1 (en) * 2006-04-03 2007-12-13 British Columbia, University Of Methods and distributed systems for data location and delivery
US20080021987A1 (en) * 2006-07-21 2008-01-24 Sony Computer Entertainment Inc. Sub-task processor distribution scheduling
US7346906B2 (en) * 2002-07-09 2008-03-18 International Business Machines Corporation Workload management in a computing environment
US7376693B2 (en) * 2002-02-08 2008-05-20 Jp Morgan Chase & Company System architecture for distributed computing and method of using the system
US7467180B2 (en) * 2003-05-29 2008-12-16 International Business Machines Corporation Automatically segmenting and populating a distributed computing problem
US8463971B2 (en) * 2005-08-22 2013-06-11 Oracle America Inc. Approach for distributing interrupts from high-interrupt load devices

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2800884B2 (en) 1995-10-27 1998-09-21 日本電気株式会社 Semiconductor device having lateral DSA power MOSFET
JPH1139271A (en) 1997-07-15 1999-02-12 Toshiba Corp Library system and method for assigning library
JP2001325041A (en) 2000-05-12 2001-11-22 Toyo Eng Corp Method for utilizing computer resource and system for the same
JP2004038226A (en) 2002-06-28 2004-02-05 Hitachi Ltd Pc cluster and its intermediate software

Patent Citations (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5031089A (en) * 1988-12-30 1991-07-09 United States Of America As Represented By The Administrator, National Aeronautics And Space Administration Dynamic resource allocation scheme for distributed heterogeneous computer systems
US5414849A (en) * 1992-10-30 1995-05-09 Hitachi, Ltd. Evaluating method of data division patterns and a program execution time for a distributed memory parallel computer system, and parallel program producing method using such an evaluating method
US6112225A (en) * 1998-03-30 2000-08-29 International Business Machines Corporation Task distribution processing system and the method for subscribing computers to perform computing tasks during idle time
US6009455A (en) * 1998-04-20 1999-12-28 Doyle; John F. Distributed computation utilizing idle networked computers
US20030193683A1 (en) * 1998-08-27 2003-10-16 Margaret Motamed Printing method and apparatus having multiple raster image processors
US20030005068A1 (en) * 2000-12-28 2003-01-02 Nickel Ronald H. System and method for creating a virtual supercomputer using computers working collaboratively in parallel and uses for the same
US20030177240A1 (en) * 2001-12-04 2003-09-18 Powerllel Corporation Parallel computing system, method and architecture
US7376693B2 (en) * 2002-02-08 2008-05-20 Jp Morgan Chase & Company System architecture for distributed computing and method of using the system
US20070260669A1 (en) * 2002-06-20 2007-11-08 Steven Neiman Method for dividing computations
US20030237084A1 (en) * 2002-06-20 2003-12-25 Steven Neiman System and method for dividing computations
US7346906B2 (en) * 2002-07-09 2008-03-18 International Business Machines Corporation Workload management in a computing environment
US7467180B2 (en) * 2003-05-29 2008-12-16 International Business Machines Corporation Automatically segmenting and populating a distributed computing problem
US20050131893A1 (en) * 2003-12-15 2005-06-16 Sap Aktiengesellschaft Database early parallelism method and system
US20060070078A1 (en) * 2004-08-23 2006-03-30 Dweck Jay S Systems and methods to allocate application tasks to a pool of processing machines
US20060064689A1 (en) * 2004-08-31 2006-03-23 Chao Zhang Systems and methods for assigning tasks to derived timers of various resolutions in real-time systems to maximize timer usage
US8463971B2 (en) * 2005-08-22 2013-06-11 Oracle America Inc. Approach for distributing interrupts from high-interrupt load devices
US20070088828A1 (en) * 2005-10-18 2007-04-19 International Business Machines Corporation System, method and program product for executing an application
US20070288638A1 (en) * 2006-04-03 2007-12-13 British Columbia, University Of Methods and distributed systems for data location and delivery
US20080021987A1 (en) * 2006-07-21 2008-01-24 Sony Computer Entertainment Inc. Sub-task processor distribution scheduling

Cited By (27)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110276870A1 (en) * 2007-10-29 2011-11-10 Microsoft Corporation Calculation of spreadsheet data
US8589938B2 (en) 2011-03-03 2013-11-19 International Business Machines Corporation Composite contention aware task scheduling
US8589939B2 (en) 2011-03-03 2013-11-19 International Business Machines Corporation Composite contention aware task scheduling
US9262230B2 (en) 2011-08-05 2016-02-16 Intel Corporation Method and system for work partitioning between processors with work demand feedback
CN103748559A (en) * 2011-08-05 2014-04-23 英特尔公司 Method and system for work partitioning between processors with work demand feedback
EP2740034A4 (en) * 2011-08-05 2016-06-22 Intel Corp Method and system for work partitioning between processors with work demand feedback
WO2013021223A1 (en) * 2011-08-05 2013-02-14 Intel Corporation Method and system for work partitioning between processors with work demand feedback
US9098329B1 (en) * 2011-08-30 2015-08-04 Amazon Technologies, Inc. Managing workflows
US9483747B2 (en) * 2011-08-30 2016-11-01 Amazon Technologies, Inc. Managing workflows
US20140109102A1 (en) * 2012-10-12 2014-04-17 Nvidia Corporation Technique for improving performance in multi-threaded processing units
US10095526B2 (en) * 2012-10-12 2018-10-09 Nvidia Corporation Technique for improving performance in multi-threaded processing units
US20140245319A1 (en) * 2013-02-27 2014-08-28 Greenbutton Limited Method for enabling an application to run on a cloud computing system
US20160004570A1 (en) * 2014-07-01 2016-01-07 Samsung Electronics Co., Ltd. Parallelization method and electronic device
US9727382B2 (en) * 2014-07-01 2017-08-08 Samsung Electronics Co., Ltd. Parallelization method and electronic device based on profiling information
US20170093731A1 (en) * 2015-09-25 2017-03-30 Intel Corporation Technologies for network round-trip time estimation
US10554568B2 (en) * 2015-09-25 2020-02-04 Intel Corporation Technologies for network round-trip time estimation
WO2017106619A1 (en) * 2015-12-18 2017-06-22 Interdigital Patent Holdings, Inc. Systems and methods associated with edge computing
CN108153589A (en) * 2016-12-06 2018-06-12 国际商业机器公司 For the method and system of the data processing in the processing arrangement of multithreading
US10387207B2 (en) * 2016-12-06 2019-08-20 International Business Machines Corporation Data processing
US10394609B2 (en) * 2016-12-06 2019-08-27 International Business Machines Corporation Data processing
US10915368B2 (en) * 2016-12-06 2021-02-09 International Business Machines Corporation Data processing
US11036558B2 (en) * 2016-12-06 2021-06-15 International Business Machines Corporation Data processing
CN108572863A (en) * 2017-03-13 2018-09-25 国家新闻出版广电总局广播电视卫星直播管理中心 Distributed task dispatching system and method
WO2021034024A1 (en) * 2019-08-16 2021-02-25 Samsung Electronics Co., Ltd. Electronic apparatus and method for controlling thereof
US11609793B2 (en) 2019-08-16 2023-03-21 Samsung Electronics Co., Ltd. Electronic apparatus determining GPUs in a network for effective data learning and method for controlling thereof
US20220179687A1 (en) * 2020-12-03 2022-06-09 Fujitsu Limited Information processing apparatus and job scheduling method
CN114020442A (en) * 2022-01-04 2022-02-08 连连宝(杭州)信息技术有限公司 Task processing method and device, electronic equipment and readable storage medium

Also Published As

Publication number Publication date
JP4421637B2 (en) 2010-02-24
US20080021987A1 (en) 2008-01-24
JP2008027442A (en) 2008-02-07
US7730119B2 (en) 2010-06-01

Similar Documents

Publication Publication Date Title
US7730119B2 (en) Sub-task processor distribution scheduling
Purohit et al. Improving online algorithms via ML predictions
US11010188B1 (en) Simulated data object storage using on-demand computation of data objects
Nayak et al. Deadline sensitive lease scheduling in cloud computing environment using AHP
US7793290B2 (en) Grip application acceleration by executing grid application based on application usage history prior to user request for application execution
US20200137151A1 (en) Load balancing engine, client, distributed computing system, and load balancing method
US9201690B2 (en) Resource aware scheduling in a distributed computing environment
Di et al. Dynamic optimization of multiattribute resource allocation in self-organizing clouds
Doulamis et al. Fair scheduling algorithms in grids
US20150242234A1 (en) Realtime Optimization Of Compute Infrastructure In A Virtualized Environment
Kumar et al. Improving online algorithms via ML predictions
US7610381B2 (en) System and method for evaluating a capacity of a streaming media server for supporting a workload
US20200104177A1 (en) Resource allocation system, management device, method, and program
US10666570B2 (en) Computing infrastructure resource-workload management methods and apparatuses
US9588799B1 (en) Managing test services in a distributed production service environment
US7886055B1 (en) Allocating resources in a system having multiple tiers
Liu et al. QoS-oriented Web Service Framework by Mixed Programming Techniques.
CN115167984B (en) Virtual machine load balancing placement method considering physical resource competition based on cloud computing platform
Patel et al. An improved approach for load balancing among heterogeneous resources in computational grids
Tsenos et al. Amesos: A scalable and elastic framework for latency sensitive streaming pipelines
CN115421930A (en) Task processing method, system, device, equipment and computer readable storage medium
US7739686B2 (en) Grid managed application branching based on priority data representing a history of executing a task with secondary applications
Zedan et al. Load balancing based active monitoring load balancer in cloud computing
Babaioff et al. Truthful Online Scheduling of Cloud Workloads under Uncertainty
Simmons et al. Dynamic provisioning of resources in data centers

Legal Events

Date Code Title Description
STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION

AS Assignment

Owner name: SONY INTERACTIVE ENTERTAINMENT INC., JAPAN

Free format text: CHANGE OF NAME;ASSIGNOR:SONY COMPUTER ENTERTAINMENT INC.;REEL/FRAME:039239/0343

Effective date: 20160401