US20020124209A1 - Method and apparatus for saving data used in error analysis - Google Patents

Method and apparatus for saving data used in error analysis Download PDF

Info

Publication number
US20020124209A1
US20020124209A1 US09/798,169 US79816901A US2002124209A1 US 20020124209 A1 US20020124209 A1 US 20020124209A1 US 79816901 A US79816901 A US 79816901A US 2002124209 A1 US2002124209 A1 US 2002124209A1
Authority
US
United States
Prior art keywords
processing system
data processing
data
power independent
memory
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
US09/798,169
Other versions
US7010726B2 (en
Inventor
Robert Faust
Kevin Kehne
Sayileela Nulu
Gary Ruzek
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Twitter Inc
Original Assignee
International Business Machines Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by International Business Machines Corp filed Critical International Business Machines Corp
Priority to US09/798,169 priority Critical patent/US7010726B2/en
Assigned to INTERNATIONAL BUSINESS MACHINES CORPORATION reassignment INTERNATIONAL BUSINESS MACHINES CORPORATION ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: KEHNE, KEVIN GENE, FAUST, ROBERT ALLAN, NULU, SAYILEELA, RUZEK, GARY LEE
Publication of US20020124209A1 publication Critical patent/US20020124209A1/en
Application granted granted Critical
Publication of US7010726B2 publication Critical patent/US7010726B2/en
Assigned to TWITTER, INC. reassignment TWITTER, INC. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: INTERNATIONAL BUSINESS MACHINES CORPORATION
Assigned to MORGAN STANLEY SENIOR FUNDING, INC. reassignment MORGAN STANLEY SENIOR FUNDING, INC. SECURITY INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: TWITTER, INC.
Assigned to MORGAN STANLEY SENIOR FUNDING, INC. reassignment MORGAN STANLEY SENIOR FUNDING, INC. SECURITY INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: TWITTER, INC.
Assigned to MORGAN STANLEY SENIOR FUNDING, INC. reassignment MORGAN STANLEY SENIOR FUNDING, INC. SECURITY INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: TWITTER, INC.
Adjusted expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0766Error or fault reporting or storing
    • G06F11/0787Storage of error reports, e.g. persistent data storage, storage using memory protection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0706Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment
    • G06F11/0712Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment in a virtual computing platform, e.g. logically partitioned systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0766Error or fault reporting or storing
    • G06F11/0778Dumping, i.e. gathering error/state information after a fault for later diagnosis

Definitions

  • the present invention relates generally to an improved data processing system and in particular to a method and apparatus for managing data. Still more particularly, the present invention provides a method and apparatus for saving data used in error analysis within a data processing system.
  • a logical partitioning option (LPAR) within a data processing system allows multiple copies of a single operating system (OS) or multiple heterogeneous operating systems to be simultaneously run on a single data processing system platform.
  • a partition, within which an operating system image runs, is assigned a non-overlapping sub-set of the platform's resources.
  • These platform allocable resources include one or more architecturally distinct processors with their interrupt management area, regions of system memory, and I/O adapter bus slots.
  • the partition's resources are represented by its own open firmware device tree to the OS image.
  • Each distinct OS or image of an OS running within the platform are protected from each other such that software errors on one logical partition cannot affect the correct operation of any of the other partitions. This is provided by allocating a disjoint set of platform resources to be directly managed by each OS image and by providing mechanisms for ensuring that the various images cannot control any resources that have not been allocated to it. Furthermore, software errors in the control of an OS's allocated resources are prevented from affecting the resources of any other image. Thus, each image of the OS (or each different OS) directly controls a distinct set of allocable resources within the platform.
  • the configuration of these different partitions are typically managed through a terminal, such as a hardware system console (HSC).
  • HSC hardware system console
  • These terminals use objects, also referred to as profiles that are defined and modified in HSC.
  • the profiles are used to configure LPARs within the data processing system.
  • Multiple HSCs may be present and used for maintaining and configuring LPARs in the data processing system.
  • These profiles used to configure the data processing system in LPARs are often required to be accessible to any HSC that is in communication with the data processing system. Maintaining profiles between these HSCs are often difficult and require processes for maintaining synchronization of the profiles at each HSC.
  • LPARs are often assigned processors and other hardware. For example, one LPAR may be assigned two processors, while another LPAR may be assigned three processors.
  • SP service processor
  • This type of data is also referred to as dump data.
  • This dump data and other information, such as error logs are typically stored in a non-volatile random access memory (NVRAM) for retrieval at a later time.
  • NVRAM non-volatile random access memory
  • This type of memory has a number of limitations. As multi processor systems have grown larger, the amount of data that is stored has out grown the available NVRAM space. Additionally, this type of memory also may be easily corrupted because many software components may access this memory during normal operation. Further, the loss of battery power will cause the contents of the memory to be lost.
  • the present invention provides a method, apparatus, and computer implemented instructions for saving data in a logically partitioned data processing system.
  • An error is detected in the logically partitioned data processing system.
  • Data needed for error analysis of the error is saved in a power independent memory for a service processor.
  • FIG. 1 is a pictorial representation of a distributed data processing system in which the present invention may be implemented
  • FIG. 2 is a block diagram of a data processing system in accordance with the present invention.
  • FIG. 3 is a block diagram of a data processing system, which may be implemented as a logically partitioned server;
  • FIG. 4 is a diagram illustrating a service processor and a storage device in accordance with a preferred embodiment of the present invention
  • FIG. 5 is a flowchart of a process used for saving data in accordance with a preferred embodiment of the present invention.
  • FIG. 6 is a flowchart of a process used for analyzing stored data in accordance with a preferred embodiment of the present invention.
  • FIG. 1 a pictorial representation of a distributed data processing system is depicted in which the present invention may be implemented.
  • Distributed data processing system 100 is a network of computers in which the present invention may be implemented.
  • Distributed data processing system 100 contains network 102 , which is the medium used to provide communications links between various devices and computers connected within distributed data processing system 100 .
  • Network 102 may include permanent connections, such as wire or fiber optic cables, or temporary connections made through telephone connections.
  • server 104 is connected to hardware system console 150 .
  • Server 104 is also connected to network 102 , along with storage unit 106 .
  • clients 108 , 110 and 112 are also connected to network 102 .
  • These clients, 108 , 110 and 112 may be, for example, personal computers or network computers.
  • a network computer is any computer coupled to a network that receives a program or other application from another computer coupled to the network 102 .
  • server 104 is a logically partitioned platform and provides data, such as boot files, operating system images and applications, to clients 108 - 112 .
  • Hardware system console 150 may be a laptop computer and is used to display messages to an operator from each operating system image running on server 104 , as well as to send input information, received from the operator, to server 104 .
  • Clients 108 , 110 and 112 are clients to server 104 .
  • Distributed data processing system 100 may include additional servers, clients, and other devices not shown.
  • Distributed data processing system 100 also includes printers 114 , 116 and 118 .
  • a client, such as client 110 may print directly to printer 114 .
  • Clients, such as client 108 and client 112 do not have directly attached printers.
  • These clients may print to printer 116 , which is attached to server 104 , or to printer 118 , which is a network printer that does not require connection to a computer for printing documents.
  • Client 110 alternatively, may print to printer 116 or printer 118 , depending on the printer type and the document requirements.
  • distributed data processing system 100 is the Internet, with network 102 representing a worldwide collection of networks and gateways that use the TCP/IP suite of protocols to communicate with one another.
  • network 102 representing a worldwide collection of networks and gateways that use the TCP/IP suite of protocols to communicate with one another.
  • network 102 representing a worldwide collection of networks and gateways that use the TCP/IP suite of protocols to communicate with one another.
  • At the heart of the Internet is a backbone of high-speed data communication lines between major nodes or host computers consisting of thousands of commercial, government, education, and other computer systems that route data and messages.
  • distributed data processing system 100 also may be implemented as a number of different types of networks, such as, for example, an intranet or a local area network.
  • FIG. 1 is intended as an example and not as an architectural limitation for the processes of the present invention.
  • Data processing system 200 is an example of a hardware system console, such as hardware system console 150 depicted in FIG. 1.
  • Data processing system 200 employs a peripheral component interconnect (PCI) local bus architecture.
  • PCI peripheral component interconnect
  • Processor 202 and main memory 204 are connected to PCI local bus 206 through PCI bridge 208 .
  • PCI bridge 208 may also include an integrated memory controller and cache memory for processor 202 . Additional connections to PCI local bus 206 may be made through direct component interconnection or through add-in boards.
  • local area network (LAN) adapter 210 SCSI host bus adapter 212 , and expansion bus interface 214 are connected to PCI local bus 206 by direct component connection.
  • audio adapter 216 graphics adapter 218 , and audio/video adapter (A/V) 219 are connected to PCI local bus 206 by add-in boards inserted into expansion slots.
  • Expansion bus interface 214 provides a connection for a keyboard and mouse adapter 220 , modem 222 , and additional memory 224 .
  • SCSI host bus adapter 212 provides a connection for hard disk drive 226 , tape drive 228 , CD-ROM drive 230 , and digital video disc read only memory drive (DVD-ROM) 232 .
  • Typical PCI local bus implementations will support three or four PCI expansion slots or add-in connectors.
  • An operating system runs on processor 202 and is used to coordinate and provide control of various components within data processing system 200 in FIG. 2.
  • the operating system may be a commercially available operating system, such as AIX, which is available from International Business Machines Corporation. “AIX is a trademark of International Business Machines Corporation.
  • An object-oriented programming system, such as Java may run in conjunction with the operating system, providing calls to the operating system from Java programs or applications executing on data processing system 200 . Instructions for the operating system, the object-oriented operating system, and applications or programs are located on a storage device, such as hard disk drive 226 , and may be loaded into main memory 204 for execution by processor 202 .
  • FIG. 2 may vary depending on the implementation.
  • other peripheral devices such as optical disk drives and the like, may be used in addition to or in place of the hardware depicted in FIG. 2.
  • the depicted example is not meant to imply architectural limitations with respect to the present invention.
  • the processes of the present invention may be applied to multiprocessor data processing systems.
  • Data processing system 300 may be a symmetric multiprocessor (SMP) system including a plurality of processors 301 , 302 , 303 , and 304 connected to system bus 306 .
  • SMP symmetric multiprocessor
  • data processing system 300 may be an IBM pSeries eServer, a product of International Business Machines Corporation in Armonk, N.Y.
  • a single processor system may be employed.
  • memory controller/cache 308 Also connected to system bus 306 is memory controller/cache 308 , which provides an interface to a plurality of local memories 360 - 363 .
  • I/O bus bridge 310 is connected to system bus 306 and provides an interface to I/O bus 312 .
  • Memory controller/cache 308 and I/O bus bridge 310 may be integrated as depicted.
  • Data processing system 300 is a logically partitioned data processing system.
  • data processing system 300 may have multiple heterogeneous operating systems (or multiple instances of a single operating system) running simultaneously. Each of these multiple operating systems may have any number of software programs executing within in it.
  • Data processing system 300 is logically partitioned such that different I/O adapters 320 - 321 , 328 - 329 , 336 - 337 , and 346 - 347 may be assigned to different logical partitions.
  • processor 301 memory 360 , and I/O adapters 320 , 328 , and 329 may be assigned to logical partition P 1 ; processors 302 - 303 , memory 361 , and I/O adapters 321 and 337 may be assigned to partition P 2 ; and processor 304 , memories 362 - 363 , and I/O adapters 336 and 346 - 347 may be assigned to logical partition P 3 .
  • Each operating system executing within data processing system 300 is assigned to a different logical partition. Thus, each operating system executing within data processing system 300 may access only those I/O units that are within its logical partition. Thus, for example, one instance of the Advanced Interactive Executive (AIX) operating system may be executing within partition P 1 , a second instance (image) of the AIX operating system may be executing within partition P 2 , and a LINUX operating system may be operating within logical partition P 3 .
  • AIX Advanced Interactive Executive
  • LINUX is a version of UNIX and is an open source software operating system.
  • Peripheral component interconnect (PCI) Host bridge 314 connected to I/O bus 312 provides an interface to PCI local bus 315 .
  • a number of terminal bridges 316 - 317 may be connected to PCI bus 315 .
  • Typical PCI bus implementations will support four terminal bridges for providing expansion slots or add-in connectors.
  • Each of terminal bridges 316 - 317 is connected to a PCI I/O Adapter 320 - 321 through a PCI Bus 318 - 319 .
  • Each I/O Adapter 320 - 321 provides an interface between data processing system 300 and input/output devices such as, for example, other network computers, which are clients to server 300 .
  • each terminal bridge 316 - 317 is configured to prevent the propagation of errors up into the PCI Host Bridge 314 and into higher levels of data processing system 300 . By doing so, an error received by any of terminal bridges 316 - 317 is isolated from the shared buses 315 and 312 of the other I/O adapters 321 , 328 - 329 , and 336 - 337 that may be in different partitions. Therefore, an error occurring within an I/O device in one partition is not “seen” by the operating system of another partition. Thus, the integrity of the operating system in one partition is not affected by an error occurring in another logical partition. Without such isolation of errors, an error occurring within an I/O device of one partition may cause the operating systems or application programs of another partition to cease to operate or to cease to operate correctly.
  • Additional PCI host bridges 322 , 330 , and 340 provide interfaces for additional PCI buses 323 , 331 , and 341 .
  • Each of additional PCI buses 323 , 331 , and 341 are connected to a plurality of terminal bridges 324 - 325 , 332 - 333 , and 342 - 343 , which are each connected to a PCI I/O adapter 328 - 329 , 336 - 337 , and 346 - 347 by a PCI bus 326 - 327 , 334 - 335 , and 344 - 345 .
  • additional I/O devices such as, for example, modems or network adapters may be supported through each of PCI I/O adapters 328 - 329 , 336 - 337 , and 346 - 347 .
  • server 300 allows connections to multiple network computers.
  • a memory mapped graphics adapter 348 and hard disk 350 may also be connected to I/O bus 312 as depicted, either directly or indirectly.
  • Hard disk 350 may be logically partitioned between various partitions without the need for additional hard disks. However, additional hard disks may be utilized if desired.
  • HSC hardware system consoles
  • service processor 366 nonvolatile random access memory (NVRAM) 368
  • I/O input/output
  • Service processor 366 also includes a storage device, which is not dependent on a power source, such as a battery to maintain the contents of the storage device.
  • the storage device is a flash ram 372 , which is a programmable and reusable chip that holds its content until erased and reprogrammed (reflashed). Flash rams have a life span of about 100,000 write cycles.
  • FIG. 3 may vary.
  • other peripheral devices such as optical disk drives and the like, also may be used in addition to or in place of the hardware depicted.
  • the depicted example is not meant to imply architectural limitations with respect to the present invention.
  • the present invention provides an improved method, apparatus, and computer implemented instructions for saving data, such as system dump data and error logs.
  • dump data includes system registers, stack contents, and origin of fault including partition and code instruction.
  • This dump data may result from a firmware detected fault, as opposed to a fault detected by the operating system or a service processor. Knowing this fact may be valuable because many computer manufacturers desire to dump errors detected by the basic input/output system (BIOS), as BIOS functions increase in complexity.
  • BIOS basic input/output system
  • the mechanism of the present invention saves the data to a flash area.
  • the data is stored in a portion of the memory unused in firmware in these examples.
  • the present invention takes advantage of extra, unused space as well as the reliability of this type of memory in preventing the loss of critical data.
  • the mechanism of the present invention using any type of flash memory.
  • a flash memory is a memory device that can be rewritten and hold its content without power. Flash memory is widely used for digital film and for data and programs for communications and industrial products as well as a variety of handheld devices. Flash chips generally have life spans from 100K to 300K write cycles.
  • service processor 400 includes a power independent memory 402 .
  • power independent memory 402 takes the form of a flash memory.
  • other types of power independent memories may be used, such as an EEPROM.
  • Dump data 404 is stored within a system flash area within power independent 402 .
  • the portion of system flash area 406 is that unused by other firmware.
  • NVRAMs are often large compared to those of NVRAM.
  • a flash RAM may range in size to as much as 128 megabytes.
  • NVRAMs are typically in the neighborhood of 512 Kbytes in size.
  • FIG. 5 a flowchart of a process used for saving data is depicted in accordance with a preferred embodiment of the present invention.
  • the process illustrated in FIG. 5 may be implemented in the form of computer instructions executed by the host processor running firmware instructions.
  • these instructions are called the Hypervisor, which is a partition manager.
  • the process begins by determining whether a fault state is detected (step 500 ). If a fault state is detected, dump data is collected (step 502 ). Next, the dump data is modified to appear as a firmware lid (step 504 ). This modification adds a header to the data so that the data appears to be a flashable module. Then, update functions are called to store the dump data (step 506 ) with the process terminating thereafter. In the depicted example, the data is flashed into the flash memory by the same software that manages firmware upgrades.
  • step 500 if a fault state is not detected, the process begins again.
  • FIG. 6 a flowchart of a process used for analyzing stored data is depicted in accordance with a preferred embodiment of the present invention.
  • the process illustrated in FIG. 6 may be implemented in the form of computer instructions residing in firmware and executed by the host processor.
  • the process begins by retrieving dump data from the power independent memory (step 600 ).
  • the host processor executing firmware instructions, makes a request of the flash memory manager for a pointer to the dump data, then the dump data is simply read from that location.
  • the flash memory manager may be implemented using currently available instructions used to access flash memories.
  • an error analysis is performed (step 602 ) with the process terminating thereafter. This error analysis may be performed using any presently available analysis programs.
  • the present invention provides an improved method, apparatus, and computer implemented instructions for saving data used in error analysis.
  • the mechanism of the present invention stores the data in a power independent memory associated with the service processor.
  • the memory is a flash RAM. In this fashion, extra, unused space is employed as well as increased reliability in preventing the loss of critical data.

Abstract

A method, apparatus, and computer implemented instructions for saving data in a logically partitioned data processing system. An error is detected in the logically partitioned data processing system. Data needed for error analysis of the error is saved in a power independent memory associated with a service processor.

Description

    BACKGROUND OF THE INVENTION
  • 1. Technical Field [0001]
  • The present invention relates generally to an improved data processing system and in particular to a method and apparatus for managing data. Still more particularly, the present invention provides a method and apparatus for saving data used in error analysis within a data processing system. [0002]
  • 2. Description of Related Art [0003]
  • A logical partitioning option (LPAR) within a data processing system (platform) allows multiple copies of a single operating system (OS) or multiple heterogeneous operating systems to be simultaneously run on a single data processing system platform. A partition, within which an operating system image runs, is assigned a non-overlapping sub-set of the platform's resources. These platform allocable resources include one or more architecturally distinct processors with their interrupt management area, regions of system memory, and I/O adapter bus slots. The partition's resources are represented by its own open firmware device tree to the OS image. [0004]
  • Each distinct OS or image of an OS running within the platform are protected from each other such that software errors on one logical partition cannot affect the correct operation of any of the other partitions. This is provided by allocating a disjoint set of platform resources to be directly managed by each OS image and by providing mechanisms for ensuring that the various images cannot control any resources that have not been allocated to it. Furthermore, software errors in the control of an OS's allocated resources are prevented from affecting the resources of any other image. Thus, each image of the OS (or each different OS) directly controls a distinct set of allocable resources within the platform. [0005]
  • The configuration of these different partitions are typically managed through a terminal, such as a hardware system console (HSC). These terminals use objects, also referred to as profiles that are defined and modified in HSC. The profiles are used to configure LPARs within the data processing system. Multiple HSCs may be present and used for maintaining and configuring LPARs in the data processing system. These profiles used to configure the data processing system in LPARs are often required to be accessible to any HSC that is in communication with the data processing system. Maintaining profiles between these HSCs are often difficult and require processes for maintaining synchronization of the profiles at each HSC. [0006]
  • These LPARs are often assigned processors and other hardware. For example, one LPAR may be assigned two processors, while another LPAR may be assigned three processors. If an error occurs, a service processor (SP), separate from the other five processors, will store data gathered from the processors and other hardware for analysis. This type of data is also referred to as dump data. This dump data and other information, such as error logs are typically stored in a non-volatile random access memory (NVRAM) for retrieval at a later time. This type of memory, however, has a number of limitations. As multi processor systems have grown larger, the amount of data that is stored has out grown the available NVRAM space. Additionally, this type of memory also may be easily corrupted because many software components may access this memory during normal operation. Further, the loss of battery power will cause the contents of the memory to be lost. [0007]
  • Therefore, it would be advantageous to have improved method, apparatus, and computer implemented instructions for saving data used for error analysis. [0008]
  • SUMMARY OF THE INVENTION
  • The present invention provides a method, apparatus, and computer implemented instructions for saving data in a logically partitioned data processing system. An error is detected in the logically partitioned data processing system. Data needed for error analysis of the error is saved in a power independent memory for a service processor. [0009]
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • The novel features believed characteristic of the invention are set forth in the appended claims. The invention itself, however, as well as a preferred mode of use, further objectives and advantages thereof, will best be understood by reference to the following detailed description of an illustrative embodiment when read in conjunction with the accompanying drawings, wherein: [0010]
  • FIG. 1 is a pictorial representation of a distributed data processing system in which the present invention may be implemented; [0011]
  • FIG. 2 is a block diagram of a data processing system in accordance with the present invention; [0012]
  • FIG. 3 is a block diagram of a data processing system, which may be implemented as a logically partitioned server; [0013]
  • FIG. 4 is a diagram illustrating a service processor and a storage device in accordance with a preferred embodiment of the present invention; [0014]
  • FIG. 5 is a flowchart of a process used for saving data in accordance with a preferred embodiment of the present invention; and [0015]
  • FIG. 6 is a flowchart of a process used for analyzing stored data in accordance with a preferred embodiment of the present invention. [0016]
  • DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT
  • With reference now to the figures, and in particular with reference to FIG. 1, a pictorial representation of a distributed data processing system is depicted in which the present invention may be implemented. [0017]
  • Distributed [0018] data processing system 100 is a network of computers in which the present invention may be implemented. Distributed data processing system 100 contains network 102, which is the medium used to provide communications links between various devices and computers connected within distributed data processing system 100. Network 102 may include permanent connections, such as wire or fiber optic cables, or temporary connections made through telephone connections.
  • In the depicted example, [0019] server 104 is connected to hardware system console 150. Server 104 is also connected to network 102, along with storage unit 106. In addition, clients 108, 110 and 112 are also connected to network 102. These clients, 108, 110 and 112, may be, for example, personal computers or network computers. For purposes of this application, a network computer is any computer coupled to a network that receives a program or other application from another computer coupled to the network 102. In the depicted example, server 104 is a logically partitioned platform and provides data, such as boot files, operating system images and applications, to clients 108-112. Hardware system console 150 may be a laptop computer and is used to display messages to an operator from each operating system image running on server 104, as well as to send input information, received from the operator, to server 104. Clients 108, 110 and 112 are clients to server 104. Distributed data processing system 100 may include additional servers, clients, and other devices not shown. Distributed data processing system 100 also includes printers 114, 116 and 118. A client, such as client 110, may print directly to printer 114. Clients, such as client 108 and client 112, do not have directly attached printers. These clients may print to printer 116, which is attached to server 104, or to printer 118, which is a network printer that does not require connection to a computer for printing documents. Client 110, alternatively, may print to printer 116 or printer 118, depending on the printer type and the document requirements.
  • In the depicted example, distributed [0020] data processing system 100 is the Internet, with network 102 representing a worldwide collection of networks and gateways that use the TCP/IP suite of protocols to communicate with one another. At the heart of the Internet is a backbone of high-speed data communication lines between major nodes or host computers consisting of thousands of commercial, government, education, and other computer systems that route data and messages. Of course, distributed data processing system 100 also may be implemented as a number of different types of networks, such as, for example, an intranet or a local area network.
  • FIG. 1 is intended as an example and not as an architectural limitation for the processes of the present invention. [0021]
  • With reference now to FIG. 2, a block diagram of a data processing system in accordance with the present invention is illustrated. [0022] Data processing system 200 is an example of a hardware system console, such as hardware system console 150 depicted in FIG. 1. Data processing system 200 employs a peripheral component interconnect (PCI) local bus architecture. Although the depicted example employs a PCI bus, other bus architectures, such as Micro Channel and ISA, may be used. Processor 202 and main memory 204 are connected to PCI local bus 206 through PCI bridge 208. PCI bridge 208 may also include an integrated memory controller and cache memory for processor 202. Additional connections to PCI local bus 206 may be made through direct component interconnection or through add-in boards. In the depicted example, local area network (LAN) adapter 210, SCSI host bus adapter 212, and expansion bus interface 214 are connected to PCI local bus 206 by direct component connection. In contrast, audio adapter 216, graphics adapter 218, and audio/video adapter (A/V) 219 are connected to PCI local bus 206 by add-in boards inserted into expansion slots. Expansion bus interface 214 provides a connection for a keyboard and mouse adapter 220, modem 222, and additional memory 224. In the depicted example, SCSI host bus adapter 212 provides a connection for hard disk drive 226, tape drive 228, CD-ROM drive 230, and digital video disc read only memory drive (DVD-ROM) 232. Typical PCI local bus implementations will support three or four PCI expansion slots or add-in connectors.
  • An operating system runs on [0023] processor 202 and is used to coordinate and provide control of various components within data processing system 200 in FIG. 2. The operating system may be a commercially available operating system, such as AIX, which is available from International Business Machines Corporation. “AIX is a trademark of International Business Machines Corporation. An object-oriented programming system, such as Java, may run in conjunction with the operating system, providing calls to the operating system from Java programs or applications executing on data processing system 200. Instructions for the operating system, the object-oriented operating system, and applications or programs are located on a storage device, such as hard disk drive 226, and may be loaded into main memory 204 for execution by processor 202.
  • Those of ordinary skill in the art will appreciate that the hardware in FIG. 2 may vary depending on the implementation. For example, other peripheral devices, such as optical disk drives and the like, may be used in addition to or in place of the hardware depicted in FIG. 2. The depicted example is not meant to imply architectural limitations with respect to the present invention. For example, the processes of the present invention may be applied to multiprocessor data processing systems. [0024]
  • With reference now to FIG. 3, a block diagram of a data processing system, which may be implemented as a logically partitioned server, such as [0025] server 104 in FIG. 1, is depicted in accordance with the present invention. Data processing system 300 may be a symmetric multiprocessor (SMP) system including a plurality of processors 301, 302, 303, and 304 connected to system bus 306. For example, data processing system 300 may be an IBM pSeries eServer, a product of International Business Machines Corporation in Armonk, N.Y. Alternatively, a single processor system may be employed. Also connected to system bus 306 is memory controller/cache 308, which provides an interface to a plurality of local memories 360-363. I/O bus bridge 310 is connected to system bus 306 and provides an interface to I/O bus 312. Memory controller/cache 308 and I/O bus bridge 310 may be integrated as depicted.
  • [0026] Data processing system 300 is a logically partitioned data processing system. Thus, data processing system 300 may have multiple heterogeneous operating systems (or multiple instances of a single operating system) running simultaneously. Each of these multiple operating systems may have any number of software programs executing within in it. Data processing system 300 is logically partitioned such that different I/O adapters 320-321, 328-329, 336-337, and 346-347 may be assigned to different logical partitions.
  • Thus, for example, suppose [0027] data processing system 300 is divided into three logical partitions, P1, P2, and P3. Each of I/O adapters 320-321, 328-329, and 336-337, each of processors 301-304, and each of local memories 360-364 is assigned to one of the three partitions. For example, processor 301, memory 360, and I/ O adapters 320, 328, and 329 may be assigned to logical partition P1; processors 302-303, memory 361, and I/ O adapters 321 and 337 may be assigned to partition P2; and processor 304, memories 362-363, and I/O adapters 336 and 346-347 may be assigned to logical partition P3.
  • Each operating system executing within [0028] data processing system 300 is assigned to a different logical partition. Thus, each operating system executing within data processing system 300 may access only those I/O units that are within its logical partition. Thus, for example, one instance of the Advanced Interactive Executive (AIX) operating system may be executing within partition P1, a second instance (image) of the AIX operating system may be executing within partition P2, and a LINUX operating system may be operating within logical partition P3. LINUX is a version of UNIX and is an open source software operating system.
  • Peripheral component interconnect (PCI) [0029] Host bridge 314 connected to I/O bus 312 provides an interface to PCI local bus 315. A number of terminal bridges 316-317 may be connected to PCI bus 315. Typical PCI bus implementations will support four terminal bridges for providing expansion slots or add-in connectors. Each of terminal bridges 316-317 is connected to a PCI I/O Adapter 320-321 through a PCI Bus 318-319. Each I/O Adapter 320-321 provides an interface between data processing system 300 and input/output devices such as, for example, other network computers, which are clients to server 300. Only a single I/O adapter 320-321 may be connected to each terminal bridge 316-317. Each of terminal bridges 316-317 is configured to prevent the propagation of errors up into the PCI Host Bridge 314 and into higher levels of data processing system 300. By doing so, an error received by any of terminal bridges 316-317 is isolated from the shared buses 315 and 312 of the other I/O adapters 321, 328-329, and 336-337 that may be in different partitions. Therefore, an error occurring within an I/O device in one partition is not “seen” by the operating system of another partition. Thus, the integrity of the operating system in one partition is not affected by an error occurring in another logical partition. Without such isolation of errors, an error occurring within an I/O device of one partition may cause the operating systems or application programs of another partition to cease to operate or to cease to operate correctly.
  • Additional PCI host bridges [0030] 322, 330, and 340 provide interfaces for additional PCI buses 323, 331, and 341. Each of additional PCI buses 323, 331, and 341 are connected to a plurality of terminal bridges 324-325, 332-333, and 342-343, which are each connected to a PCI I/O adapter 328-329, 336-337, and 346-347 by a PCI bus 326-327, 334-335, and 344-345. Thus, additional I/O devices, such as, for example, modems or network adapters may be supported through each of PCI I/O adapters 328-329, 336-337, and 346-347. In this manner, server 300 allows connections to multiple network computers. A memory mapped graphics adapter 348 and hard disk 350 may also be connected to I/O bus 312 as depicted, either directly or indirectly. Hard disk 350 may be logically partitioned between various partitions without the need for additional hard disks. However, additional hard disks may be utilized if desired.
  • Management of logical partitions is achieved through terminals, such as hardware system consoles (HSC). This access is provided in these examples through [0031] service processor 366, nonvolatile random access memory (NVRAM) 368, and input/output (I/O) adapter 370, which may be implemented as a Universal Asynchronous Receiver Transmitter (UART). Service processor 366 also includes a storage device, which is not dependent on a power source, such as a battery to maintain the contents of the storage device. In this example, the storage device is a flash ram 372, which is a programmable and reusable chip that holds its content until erased and reprogrammed (reflashed). Flash rams have a life span of about 100,000 write cycles.
  • Those of ordinary skill in the art will appreciate that the hardware depicted in FIG. 3 may vary. For example, other peripheral devices, such as optical disk drives and the like, also may be used in addition to or in place of the hardware depicted. The depicted example is not meant to imply architectural limitations with respect to the present invention. [0032]
  • The present invention provides an improved method, apparatus, and computer implemented instructions for saving data, such as system dump data and error logs. In these examples, dump data includes system registers, stack contents, and origin of fault including partition and code instruction. This dump data may result from a firmware detected fault, as opposed to a fault detected by the operating system or a service processor. Knowing this fact may be valuable because many computer manufacturers desire to dump errors detected by the basic input/output system (BIOS), as BIOS functions increase in complexity. [0033]
  • The mechanism of the present invention saves the data to a flash area. The data is stored in a portion of the memory unused in firmware in these examples. In this manner, the present invention takes advantage of extra, unused space as well as the reliability of this type of memory in preventing the loss of critical data. The mechanism of the present invention using any type of flash memory. A flash memory is a memory device that can be rewritten and hold its content without power. Flash memory is widely used for digital film and for data and programs for communications and industrial products as well as a variety of handheld devices. Flash chips generally have life spans from 100K to 300K write cycles. [0034]
  • Turning next to FIG. 4, a diagram illustrating a service processor and a storage device is depicted in accordance with a preferred embodiment of the present invention. In this example, [0035] service processor 400 includes a power independent memory 402. As depicted, power independent memory 402 takes the form of a flash memory. Of course, other types of power independent memories may be used, such as an EEPROM. Dump data 404 is stored within a system flash area within power independent 402. In this example, the portion of system flash area 406 is that unused by other firmware. These types of memories are often large compared to those of NVRAM. For example, a flash RAM may range in size to as much as 128 megabytes. In contrast, NVRAMs are typically in the neighborhood of 512 Kbytes in size.
  • Turning next to FIG. 5, a flowchart of a process used for saving data is depicted in accordance with a preferred embodiment of the present invention. The process illustrated in FIG. 5 may be implemented in the form of computer instructions executed by the host processor running firmware instructions. In this case, these instructions are called the Hypervisor, which is a partition manager. [0036]
  • The process begins by determining whether a fault state is detected (step [0037] 500). If a fault state is detected, dump data is collected (step 502). Next, the dump data is modified to appear as a firmware lid (step 504). This modification adds a header to the data so that the data appears to be a flashable module. Then, update functions are called to store the dump data (step 506) with the process terminating thereafter. In the depicted example, the data is flashed into the flash memory by the same software that manages firmware upgrades.
  • With reference again to step [0038] 500, if a fault state is not detected, the process begins again.
  • Turning next to FIG. 6, a flowchart of a process used for analyzing stored data is depicted in accordance with a preferred embodiment of the present invention. The process illustrated in FIG. 6 may be implemented in the form of computer instructions residing in firmware and executed by the host processor. [0039]
  • The process begins by retrieving dump data from the power independent memory (step [0040] 600). The host processor, executing firmware instructions, makes a request of the flash memory manager for a pointer to the dump data, then the dump data is simply read from that location. The flash memory manager may be implemented using currently available instructions used to access flash memories. Next, an error analysis is performed (step 602) with the process terminating thereafter. This error analysis may be performed using any presently available analysis programs.
  • Thus, the present invention provides an improved method, apparatus, and computer implemented instructions for saving data used in error analysis. The mechanism of the present invention stores the data in a power independent memory associated with the service processor. In these examples, the memory is a flash RAM. In this fashion, extra, unused space is employed as well as increased reliability in preventing the loss of critical data. [0041]
  • It is important to note that while the present invention has been described in the context of a fully functioning data processing system, those of ordinary skill in the art will appreciate that the processes of the present invention are capable of being distributed in the form of a computer readable medium of instructions and a variety of forms and that the present invention applies equally regardless of the particular type of signal bearing media actually used to carry out the distribution. Examples of computer readable media include recordable-type media, such as a floppy disk, a hard disk drive, a RAM, CD-ROMS, DVD-ROMs, and transmission-type media, such as digital and analog communications links, wired or wireless communications links using transmission forms, such as, for example, radio frequency and light wave transmissions. The computer readable media may take the form of coded formats that are decoded for actual use in a particular data processing system. [0042]
  • The description of the present invention has been presented for purposes of illustration and description, and is not intended to be exhaustive or limited to the invention in the form disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art. The embodiment was chosen and described in order to best explain the principles of the invention, the practical application, and to enable others of ordinary skill in the art to understand the invention for various embodiments with various modifications as are suited to the particular use contemplated. [0043]

Claims (39)

What is claimed is:
1. A method for saving data in a logically partitioned data processing system, the method comprising:
detecting an error in the logically partitioned data processing system; and
saving data needed for error analysis of the error in a power independent memory associated with a service processor.
2. The method of claim 1, wherein flash memory is a type of electrically erasable programmable read only memory.
3. The method of claim 1, wherein the data includes at least one of stack registers, general purpose registers, and floating point registers.
4. The method of claim 1 further comprising:
retrieving the data from the flash memory for analysis.
5. The method of claim 1, wherein the data is saved between system boots if the data processing system.
6. The method of claim 1, wherein the flash memory has a size of 16 megabytes.
7. The method of claim 1, wherein the data includes an error log.
8. The method of claim 1, wherein the data includes dump data.
9. A method in a data processing system for saving data, the method comprising:
detecting a fault state in a partition manger on the data processing system; and
saving data relating to the fault state in a power independent memory associated with a service processor in the data processing system.
10. The method of claim 9, wherein memory is the power independent memory.
11. The method of claim 10, wherein the power independent memory is an erasable programmable read only memory.
12. A data processing system comprising:
a bus system;
a communications unit connected to the bus system, wherein data is sent and received using the communications unit;
a firmware connected to the bus system, wherein a set of instructions are located in the firmware;
a service processor connected to the bus system;
a power independent memory associated with the service processor; and
a host processor connected to the bus system, wherein the host processor executes the set of instructions in firmware to detect an error in the logically partitioned data processing system; and save data needed for error analysis of the error in a power independent memory associated with the service processor.
13. The data processing system of claim 12, wherein the bus system includes a primary bus and a secondary bus.
14. The data processing system claim 12, wherein the communications unit is an Ethernet adapter.
15. A data processing system comprising:
a bus system;
a communications unit connected to the bus system, wherein data is sent and received using the communications unit;
a firmware connected to the bus system, wherein a set of instructions are located in the firmware;
a service processor connected to the bus system;
a power independent memory associated with the service processor; and
a host processor connected to the bus system, wherein the host processor executes the set of instructions to detect a fault state in the data processing system; and save data relating to the fault state in the power independent memory associated the service processor in the data processing system.
16. The data processing system of claim 15, wherein the bus system includes a primary bus and a secondary bus.
17. The data processing system claim 16, wherein the communications unit is an Ethernet adapter.
18. A data processing system for saving data in a logically partitioned data processing system, the data processing system comprising:
detecting means for detecting an error in the logically partitioned data processing system; and
saving means for saving data needed for error analysis of the error in a power independent memory associated with a service processor.
19. The data processing system of claim 18, wherein the power independent memory is an erasable programmable read only memory.
20. The data processing system of claim 18, wherein the data includes at least one of stack registers, general purpose registers, and floating point registers.
21. The data processing system of claim 18 further comprising:
retrieving means for retrieving the data from the power independent memory for analysis.
22. The data processing system of claim 18, wherein the data is saved between system boots if the data processing system.
23. The data processing system method of claim 18, wherein the power independent memory has a size of 16 megabytes.
24. The data processing system of claim 18, wherein the data includes an error log.
25. The data processing system of claim 18, wherein the data includes dump data.
26. A data processing system for saving data, the data processing system comprising:
detecting means for detecting a fault state in a partition manger on the data processing system; and
saving means for saving data relating to the fault state in a power independent memory associated with a service processor in the data processing system.
27. The data processing system of claim 26, wherein memory is a power independent memory.
28. The data processing system of claim 27, wherein the power independent memory is an erasable programmable read only memory.
29. A computer program product in a computer readable medium for saving data in a logically partitioned data processing system, the computer program product comprising:
first instructions for detecting an error in the logically partitioned data processing system; and
second instructions for saving data needed for error analysis of the error in a power independent memory associated with a service processor.
30. The computer program product of claim 29, wherein the power independent memory is an erasable programmable read only memory.
31. The computer program product of claim 29, wherein the data includes at least one of stack registers, general purpose registers, and floating point registers.
32. The computer program product of claim 29 further comprising:
third instructions for retrieving the data from the power independent memory for analysis.
33. The computer program product of claim 29, wherein the data is saved between system boots if the data processing system.
34. The computer program product of claim 29, wherein the power independent memory has a size of 16 megabytes.
35. The computer program product of claim 29, wherein the data includes an error log.
36. The computer program product of claim 29, wherein the data includes dump data.
37. A computer program product in a computer readable medium for saving data, the method comprising:
first instructions for detecting a fault state in a partition manger on the data processing system; and
second instructions for saving data relating to the fault state in a power independent memory associated with a service processor in the data processing system.
38. The computer program product of claim 37, wherein memory is a power independent memory.
39. The computer program product of claim 38, wherein the power independent memory is an erasable programmable read only memory.
US09/798,169 2001-03-01 2001-03-01 Method and apparatus for saving data used in error analysis Expired - Lifetime US7010726B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US09/798,169 US7010726B2 (en) 2001-03-01 2001-03-01 Method and apparatus for saving data used in error analysis

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US09/798,169 US7010726B2 (en) 2001-03-01 2001-03-01 Method and apparatus for saving data used in error analysis

Publications (2)

Publication Number Publication Date
US20020124209A1 true US20020124209A1 (en) 2002-09-05
US7010726B2 US7010726B2 (en) 2006-03-07

Family

ID=25172707

Family Applications (1)

Application Number Title Priority Date Filing Date
US09/798,169 Expired - Lifetime US7010726B2 (en) 2001-03-01 2001-03-01 Method and apparatus for saving data used in error analysis

Country Status (1)

Country Link
US (1) US7010726B2 (en)

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040103412A1 (en) * 2002-11-21 2004-05-27 Rao Bindu Rama Software self-repair toolkit for electronic devices
US20060041869A1 (en) * 2003-04-07 2006-02-23 Seth Houston System and method for analyzing consumer specified issues associated with a software application
US20060143058A1 (en) * 2000-11-17 2006-06-29 Jeffrey Brunet Operator network that routes customer care calls based on subscriber/device profile and CSR skill set
US20060294422A1 (en) * 2005-06-28 2006-12-28 Nec Electronics Corporation Processor and method of controlling execution of processes
US20070006049A1 (en) * 2003-04-25 2007-01-04 Agha Salim A Preservation of error data on a diskless platform
US20070016810A1 (en) * 2005-06-30 2007-01-18 Seiko Epson Corporation Information processing apparatus and program for causing computer to execute power control method
US20070033281A1 (en) * 2005-08-02 2007-02-08 Hwang Min J Error management system and method of using the same
US20070168980A1 (en) * 2006-01-06 2007-07-19 Reed David C Apparatus and method to debug a software program
US20080244296A1 (en) * 2007-03-26 2008-10-02 International Business Machines Corporation Computer system fault detection
US20100318717A1 (en) * 2009-06-16 2010-12-16 International Business Machines Corporation Status information saving among multiple computers
US8468515B2 (en) 2000-11-17 2013-06-18 Hewlett-Packard Development Company, L.P. Initialization and update of software and/or firmware in electronic devices
US8479189B2 (en) 2000-11-17 2013-07-02 Hewlett-Packard Development Company, L.P. Pattern detection preprocessor in an electronic device update generation system
US8526940B1 (en) 2004-08-17 2013-09-03 Palm, Inc. Centralized rules repository for smart phone customer care
US8555273B1 (en) 2003-09-17 2013-10-08 Palm. Inc. Network for updating electronic devices
US8578361B2 (en) 2004-04-21 2013-11-05 Palm, Inc. Updating an electronic device with update agent code
US8752044B2 (en) 2006-07-27 2014-06-10 Qualcomm Incorporated User experience and dependency management in a mobile device
US20140331092A1 (en) * 2013-05-02 2014-11-06 Microsoft Corporation Activity based sampling of diagnostics data
US8893110B2 (en) 2006-06-08 2014-11-18 Qualcomm Incorporated Device management in a network
CN105719049A (en) * 2016-01-12 2016-06-29 深圳大尚网络技术有限公司 Realization method and system of intelligent logs

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7028213B2 (en) * 2001-09-28 2006-04-11 Hewlett-Packard Development Company, L.P. Error indication in a raid memory system
US7437618B2 (en) * 2005-02-11 2008-10-14 International Business Machines Corporation Method in a processor for dynamically during runtime allocating memory for in-memory hardware tracing
US20060184837A1 (en) * 2005-02-11 2006-08-17 International Business Machines Corporation Method, apparatus, and computer program product in a processor for balancing hardware trace collection among different hardware trace facilities
US7418629B2 (en) * 2005-02-11 2008-08-26 International Business Machines Corporation Synchronizing triggering of multiple hardware trace facilities using an existing system bus
US7437617B2 (en) * 2005-02-11 2008-10-14 International Business Machines Corporation Method, apparatus, and computer program product in a processor for concurrently sharing a memory controller among a tracing process and non-tracing processes using a programmable variable number of shared memory write buffers
US7926034B2 (en) * 2007-03-13 2011-04-12 Seiko Epson Corporation Application software flight recorder developer client
US20080229283A1 (en) * 2007-03-13 2008-09-18 Steve Nelson Application Software Flight Recorder Tester Client
US20080235667A1 (en) * 2007-03-13 2008-09-25 Steve Nelson Application Software Flight Recorder Test Server
US8122291B2 (en) * 2010-01-21 2012-02-21 Hewlett-Packard Development Company, L.P. Method and system of error logging

Citations (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5265207A (en) * 1990-10-03 1993-11-23 Thinking Machines Corporation Parallel computer system including arrangement for transferring messages from a source processor to selected ones of a plurality of destination processors and combining responses
US5335341A (en) * 1990-12-20 1994-08-02 International Business Machines Corporation Dump analysis system and method in data processing systems
US5349675A (en) * 1990-09-04 1994-09-20 International Business Machines Corporation System for directly displaying remote screen information and providing simulated keyboard input by exchanging high level commands
US5564040A (en) * 1994-11-08 1996-10-08 International Business Machines Corporation Method and apparatus for providing a server function in a logically partitioned hardware machine
US5619644A (en) * 1995-09-18 1997-04-08 International Business Machines Corporation Software directed microcode state save for distributed storage controller
US5748884A (en) * 1996-06-13 1998-05-05 Mci Corporation Autonotification system for notifying recipients of detected events in a network environment
US5805790A (en) * 1995-03-23 1998-09-08 Hitachi, Ltd. Fault recovery method and apparatus
US5872970A (en) * 1996-06-28 1999-02-16 Mciworldcom, Inc. Integrated cross-platform batch management system
US6199179B1 (en) * 1998-06-10 2001-03-06 Compaq Computer Corporation Method and apparatus for failure recovery in a multi-processor computer system
US6233680B1 (en) * 1998-10-02 2001-05-15 International Business Machines Corporation Method and system for boot-time deconfiguration of a processor in a symmetrical multi-processing system
US6256756B1 (en) * 1998-12-04 2001-07-03 Hewlett-Packard Company Embedded memory bank system
US6401174B1 (en) * 1997-09-05 2002-06-04 Sun Microsystems, Inc. Multiprocessing computer system employing a cluster communication error reporting mechanism
US6412089B1 (en) * 1999-02-26 2002-06-25 Compaq Computer Corporation Background read scanning with defect reallocation
US20020120884A1 (en) * 2001-02-26 2002-08-29 Tetsuaki Nakamikawa Multi-computer fault detection system
US6457138B1 (en) * 1999-04-19 2002-09-24 Cisco Technology, Inc. System and method for crash handling on redundant systems
US6493656B1 (en) * 1999-02-26 2002-12-10 Compaq Computer Corporation, Inc. Drive error logging
US6496945B2 (en) * 1998-06-04 2002-12-17 Compaq Information Technologies Group, L.P. Computer system implementing fault detection and isolation using unique identification codes stored in non-volatile memory
US6516429B1 (en) * 1999-11-04 2003-02-04 International Business Machines Corporation Method and apparatus for run-time deconfiguration of a processor in a symmetrical multi-processing system
US6543010B1 (en) * 1999-02-24 2003-04-01 Hewlett-Packard Development Company, L.P. Method and apparatus for accelerating a memory dump
US6594785B1 (en) * 2000-04-28 2003-07-15 Unisys Corporation System and method for fault handling and recovery in a multi-processing system having hardware resources shared between multiple partitions
US6658594B1 (en) * 2000-07-13 2003-12-02 International Business Machines Corporation Attention mechanism for immediately displaying/logging system checkpoints
US6658599B1 (en) * 2000-06-22 2003-12-02 International Business Machines Corporation Method for recovering from a machine check interrupt during runtime

Patent Citations (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5349675A (en) * 1990-09-04 1994-09-20 International Business Machines Corporation System for directly displaying remote screen information and providing simulated keyboard input by exchanging high level commands
US5265207A (en) * 1990-10-03 1993-11-23 Thinking Machines Corporation Parallel computer system including arrangement for transferring messages from a source processor to selected ones of a plurality of destination processors and combining responses
US5335341A (en) * 1990-12-20 1994-08-02 International Business Machines Corporation Dump analysis system and method in data processing systems
US5564040A (en) * 1994-11-08 1996-10-08 International Business Machines Corporation Method and apparatus for providing a server function in a logically partitioned hardware machine
US5805790A (en) * 1995-03-23 1998-09-08 Hitachi, Ltd. Fault recovery method and apparatus
US5619644A (en) * 1995-09-18 1997-04-08 International Business Machines Corporation Software directed microcode state save for distributed storage controller
US5748884A (en) * 1996-06-13 1998-05-05 Mci Corporation Autonotification system for notifying recipients of detected events in a network environment
US5872970A (en) * 1996-06-28 1999-02-16 Mciworldcom, Inc. Integrated cross-platform batch management system
US6401174B1 (en) * 1997-09-05 2002-06-04 Sun Microsystems, Inc. Multiprocessing computer system employing a cluster communication error reporting mechanism
US6496945B2 (en) * 1998-06-04 2002-12-17 Compaq Information Technologies Group, L.P. Computer system implementing fault detection and isolation using unique identification codes stored in non-volatile memory
US6199179B1 (en) * 1998-06-10 2001-03-06 Compaq Computer Corporation Method and apparatus for failure recovery in a multi-processor computer system
US6233680B1 (en) * 1998-10-02 2001-05-15 International Business Machines Corporation Method and system for boot-time deconfiguration of a processor in a symmetrical multi-processing system
US6256756B1 (en) * 1998-12-04 2001-07-03 Hewlett-Packard Company Embedded memory bank system
US6543010B1 (en) * 1999-02-24 2003-04-01 Hewlett-Packard Development Company, L.P. Method and apparatus for accelerating a memory dump
US6493656B1 (en) * 1999-02-26 2002-12-10 Compaq Computer Corporation, Inc. Drive error logging
US6412089B1 (en) * 1999-02-26 2002-06-25 Compaq Computer Corporation Background read scanning with defect reallocation
US6457138B1 (en) * 1999-04-19 2002-09-24 Cisco Technology, Inc. System and method for crash handling on redundant systems
US6516429B1 (en) * 1999-11-04 2003-02-04 International Business Machines Corporation Method and apparatus for run-time deconfiguration of a processor in a symmetrical multi-processing system
US6594785B1 (en) * 2000-04-28 2003-07-15 Unisys Corporation System and method for fault handling and recovery in a multi-processing system having hardware resources shared between multiple partitions
US6658599B1 (en) * 2000-06-22 2003-12-02 International Business Machines Corporation Method for recovering from a machine check interrupt during runtime
US6658594B1 (en) * 2000-07-13 2003-12-02 International Business Machines Corporation Attention mechanism for immediately displaying/logging system checkpoints
US20020120884A1 (en) * 2001-02-26 2002-08-29 Tetsuaki Nakamikawa Multi-computer fault detection system

Cited By (43)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8848899B2 (en) 2000-11-17 2014-09-30 Qualcomm Incorporated Operator network that routes customer care calls based on subscriber / device profile and CSR skill set
US8468515B2 (en) 2000-11-17 2013-06-18 Hewlett-Packard Development Company, L.P. Initialization and update of software and/or firmware in electronic devices
US20080175372A1 (en) * 2000-11-17 2008-07-24 Jeffrey Brunet Operator network that routes customer care calls based on subscriber / device profile and csr skill set
US7401320B2 (en) 2000-11-17 2008-07-15 Hewlett-Packard Development Company, L.P. Operator network that routes customer care calls based on subscriber/device profile and CSR skill set
US8479189B2 (en) 2000-11-17 2013-07-02 Hewlett-Packard Development Company, L.P. Pattern detection preprocessor in an electronic device update generation system
US20060143058A1 (en) * 2000-11-17 2006-06-29 Jeffrey Brunet Operator network that routes customer care calls based on subscriber/device profile and CSR skill set
US7047448B2 (en) * 2002-11-21 2006-05-16 Bitfone Corporation Software self-repair toolkit for electronic devices
US20060190773A1 (en) * 2002-11-21 2006-08-24 Rao Bindu R Software self-repair toolkit for electronic devices
US20040103412A1 (en) * 2002-11-21 2004-05-27 Rao Bindu Rama Software self-repair toolkit for electronic devices
WO2004049104A3 (en) * 2002-11-21 2005-09-29 Bitfone Corp Software self-repair toolkit for electronic devices
WO2004049104A2 (en) * 2002-11-21 2004-06-10 Bitfone Corporation Software self-repair toolkit for electronic devices
US7640458B2 (en) * 2002-11-21 2009-12-29 Hewlett-Packard Development Company, L.P. Software self-repair toolkit for electronic devices
US20060041869A1 (en) * 2003-04-07 2006-02-23 Seth Houston System and method for analyzing consumer specified issues associated with a software application
US20070006049A1 (en) * 2003-04-25 2007-01-04 Agha Salim A Preservation of error data on a diskless platform
US7765431B2 (en) 2003-04-25 2010-07-27 International Business Machines Corporation Preservation of error data on a diskless platform
US7467331B2 (en) * 2003-04-25 2008-12-16 International Business Machines Corporation Preservation of error data on a diskless platform
US20090164851A1 (en) * 2003-04-25 2009-06-25 Salim Ahmed Agha Preservation of error data on a diskless platform
US8555273B1 (en) 2003-09-17 2013-10-08 Palm. Inc. Network for updating electronic devices
US8578361B2 (en) 2004-04-21 2013-11-05 Palm, Inc. Updating an electronic device with update agent code
US8526940B1 (en) 2004-08-17 2013-09-03 Palm, Inc. Centralized rules repository for smart phone customer care
US9342416B2 (en) 2005-06-28 2016-05-17 Renesas Electronics Corporation Processor and method of controlling execution of processes
US10235254B2 (en) 2005-06-28 2019-03-19 Renesas Electronics Corporation Processor and method of controlling execution of processes
US20060294422A1 (en) * 2005-06-28 2006-12-28 Nec Electronics Corporation Processor and method of controlling execution of processes
US8984334B2 (en) 2005-06-28 2015-03-17 Renesas Electronics Corporation Processor and method of controlling execution of processes
US8296602B2 (en) * 2005-06-28 2012-10-23 Renesas Electronics Corporation Processor and method of controlling execution of processes
US20070016810A1 (en) * 2005-06-30 2007-01-18 Seiko Epson Corporation Information processing apparatus and program for causing computer to execute power control method
US7747880B2 (en) * 2005-06-30 2010-06-29 Seiko Epson Corporation Information processing apparatus and program for causing computer to execute power control method
US7702959B2 (en) * 2005-08-02 2010-04-20 Nhn Corporation Error management system and method of using the same
US20070033281A1 (en) * 2005-08-02 2007-02-08 Hwang Min J Error management system and method of using the same
US8230396B2 (en) 2006-01-06 2012-07-24 International Business Machines Corporation Apparatus and method to debug a software program
US20070168980A1 (en) * 2006-01-06 2007-07-19 Reed David C Apparatus and method to debug a software program
US8893110B2 (en) 2006-06-08 2014-11-18 Qualcomm Incorporated Device management in a network
US8752044B2 (en) 2006-07-27 2014-06-10 Qualcomm Incorporated User experience and dependency management in a mobile device
US9081638B2 (en) 2006-07-27 2015-07-14 Qualcomm Incorporated User experience and dependency management in a mobile device
US20080244296A1 (en) * 2007-03-26 2008-10-02 International Business Machines Corporation Computer system fault detection
US7818597B2 (en) * 2007-03-26 2010-10-19 International Business Machines Corporation Computer system fault detection
US8793414B2 (en) 2009-06-16 2014-07-29 International Business Machines Corporation Status information saving among multiple computers
US8271704B2 (en) 2009-06-16 2012-09-18 International Business Machines Corporation Status information saving among multiple computers
US9229658B2 (en) 2009-06-16 2016-01-05 International Business Machines Corporation Status information saving among multiple computers
US20100318717A1 (en) * 2009-06-16 2010-12-16 International Business Machines Corporation Status information saving among multiple computers
US20140331092A1 (en) * 2013-05-02 2014-11-06 Microsoft Corporation Activity based sampling of diagnostics data
US9092332B2 (en) * 2013-05-02 2015-07-28 Microsoft Technology Licensing, Llc Activity based sampling of diagnostics data
CN105719049A (en) * 2016-01-12 2016-06-29 深圳大尚网络技术有限公司 Realization method and system of intelligent logs

Also Published As

Publication number Publication date
US7010726B2 (en) 2006-03-07

Similar Documents

Publication Publication Date Title
US7010726B2 (en) Method and apparatus for saving data used in error analysis
US6836855B2 (en) Recovery from data fetch errors in hypervisor code
JP3943538B2 (en) Method for managing error logs in a logically partitioned data processing system
US7100163B2 (en) Hypervisor virtualization of OS console and operator panel
US9213623B2 (en) Memory allocation with identification of requesting loadable kernel module
US6910160B2 (en) System, method, and computer program product for preserving trace data after partition crash in logically partitioned systems
US6629162B1 (en) System, method, and product in a logically partitioned system for prohibiting I/O adapters from accessing memory assigned to other partitions during DMA
US6839892B2 (en) Operating system debugger extensions for hypervisor debugging
US6651182B1 (en) Method for optimal system availability via resource recovery
US7711991B2 (en) Error monitoring of partitions in a computer system using partition status indicators
US7062517B2 (en) Method and apparatus for centralized computer management
US7120823B2 (en) Method and apparatus for recovering logical partition configuration data
US8751696B2 (en) Performing device configuration rediscovery
JP3815569B2 (en) Method and apparatus for simultaneously updating and activating partition firmware in a logical partition data processing system
US7039692B2 (en) Method and apparatus for maintaining profiles for terminals in a configurable data processing system
JP4366336B2 (en) Method for managing trace data in logical partition data processing system, logical partition data processing system for managing trace data, computer program for causing computer to manage trace data, logical partition data Processing system
US7913251B2 (en) Hypervisor virtualization of OS console and operator panel
US6654906B1 (en) Recovery from instruction fetch errors in hypervisor code
US20030163651A1 (en) Apparatus and method of transferring data from one partition of a partitioned computer system to another
US6832342B2 (en) Method and apparatus for reducing hardware scan dump data
US6898731B2 (en) System, method, and computer program product for preventing machine crashes due to hard errors in logically partitioned systems
US6658594B1 (en) Attention mechanism for immediately displaying/logging system checkpoints
US20010011335A1 (en) Data processing system having a network and method for managing memory by storing discardable pages in a local paging device
US6938114B2 (en) Method and apparatus for managing access to a service processor

Legal Events

Date Code Title Description
AS Assignment

Owner name: INTERNATIONAL BUSINESS MACHINES CORPORATION, NEW Y

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:FAUST, ROBERT ALLAN;KEHNE, KEVIN GENE;NULU, SAYILEELA;AND OTHERS;REEL/FRAME:011631/0577;SIGNING DATES FROM 20010223 TO 20010226

FEPP Fee payment procedure

Free format text: PAYOR NUMBER ASSIGNED (ORIGINAL EVENT CODE: ASPN); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

STCF Information on status: patent grant

Free format text: PATENTED CASE

FPAY Fee payment

Year of fee payment: 4

REMI Maintenance fee reminder mailed
AS Assignment

Owner name: TWITTER, INC., CALIFORNIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:INTERNATIONAL BUSINESS MACHINES CORPORATION;REEL/FRAME:032075/0404

Effective date: 20131230

FPAY Fee payment

Year of fee payment: 8

SULP Surcharge for late payment

Year of fee payment: 7

MAFP Maintenance fee payment

Free format text: PAYMENT OF MAINTENANCE FEE, 12TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1553)

Year of fee payment: 12

AS Assignment

Owner name: MORGAN STANLEY SENIOR FUNDING, INC., MARYLAND

Free format text: SECURITY INTEREST;ASSIGNOR:TWITTER, INC.;REEL/FRAME:062079/0677

Effective date: 20221027

Owner name: MORGAN STANLEY SENIOR FUNDING, INC., MARYLAND

Free format text: SECURITY INTEREST;ASSIGNOR:TWITTER, INC.;REEL/FRAME:061804/0086

Effective date: 20221027

Owner name: MORGAN STANLEY SENIOR FUNDING, INC., MARYLAND

Free format text: SECURITY INTEREST;ASSIGNOR:TWITTER, INC.;REEL/FRAME:061804/0001

Effective date: 20221027