US7676509B2 - Methods and apparatus for modifying a backup data stream including a set of validation bytes for each data block to be provided to a fixed position delta reduction backup application - Google Patents
Methods and apparatus for modifying a backup data stream including a set of validation bytes for each data block to be provided to a fixed position delta reduction backup application Download PDFInfo
- Publication number
- US7676509B2 US7676509B2 US11/345,819 US34581906A US7676509B2 US 7676509 B2 US7676509 B2 US 7676509B2 US 34581906 A US34581906 A US 34581906A US 7676509 B2 US7676509 B2 US 7676509B2
- Authority
- US
- United States
- Prior art keywords
- backup
- bytes
- sets
- validation
- recited
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active, expires
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/14—Error detection or correction of the data by redundancy in operation
- G06F11/1402—Saving, restoring, recovering or retrying
- G06F11/1446—Point-in-time backing up or restoration of persistent data
- G06F11/1448—Management of the data involved in backup or backup restore
- G06F11/1451—Management of the data involved in backup or backup restore by selection of backup contents
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/14—Error detection or correction of the data by redundancy in operation
- G06F11/1402—Saving, restoring, recovering or retrying
- G06F11/1446—Point-in-time backing up or restoration of persistent data
- G06F11/1458—Management of the backup or restore process
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/14—Error detection or correction of the data by redundancy in operation
- G06F11/1402—Saving, restoring, recovering or retrying
- G06F11/1446—Point-in-time backing up or restoration of persistent data
- G06F11/1458—Management of the backup or restore process
- G06F11/1469—Backup restoration techniques
Definitions
- the present invention relates to modifying a backup data stream to be processed by a fixed position delta reduction backup process. More particularly, the present invention relates to modifying a backup data stream to be processed by a fixed position delta reduction backup method, where the backup data stream includes a set of validation bytes for each data block.
- backup application In order to backup data, data backups are often performed via what is commonly referred to as a “backup application.” During a data backup, the backup application sends the data to be stored either to a local storage medium or via a network interface for remote transmission.
- the amount of data that is stored by the backup application varies with the method implemented by the backup application. For instance, some backup applications backup all data in the specified directory, database or file, while other applications attempt to increase the efficiency of the backup process by storing only the data that has been modified since the last backup.
- One commonly used method is the fixed position delta reduction method, which determines which fixed position segments of data have been modified since the last backup and stores the data reflecting those changes.
- the fixed position delta reduction method determines which segments of data have been modified by comparing one segment of data at a fixed position in a file or data stream received during a current backup with the segment of data previously at that same fixed position in the file or data stream during the last backup for that particular file.
- a backup application implementing a fixed position delta reduction method executes and the effectiveness of that process varies with the format in which data is stored.
- data associated with a particular file or database may be retrieved in the form of separate physical-organized streams or in a single stream including a plurality of data segments (i.e., blocks).
- blocks data segments
- problems that are introduced into a backup application implementing a fixed position delta reduction backup method when data is retrieved from a system providing a backup data stream including a plurality of data blocks, where each of the data blocks has an associated set of validation bytes.
- FIG. 1 is a diagram illustrating an exemplary data stream including a plurality of blocks of data.
- backup data is typically sent to the backup application as a data stream.
- a database or Application Programming Interface (API) 102 transmits the data stream 104 to a fixed position delta reduction backup application 105 for storing to a storage medium 106 .
- the data stream 104 includes data blocks 1 , 2 , and 3 , where each of the data blocks has an associated set of validation bytes.
- the data stream When a data stream is received via an application implemented by an IBM iSeriesTM platform, the data stream includes a set of validation bytes for each block of data. More particularly, the set of validation bytes includes a Cyclic Redundancy Check (CRC) value. Since each set of validation bytes generated by an IBM iSeriesTM platform also includes an “unknown seeding” component, the set of validation bytes associated with each data block will change with each request to the API. As a result, the validation bytes will appear to be changed data to the fixed position delta reduction backup application, regardless of whether the corresponding data block has been modified.
- CRC Cyclic Redundancy Check
- each set of validation bytes 107 in the data stream 104 includes an “unknown seed” component. More particularly, the set of validation bytes 107 for the data blocks 1 , 2 , and 3 of the data stream 104 includes a CRC that is calculated using an “unknown seed,” seed 1 , that changes from one data backup to the next data backup.
- a second data stream 110 is received, which again includes a set of validation bytes 108 for each of data blocks 1 , 2 and 3 .
- Each set of validation bytes 108 in the second data stream 110 includes a CRC that is calculated using an unknown seed, seed 2 , that changes from one data backup to the next data backup.
- the CRC and therefore the set of validation bytes associated with a particular data block will differ from one backup session to the next, regardless of whether the contents of the data block have changed.
- the set of validation bytes associated with each data block in the modified data stream 110 is compared to the corresponding set of validation bytes of the original data stream 104 (represented by corresponding arrows), the sets of validation bytes appear to have been modified or to be new data.
- the fixed position delta reduction backup application monitors segments of data for changes.
- each segment of the data stream being backed up typically includes both a data block and a set of validation bytes (and possibly other data block(s) and associated set(s) of validation bytes), the detection in the change of a set of validation bytes typically requires that the data blocks in that segment also be stored.
- the set of validation bytes 108 associated with blocks 1 , 2 , and 3 of the modified data stream 110 are compared to the corresponding set of validation bytes 107 associated with data blocks 1 , 2 , and 3 in the original data stream 104 , respectively. Since the unknown seed component used to generate the CRC of each set of validation bytes 107 of the data stream 104 differs from that of each set of validation bytes 108 of the data stream 110 , the sets of validation bytes appear to have changed. The sets of validation bytes therefore appear to the backup application to be modified data, resulting in the storing of the segment(s) of the data stream including the validation bytes 108 and the corresponding data blocks 1 , 2 , and 3 .
- each of the data blocks may be perceived as new (or modified) data upon a determination that the associated set of validation bytes in the same segment of the data stream has “changed.”
- the detection of this “new data” requires that all of the “new data” be written to a local storage medium or transmitted via a network interface for storing to a remote storage medium in order to perform a complete backup. Accordingly, this “new data” is stored unnecessarily, resulting in an inefficient processing of backup data provided to the fixed position delta reduction backup application.
- Methods and apparatus for modifying a data stream of backup data to be provided to a fixed position delta reduction backup method are disclosed. This is accomplished, in part, by modifying a stream of backup data prior to processing the backup data stream via a fixed position delta reduction backup method.
- the amount of data that is detected by the fixed position delta reduction backup method as new or changed is minimized. Accordingly, the amount of data that is stored by the fixed position delta reduction backup method to complete a data backup is substantially reduced.
- a backup data stream is modified prior to providing one or more modified backup data streams to the fixed position delta reduction backup application.
- the disclosed embodiments may also be implemented by the fixed position delta reduction backup application.
- the modified data stream may be processed by the corresponding fixed position delta reduction backup method without requiring that the modified data stream(s) be provided to a separate application.
- a data stream including a set of validation bytes for each data block is received.
- the data stream is received, at least a portion of the data stream is parsed into a plurality of data blocks and a plurality of sets of validation bytes, wherein each of the plurality of data blocks corresponds to one of the plurality of sets of validation bytes and each of the plurality of sets of validation bytes includes a signature.
- a signature may be calculated, for example, by calculating a cyclic redundancy check (CRC) or checksum.
- CRC cyclic redundancy check
- One or more modified data streams are then generated such that the plurality of data blocks are separate from the plurality of sets of validation bytes.
- a single modified data stream is generated such that the plurality of sets of validation bytes are appended to the plurality of data blocks in a contiguous manner.
- two different modified data streams are generated, where the first modified data stream includes the plurality of data blocks and the second modified data stream includes the plurality of sets of validation bytes.
- each set of validation bytes associated with each data block changes from one data backup to the next data backup.
- the set of validation bytes is not entirely based upon the contents of the corresponding data block.
- each set of validation bytes may include a signature calculated using a seed component that changes from one backup session to the next.
- a seed may include, for example, a date and/or time component.
- each set of validation bytes may include a value that is transmitted separately from the signature, where the value changes from one backup session to the next. For instance, such a value may include a date and/or time component.
- each set of validation bytes is a fixed length.
- each of the data blocks is a fixed length data block.
- the length of a fixed length data block is a specific, predetermined length.
- the length may be fixed with respect to position (e.g., with respect to other data blocks), as well as with respect to time.
- the length is fixed with respect to position when each of the fixed length data blocks includes a predetermined, identical number of bytes of data.
- the length is fixed with respect to time when the length of a data block remains the same across time, and therefore across multiple data backups.
- each data block is described as being a fixed length with respect to position, as well as with respect to time across multiple backups. However, it is important to note that the data blocks may be fixed length only with respect to position or with respect to time. Moreover, the data blocks may also be of variable length with respect to position and/or with respect to time.
- each set of validation bytes is 16 bytes and each data block is 64 kilobytes.
- API Application Programming Interface
- the data stream includes separate 1 megabyte portions (i.e., buffers).
- Each 1 megabyte portion includes individual data blocks, each followed by a validation segment (i.e., set of validation bytes).
- Each validation segment includes a signature.
- the signature may be calculated using a seed component or, alternatively, the validation segment may further include a separate value, where the seed component/value changes from one backup session to the next.
- the invention pertains to a system operable to perform and/or initiate any of the disclosed methods.
- the system includes one or more processors and one or more memories. At least one of the memories and processors are adapted to provide at least some of the above described method operations.
- the invention pertains to a computer program product for performing the disclosed methods.
- the computer program product has at least one tangible computer readable medium and computer program instructions associated with at least one of the computer readable product configured to perform at least some of the above described method operations.
- FIG. 1 is a diagram illustrating an exemplary data stream including a plurality of data blocks as processed by a typical backup application.
- FIG. 2 is a diagram illustrating the result of generating a set of validation bytes for each data block into a data stream including a plurality of data blocks as shown in FIG. 1 .
- FIG. 3 is a diagram illustrating an exemplary system in which the present invention may be implemented.
- FIG. 4 is a process flow diagram illustrating a method of implementing a stream modification method as shown at block 304 of FIG. 3 in accordance with one embodiment of the invention.
- FIG. 5 is a process flow diagram illustrating a method of parsing a backup data stream as shown at block 404 of FIG. 4 in accordance with one embodiment of the invention.
- FIG. 6 is a process flow diagram illustrating a method of reversing the save method previously performed upon restore of a file in accordance with one embodiment of the invention.
- FIG. 7 is a process flow diagram illustrating a method of obtaining the data blocks and corresponding sets of validation bytes from the modified backup data stream(s) as shown at block 604 of FIG. 6 in accordance with one embodiment of the invention.
- FIG. 8 is a process flow diagram illustrating a method of generating a backup data stream from the obtained data blocks and corresponding sets of validation bytes as shown at block 606 of FIG. 6 in accordance with one embodiment of the invention.
- FIG. 9 is a block diagram illustrating a typical, general-purpose computer system suitable for implementing the present invention.
- the disclosed embodiments enable a backup data stream that is received from a system generating a set of validation bytes (i.e., validation segment) for each data block, where the set of validation bytes changes with each data backup, to be efficiently processed by a fixed position delta reduction backup method. This is accomplished, in part, by modifying the backup data stream prior to passing it to the fixed position delta reduction backup method. By modifying the backup data stream, the amount of data that is perceived by the fixed position delta reduction backup method to be new or changed is minimized. In this manner, inefficiencies typically introduced into the fixed position delta reduction backup process when a backup data stream includes sets of validation bytes that vary with each backup are eliminated.
- each set of validation bytes associated with each data block changes with each backup execution. More particularly, each set of validation bytes includes a component that changes from one backup to the next.
- each set of validation bytes may include a signature that is calculated using a seed component that changes from one data backup to the next data backup.
- the seed component may include a date and/or time.
- a signature may be generated, for example, by calculating a cyclic redundancy check (CRC) or checksum.
- each set of validation bytes may include a value that is transmitted separately from the signature, where the value changes from one data backup to the next data backup. For example, such a value may include a date and/or time component.
- the backup data stream is modified such that the data blocks are separated from the corresponding sets of validation bytes. More particularly, at least a portion of the data stream is parsed into a plurality of data blocks and a plurality of sets of validation bytes, where each of the plurality of data blocks corresponds to one of the plurality of sets of validation bytes. One or more modified data streams are then generated such that the plurality of data blocks are separate from the plurality of sets of validation bytes. For instance, the plurality of sets of validation bytes may be appended to the plurality of data blocks such that a single modified stream is generated.
- two different modified data streams may be generated, where the first modified data stream includes the plurality of data blocks and the second modified data stream includes the corresponding plurality of sets of validation bytes.
- the first modified data stream includes the plurality of data blocks
- the second modified data stream includes the corresponding plurality of sets of validation bytes.
- Data associated with a particular file or database may be stored in variable length data blocks or fixed length data blocks.
- the length of a data block may vary or be fixed with respect to position (e.g., with respect to other data blocks) and/or time (e.g., over time).
- a variable length data block for which the length varies with respect to time may be any length, which varies with the content of the data block.
- the length of a variable length data block for which the length varies over time may increase or decrease over time.
- each of the variable length data blocks may include any number of bytes of data.
- each data block may include a different number of bytes of data, and therefore the length of the data blocks need not be the same.
- a variable length data block for which the length varies with respect to position need not vary with respect to time, and vice versa.
- the length of a fixed length data block is a specific, predetermined length.
- the length is fixed with respect to other data blocks when each of the fixed length data blocks includes a predetermined, identical number of bytes of data.
- the length is fixed with respect to time when the length of a data block remains the same across time, and therefore across multiple data backups.
- a fixed length data block for which the length is fixed with respect to position need not be fixed with respect to time, and vice versa.
- Many common database programs divide databases into fixed length data blocks, where the length is fixed with respect to both position and time.
- the disclosed embodiments may be implemented with systems storing data in the form of fixed length data blocks or variable length data blocks.
- the length may vary with respect to position (e.g., with respect to other data blocks) and/or time (e.g., across multiple data backups).
- the disclosed embodiments may also be implemented with systems storing data in the form of fixed length data blocks where the length is fixed with respect to only position or time.
- the backup data stream includes one or more segments, each of which includes a plurality of data blocks and corresponding sets of validation bytes. These segments may be separated logically, and may be referred to as logical components. Each logical component may be defined as a logically distinct segment within a file or database, such as a backed-up file within a backup dump file or a database file/tablespace within a database dump stream.
- 11/280,545 entitled “Methods and Apparatus for Modifying a Backup Data Stream including Logical Partitions of Data Blocks to be Provided to a Fixed Position Delta Reduction Backup Application,” naming Boldt et al. as inventors, filed on Nov. 15, 2005, which is incorporated herein by reference for all purposes.
- the segments may be separated into equal length portions, as will be described in further detail below with reference to the IBM iSeriesTM platform.
- each of the segments includes one or more data blocks, where each of the data blocks is followed by an associated set of validation bytes.
- the backup data stream is received from an IBM iSeriesTM platform.
- the backup data stream is typically obtained via an Application Programming Interface (API), which is referred to as the SAV API.
- API Application Programming Interface
- the portions of the data stream are typically 1 megabyte in length.
- Each of the data blocks in a particular portion of the data stream is 64 kilobytes in length, while each associated set of validation bytes consists of 16 bytes.
- each set of validation bytes includes a value that changes from one data backup to the next or, alternatively, a signature calculated using a seed component that changes from one data backup to the next.
- a segment including the sets of validation bytes is generated.
- the segment is then appended to the data blocks obtained from that portion of the data stream.
- the segment of validation bytes is 256 bytes.
- a number of “padding” bytes may be used to pad the segment of validation bytes. This may be desirable, for example, in order to pad a 256 byte segment of validation bytes to generate a 32 kilobyte segment, thereby maintaining consistent 32 kilobyte boundaries.
- the data that is provided to or obtained by a fixed position delta reduction backup application be obtained from a database or file.
- a plurality of data blocks are obtained from a database and a set of validation bytes is generated for each data block.
- the data and corresponding sets of validation bytes may correspond to one or more files stored in a database.
- Data associated with a file or database may be received by a backup application as one contiguous stream of data.
- the backup application may call an application programming interface (API) offered by a database engine to request backup data.
- API application programming interface
- the database API will send the backup data as a stream to the requesting application.
- FIG. 3 is a diagram illustrating an exemplary system in which the present invention may be implemented to modify a stream of backup data for transmission to a fixed position delta reduction backup application.
- a stream of backup data 302 is received by a stream modification method (i.e., Save Stream method) 304 .
- the stream modification method 304 receives the stream of backup data, it modifies the stream of backup data, generating one or more modified data streams. Techniques for modifying the stream of backup data will be described in further detail below with reference to FIGS. 4-8 .
- the modified data stream(s) 306 of backup data are then provided to a fixed position delta reduction backup application 308 .
- the fixed position delta reduction backup application 408 Upon receipt of the modified data stream(s), the fixed position delta reduction backup application 408 processes the modified data stream(s) 306 according to standard fixed position delta reduction backup methods. It is important to note that in this example, the stream modification method 304 is performed separately from the fixed position delta reduction backup application 308 . However, the stream modification method 304 and a fixed position delta reduction method may also be performed by a single application. Thus, a single application may implement any of the disclosed embodiments, as well as a fixed position delta reduction method and associated backup processes.
- Each object, file or database, and therefore each stream of backup data 302 associated with a file or database includes one or more segments (i.e., partitions).
- each of the segments may be a logical component or a fixed length segment.
- the length of each of the partitions and each of the data blocks is a fixed length.
- the length of each of the partitions and each of the data blocks may vary with respect to one another.
- FIG. 4 is a process flow diagram illustrating a method of implementing a stream modification method as shown at block 304 of FIG. 3 in accordance with one embodiment of the invention.
- the data stream is obtained at 404 , which includes a plurality of data blocks and corresponding plurality of sets of validation bytes.
- the specified length portion is obtained.
- a portion of the data stream is variable in length, such as where the portion is a logical segment of the data stream, the logical segment may be identified.
- Each of the plurality of data blocks corresponds to one of the plurality of sets of validation bytes.
- each of the plurality of data blocks is followed by one of the plurality of sets of validation bytes. Stated another way, each pair of data blocks is separated by a set of validation bytes such that the data blocks and sets of validation bytes are alternating.
- the at least a portion of the data stream is then parsed to generate one or more modified data streams such that the sets of validation bytes are separated from the plurality of data blocks at 404 .
- each of the sets of validation bytes may be removed from the portion of the data stream, leaving only the data blocks.
- each of the data blocks may be removed from the portion of the data stream, resulting in a contiguous stream including the sets of validation bytes.
- One method of parsing a data stream (or portion thereof) such that the plurality of sets of validation bytes are separated from the plurality of data blocks will be described in further detail below with reference to FIG. 5 .
- the one or more modified data streams may then be provided (e.g., transmitted to) a fixed position delta reduction backup application at 406 .
- steps 402 - 406 may be repeated for any remaining portions of the data stream.
- the fixed position delta reduction backup application determines which data blocks to store to remote or local storage. For instance, the fixed position delta reduction backup application may determine which data blocks have been modified (e.g., changed, added, or deleted) by calculating signatures associated with the data blocks in the current modified data stream and comparing the calculated signatures to previously stored signatures associated with a previous backup of the same file or database. The fixed position delta reduction backup application also replaces the previously stored signatures with the newly calculated signatures, enabling the fixed position delta reduction backup application to detect changes made to the file or database since the most recent backup.
- the fixed position delta reduction backup application may determine which data blocks have been modified (e.g., changed, added, or deleted) by calculating signatures associated with the data blocks in the current modified data stream and comparing the calculated signatures to previously stored signatures associated with a previous backup of the same file or database.
- the fixed position delta reduction backup application also replaces the previously stored signatures with the newly calculated signatures, enabling the fixed position delta reduction backup application to detect changes made to the file or database since the most
- the fixed position delta reduction backup application will be able to correctly compare signatures of each of the data blocks with those signatures that have previously been stored by the fixed position delta reduction backup application for those data blocks. Since the sets of validation bytes that are received by the delta reduction backup application will differ from those previously received by the delta reduction backup application due to the changing seed component (or value), the sets of validation bytes will be perceived as new or modified data by the delta reduction backup application and stored.
- the fixed position delta reduction backup application determines which data has been modified since the last data backup, the fixed position delta reduction backup application stores the modified data. This data may be sent to a local data storage medium or may be sent via a network interface for transmission to a remote storage medium.
- FIG. 5 is a process flow diagram illustrating an exemplary method of parsing a backup data stream as shown at block 404 of FIG. 4 .
- a string variable representing the SET OF PREVIOUSLY OBTAINED DATA BLOCKS is initialized to NULL at 502
- a string variable representing the SET OF PREVIOUSLY OBTAINED SETS OF VALIDATION BYTES is initialized to NULL at 504 .
- At least a portion of a backup stream is obtained at 506 . If there are more data blocks at 508 , the process continues at 510 to obtain the next data block. The obtained data block is then concatenated (e.g., appended) to the SET OF PREVIOUSLY OBTAINED DATA BLOCKS at 512 . The set of validation bytes associated with the obtained data block is obtained at 514 and concatenated (e.g., appended) to the SET OF PREVIOUSLY OBTAINED SETS OF VALIDATION BYTES at 516 . The process continues at 508 for all remaining data blocks (and corresponding sets of validation bytes) in the data stream.
- one or more modified data streams are generated such that the plurality of data blocks are separate from the plurality of sets of validation bytes. This may be accomplished by generating a single data stream or two (or more) different data streams. More particularly, as shown at 518 , the SET OF PREVIOUSLY OBTAINED SETS OF VALIDATION BYTES may be concatenated (e.g., appended) to the SETS OF PREVIOUSLY OBTAINED DATA BLOCKS to generate a single modified data stream. In other words, the sets of validation bytes may be placed at the end of the modified data stream in a contiguous manner.
- two different “modified” data streams may be generated, where a first data stream includes the SETS OF PREVIOUSLY OBTAINED SETS OF VALIDATION BYTES and a second data stream includes the SETS OF PREVIOUSLY OBTAINED SETS OF DATA BLOCKS.
- the data stream is processed by the fixed position delta reduction backup application and the data that has been modified since the last backup is stored by the fixed position delta reduction backup application to local or remote storage.
- the fixed position delta reduction backup application retrieves the stored data, it is necessary to reverse the method that was previously performed to modify the backup data stream that was provided to the fixed position delta reduction backup application.
- FIG. 6 is a process flow diagram illustrating a method of reversing the save method previously performed upon restore of a file in accordance with one embodiment of the invention.
- At least one modified backup data stream is obtained from the delta reduction backup application at 602 .
- the modified backup data stream(s) may correspond to a file.
- the modified backup data stream(s) are then parsed such that the plurality of data blocks and the corresponding sets of validation bytes are obtained at 604 from the modified backup data stream(s).
- One method of parsing the modified backup data stream(s) will be described in further detail below with reference to FIG. 7 .
- a backup data stream is then generated (e.g., restored) at 606 such that each of the sets of validation bytes corresponding to one of the plurality of data blocks is individually concatenated (e.g., appended) to the corresponding data block. Stated another way, the data blocks and sets of validation bytes are alternating such that each pair of data blocks is separated by a set of validation bytes.
- One method of generating (e.g., restoring) a backup data stream will be described in further detail below with reference to FIG. 8 .
- the backup data stream(s) are then provided (e.g., via API) at 608 , thereby enabling the file to be restored. In this manner, a physical file may be restored from the backup data that has been modified as set forth above.
- FIG. 7 is a process flow diagram illustrating a method of obtaining the data blocks and corresponding sets of validation bytes from the modified backup data stream(s) as shown at block 604 of FIG. 6 in accordance with one embodiment of the invention.
- One or more modified data streams are obtained (e.g., received) at 702 .
- a single modified data stream in which the sets of validation bytes have been appended to the data blocks is obtained.
- two modified data streams are obtained, where a first one of the modified data streams includes the data blocks while a second one of the modified data streams includes the sets of validation bytes.
- the sets of validation bytes are separated from the sets of data blocks in the modified data stream. More particularly, the set of data blocks is initialized to NULL at 706 . If there are more data blocks at 708 , the next data block is obtained at 710 and appended to the set of data blocks at 712 . The process repeats at 708 until all of the data blocks have been encountered. The sets of validation bytes that were previously appended to the set of data blocks may then be obtained at 714 from the modified data stream.
- one of the modified data streams includes the data blocks, while the other of the modified data streams includes the sets of validation bytes.
- the data stream that includes the data blocks is identified at 716 . Therefore, the set of data blocks may be obtained from this first modified data stream at 718 , while the sets of validation bytes may be found in the second modified data stream at 720 .
- the original backup data stream may be restored. This may be accomplished by reversing the changes that were initially made to modify the backup data stream. Once the original backup data stream is restored, it may be provided via an API to the system that originated the backup data stream, enabling a file or portion thereof corresponding to the backup data stream to be restored.
- FIG. 8 is a process flow diagram illustrating a method of generating a backup data stream from the obtained data blocks and corresponding sets of validation bytes as shown at block 606 of FIG. 6 in accordance with one embodiment of the invention.
- the restored data stream is initialized to NULL at 802 . If there are more data blocks (and associated sets of validation bytes) at 804 , the process continues to obtain the next data block in the set of data blocks, which is appended to the restored data stream at 806 . In addition, the next set of validation bytes is obtained from the sets of validation bytes and appended to the restored data stream at 808 . In other words, the set of validation bytes is appended to the corresponding data block. In this manner, a set of validation byes may be inserted between two data blocks.
- the process repeats at 804 for all remaining data blocks/sets of validation bytes. When no data blocks/sets of validation bytes remain, the process ends at 810 .
- a backup data stream may be restored.
- This may be accomplished, for example, where a set of modified data streams (e.g., a single or two different modified data streams) are generated for each portion of the original backup data stream.
- the set of modified data streams may therefore each be processed to separate the sets of validation bytes from the data blocks as set forth above with reference to FIG. 7 .
- the sets of validation bytes and the data blocks for each portion of the original backup data stream may be combined for all portions or maintained separately for each portion.
- the original backup data stream may then be generated (e.g., restored) as set forth above with reference to FIG. 8 from the data blocks and corresponding sets of validation bytes. For instance, if the sets of validation bytes and the data blocks for all of the portions of the backup data stream are combined, the original backup data stream may be restored as described above with reference to FIG. 8 . Alternatively, if the sets of validation bytes and the data blocks for each portion of the original backup data stream are maintained separately from those for other portions of the original backup data stream, each portion of the backup data stream may be separately restored. The restored portions may be separately transmitted or concatenated prior to transmission to restore the original backup data stream.
- the file restore process operates to reverse the stream modification method previously performed to modify the backup data stream.
- the process illustrated in FIGS. 6-8 corresponds to a system in which the processes set forth above have been performed to modify a backup data stream.
- the file restore process will differ depending upon the format of the backup data stream received and the combination of steps performed to modify the backup data stream. Accordingly, the above-described embodiments are merely illustrative, and other methods of modifying a backup data stream may be performed to separate sets of validation bytes from the corresponding data blocks.
- the techniques of the present invention may be implemented on software and/or hardware.
- the technique of the present invention is implemented in software.
- the present invention relates to machine-readable media that include program instructions, state information (e.g., tables), etc. for performing various operations described herein.
- machine-readable media include, but are not limited to, magnetic media such as hard disks, floppy disks, and magnetic tape; optical media such as CD-ROM disks; magneto-optical media such as floptical disks; and hardware devices that are specially configured to store and perform program instructions, such as read-only memory devices (ROM) and random access memory (RAM).
- the invention may also be embodied in or associated with a tangible computer-readable medium in which a carrier wave travels over an appropriate medium such as airwaves, optical lines, electric lines, etc.
- Examples of program instructions include both machine code, such as produced by a compiler, and files containing higher level code that may be executed by a computer using an interpreter.
- FIG. 9 illustrates a typical, general-purpose computer system 1502 suitable for implementing the present invention.
- the computer system may take any suitable form.
- the computer system 1502 includes any number of processors 1504 (also referred to as central processing units, or CPUs) that may be coupled to memory devices including primary storage device 1506 (typically a read only memory, or ROM) and primary storage device 1508 (typically a random access memory, or RAM).
- processors 1504 also referred to as central processing units, or CPUs
- memory devices including primary storage device 1506 (typically a read only memory, or ROM) and primary storage device 1508 (typically a random access memory, or RAM).
- ROM read only memory
- RAM random access memory
- Both the primary storage devices 1506 , 1508 may include any suitable computer-readable media.
- a secondary storage medium 1510 which is typically a mass memory device, may also be coupled bi-directionally to CPUs 1504 and provides additional data storage capacity.
- the mass memory device 1510 is a computer-readable medium that may be used to store programs including computer code, data, and the like.
- the mass memory device 1510 is a storage medium such as a hard disk, which is generally slower than primary storage devices 1506 , 1508 .
- the mass memory device 1510 may be a storage device such as a SCSI storage device.
- the CPUs 1504 optionally may be coupled to a computer or telecommunications network, e.g., an internet network or an intranet network, using a network connection as shown generally at 1514 .
- a network connection it is contemplated that the CPUs 1504 might receive information from the network (e.g., data associated with a restore process), or might output information to the network (e.g., data that has been processed by a fixed position delta reduction backup process or data that has been modified prior to being processed by a fixed position delta reduction backup application) in the course of performing the above-described method steps.
- backup data may be transmitted over a network to be processed, or to be stored to or retrieved from a remote storage device.
- the network may be a storage area network (SAN) such as a fibre-channel SAN.
- the invention may be installed for use across a network such as the Internet, thereby enabling data retrieval from and backup to disparate sources.
Abstract
Description
Claims (59)
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US11/345,819 US7676509B2 (en) | 2006-02-01 | 2006-02-01 | Methods and apparatus for modifying a backup data stream including a set of validation bytes for each data block to be provided to a fixed position delta reduction backup application |
PCT/US2007/002684 WO2007089861A2 (en) | 2006-02-01 | 2007-01-30 | Methods and apparatus for modifying a backup data stream including a set of validation bytes for each data block to be provided to a fixed position delta reduction backup application |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US11/345,819 US7676509B2 (en) | 2006-02-01 | 2006-02-01 | Methods and apparatus for modifying a backup data stream including a set of validation bytes for each data block to be provided to a fixed position delta reduction backup application |
Publications (2)
Publication Number | Publication Date |
---|---|
US20070179998A1 US20070179998A1 (en) | 2007-08-02 |
US7676509B2 true US7676509B2 (en) | 2010-03-09 |
Family
ID=38283330
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US11/345,819 Active 2026-08-10 US7676509B2 (en) | 2006-02-01 | 2006-02-01 | Methods and apparatus for modifying a backup data stream including a set of validation bytes for each data block to be provided to a fixed position delta reduction backup application |
Country Status (2)
Country | Link |
---|---|
US (1) | US7676509B2 (en) |
WO (1) | WO2007089861A2 (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7925630B1 (en) * | 2007-03-30 | 2011-04-12 | Symantec Corporation | Method of inserting a validated time-image on the primary CDP subsystem in a continuous data protection and replication (CDP/R) subsystem |
US8060479B1 (en) * | 2008-03-28 | 2011-11-15 | Symantec Corporation | Systems and methods for transparently restoring data using file streaming |
US20130311765A1 (en) * | 2012-05-17 | 2013-11-21 | Sony Computer Entertainment Inc. | Information processing apparatus, data generation method, information processing method, and information processing system |
US9037545B2 (en) | 2006-05-05 | 2015-05-19 | Hybir Inc. | Group based complete and incremental computer file backup system, process and apparatus |
Families Citing this family (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8856332B2 (en) * | 2007-10-09 | 2014-10-07 | International Business Machines Corporation | Integrated capacity and architecture design tool |
US8315985B1 (en) * | 2008-12-18 | 2012-11-20 | Symantec Corporation | Optimizing the de-duplication rate for a backup stream |
CN102792281B (en) * | 2010-03-04 | 2015-11-25 | 日本电气株式会社 | Memory device |
WO2015024603A1 (en) * | 2013-08-23 | 2015-02-26 | Nec Europe Ltd. | Method and system for authenticating a data stream |
US9501369B1 (en) * | 2014-03-31 | 2016-11-22 | Emc Corporation | Partial restore from tape backup |
US9946603B1 (en) | 2015-04-14 | 2018-04-17 | EMC IP Holding Company LLC | Mountable container for incremental file backups |
US10078555B1 (en) * | 2015-04-14 | 2018-09-18 | EMC IP Holding Company LLC | Synthetic full backups for incremental file backups |
US9996429B1 (en) | 2015-04-14 | 2018-06-12 | EMC IP Holding Company LLC | Mountable container backups for files |
US11755231B2 (en) | 2019-02-08 | 2023-09-12 | Ownbackup Ltd. | Modified representation of backup copy on restore |
US20200257594A1 (en) * | 2019-02-08 | 2020-08-13 | OwnBackup LTD | Modified Representation Of Backup Copy On Restore |
Citations (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5604487A (en) * | 1993-07-30 | 1997-02-18 | Lockheed Martin Tactical Systems, Inc. | Apparatus and method for user-selective data communication with verification |
US5745906A (en) | 1995-11-14 | 1998-04-28 | Deltatech Research, Inc. | Method and apparatus for merging delta streams to reconstruct a computer file |
US5752039A (en) * | 1993-03-22 | 1998-05-12 | Ntt Data Communications Systems Corp. | Executable file difference extraction/update system and executable file difference extraction method |
US5765172A (en) * | 1996-01-23 | 1998-06-09 | Dsc Communications Corporation | System and method for verifying integrity of replicated databases |
US5781911A (en) * | 1996-09-10 | 1998-07-14 | D2K, Incorporated | Integrated system and method of data warehousing and delivery |
WO1999038097A1 (en) | 1998-01-21 | 1999-07-29 | Microsoft Corporation | Native data signatures in a file system |
US5990810A (en) * | 1995-02-17 | 1999-11-23 | Williams; Ross Neil | Method for partitioning a block of data into subblocks and for storing and communcating such subblocks |
US6233589B1 (en) | 1998-07-31 | 2001-05-15 | Novell, Inc. | Method and system for reflecting differences between two files |
US6237068B1 (en) * | 1998-08-18 | 2001-05-22 | International Business Machines Corp. | System for multi-volume, write-behind data storage in a distributed processing system |
US20030056139A1 (en) | 2001-09-20 | 2003-03-20 | Bill Murray | Systems and methods for data backup over a network |
US6785786B1 (en) * | 1997-08-29 | 2004-08-31 | Hewlett Packard Development Company, L.P. | Data backup and recovery systems |
US20050080823A1 (en) * | 2003-10-10 | 2005-04-14 | Brian Collins | Systems and methods for modifying a set of data objects |
US6968349B2 (en) * | 2002-05-16 | 2005-11-22 | International Business Machines Corporation | Apparatus and method for validating a database record before applying journal data |
US20070100913A1 (en) * | 2005-10-12 | 2007-05-03 | Sumner Gary S | Method and system for data backup |
US20070136741A1 (en) * | 2005-12-09 | 2007-06-14 | Keith Stattenfield | Methods and systems for processing content |
US20070288533A1 (en) * | 2003-03-28 | 2007-12-13 | Novell, Inc. | Methods and systems for file replication utilizing differences between versions of files |
-
2006
- 2006-02-01 US US11/345,819 patent/US7676509B2/en active Active
-
2007
- 2007-01-30 WO PCT/US2007/002684 patent/WO2007089861A2/en active Application Filing
Patent Citations (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5752039A (en) * | 1993-03-22 | 1998-05-12 | Ntt Data Communications Systems Corp. | Executable file difference extraction/update system and executable file difference extraction method |
US5604487A (en) * | 1993-07-30 | 1997-02-18 | Lockheed Martin Tactical Systems, Inc. | Apparatus and method for user-selective data communication with verification |
US5990810A (en) * | 1995-02-17 | 1999-11-23 | Williams; Ross Neil | Method for partitioning a block of data into subblocks and for storing and communcating such subblocks |
US5745906A (en) | 1995-11-14 | 1998-04-28 | Deltatech Research, Inc. | Method and apparatus for merging delta streams to reconstruct a computer file |
US5765172A (en) * | 1996-01-23 | 1998-06-09 | Dsc Communications Corporation | System and method for verifying integrity of replicated databases |
US5781911A (en) * | 1996-09-10 | 1998-07-14 | D2K, Incorporated | Integrated system and method of data warehousing and delivery |
US6785786B1 (en) * | 1997-08-29 | 2004-08-31 | Hewlett Packard Development Company, L.P. | Data backup and recovery systems |
WO1999038097A1 (en) | 1998-01-21 | 1999-07-29 | Microsoft Corporation | Native data signatures in a file system |
US6233589B1 (en) | 1998-07-31 | 2001-05-15 | Novell, Inc. | Method and system for reflecting differences between two files |
US6237068B1 (en) * | 1998-08-18 | 2001-05-22 | International Business Machines Corp. | System for multi-volume, write-behind data storage in a distributed processing system |
US20030056139A1 (en) | 2001-09-20 | 2003-03-20 | Bill Murray | Systems and methods for data backup over a network |
US6968349B2 (en) * | 2002-05-16 | 2005-11-22 | International Business Machines Corporation | Apparatus and method for validating a database record before applying journal data |
US20070288533A1 (en) * | 2003-03-28 | 2007-12-13 | Novell, Inc. | Methods and systems for file replication utilizing differences between versions of files |
US20050080823A1 (en) * | 2003-10-10 | 2005-04-14 | Brian Collins | Systems and methods for modifying a set of data objects |
US20070100913A1 (en) * | 2005-10-12 | 2007-05-03 | Sumner Gary S | Method and system for data backup |
US20070136741A1 (en) * | 2005-12-09 | 2007-06-14 | Keith Stattenfield | Methods and systems for processing content |
Non-Patent Citations (1)
Title |
---|
International Search Report and Written Opinion Of The International Searching Authority dated Aug. 8, 2007, for related PCT Application No. PCT/US2007/002684. |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9037545B2 (en) | 2006-05-05 | 2015-05-19 | Hybir Inc. | Group based complete and incremental computer file backup system, process and apparatus |
US9679146B2 (en) | 2006-05-05 | 2017-06-13 | Hybir Inc. | Group based complete and incremental computer file backup system, process and apparatus |
US10671761B2 (en) | 2006-05-05 | 2020-06-02 | Hybir Inc. | Group based complete and incremental computer file backup system, process and apparatus |
US7925630B1 (en) * | 2007-03-30 | 2011-04-12 | Symantec Corporation | Method of inserting a validated time-image on the primary CDP subsystem in a continuous data protection and replication (CDP/R) subsystem |
US8060479B1 (en) * | 2008-03-28 | 2011-11-15 | Symantec Corporation | Systems and methods for transparently restoring data using file streaming |
US20130311765A1 (en) * | 2012-05-17 | 2013-11-21 | Sony Computer Entertainment Inc. | Information processing apparatus, data generation method, information processing method, and information processing system |
US9231927B2 (en) * | 2012-05-17 | 2016-01-05 | Sony Corporation | Information processing apparatus, data generation method, information processing method, and information processing system for updating and verifying software programs |
Also Published As
Publication number | Publication date |
---|---|
WO2007089861A3 (en) | 2007-10-11 |
US20070179998A1 (en) | 2007-08-02 |
WO2007089861A2 (en) | 2007-08-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US7676509B2 (en) | Methods and apparatus for modifying a backup data stream including a set of validation bytes for each data block to be provided to a fixed position delta reduction backup application | |
US7761766B2 (en) | Methods and apparatus for modifying a backup data stream including logical partitions of data blocks to be provided to a fixed position delta reduction backup application | |
US8165221B2 (en) | System and method for sampling based elimination of duplicate data | |
US9690802B2 (en) | Stream locality delta compression | |
US9454318B2 (en) | Efficient data storage system | |
US7478113B1 (en) | Boundaries | |
US8527455B2 (en) | Seeding replication | |
US8751462B2 (en) | Delta compression after identity deduplication | |
US7730031B2 (en) | Method and system for updating an archive of a computer file | |
US8200933B2 (en) | System and method for removing a storage server in a distributed column chunk data store | |
US7305532B2 (en) | Efficient data storage system | |
US8849772B1 (en) | Data replication with delta compression | |
US9098513B1 (en) | Methods and systems for differencing orderly dependent files | |
US7574418B1 (en) | Method and apparatus for storing composite data streams | |
US7631144B1 (en) | Write latency efficient storage system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: EVAULT, INC., CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:SWANEPOEL, JACQUES DIEDERIK;FU, GUANGSHENG;REEL/FRAME:017542/0745 Effective date: 20060131 Owner name: EVAULT, INC.,CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:SWANEPOEL, JACQUES DIEDERIK;FU, GUANGSHENG;REEL/FRAME:017542/0745 Effective date: 20060131 |
|
AS | Assignment |
Owner name: I365 INC., CALIFORNIA Free format text: CHANGE OF NAME;ASSIGNOR:EVAULT, INC.;REEL/FRAME:022634/0047 Effective date: 20080923 Owner name: I365 INC.,CALIFORNIA Free format text: CHANGE OF NAME;ASSIGNOR:EVAULT, INC.;REEL/FRAME:022634/0047 Effective date: 20080923 |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
AS | Assignment |
Owner name: EVAULT, INC., CALIFORNIA Free format text: CHANGE OF NAME;ASSIGNOR:I365 INC.;REEL/FRAME:027508/0769 Effective date: 20111220 |
|
FEPP | Fee payment procedure |
Free format text: PAT HOLDER NO LONGER CLAIMS SMALL ENTITY STATUS, ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: STOL); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
FPAY | Fee payment |
Year of fee payment: 4 |
|
SULP | Surcharge for late payment | ||
AS | Assignment |
Owner name: CARBONITE GMBH, SWITZERLAND Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:EVAULT, INC.;REEL/FRAME:037617/0911 Effective date: 20160105 |
|
MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 8TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1552) Year of fee payment: 8 |
|
AS | Assignment |
Owner name: SILICON VALLEY BANK, MASSACHUSETTS Free format text: SECURITY INTEREST;ASSIGNOR:CARBONITE, INC.;REEL/FRAME:045640/0335 Effective date: 20180319 |
|
AS | Assignment |
Owner name: CARBONITE, INC., MASSACHUSETTS Free format text: TERMINATION OF PATENT SECURITY AGREEMENT FILED AT R/F 045640/0335;ASSIGNOR:SILICON VALLEY BANK, AS ADMINISTRATIVE AGENT;REEL/FRAME:048702/0929 Effective date: 20190326 |
|
MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 12TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1553); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY Year of fee payment: 12 |
|
AS | Assignment |
Owner name: OPEN TEXT INC., CALIFORNIA Free format text: ASSIGNMENT AND ASSUMPTION AGREEMENT;ASSIGNOR:CARBONITE, LLC;REEL/FRAME:065222/0310 Effective date: 20221001 Owner name: CARBONITE, LLC, MASSACHUSETTS Free format text: CERTIFICATE OF CONVERSION;ASSIGNOR:CARBONITE, INC.;REEL/FRAME:065222/0303 Effective date: 20220929 |