US20090234799A1 - Efficient processing of queries in federated database systems - Google Patents

Efficient processing of queries in federated database systems Download PDF

Info

Publication number
US20090234799A1
US20090234799A1 US12/046,273 US4627308A US2009234799A1 US 20090234799 A1 US20090234799 A1 US 20090234799A1 US 4627308 A US4627308 A US 4627308A US 2009234799 A1 US2009234799 A1 US 2009234799A1
Authority
US
United States
Prior art keywords
federated
server
subquery
source server
results
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
US12/046,273
Other versions
US8538985B2 (en
Inventor
Anjali Betawadkar-Norwood
Hamid Pirahesh
David Everett Simmen
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
International Business Machines Corp
Original Assignee
International Business Machines Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by International Business Machines Corp filed Critical International Business Machines Corp
Priority to US12/046,273 priority Critical patent/US8538985B2/en
Assigned to INTERNATIONAL BUSINESS MACHINES CORPORATION reassignment INTERNATIONAL BUSINESS MACHINES CORPORATION ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: SIMMEN, DAVID EVERETT, BETAWADKAR-NORWOOD, ANJALI, PIRAHESH, HAMID
Publication of US20090234799A1 publication Critical patent/US20090234799A1/en
Application granted granted Critical
Publication of US8538985B2 publication Critical patent/US8538985B2/en
Expired - Fee Related legal-status Critical Current
Adjusted expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2471Distributed queries

Definitions

  • a federated database system is a type of database management (DBMS) system which transparently integrates multiple autonomous database systems, referred to below as source servers, into a single federated database.
  • the source servers are interconnected through a computer network and can be geographically decentralized. Since the constituent database systems remain autonomous, a federated database system is often a viable alternative to merging together several disparate databases.
  • federated database systems can provide a uniform front-end user interface, thereby enabling users and clients to store data in and retrieve data from multiple non-contiguous source servers with a single query, even if the constituent source servers are heterogeneous.
  • the federated database management system receives an query from a user or client that references tables stored and managed by one or more source servers, optimizes the query into subqueries that can be executed by those source servers, and coordinates the execution of the received query by distributing the subqueries to the servers for execution, and by combining subquery results into a result for the received query that is returned to the querying user or client.
  • Some common examples of source servers include the DB2 z Series and the Informix IDS series, both available from International Business Machines Corporation of Armonk, N.Y.
  • a problem with federated query processing is that data from different source servers must be combined on the federated server.
  • the movement of data from the source servers to the federated server requires the federated database system to do a significant amount of processing, and hence use a commensurate amount of system resources, given that there is likely to be a large amount of data transferred from the source servers.
  • the computer network might become a bottleneck due to the large amount of data being moved across the network from the source servers to the federated server.
  • the invention provides methods and apparatus, including computer program products, implementing and using techniques for processing a federated query in a federated database system.
  • a federated query is received at a federated database server.
  • a federated query execution plan is generated based on the received federated query.
  • the federated query execution plan defines one or more source servers of the federated database and a unique subquery to be executed on each of the source servers.
  • the subqueries are distributed to the source servers in accordance with the federated query execution plan.
  • the respective subqueries are executed asynchronously at the source servers.
  • the subquery results are passed to a first designated source server defined in the federated query execution plan.
  • the subquery results are joined and aggregated at the first designated source server into a final query result.
  • the final query result is returned to the federated database server.
  • the invention can be implemented to include one or more of the following advantages.
  • the sideways data movement and distributed federated tables described herein can fully exploit the power of existing backend database servers, and can thus achieve orders of magnitude better performance compared to conventional federated database systems, with significantly more efficient user of resources. Only a few modifications need to be made to existing federated database systems, such as the installation of a general purpose messaging system, a stored procedure, and user-defined scalar functions on the source servers.
  • the invention also allows for “downwards” data movement, which in essence is a special case of the sideways data movement.
  • the federated server may host a small table that needs to be joined with a large table on a data source and aggregation needs to be done. Without the techniques described herein, the strategy would be to bring the large table to the federated server and perform join and aggregation there. This would result in large data movement over the network and federated server is doing the heavy processing.
  • Downwards data transfer moves the small table from federated server to the data source, where the join and aggregation with the large table can be performed. After the join and aggregation, only a small result set is moved back to the federated server, thereby saving valuable bandwidth and using the better processing power of the data source, as compared to the federated server.
  • FIG. 1 shows a schematic view of subqueries and data movement in a prior art federated database management system.
  • FIG. 2 shows the subqueries and data movement of FIG. 1 , but using the sideways data movement, in accordance with one embodiment of the invention.
  • FIG. 3 shows the extensions to the sideways data movement of FIG. 2 in a scenario with distributed federated tables, in accordance with one embodiment of the invention.
  • the various embodiments of the invention described herein pertain to query processing in federated database systems.
  • the various embodiments of the invention relate to the optimization and efficient execution of federated queries that access, join, aggregate and otherwise augment data that is distributed across multiple heterogeneous database management systems, herein referred to as source servers.
  • federated queries can be optimized into query execution plans that, when executed, can move data asynchronously and directly between source servers without going through the federated database management system. This asynchronous and direct transfer of data between source servers will be referred to hereinafter as “sideways data movement”.
  • federated queries can be processed in parallel through the optimization and execution of queries that refer to “distributed federated tables,” that is, database tables that are replicated, partitioned, or otherwise distributed across multiple source servers.
  • distributed federated tables that is, database tables that are replicated, partitioned, or otherwise distributed across multiple source servers.
  • One part of this aspect of the various implementations of the invention extends the sideways data movement to allow for data transfer to be directed or replicated to one of several alternative source servers, based upon distribution criteria received by a query optimizer component of the system.
  • Another part of this second aspect of the various implementations of the invention allows for the declaration of metadata that defines distribution criteria of stored distributed federated tables to a query optimizer component of the system. This metadata is used to optimize queries involving these tables, and in particular, to build execution plans with sideways data movement wherein data transfer is dynamically directed or replicated, or otherwise distributed.
  • FIG. 1 shows a schematic view of a prior art federated database system ( 100 ), in which the query of Table 1 can be conducted.
  • the information about sales in US cities is in a db2z.sales table managed by a DB2 for z series source server ( 102 ), whereas sales information for European cities is in an oracle.sales table managed by an Oracle source server ( 104 ).
  • the federated database system ( 100 ) would typically need to: execute a first subquery on the Oracle source server ( 104 ) to obtain the sales information for Brussels; execute a second subquery on the zSeries source server ( 102 ) to obtain sales results for Fresno; execute a third subquery on the federation server ( 106 ) that joins and aggregates the results of the first two subqueries.
  • the Oracle source server ( 104 ) and the zSeries source server ( 102 ) cannot communicate directly with each other.
  • a more efficient strategy for the federated query above is to move the subquery results for the Fresno sales directly from the Oracle source server ( 104 ) to the DB2 for z series source server ( 102 ), and to do the join and aggregation there, as will be discussed in further detail below.
  • the network traffic will be reduced, the computing power of the highly tuned DB2 for z series source server ( 102 ) is fully exploited.
  • This movement of data between source servers is referred to as a “sideways data movement.”
  • Federated query execution strategies that exploit sideways data movement can result in orders of magnitude better performance, as well as significantly better use of existing systems resources, thus allowing the federation server ( 106 ) to act more as a virtual database management system that is focused more on the optimization and coordination aspect of federated query processing and less on the heavy lifting.
  • federated queries can be optimized into query execution plans that exploit sideways data movement. That is, when executed, the query execution plans can move data asynchronously and directly between servers without going through the federation server ( 106 ). As will be discussed in further detail below, this approach to sideways data movement requires only the installation of a general purpose messaging system, a stored procedure, and a table functions on the source servers.
  • FIG. 2 the principles in accordance with one embodiment of the invention are illustrated in FIG. 2 .
  • the federation server ( 106 ) will again execute a subquery on the Oracle source server ( 104 ) for obtaining the sales information for Brussels.
  • the results of the subquery will not be shipped back the federation server ( 106 ).
  • the results of the subquery will be shipped through a message queue ( 202 ) to the source server running DB2 for z Series ( 102 ), that is, a sideways data movement is performed.
  • the message queue ( 202 ) can be managed by any general purpose messaging system, such as the WebSphere MQ series, which is available from International Business Machines Corporation of Armonk, N.Y.
  • the federation server ( 106 ) executes a second subquery on the DB2 for z Series source server ( 102 ). This second subquery retrieves the results of the first subquery from the message queue ( 202 ), joins that data with the Fresno sales data of the DB2 for z Series source server ( 102 ), and finally aggregates the joined data to achieve the final query result, which is returned to the federation server ( 106 ).
  • this execution strategy allows for less network traffic as only aggregated data is returned to the federation server ( 106 ). Moreover, the execution strategy makes better use of existing system resources, as the heavy lifting is done by DB2 for z Series ( 102 ), which typically has more processing power than the federation server ( 106 ). It should also be noted that the subqueries are executed asynchronously and in parallel, which allows for an overall decrease in elapsed execution time.
  • the SEND procedure ( 204 ) receives a query and the name of a message queue as input.
  • the SEND procedure ( 204 ) creates the message queue ( 202 ), runs the subquery, and inserts the subquery results into the message queue ( 202 ).
  • the insertion of subquery results into the message queue ( 202 ) is done in a pipelined fashion, that is, as rows are produced.
  • the special table function referred to in FIG. 2 as the RECEIVE table function ( 206 ), takes a description of its output schema, and the name of a message queue ( 202 ), as input.
  • the table function ( 206 ) receives input data from the specified message queue ( 202 ), formats that data into rows and columns of appropriate types as per the provided schema, and returns the formatted results as output.
  • sales information for European products are all contained in a single table managed by the Oracle source server, and the sales information for US products is distributed across tables managed by the IDS and DB2 for z Series source servers.
  • the query is the same as in Table 1 above.
  • FIG. 3 shows the scenario described above and illustrates the extensions to the sideways data movement execution strategy that are required to distribute data dynamically to the appropriate source server in accordance with one implementation of the invention.
  • the Oracle source server ( 104 ) and the DB2 for z Series source server ( 102 ) are running essentially the same subqueries that were illustrated and discussed above with respect to FIG. 2 .
  • the SEND stored procedure ( 204 ) and the RECEIVE table function ( 206 ) receive the subqueries to execute, the message queues ( 202 , 302 ) where subquery results will be inserted and received, as well as any needed output schema information.
  • the SEND stored procedure ( 204 ) on the Oracle source server ( 104 ) is now receiving an additional argument, called “pid_hash_func.”
  • This additional argument specifies the name of the distribution function to be used to direct rows from the Oracle source server ( 104 ), to either the DB2 for z Series source server ( 102 ) or the IDS source server ( 304 ).
  • the SEND table function ( 204 ) uses this function to decide whether a given result row is inserted into a message queue ( 202 ) bound for the DB2 for z Series source server ( 102 ), or a message queue ( 302 ) bound for the IDS source server ( 304 ).
  • the names of the different outbound message queues are also provided as input to the SEND function ( 204 ).
  • the query optimizer component of the system uses additional metadata in order to generate such an execution strategy involving sideways data movement and distributed federated tables.
  • This additional metadata includes “server groups”, “distribution functions”, and “partitioning keys”, which together define a distributed federated table to the optimizer component.
  • a sever group represents a set of source servers over which partitions, or replicas, of distributed federated tables reside.
  • Table 2 below shows the declaration of the server group “sales_group” which includes source severs named “DB2Z” and “IDS”.
  • a federated distributed table is declared by specifying a server group, distribution function, and (if needed) partitioning keys.
  • the distribution function and partitioning keys essentially indicate how rows are distributed across the source servers identified by the server group.
  • Table 3 shows an example of the declaration of a federated distributed table that is partitioned across a sever group called “sales_group” using a distribution function “part-prod” applied to the partitioning key attribute “PRODID”.
  • the distribution function part-prod is a “sourced function” that is declared to the federated server—so that the federated server knows how to find and invoke the function on each source server—in a separate step not shown in the example of FIG. 3 .
  • Distributed federated tables might also be declared as replicated across source servers as illustrated in Table 4 below. Clearly no actual distribution function is needed on the source servers to implement dynamic replication.
  • the SEND procedure ( 204 ) simply inserts a given row into all identified outbound message queues whenever replication is required.
  • the information received by the optimizer about replicas and distributed tables can be used in a variety of ways. For illustration purposes, one example is as follows. Assume that the source servers are numbered 1, 2, 3, and so on. Furthermore, assume that the optimizer knows, from received metadata, that one table, say Table T 1 , is distributed on source servers 1 and 2 using a prod_key attribute (for example, that odd prod_keys reside on source server 1 and even prod_keys reside on source server 2 ), and another table T 2 is also distributed the same way on the same source servers. Then any join between tables T 1 and T 2 that looks for matching prod_key attributes in the two tables can be ‘collocated’. This means that no data transfer needs to happen in order to perform this join operation.
  • the invention can take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment containing both hardware and software elements.
  • the invention is implemented in software, which includes but is not limited to firmware, resident software, microcode, etc.
  • the invention can take the form of a computer program product accessible from a computer-usable or computer-readable medium providing program code for use by or in connection with a computer or any instruction execution system.
  • a computer-usable or computer readable medium can be any apparatus that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
  • the medium can be an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system (or apparatus or device) or a propagation medium.
  • Examples of a computer-readable medium include a semiconductor or solid state memory, magnetic tape, a removable computer diskette, a random access memory (RAM), a read-only memory (ROM), a rigid magnetic disk and an optical disk.
  • Current examples of optical disks include compact disk—read only memory (CD-ROM), compact disk—read/write (CD-R/W) and DVD.
  • a data processing system suitable for storing and/or executing program code will include at least one processor coupled directly or indirectly to memory elements through a system bus.
  • the memory elements can include local memory employed during actual execution of the program code, bulk storage, and cache memories which provide temporary storage of at least some program code in order to reduce the number of times code must be retrieved from bulk storage during execution.
  • I/O devices including but not limited to keyboards, displays, pointing devices, etc.
  • I/O controllers can be coupled to the system either directly or through intervening I/O controllers.
  • Network adapters may also be coupled to the system to enable the data processing system to become coupled to other data processing systems or remote printers or storage devices through intervening private or public networks.
  • Modems, cable modem and Ethernet cards are just a few of the currently available types of network adapters.

Abstract

Methods and apparatus, including computer program products, implementing and using techniques for processing a federated query in a federated database system. A federated query is received at a federated database server. A federated query execution plan is generated based on the received federated query. The federated query execution plan defines one or more source servers of the federated database and a unique subquery to be executed on each of the source servers. The subqueries are distributed to the source servers in accordance with the federated query execution plan. The respective subqueries are executed asynchronously at the source servers. The subquery results are passed to a first designated source server defined in the federated query execution plan. The subquery results are joined and aggregated at the first designated source server into a final query result. The final query result is returned to the federated database server.

Description

    BACKGROUND
  • This invention relates to query processing in federated database systems. A federated database system is a type of database management (DBMS) system which transparently integrates multiple autonomous database systems, referred to below as source servers, into a single federated database. The source servers are interconnected through a computer network and can be geographically decentralized. Since the constituent database systems remain autonomous, a federated database system is often a viable alternative to merging together several disparate databases.
  • Through data abstraction, federated database systems can provide a uniform front-end user interface, thereby enabling users and clients to store data in and retrieve data from multiple non-contiguous source servers with a single query, even if the constituent source servers are heterogeneous. The federated database management system receives an query from a user or client that references tables stored and managed by one or more source servers, optimizes the query into subqueries that can be executed by those source servers, and coordinates the execution of the received query by distributing the subqueries to the servers for execution, and by combining subquery results into a result for the received query that is returned to the querying user or client. Some common examples of source servers include the DB2 z Series and the Informix IDS series, both available from International Business Machines Corporation of Armonk, N.Y.
  • A problem with federated query processing is that data from different source servers must be combined on the federated server. The movement of data from the source servers to the federated server requires the federated database system to do a significant amount of processing, and hence use a commensurate amount of system resources, given that there is likely to be a large amount of data transferred from the source servers. Moreover, the computer network might become a bottleneck due to the large amount of data being moved across the network from the source servers to the federated server. Thus, there is a need for more efficient processing techniques for federated queries.
  • SUMMARY
  • In general, in one aspect, the invention provides methods and apparatus, including computer program products, implementing and using techniques for processing a federated query in a federated database system. A federated query is received at a federated database server. A federated query execution plan is generated based on the received federated query. The federated query execution plan defines one or more source servers of the federated database and a unique subquery to be executed on each of the source servers. The subqueries are distributed to the source servers in accordance with the federated query execution plan. The respective subqueries are executed asynchronously at the source servers. The subquery results are passed to a first designated source server defined in the federated query execution plan. The subquery results are joined and aggregated at the first designated source server into a final query result. The final query result is returned to the federated database server.
  • The invention can be implemented to include one or more of the following advantages. The sideways data movement and distributed federated tables described herein can fully exploit the power of existing backend database servers, and can thus achieve orders of magnitude better performance compared to conventional federated database systems, with significantly more efficient user of resources. Only a few modifications need to be made to existing federated database systems, such as the installation of a general purpose messaging system, a stored procedure, and user-defined scalar functions on the source servers.
  • The invention also allows for “downwards” data movement, which in essence is a special case of the sideways data movement. For example, in some cases, the federated server may host a small table that needs to be joined with a large table on a data source and aggregation needs to be done. Without the techniques described herein, the strategy would be to bring the large table to the federated server and perform join and aggregation there. This would result in large data movement over the network and federated server is doing the heavy processing. Downwards data transfer, on the other hand, in accordance with the various implementations of the invention (or, expressed differently, sideways data transfer where the federated server is one of the participants in the transfer) moves the small table from federated server to the data source, where the join and aggregation with the large table can be performed. After the join and aggregation, only a small result set is moved back to the federated server, thereby saving valuable bandwidth and using the better processing power of the data source, as compared to the federated server.
  • The details of one or more embodiments of the invention are set forth in the accompanying drawings and the description below. Other features and advantages of the invention will be apparent from the description and drawings, and from the claims.
  • DESCRIPTION OF DRAWINGS
  • FIG. 1 shows a schematic view of subqueries and data movement in a prior art federated database management system.
  • FIG. 2 shows the subqueries and data movement of FIG. 1, but using the sideways data movement, in accordance with one embodiment of the invention.
  • FIG. 3 shows the extensions to the sideways data movement of FIG. 2 in a scenario with distributed federated tables, in accordance with one embodiment of the invention.
  • Like reference symbols in the various drawings indicate like elements.
  • DETAILED DESCRIPTION
  • The various embodiments of the invention described herein pertain to query processing in federated database systems. In particular, the various embodiments of the invention relate to the optimization and efficient execution of federated queries that access, join, aggregate and otherwise augment data that is distributed across multiple heterogeneous database management systems, herein referred to as source servers.
  • In accordance with one aspect of the invention, federated queries can be optimized into query execution plans that, when executed, can move data asynchronously and directly between source servers without going through the federated database management system. This asynchronous and direct transfer of data between source servers will be referred to hereinafter as “sideways data movement”.
  • In accordance with another aspect of the invention, federated queries can be processed in parallel through the optimization and execution of queries that refer to “distributed federated tables,” that is, database tables that are replicated, partitioned, or otherwise distributed across multiple source servers. One part of this aspect of the various implementations of the invention extends the sideways data movement to allow for data transfer to be directed or replicated to one of several alternative source servers, based upon distribution criteria received by a query optimizer component of the system. Another part of this second aspect of the various implementations of the invention allows for the declaration of metadata that defines distribution criteria of stored distributed federated tables to a query optimizer component of the system. This metadata is used to optimize queries involving these tables, and in particular, to build execution plans with sideways data movement wherein data transfer is dynamically directed or replicated, or otherwise distributed.
  • Various implementations of the invention will now be described by way of example and with reference to the drawings. In particular the following description will set forth how the sideways data movement and distributed federated tables can be advantageously used in a scenario where a user wishes to aggregate product sales data. As the skilled reader will realize, product sales data is merely only one use example, and the underlying principles of the various embodiments of the invention as described herein can be used in a wide range of applications and for many different types of data.
  • Sideways Data Movement
  • As was discussed above, a problem in existing federated query processing is that data from different source servers must be combined on the federated server. Consider, for example, the example query shown in Table 1 below, which seeks the aggregate sales of products sold in Fresno, Calif., that were also sold in Brussels, Belgium:
  • TABLE 1
    select sum(zs.price), zs.prodid
    from oracle.sales o, db2z.sales zs
    where zs.city = ‘Fresno’ and o.city=’Brussels’ and o.prodid = zs.prodid
    group by zs.prodid
  • FIG. 1 shows a schematic view of a prior art federated database system (100), in which the query of Table 1 can be conducted. As can be seen in FIG. 1, the information about sales in US cities is in a db2z.sales table managed by a DB2 for z series source server (102), whereas sales information for European cities is in an oracle.sales table managed by an Oracle source server (104). In order to obtain the query result, the federated database system (100) would typically need to: execute a first subquery on the Oracle source server (104) to obtain the sales information for Brussels; execute a second subquery on the zSeries source server (102) to obtain sales results for Fresno; execute a third subquery on the federation server (106) that joins and aggregates the results of the first two subqueries. As can be seen in FIG. 1, the Oracle source server (104) and the zSeries source server (102) cannot communicate directly with each other.
  • A more efficient strategy for the federated query above, is to move the subquery results for the Fresno sales directly from the Oracle source server (104) to the DB2 for z series source server (102), and to do the join and aggregation there, as will be discussed in further detail below. As a result, the network traffic will be reduced, the computing power of the highly tuned DB2 for z series source server (102) is fully exploited. This movement of data between source servers is referred to as a “sideways data movement.” Federated query execution strategies that exploit sideways data movement can result in orders of magnitude better performance, as well as significantly better use of existing systems resources, thus allowing the federation server (106) to act more as a virtual database management system that is focused more on the optimization and coordination aspect of federated query processing and less on the heavy lifting.
  • In accordance with some implementations of the invention, federated queries can be optimized into query execution plans that exploit sideways data movement. That is, when executed, the query execution plans can move data asynchronously and directly between servers without going through the federation server (106). As will be discussed in further detail below, this approach to sideways data movement requires only the installation of a general purpose messaging system, a stored procedure, and a table functions on the source servers.
  • Using the same example as above, the principles in accordance with one embodiment of the invention are illustrated in FIG. 2. As can be seen in FIG. 2, the federation server (106) will again execute a subquery on the Oracle source server (104) for obtaining the sales information for Brussels. However, in this case, the results of the subquery will not be shipped back the federation server (106). Instead, the results of the subquery will be shipped through a message queue (202) to the source server running DB2 for z Series (102), that is, a sideways data movement is performed.
  • The message queue (202) can be managed by any general purpose messaging system, such as the WebSphere MQ series, which is available from International Business Machines Corporation of Armonk, N.Y. In parallel, the federation server (106) executes a second subquery on the DB2 for z Series source server (102). This second subquery retrieves the results of the first subquery from the message queue (202), joins that data with the Fresno sales data of the DB2 for z Series source server (102), and finally aggregates the joined data to achieve the final query result, which is returned to the federation server (106).
  • As the skilled person realizes, this execution strategy allows for less network traffic as only aggregated data is returned to the federation server (106). Moreover, the execution strategy makes better use of existing system resources, as the heavy lifting is done by DB2 for z Series (102), which typically has more processing power than the federation server (106). It should also be noted that the subqueries are executed asynchronously and in parallel, which allows for an overall decrease in elapsed execution time.
  • In accordance with this implementation, besides the messaging system (202), only a special stored procedure and a special table function need to be installed on each of the source servers. The special stored procedure, referred to in FIG. 2 as the SEND procedure (204), receives a query and the name of a message queue as input. When executed, the SEND procedure (204) creates the message queue (202), runs the subquery, and inserts the subquery results into the message queue (202). In some implementations, the insertion of subquery results into the message queue (202) is done in a pipelined fashion, that is, as rows are produced.
  • The special table function, referred to in FIG. 2 as the RECEIVE table function (206), takes a description of its output schema, and the name of a message queue (202), as input. When invoked during the process of subquery execution, the table function (206) receives input data from the specified message queue (202), formats that data into rows and columns of appropriate types as per the provided schema, and returns the formatted results as output.
  • It should be noted that virtually all conventional major database management systems support table functions and stored procedures. Moreover, most enterprise software stacks typically include a general purpose messaging system. Consequently, the approach to sideways data movement described in the above implementation is both feasible and practical. A person skilled in the art of database query processing, and in particular in federated query processing, can readily appreciate the performance benefits of sideways data movement over the existing systems, as well as the elegance and feasibility of the solutions described in the above implementations.
  • Distributed Federated Tables
  • There are common business scenarios in which multiple tables residing on different source servers must be treated as the same logical table. These scenarios often occur as a result of acquisitions and mergers. Such a scenario will now be illustrated by extending the above example.
  • In this extended example, it is assumed that the company interested in obtaining the aggregate sales results acquired a company that sold different products from the parent company. The sales data for the US product suite resides on a source server running Informix IDS. As a result, the US sales data for the merged companies is now effectively partitioned between the DB2 for z Series sales table and the new IDS sales table. Since the product identification number (PRODID) makes it possible to distinguish between products sold by the parent company and products sold by the acquired company, it is possible to write a function that can examine PRODID attribute values and determine whether a row belongs to the IDS partition (i.e. the product was sold by the newly acquired company) or the partition DB2 for z Series partition (containing sales information for products of the parent company).
  • Furthermore, it is assumed that, because of the expense and complexity involved in merging the data in the DB2 for z Series and IDS source servers, the US division of the company has decided to leave the systems physically separated. The European division of the company, on the other hand, was able to move any European sales relating to products of the acquired company into the Oracle source server.
  • Thus, in a nutshell, in this example sales information for European products are all contained in a single table managed by the Oracle source server, and the sales information for US products is distributed across tables managed by the IDS and DB2 for z Series source servers. However, as was discussed above, it is possible to write a function that examines a value for the PRODID attribute and determines whether the corresponding product was sold by the parent company or the acquired company. The query is the same as in Table 1 above.
  • Revisiting the sideways data movement execution strategy illustrated in FIG. 2, now, it is not only necessary to move rows pertaining to Brussels sales results from the Oracle source server (104) directly to the DB2 for z Series server (102), but it might also be necessary to move rows to the IDS source server, depending on the value of the PRODID attribute of a given row. The aforementioned distribution function, which can examine a PRODID attribute value and determine whether the corresponding product was sold by the acquired company or by the parent company, can also be used to determine to which source server a particular row should be directed.
  • Tables that are logically the same, but physically distributed across multiple federated source servers are referred to herein as “distributed federated tables.” FIG. 3 shows the scenario described above and illustrates the extensions to the sideways data movement execution strategy that are required to distribute data dynamically to the appropriate source server in accordance with one implementation of the invention.
  • As can be seen in FIG. 3, the Oracle source server (104) and the DB2 for z Series source server (102) are running essentially the same subqueries that were illustrated and discussed above with respect to FIG. 2. Just like before, the SEND stored procedure (204) and the RECEIVE table function (206) receive the subqueries to execute, the message queues (202, 302) where subquery results will be inserted and received, as well as any needed output schema information. In the scenario illustrated in FIG. 3, though, there is also an IDS source server (304) executing a copy of the same subquery that is executed by the DB2 for z Series source server (102). Moreover, the SEND stored procedure (204) on the Oracle source server (104) is now receiving an additional argument, called “pid_hash_func.” This additional argument specifies the name of the distribution function to be used to direct rows from the Oracle source server (104), to either the DB2 for z Series source server (102) or the IDS source server (304). The SEND table function (204) uses this function to decide whether a given result row is inserted into a message queue (202) bound for the DB2 for z Series source server (102), or a message queue (302) bound for the IDS source server (304). The names of the different outbound message queues are also provided as input to the SEND function (204).
  • The query optimizer component of the system (not shown) uses additional metadata in order to generate such an execution strategy involving sideways data movement and distributed federated tables. This additional metadata includes “server groups”, “distribution functions”, and “partitioning keys”, which together define a distributed federated table to the optimizer component. A sever group represents a set of source servers over which partitions, or replicas, of distributed federated tables reside. Table 2 below shows the declaration of the server group “sales_group” which includes source severs named “DB2Z” and “IDS”.
  • TABLE 2
    CREATE SERVER GROUP sales_group ON SOURCE
    SERVERS (DB2Z, IDS)
  • A federated distributed table is declared by specifying a server group, distribution function, and (if needed) partitioning keys. The distribution function and partitioning keys essentially indicate how rows are distributed across the source servers identified by the server group. Table 3 below shows an example of the declaration of a federated distributed table that is partitioned across a sever group called “sales_group” using a distribution function “part-prod” applied to the partitioning key attribute “PRODID”. The distribution function part-prod is a “sourced function” that is declared to the federated server—so that the federated server knows how to find and invoke the function on each source server—in a separate step not shown in the example of FIG. 3.
  • TABLE 3
    CREATE TABLE us-sales (PRODID INT, PRODNAME
    VARCHAR(1,000),...)  IN SERVER GROUP sales_group DISTRIBUTE
    BY part-prod PARTITIONING KEYS (PRODID).
  • Distributed federated tables might also be declared as replicated across source servers as illustrated in Table 4 below. Clearly no actual distribution function is needed on the source servers to implement dynamic replication. The SEND procedure (204) simply inserts a given row into all identified outbound message queues whenever replication is required.
  • TABLE 4
    CREATE TABLE us-sales (PRODID INT, PRODNAME
    VARCHAR(1,000),...)  IN SERVER GROUP sales_group DISTRIBUTE
    BY REPLICATION.
  • The information received by the optimizer about replicas and distributed tables can be used in a variety of ways. For illustration purposes, one example is as follows. Assume that the source servers are numbered 1, 2, 3, and so on. Furthermore, assume that the optimizer knows, from received metadata, that one table, say Table T1, is distributed on source servers 1 and 2 using a prod_key attribute (for example, that odd prod_keys reside on source server 1 and even prod_keys reside on source server 2), and another table T2 is also distributed the same way on the same source servers. Then any join between tables T1 and T2 that looks for matching prod_key attributes in the two tables can be ‘collocated’. This means that no data transfer needs to happen in order to perform this join operation. The reason for this is that all odd prod_key attributes from both tables T1 and T2 can be found on source server and all even keys can be found on source server 2. As the skilled person realizes, voiding the data transfer altogether is even better than making the data transfer more efficient.
  • Another example showing how the optimizer uses the replicated nature of tables is as follows. Imagine that a table T is distributed on servers 1 and 2 ) using a prod_key attribute and a table R that is replicated (that is, a full copy of R exists) on servers 1 and 2. Again, any join of T and R would not require any data transfer since table T will find all the necessary data needed for a join with R on both the servers. These optimization strategies exist to deal with tables distributed on multiple nodes in a massively parallel processing (MPP) system. The MPP system is homogeneous with all nodes exactly the same version/level of DB2. These same optimizations can also be extended and exploited to optimize the tables distributed/replicated on heterogeneous source servers. It should be noted again, that these are merely two examples, and that many variations of optimizations can be contemplated by people of ordinary skill in the art.
  • The invention can take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment containing both hardware and software elements. In a preferred embodiment, the invention is implemented in software, which includes but is not limited to firmware, resident software, microcode, etc.
  • Furthermore, the invention can take the form of a computer program product accessible from a computer-usable or computer-readable medium providing program code for use by or in connection with a computer or any instruction execution system. For the purposes of this description, a computer-usable or computer readable medium can be any apparatus that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
  • The medium can be an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system (or apparatus or device) or a propagation medium. Examples of a computer-readable medium include a semiconductor or solid state memory, magnetic tape, a removable computer diskette, a random access memory (RAM), a read-only memory (ROM), a rigid magnetic disk and an optical disk. Current examples of optical disks include compact disk—read only memory (CD-ROM), compact disk—read/write (CD-R/W) and DVD.
  • A data processing system suitable for storing and/or executing program code will include at least one processor coupled directly or indirectly to memory elements through a system bus. The memory elements can include local memory employed during actual execution of the program code, bulk storage, and cache memories which provide temporary storage of at least some program code in order to reduce the number of times code must be retrieved from bulk storage during execution.
  • Input/output or I/O devices (including but not limited to keyboards, displays, pointing devices, etc.) can be coupled to the system either directly or through intervening I/O controllers.
  • Network adapters may also be coupled to the system to enable the data processing system to become coupled to other data processing systems or remote printers or storage devices through intervening private or public networks. Modems, cable modem and Ethernet cards are just a few of the currently available types of network adapters.
  • A number of implementations of the invention have been described. Nevertheless, it will be understood that various modifications may be made without departing from the spirit and scope of the invention. For example, in the above examples, the joining and aggregating have typically been done on a particular source server in the federated database system. However, it should be noted that any and all the source servers may perform these operations and pass the data on to other source servers using the sideways mechanism described above. Thus, the capabilities of the systems and methods described herein are certainly not limited to performing these operations only on the types of source servers described above. Accordingly, other embodiments are within the scope of the following claims.

Claims (25)

1. A computer-implemented method for processing a federated query in a federated database system, the method comprising:
receiving a federated query at a federated database server;
generating a federated query execution plan, based on the received federated query, the federated query execution plan defining one or more source servers of the federated database and a unique subquery to be executed on each of the defined one or more source servers;
distributing the subqueries to the one or more source servers in accordance with the federated query execution plan;
executing the respective subqueries asynchronously at the one or more source servers;
passing the subquery results to a first designated source server defined in the federated query execution plan;
joining and aggregating the subquery results at the first designated source server into a final query result; and
returning the final query result to the federated database server.
2. The method of claim 1, wherein passing the results comprises:
connecting a source server to the first designated source server using a message queue; and
inserting the subquery results from the source server into the message queue.
3. The method of claim 2, wherein inserting the subquery results in the message queue includes inserting the subquery results into the message queue as rows are produced.
4. The method of claim 1, wherein joining and aggregating comprises:
receiving subquery results from a specified message queue; and
formatting the received subquery results into rows and columns of appropriate types to generate a final query result.
5. The method of claim 1, wherein a database table is physically distributed across the first designated source server and a second designated source server, further comprising:
passing, based on parameters specified in the federated query, a first subset of the subquery results to the first designated source server and a second subset of the subquery results to the second designated source server;
joining and aggregating the first subset of the subquery results at the first designated source server into a first final query result;
joining and aggregating the second subset of the subquery results at the second designated source server into a second final query result;
returning the first and second final query results to the federated database server; and
combining the first and second final query results at the federated database server.
6. The method of claim 5, wherein passing the first and second subsets of the query results comprises:
determining whether a subquery result is to be sent to the first designated source server or to the second designated source server;
connecting a source server to the first designated source server using a first message queue;
connecting a source server to the second designated source server using a second message queue;
inserting the subquery results from the source server into the first message queue in response to determining that the subquery results should be sent to the first designated source server; and
inserting the subquery results from the source server into the second message queue in response to determining that the subquery should be sent to the second designated source server.
7. The method of claim 6, wherein placing the subquery results in the first and second message queues, respectively, includes inserting the subquery results into either the first message queue or into the second message queue as rows are produced.
8. The method of claim 5, wherein joining and aggregating the first subset comprises:
receiving the first subset of subquery results from the first message queue; and
formatting the received first subset of subquery results into rows and columns of appropriate types to generate a first final query result.
9. The method of claim 5, wherein joining and aggregating the second subset comprises:
receiving the second subset of subquery results from the first message queue; and
formatting the received second subset of subquery results into rows and columns of appropriate types to generate a second final query result.
10. The method of claim 5, further comprising:
replicating the physically distributed database table onto one or more additional source servers in the federated database system; and
using the information about the replicas of the distributed database table in generating the federated query execution plan.
11. The method of claim 5, wherein generating a federated query execution plan includes:
optimizing the federated query based on metadata to generate an execution strategy involving sideways data movement.
12. The method of claim 11, wherein the metadata includes one or more of: a server group representing a set of source servers over which partitions or replicas of the distributed database table reside, and a distribution function and a partitioning key indicating how rows of the distributed database table are distributed across the source servers identified by the server group.
13. A computer program product comprising a computer useable medium including a computer readable program, wherein the computer readable program when executed on a computer causes the computer to:
receive a federated query at a federated database server;
generate a federated query execution plan, based on the received federated query, the federated query execution plan defining one or more source servers of the federated database and a unique subquery to be executed on each of the defined one or more source servers;
distribute the subqueries to the one or more source servers in accordance with the federated query execution plan;
execute the respective subqueries asynchronously at the one or more source servers;
pass the subquery results to a first designated source server defined in the federated query execution plan;
join and aggregate the subquery results at the first designated source server into a final query result; and
return the final query result to the federated database server.
14. The computer program product of claim 13, wherein passing the results comprises:
connecting a source server to the first designated source server using a message queue; and
inserting the subquery results from the source server into the message queue.
15. The computer program product of claim 14, wherein inserting the subquery results in the message queue includes inserting the subquery results into the message queue as rows are produced.
16. The computer program product of claim 13, wherein joining and aggregating comprises:
receiving subquery results from a specified message queue; and
formatting the received subquery results into rows and columns of appropriate types to generate a final query result.
17. The computer program product of claim 13, wherein a database table is physically distributed across the first designated source server and a second designated source server, wherein the computer readable program when executed on a computer further causes the computer to:
pass, based on parameters specified in the federated query, a first subset of the subquery results to the first designated source server and a second subset of the subquery results to the second designated source server;
join and aggregate the first subset of the subquery results at the first designated source server into a first final query result;
join and aggregate the second subset of the subquery results at the second designated source server into a second final query result;
return the first and second final query results to the federated database server; and
combine the first and second final query results at the federated database server.
18. The computer program product of claim 17, wherein passing the first and second subsets of the query results comprises:
determining whether a subquery result is to be sent to the first designated source server or to the second designated source server;
connecting a source server to the first designated source server using a first message queue;
connecting a source server to the second designated source server using a second message queue;
inserting the subquery results from the source server into the first message queue in response to determining that the subquery results should be sent to the first designated source server; and
inserting the subquery results from the source server into the second message queue in response to determining that the subquery should be sent to the second designated source server.
19. The computer program product of claim 18, wherein placing the subquery results in the first and second message queues, respectively, includes inserting the subquery results into either the first message queue or into the second message queue as rows are produced.
20. The computer program product of claim 17, wherein joining and aggregating the first subset comprises:
receiving the first subset of subquery results from the first message queue; and
formatting the received first subset of subquery results into rows and columns of appropriate types to generate a first final query result.
21. The computer program product of claim 17, wherein joining and aggregating the second subset comprises:
receiving the second subset of subquery results from the first message queue; and
formatting the received second subset of subquery results into rows and columns of appropriate types to generate a second final query result.
22. The computer program product of claim 17, wherein the computer readable program when executed on a computer further causes the computer to:
replicate the physically distributed database table onto one or more additional source servers in the federated database system; and
use the information about the replicas of the distributed database table in generating the federated query execution plan.
23. The computer program product of claim 17, wherein generating a federated query execution plan includes:
optimizing the federated query based on metadata to generate an execution strategy involving sideways data movement.
24. The computer program product of claim 23, wherein the metadata includes one or more of: a server group representing a set of source servers over which partitions or replicas of the distributed database table reside, and a distribution function and a partitioning key indicating how rows of the distributed database table are distributed across the source servers identified by the server group.
25. A federated database system for processing federated queries, comprising:
means for receiving a federated query at a federated database server;
means for generating a federated query execution plan, based on the received federated query, the federated query execution plan defining one or more source servers of the federated database and a unique subquery to be executed on each of the defined one or more source servers;
means for distributing the subqueries to the one or more source servers in accordance with the federated query execution plan;
means for executing the respective subqueries asynchronously at the one or more source servers;
means for passing the subquery results to a first designated source server defined in the federated query execution plan;
means for joining and aggregating the subquery results at the first designated source server into a final query result; and
means for returning the final query result to the federated database server.
US12/046,273 2008-03-11 2008-03-11 Efficient processing of queries in federated database systems Expired - Fee Related US8538985B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US12/046,273 US8538985B2 (en) 2008-03-11 2008-03-11 Efficient processing of queries in federated database systems

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US12/046,273 US8538985B2 (en) 2008-03-11 2008-03-11 Efficient processing of queries in federated database systems

Publications (2)

Publication Number Publication Date
US20090234799A1 true US20090234799A1 (en) 2009-09-17
US8538985B2 US8538985B2 (en) 2013-09-17

Family

ID=41064106

Family Applications (1)

Application Number Title Priority Date Filing Date
US12/046,273 Expired - Fee Related US8538985B2 (en) 2008-03-11 2008-03-11 Efficient processing of queries in federated database systems

Country Status (1)

Country Link
US (1) US8538985B2 (en)

Cited By (126)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110106789A1 (en) * 2009-10-30 2011-05-05 International Business Machines Corporation Database system and method of optimizing cross database query
EP2343658A1 (en) 2009-12-18 2011-07-13 Siemens IT Solutions and Services GmbH Federation as a process
US20110179028A1 (en) * 2010-01-15 2011-07-21 Microsoft Corporation Aggregating data from a work queue
US20120072414A1 (en) * 2010-09-16 2012-03-22 Microsoft Corporation Query processing algorithm for vertically partitioned federated database systems
WO2012129149A2 (en) * 2011-03-23 2012-09-27 International Business Machines Corporation Aggregating search results based on associating data instances with knowledge base entities
US20120284252A1 (en) * 2009-10-02 2012-11-08 David Drai System and Method For Search Engine Optimization
CN102982147A (en) * 2012-11-26 2013-03-20 深圳市华为技术软件有限公司 Method and device for increasing integration efficiency of data information
US20130110813A1 (en) * 2011-11-02 2013-05-02 Microsoft Corporation Routing Query Results
US20130110860A1 (en) * 2011-11-02 2013-05-02 Microsoft Corporation User pipeline configuration for rule-based query transformation, generation and result display
US20130138681A1 (en) * 2011-11-28 2013-05-30 Computer Associates Think, Inc. Method and system for metadata driven processing of federated data
US8583408B2 (en) 2011-03-17 2013-11-12 Bank Of America Corporation Standardized modeling suite
US20130311447A1 (en) * 2012-05-15 2013-11-21 Microsoft Corporation Scenario based insights into structure data
US20140019426A1 (en) * 2012-07-12 2014-01-16 Open Text S.A. Systems and methods for in-place records management and content lifecycle management
US20140059042A1 (en) * 2012-08-24 2014-02-27 Microsoft Corporation Online Learning of Click-through Rates on Federated Search Results
US8694525B2 (en) 2011-06-24 2014-04-08 Sas Institute Inc. Systems and methods for performing index joins using auto generative queries
GB2516501A (en) * 2013-07-25 2015-01-28 Ibm Method and system for processing data in a parallel database environment
US9189563B2 (en) 2011-11-02 2015-11-17 Microsoft Technology Licensing, Llc Inheritance of rules across hierarchical levels
US20160063017A1 (en) * 2014-08-26 2016-03-03 International Business Machines Corporation Access control for unprotected data storage system endpoints
US20160147888A1 (en) * 2014-11-21 2016-05-26 Red Hat, Inc. Federation optimization using ordered queues
US20160292215A1 (en) * 2010-10-28 2016-10-06 Microsoft Technology Licensing, Llc Partitioning online databases
WO2017222927A1 (en) * 2016-06-19 2017-12-28 Data.World, Inc. Collaborative dataset consolidation via distributed computer networks
WO2017223468A1 (en) * 2016-06-23 2017-12-28 Schneider Electric USA, Inc. Transactional-unstructured data driven sequential federated query method for distributed systems
WO2017223464A1 (en) * 2016-06-23 2017-12-28 Schneider Electric USA, Inc. Contextual-characteristic data driven sequential federated query methods for distributed systems
US9959326B2 (en) 2011-03-23 2018-05-01 International Business Machines Corporation Annotating schema elements based on associating data instances with knowledge base entities
US10102258B2 (en) 2016-06-19 2018-10-16 Data.World, Inc. Collaborative dataset consolidation via distributed computer networks
US10324925B2 (en) * 2016-06-19 2019-06-18 Data.World, Inc. Query generation for collaborative datasets
US10346429B2 (en) 2016-06-19 2019-07-09 Data.World, Inc. Management of collaborative datasets via distributed computer networks
US10353911B2 (en) 2016-06-19 2019-07-16 Data.World, Inc. Computerized tools to discover, form, and analyze dataset interrelations among a system of networked collaborative datasets
US10438013B2 (en) 2016-06-19 2019-10-08 Data.World, Inc. Platform management of integrated access of public and privately-accessible datasets utilizing federated query generation and query schema rewriting optimization
US10452677B2 (en) 2016-06-19 2019-10-22 Data.World, Inc. Dataset analysis and dataset attribute inferencing to form collaborative datasets
US10452975B2 (en) 2016-06-19 2019-10-22 Data.World, Inc. Platform management of integrated access of public and privately-accessible datasets utilizing federated query generation and query schema rewriting optimization
US10515085B2 (en) 2016-06-19 2019-12-24 Data.World, Inc. Consolidator platform to implement collaborative datasets via distributed computer networks
US10528262B1 (en) * 2012-07-26 2020-01-07 EMC IP Holding Company LLC Replication-based federation of scalable data across multiple sites
WO2020027867A1 (en) * 2018-07-31 2020-02-06 Splunk Inc. Generating a subquery for a distinct data intake and query system
US10645548B2 (en) 2016-06-19 2020-05-05 Data.World, Inc. Computerized tool implementation of layered data files to discover, form, or analyze dataset interrelations of networked collaborative datasets
US20200175010A1 (en) * 2018-11-29 2020-06-04 Sap Se Distributed queries on legacy systems and micro-services
US10691710B2 (en) 2016-06-19 2020-06-23 Data.World, Inc. Interactive interfaces as computerized tools to present summarization data of dataset attributes for collaborative datasets
US10699027B2 (en) 2016-06-19 2020-06-30 Data.World, Inc. Loading collaborative datasets into data stores for queries via distributed computer networks
US10726009B2 (en) 2016-09-26 2020-07-28 Splunk Inc. Query processing using query-resource usage and node utilization data
US10747774B2 (en) 2016-06-19 2020-08-18 Data.World, Inc. Interactive interfaces to present data arrangement overviews and summarized dataset attributes for collaborative datasets
US10776355B1 (en) 2016-09-26 2020-09-15 Splunk Inc. Managing, storing, and caching query results and partial query results for combination with additional query results
US10795884B2 (en) 2016-09-26 2020-10-06 Splunk Inc. Dynamic resource allocation for common storage query
US10824637B2 (en) 2017-03-09 2020-11-03 Data.World, Inc. Matching subsets of tabular data arrangements to subsets of graphical data arrangements at ingestion into data driven collaborative datasets
WO2020227659A1 (en) * 2019-05-08 2020-11-12 Datameer, Inc. Recommendation model generation and use in a hybrid multi-cloud database environment
US10853376B2 (en) 2016-06-19 2020-12-01 Data.World, Inc. Collaborative dataset consolidation via distributed computer networks
US10860653B2 (en) 2010-10-22 2020-12-08 Data.World, Inc. System for accessing a relational database using semantic queries
US10896182B2 (en) 2017-09-25 2021-01-19 Splunk Inc. Multi-partitioning determination for combination operations
US10922308B2 (en) 2018-03-20 2021-02-16 Data.World, Inc. Predictive determination of constraint data for application with linked data in graph-based datasets associated with a data-driven collaborative dataset platform
US10956415B2 (en) 2016-09-26 2021-03-23 Splunk Inc. Generating a subquery for an external data system using a configuration file
US10977260B2 (en) 2016-09-26 2021-04-13 Splunk Inc. Task distribution in an execution node of a distributed execution environment
US10984044B1 (en) 2016-09-26 2021-04-20 Splunk Inc. Identifying buckets for query execution using a catalog of buckets stored in a remote shared storage system
US10984008B2 (en) 2016-06-19 2021-04-20 Data.World, Inc. Collaborative dataset consolidation via distributed computer networks
US11003714B1 (en) 2016-09-26 2021-05-11 Splunk Inc. Search node and bucket identification using a search node catalog and a data store catalog
US11010435B2 (en) 2016-09-26 2021-05-18 Splunk Inc. Search service for a data fabric system
US11016931B2 (en) 2016-06-19 2021-05-25 Data.World, Inc. Data ingestion to generate layered dataset interrelations to form a system of networked collaborative datasets
USD920353S1 (en) 2018-05-22 2021-05-25 Data.World, Inc. Display screen or portion thereof with graphical user interface
US11023463B2 (en) 2016-09-26 2021-06-01 Splunk Inc. Converting and modifying a subquery for an external data system
US11023104B2 (en) 2016-06-19 2021-06-01 data.world,Inc. Interactive interfaces as computerized tools to present summarization data of dataset attributes for collaborative datasets
US11036716B2 (en) 2016-06-19 2021-06-15 Data World, Inc. Layered data generation and data remediation to facilitate formation of interrelated data in a system of networked collaborative datasets
US11036697B2 (en) 2016-06-19 2021-06-15 Data.World, Inc. Transmuting data associations among data arrangements to facilitate data operations in a system of networked collaborative datasets
US11042537B2 (en) 2016-06-19 2021-06-22 Data.World, Inc. Link-formative auxiliary queries applied at data ingestion to facilitate data operations in a system of networked collaborative datasets
US11042548B2 (en) 2016-06-19 2021-06-22 Data World, Inc. Aggregation of ancillary data associated with source data in a system of networked collaborative datasets
US11042556B2 (en) 2016-06-19 2021-06-22 Data.World, Inc. Localized link formation to perform implicitly federated queries using extended computerized query language syntax
US11042560B2 (en) 2016-06-19 2021-06-22 data. world, Inc. Extended computerized query language syntax for analyzing multiple tabular data arrangements in data-driven collaborative projects
US11061927B2 (en) * 2019-04-03 2021-07-13 Sap Se Optimization of relocated queries in federated databases using cross database table replicas
US11068475B2 (en) 2016-06-19 2021-07-20 Data.World, Inc. Computerized tools to develop and manage data-driven projects collaboratively via a networked computing platform and collaborative datasets
US11068453B2 (en) 2017-03-09 2021-07-20 data.world, Inc Determining a degree of similarity of a subset of tabular data arrangements to subsets of graph data arrangements at ingestion into a data-driven collaborative dataset platform
US11068847B2 (en) 2016-06-19 2021-07-20 Data.World, Inc. Computerized tools to facilitate data project development via data access layering logic in a networked computing platform including collaborative datasets
US11086896B2 (en) 2016-06-19 2021-08-10 Data.World, Inc. Dynamic composite data dictionary to facilitate data operations via computerized tools configured to access collaborative datasets in a networked computing platform
US11106734B1 (en) 2016-09-26 2021-08-31 Splunk Inc. Query execution using containerized state-free search nodes in a containerized scalable environment
US11126632B2 (en) 2016-09-26 2021-09-21 Splunk Inc. Subquery generation based on search configuration data from an external data system
US11151137B2 (en) 2017-09-25 2021-10-19 Splunk Inc. Multi-partition operation in combination operations
US11163758B2 (en) 2016-09-26 2021-11-02 Splunk Inc. External dataset capability compensation
US11188532B2 (en) * 2019-11-19 2021-11-30 Vmware, Inc. Successive database record filtering on disparate database types
USD940169S1 (en) 2018-05-22 2022-01-04 Data.World, Inc. Display screen or portion thereof with a graphical user interface
USD940732S1 (en) 2018-05-22 2022-01-11 Data.World, Inc. Display screen or portion thereof with a graphical user interface
US11222066B1 (en) 2016-09-26 2022-01-11 Splunk Inc. Processing data using containerized state-free indexing nodes in a containerized scalable environment
US20220012238A1 (en) * 2020-07-07 2022-01-13 AtScale, Inc. Datacube access connectors
US11232100B2 (en) 2016-09-26 2022-01-25 Splunk Inc. Resource allocation for multiple datasets
US11238109B2 (en) 2017-03-09 2022-02-01 Data.World, Inc. Computerized tools configured to determine subsets of graph data arrangements for linking relevant data to enrich datasets associated with a data-driven collaborative dataset platform
US11243963B2 (en) 2016-09-26 2022-02-08 Splunk Inc. Distributing partial results to worker nodes from an external data system
US11243960B2 (en) 2018-03-20 2022-02-08 Data.World, Inc. Content addressable caching and federation in linked data projects in a data-driven collaborative dataset platform using disparate database architectures
US11250056B1 (en) 2016-09-26 2022-02-15 Splunk Inc. Updating a location marker of an ingestion buffer based on storing buckets in a shared storage system
US11269939B1 (en) 2016-09-26 2022-03-08 Splunk Inc. Iterative message-based data processing including streaming analytics
US11275760B2 (en) * 2014-10-28 2022-03-15 Microsoft Technology Licensing, Llc Online schema and data transformations
US11281706B2 (en) 2016-09-26 2022-03-22 Splunk Inc. Multi-layer partition allocation for query execution
US11294941B1 (en) 2016-09-26 2022-04-05 Splunk Inc. Message-based data ingestion to a data intake and query system
US11314753B2 (en) 2016-09-26 2022-04-26 Splunk Inc. Execution of a query received from a data intake and query system
US11321321B2 (en) 2016-09-26 2022-05-03 Splunk Inc. Record expansion and reduction based on a processing task in a data intake and query system
US11327991B2 (en) 2018-05-22 2022-05-10 Data.World, Inc. Auxiliary query commands to deploy predictive data models for queries in a networked computing platform
US11334543B1 (en) 2018-04-30 2022-05-17 Splunk Inc. Scalable bucket merging for a data intake and query system
US11334625B2 (en) 2016-06-19 2022-05-17 Data.World, Inc. Loading collaborative datasets into data stores for queries via distributed computer networks
US11354312B2 (en) 2019-08-29 2022-06-07 International Business Machines Corporation Access-plan-based querying for federated database-management systems
US11416528B2 (en) 2016-09-26 2022-08-16 Splunk Inc. Query acceleration data store
US11429610B2 (en) * 2020-04-01 2022-08-30 Sap Se Scaled-out query execution engine
US11442935B2 (en) 2016-09-26 2022-09-13 Splunk Inc. Determining a record generation estimate of a processing task
US11442988B2 (en) 2018-06-07 2022-09-13 Data.World, Inc. Method and system for editing and maintaining a graph schema
US11461334B2 (en) 2016-09-26 2022-10-04 Splunk Inc. Data conditioning for dataset destination
US11468049B2 (en) 2016-06-19 2022-10-11 Data.World, Inc. Data ingestion to generate layered dataset interrelations to form a system of networked collaborative datasets
US11494380B2 (en) 2019-10-18 2022-11-08 Splunk Inc. Management of distributed computing framework components in a data fabric service system
US11537990B2 (en) 2018-05-22 2022-12-27 Data.World, Inc. Computerized tools to collaboratively generate queries to access in-situ predictive data models in a networked computing platform
US11550847B1 (en) 2016-09-26 2023-01-10 Splunk Inc. Hashing bucket identifiers to identify search nodes for efficient query execution
US11562023B1 (en) 2016-09-26 2023-01-24 Splunk Inc. Merging buckets in a data intake and query system
US11567993B1 (en) 2016-09-26 2023-01-31 Splunk Inc. Copying buckets from a remote shared storage system to memory associated with a search node for query execution
US11580107B2 (en) 2016-09-26 2023-02-14 Splunk Inc. Bucket data distribution for exporting data to worker nodes
US11586627B2 (en) 2016-09-26 2023-02-21 Splunk Inc. Partitioning and reducing records at ingest of a worker node
US11586692B2 (en) 2016-09-26 2023-02-21 Splunk Inc. Streaming data processing
US11593377B2 (en) 2016-09-26 2023-02-28 Splunk Inc. Assigning processing tasks in a data intake and query system
US11599541B2 (en) 2016-09-26 2023-03-07 Splunk Inc. Determining records generated by a processing task of a query
US11604795B2 (en) 2016-09-26 2023-03-14 Splunk Inc. Distributing partial results from an external data system between worker nodes
US11615104B2 (en) 2016-09-26 2023-03-28 Splunk Inc. Subquery generation based on a data ingest estimate of an external data system
US11615087B2 (en) 2019-04-29 2023-03-28 Splunk Inc. Search time estimate in a data intake and query system
US11620336B1 (en) 2016-09-26 2023-04-04 Splunk Inc. Managing and storing buckets to a remote shared storage system based on a collective bucket size
US20230127572A1 (en) * 2016-06-19 2023-04-27 Data.World, Inc. Platform management of integrated access of public and privately-accessible datasets utilizing federated query generation and query schema rewriting optimization
US11663227B2 (en) 2016-09-26 2023-05-30 Splunk Inc. Generating a subquery for a distinct data intake and query system
US11675808B2 (en) 2016-06-19 2023-06-13 Data.World, Inc. Dataset analysis and dataset attribute inferencing to form collaborative datasets
US11704313B1 (en) 2020-10-19 2023-07-18 Splunk Inc. Parallel branch operation using intermediary nodes
US11715051B1 (en) 2019-04-30 2023-08-01 Splunk Inc. Service provider instance recommendations using machine-learned classifications and reconciliation
US11755602B2 (en) 2016-06-19 2023-09-12 Data.World, Inc. Correlating parallelized data from disparate data sources to aggregate graph data portions to predictively identify entity data
US11860940B1 (en) 2016-09-26 2024-01-02 Splunk Inc. Identifying buckets for query execution using a catalog of buckets
US11874691B1 (en) 2016-09-26 2024-01-16 Splunk Inc. Managing efficient query execution including mapping of buckets to search nodes
US11922222B1 (en) 2020-01-30 2024-03-05 Splunk Inc. Generating a modified component for a data intake and query system using an isolated execution environment image
US11921672B2 (en) 2017-07-31 2024-03-05 Splunk Inc. Query execution at a remote heterogeneous data store of a data fabric service
US11947554B2 (en) 2016-06-19 2024-04-02 Data.World, Inc. Loading collaborative datasets into data stores for queries via distributed computer networks
US11947600B2 (en) 2021-11-30 2024-04-02 Data.World, Inc. Content addressable caching and federation in linked data projects in a data-driven collaborative dataset platform using disparate database architectures
US11947529B2 (en) 2018-05-22 2024-04-02 Data.World, Inc. Generating and analyzing a data model to identify relevant data catalog data derived from graph-based data arrangements to perform an action

Families Citing this family (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101608495B1 (en) * 2009-12-11 2016-04-01 삼성전자주식회사 Apparatus and Method for processing data stream
US9665620B2 (en) 2010-01-15 2017-05-30 Ab Initio Technology Llc Managing data queries
US9116955B2 (en) 2011-05-02 2015-08-25 Ab Initio Technology Llc Managing data queries
US9317554B2 (en) * 2012-09-26 2016-04-19 Microsoft Technology Licensing, Llc SQL generation for assert, update and delete relational trees
JP6454706B2 (en) 2013-12-06 2019-01-16 アビニシオ テクノロジー エルエルシー Source code conversion
US9262476B2 (en) 2014-01-10 2016-02-16 Red Hat, Inc. System and method for batch query processing
US10114874B2 (en) 2014-02-24 2018-10-30 Red Hat, Inc. Source query caching as fault prevention for federated queries
US20160125098A1 (en) * 2014-10-30 2016-05-05 Lenovo (Singapore) Pte. Ltd. Aggregate service with search capabilities
US10437819B2 (en) 2014-11-14 2019-10-08 Ab Initio Technology Llc Processing queries containing a union-type operation
US9934275B2 (en) 2015-01-12 2018-04-03 Red Hat, Inc. Query union and split
US10417281B2 (en) 2015-02-18 2019-09-17 Ab Initio Technology Llc Querying a data source on a network
WO2016175858A1 (en) 2015-04-30 2016-11-03 Hewlett Packard Enterprise Development Lp Dynamic function invocation
US10432716B2 (en) 2016-02-29 2019-10-01 Bank Of America Corporation Metadata synchronization system
CN110008244A (en) * 2019-03-29 2019-07-12 国家计算机网络与信息安全管理中心 A kind of data query method and data query device
US11093223B2 (en) 2019-07-18 2021-08-17 Ab Initio Technology Llc Automatically converting a program written in a procedural programming language into a dataflow graph and related systems and methods
US11727022B2 (en) 2021-03-19 2023-08-15 International Business Machines Corporation Generating a global delta in distributed databases
US11704327B2 (en) 2021-03-19 2023-07-18 International Business Machines Corporation Querying distributed databases

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020035559A1 (en) * 2000-06-26 2002-03-21 Crowe William L. System and method for a decision engine and architecture for providing high-performance data querying operations
US20020124116A1 (en) * 2000-12-26 2002-09-05 Yaung Alan T. Messaging service in a federated content management system
US20020143755A1 (en) * 2000-11-28 2002-10-03 Siemens Technology-To-Business Center, Llc System and methods for highly distributed wide-area data management of a network of data sources through a database interface
US20090177647A1 (en) * 2008-01-07 2009-07-09 Knowledge Computing Corporation Method and Apparatus for Conducting Data Queries Using Consolidation Strings and Inter-Node Consolidation

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7113939B2 (en) 1999-09-21 2006-09-26 International Business Machines Corporation Architecture to enable search gateways as part of federated search
CA2397905A1 (en) 2002-08-13 2004-02-13 University Of Ottawa Differentiated transport services for enabling real-time distributed interactive virtual systems
US7243093B2 (en) 2002-11-27 2007-07-10 International Business Machines Corporation Federated query management
US7287048B2 (en) 2004-01-07 2007-10-23 International Business Machines Corporation Transparent archiving
US7315872B2 (en) 2004-08-31 2008-01-01 International Business Machines Corporation Dynamic and selective data source binding through a metawrapper
US7383247B2 (en) 2005-08-29 2008-06-03 International Business Machines Corporation Query routing of federated information systems for fast response time, load balance, availability, and reliability
US20070067274A1 (en) 2005-09-16 2007-03-22 International Business Machines Corporation Hybrid push-down/pull-up of unions with expensive operations in a federated query processor
US20070162425A1 (en) 2006-01-06 2007-07-12 International Business Machines Corporation System and method for performing advanced cost/benefit analysis of asynchronous operations
US20070203893A1 (en) 2006-02-27 2007-08-30 Business Objects, S.A. Apparatus and method for federated querying of unstructured data
US7877381B2 (en) 2006-03-24 2011-01-25 International Business Machines Corporation Progressive refinement of a federated query plan during query execution

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020035559A1 (en) * 2000-06-26 2002-03-21 Crowe William L. System and method for a decision engine and architecture for providing high-performance data querying operations
US20020143755A1 (en) * 2000-11-28 2002-10-03 Siemens Technology-To-Business Center, Llc System and methods for highly distributed wide-area data management of a network of data sources through a database interface
US20020124116A1 (en) * 2000-12-26 2002-09-05 Yaung Alan T. Messaging service in a federated content management system
US20090177647A1 (en) * 2008-01-07 2009-07-09 Knowledge Computing Corporation Method and Apparatus for Conducting Data Queries Using Consolidation Strings and Inter-Node Consolidation

Cited By (204)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120284252A1 (en) * 2009-10-02 2012-11-08 David Drai System and Method For Search Engine Optimization
US10346483B2 (en) * 2009-10-02 2019-07-09 Akamai Technologies, Inc. System and method for search engine optimization
US20110106789A1 (en) * 2009-10-30 2011-05-05 International Business Machines Corporation Database system and method of optimizing cross database query
US8768915B2 (en) * 2009-10-30 2014-07-01 International Business Machines Corporation Database system and method of optimizing cross database query
EP2343658A1 (en) 2009-12-18 2011-07-13 Siemens IT Solutions and Services GmbH Federation as a process
US8645377B2 (en) * 2010-01-15 2014-02-04 Microsoft Corporation Aggregating data from a work queue
US20110179028A1 (en) * 2010-01-15 2011-07-21 Microsoft Corporation Aggregating data from a work queue
US8392399B2 (en) * 2010-09-16 2013-03-05 Microsoft Corporation Query processing algorithm for vertically partitioned federated database systems
US20120072414A1 (en) * 2010-09-16 2012-03-22 Microsoft Corporation Query processing algorithm for vertically partitioned federated database systems
US11409802B2 (en) 2010-10-22 2022-08-09 Data.World, Inc. System for accessing a relational database using semantic queries
US10860653B2 (en) 2010-10-22 2020-12-08 Data.World, Inc. System for accessing a relational database using semantic queries
US20160292215A1 (en) * 2010-10-28 2016-10-06 Microsoft Technology Licensing, Llc Partitioning online databases
US8583408B2 (en) 2011-03-17 2013-11-12 Bank Of America Corporation Standardized modeling suite
WO2012129149A3 (en) * 2011-03-23 2014-04-10 International Business Machines Corporation Aggregating search results based on associating data instances with knowledge base entities
US9959326B2 (en) 2011-03-23 2018-05-01 International Business Machines Corporation Annotating schema elements based on associating data instances with knowledge base entities
WO2012129149A2 (en) * 2011-03-23 2012-09-27 International Business Machines Corporation Aggregating search results based on associating data instances with knowledge base entities
US8694525B2 (en) 2011-06-24 2014-04-08 Sas Institute Inc. Systems and methods for performing index joins using auto generative queries
US10366115B2 (en) 2011-11-02 2019-07-30 Microsoft Technology Licensing, Llc Routing query results
US9558274B2 (en) * 2011-11-02 2017-01-31 Microsoft Technology Licensing, Llc Routing query results
US10409897B2 (en) 2011-11-02 2019-09-10 Microsoft Technology Licensing, Llc Inheritance of rules across hierarchical level
US9177022B2 (en) * 2011-11-02 2015-11-03 Microsoft Technology Licensing, Llc User pipeline configuration for rule-based query transformation, generation and result display
US9189563B2 (en) 2011-11-02 2015-11-17 Microsoft Technology Licensing, Llc Inheritance of rules across hierarchical levels
US20130110813A1 (en) * 2011-11-02 2013-05-02 Microsoft Corporation Routing Query Results
US9792264B2 (en) 2011-11-02 2017-10-17 Microsoft Technology Licensing, Llc Inheritance of rules across hierarchical levels
US20130110860A1 (en) * 2011-11-02 2013-05-02 Microsoft Corporation User pipeline configuration for rule-based query transformation, generation and result display
US20130138681A1 (en) * 2011-11-28 2013-05-30 Computer Associates Think, Inc. Method and system for metadata driven processing of federated data
US9460173B2 (en) * 2011-11-28 2016-10-04 Ca, Inc. Method and system for metadata driven processing of federated data
US20130311447A1 (en) * 2012-05-15 2013-11-21 Microsoft Corporation Scenario based insights into structure data
US10853361B2 (en) 2012-05-15 2020-12-01 Microsoft Technology Licensing, Llc Scenario based insights into structure data
US11550761B2 (en) 2012-07-12 2023-01-10 Open Text Sa Ulc Systems and methods for in-place records management and content lifecycle management
US10754828B2 (en) 2012-07-12 2020-08-25 Open Text Sa Ulc Systems and methods for in-place records management and content lifecycle management
US9798737B2 (en) * 2012-07-12 2017-10-24 Open Text Sa Ulc Systems and methods for in-place records management and content lifecycle management
US20140019426A1 (en) * 2012-07-12 2014-01-16 Open Text S.A. Systems and methods for in-place records management and content lifecycle management
US10528262B1 (en) * 2012-07-26 2020-01-07 EMC IP Holding Company LLC Replication-based federation of scalable data across multiple sites
US10860663B2 (en) 2012-08-24 2020-12-08 Microsoft Technology Licensing, Llc Online learning of click-through rates on federated search results
US9922120B2 (en) * 2012-08-24 2018-03-20 Microsoft Technology Licensing, Llc Online learning of click-through rates on federated search results
US20140059042A1 (en) * 2012-08-24 2014-02-27 Microsoft Corporation Online Learning of Click-through Rates on Federated Search Results
CN102982147A (en) * 2012-11-26 2013-03-20 深圳市华为技术软件有限公司 Method and device for increasing integration efficiency of data information
GB2516501A (en) * 2013-07-25 2015-01-28 Ibm Method and system for processing data in a parallel database environment
US9953067B2 (en) 2013-07-25 2018-04-24 International Business Machines Corporation Method and system for processing data in a parallel database environment
US10936606B2 (en) 2013-07-25 2021-03-02 International Business Machines Corporation Method and system for processing data in a parallel database environment
US20160063271A1 (en) * 2014-08-26 2016-03-03 International Business Machines Corporation Access control for unprotected data storage system endpoints
US20190057107A1 (en) * 2014-08-26 2019-02-21 International Business Machines Corporation Access control for unprotected data storage system endpoints
US10108628B2 (en) * 2014-08-26 2018-10-23 International Business Machines Corporation Access control for unprotected data storage system endpoints
US20160063017A1 (en) * 2014-08-26 2016-03-03 International Business Machines Corporation Access control for unprotected data storage system endpoints
US9690792B2 (en) * 2014-08-26 2017-06-27 International Business Machines Corporation Access control for unprotected data storage system endpoints
US10838916B2 (en) * 2014-08-26 2020-11-17 International Business Machines Corporation Access control for unprotected data storage system endpoints
US11275760B2 (en) * 2014-10-28 2022-03-15 Microsoft Technology Licensing, Llc Online schema and data transformations
US20180004817A1 (en) * 2014-11-21 2018-01-04 Red Hat, Inc. Federation optimization using ordered queues
US11709849B2 (en) 2014-11-21 2023-07-25 Red Hat, Inc. Federation optimization using ordered queues
US20160147888A1 (en) * 2014-11-21 2016-05-26 Red Hat, Inc. Federation optimization using ordered queues
US9767168B2 (en) * 2014-11-21 2017-09-19 Red Hat, Inc. Federation optimization using ordered queues
US11093633B2 (en) * 2016-06-19 2021-08-17 Data.World, Inc. Platform management of integrated access of public and privately-accessible datasets utilizing federated query generation and query schema rewriting optimization
US20230009198A1 (en) * 2016-06-19 2023-01-12 Data.World, Inc. Query generation for collaborative datasets
US11947554B2 (en) 2016-06-19 2024-04-02 Data.World, Inc. Loading collaborative datasets into data stores for queries via distributed computer networks
US10645548B2 (en) 2016-06-19 2020-05-05 Data.World, Inc. Computerized tool implementation of layered data files to discover, form, or analyze dataset interrelations of networked collaborative datasets
US11941140B2 (en) * 2016-06-19 2024-03-26 Data.World, Inc. Platform management of integrated access of public and privately-accessible datasets utilizing federated query generation and query schema rewriting optimization
US10691710B2 (en) 2016-06-19 2020-06-23 Data.World, Inc. Interactive interfaces as computerized tools to present summarization data of dataset attributes for collaborative datasets
US10699027B2 (en) 2016-06-19 2020-06-30 Data.World, Inc. Loading collaborative datasets into data stores for queries via distributed computer networks
US11928596B2 (en) * 2016-06-19 2024-03-12 Data.World, Inc. Platform management of integrated access of public and privately-accessible datasets utilizing federated query generation and query schema rewriting optimization
US10747774B2 (en) 2016-06-19 2020-08-18 Data.World, Inc. Interactive interfaces to present data arrangement overviews and summarized dataset attributes for collaborative datasets
US10452975B2 (en) 2016-06-19 2019-10-22 Data.World, Inc. Platform management of integrated access of public and privately-accessible datasets utilizing federated query generation and query schema rewriting optimization
US11816118B2 (en) 2016-06-19 2023-11-14 Data.World, Inc. Collaborative dataset consolidation via distributed computer networks
US11755602B2 (en) 2016-06-19 2023-09-12 Data.World, Inc. Correlating parallelized data from disparate data sources to aggregate graph data portions to predictively identify entity data
US11734564B2 (en) * 2016-06-19 2023-08-22 Data.World, Inc. Platform management of integrated access of public and privately-accessible datasets utilizing federated query generation and query schema rewriting optimization
US10452677B2 (en) 2016-06-19 2019-10-22 Data.World, Inc. Dataset analysis and dataset attribute inferencing to form collaborative datasets
US10853376B2 (en) 2016-06-19 2020-12-01 Data.World, Inc. Collaborative dataset consolidation via distributed computer networks
US10438013B2 (en) 2016-06-19 2019-10-08 Data.World, Inc. Platform management of integrated access of public and privately-accessible datasets utilizing federated query generation and query schema rewriting optimization
US10353911B2 (en) 2016-06-19 2019-07-16 Data.World, Inc. Computerized tools to discover, form, and analyze dataset interrelations among a system of networked collaborative datasets
US10860601B2 (en) 2016-06-19 2020-12-08 Data.World, Inc. Dataset analysis and dataset attribute inferencing to form collaborative datasets
US10860613B2 (en) 2016-06-19 2020-12-08 Data.World, Inc. Management of collaborative datasets via distributed computer networks
US10346429B2 (en) 2016-06-19 2019-07-09 Data.World, Inc. Management of collaborative datasets via distributed computer networks
US10860600B2 (en) 2016-06-19 2020-12-08 Data.World, Inc. Dataset analysis and dataset attribute inferencing to form collaborative datasets
US11726992B2 (en) * 2016-06-19 2023-08-15 Data.World, Inc. Query generation for collaborative datasets
WO2017222927A1 (en) * 2016-06-19 2017-12-28 Data.World, Inc. Collaborative dataset consolidation via distributed computer networks
US10324925B2 (en) * 2016-06-19 2019-06-18 Data.World, Inc. Query generation for collaborative datasets
US10963486B2 (en) 2016-06-19 2021-03-30 Data.World, Inc. Management of collaborative datasets via distributed computer networks
US11675808B2 (en) 2016-06-19 2023-06-13 Data.World, Inc. Dataset analysis and dataset attribute inferencing to form collaborative datasets
US20230127572A1 (en) * 2016-06-19 2023-04-27 Data.World, Inc. Platform management of integrated access of public and privately-accessible datasets utilizing federated query generation and query schema rewriting optimization
US10984008B2 (en) 2016-06-19 2021-04-20 Data.World, Inc. Collaborative dataset consolidation via distributed computer networks
US11609680B2 (en) 2016-06-19 2023-03-21 Data.World, Inc. Interactive interfaces as computerized tools to present summarization data of dataset attributes for collaborative datasets
US11210307B2 (en) 2016-06-19 2021-12-28 Data.World, Inc. Consolidator platform to implement collaborative datasets via distributed computer networks
US11016931B2 (en) 2016-06-19 2021-05-25 Data.World, Inc. Data ingestion to generate layered dataset interrelations to form a system of networked collaborative datasets
US20220366252A1 (en) * 2016-06-19 2022-11-17 Data.World, Inc. Platform management of integrated access of public and privately-accessible datasets utilizing federated query generation and query schema rewriting optimization
US10515085B2 (en) 2016-06-19 2019-12-24 Data.World, Inc. Consolidator platform to implement collaborative datasets via distributed computer networks
US11246018B2 (en) 2016-06-19 2022-02-08 Data.World, Inc. Computerized tool implementation of layered data files to discover, form, or analyze dataset interrelations of networked collaborative datasets
US10102258B2 (en) 2016-06-19 2018-10-16 Data.World, Inc. Collaborative dataset consolidation via distributed computer networks
US20220351038A1 (en) * 2016-06-19 2022-11-03 Data.World, Inc. Platform management of integrated access of public and privately-accessible datasets utilizing federated query generation and query schema rewriting optimization
US11023104B2 (en) 2016-06-19 2021-06-01 data.world,Inc. Interactive interfaces as computerized tools to present summarization data of dataset attributes for collaborative datasets
US11036716B2 (en) 2016-06-19 2021-06-15 Data World, Inc. Layered data generation and data remediation to facilitate formation of interrelated data in a system of networked collaborative datasets
US11036697B2 (en) 2016-06-19 2021-06-15 Data.World, Inc. Transmuting data associations among data arrangements to facilitate data operations in a system of networked collaborative datasets
US11042537B2 (en) 2016-06-19 2021-06-22 Data.World, Inc. Link-formative auxiliary queries applied at data ingestion to facilitate data operations in a system of networked collaborative datasets
US11042548B2 (en) 2016-06-19 2021-06-22 Data World, Inc. Aggregation of ancillary data associated with source data in a system of networked collaborative datasets
US11042556B2 (en) 2016-06-19 2021-06-22 Data.World, Inc. Localized link formation to perform implicitly federated queries using extended computerized query language syntax
US11042560B2 (en) 2016-06-19 2021-06-22 data. world, Inc. Extended computerized query language syntax for analyzing multiple tabular data arrangements in data-driven collaborative projects
US11468049B2 (en) 2016-06-19 2022-10-11 Data.World, Inc. Data ingestion to generate layered dataset interrelations to form a system of networked collaborative datasets
US11068475B2 (en) 2016-06-19 2021-07-20 Data.World, Inc. Computerized tools to develop and manage data-driven projects collaboratively via a networked computing platform and collaborative datasets
US11277720B2 (en) 2016-06-19 2022-03-15 Data.World, Inc. Computerized tool implementation of layered data files to discover, form, or analyze dataset interrelations of networked collaborative datasets
US11314734B2 (en) * 2016-06-19 2022-04-26 Data.World, Inc. Query generation for collaborative datasets
US11068847B2 (en) 2016-06-19 2021-07-20 Data.World, Inc. Computerized tools to facilitate data project development via data access layering logic in a networked computing platform including collaborative datasets
US11423039B2 (en) 2016-06-19 2022-08-23 data. world, Inc. Collaborative dataset consolidation via distributed computer networks
US11086896B2 (en) 2016-06-19 2021-08-10 Data.World, Inc. Dynamic composite data dictionary to facilitate data operations via computerized tools configured to access collaborative datasets in a networked computing platform
US11327996B2 (en) 2016-06-19 2022-05-10 Data.World, Inc. Interactive interfaces to present data arrangement overviews and summarized dataset attributes for collaborative datasets
US11210313B2 (en) 2016-06-19 2021-12-28 Data.World, Inc. Computerized tools to discover, form, and analyze dataset interrelations among a system of networked collaborative datasets
US11386218B2 (en) * 2016-06-19 2022-07-12 Data.World, Inc. Platform management of integrated access of public and privately-accessible datasets utilizing federated query generation and query schema rewriting optimization
US11373094B2 (en) * 2016-06-19 2022-06-28 Data.World, Inc. Platform management of integrated access of public and privately-accessible datasets utilizing federated query generation and query schema rewriting optimization
US11366824B2 (en) 2016-06-19 2022-06-21 Data.World, Inc. Dataset analysis and dataset attribute inferencing to form collaborative datasets
US11163755B2 (en) 2016-06-19 2021-11-02 Data.World, Inc. Query generation for collaborative datasets
US11176151B2 (en) 2016-06-19 2021-11-16 Data.World, Inc. Consolidator platform to implement collaborative datasets via distributed computer networks
US11334625B2 (en) 2016-06-19 2022-05-17 Data.World, Inc. Loading collaborative datasets into data stores for queries via distributed computer networks
US11334793B2 (en) * 2016-06-19 2022-05-17 Data.World, Inc. Platform management of integrated access of public and privately-accessible datasets utilizing federated query generation and query schema rewriting optimization
US11194830B2 (en) 2016-06-19 2021-12-07 Data.World, Inc. Computerized tools to discover, form, and analyze dataset interrelations among a system of networked collaborative datasets
US10942942B2 (en) 2016-06-23 2021-03-09 Schneider Electric USA, Inc. Transactional-unstructured data driven sequential federated query method for distributed systems
WO2017223464A1 (en) * 2016-06-23 2017-12-28 Schneider Electric USA, Inc. Contextual-characteristic data driven sequential federated query methods for distributed systems
CN109643311A (en) * 2016-06-23 2019-04-16 施耐德电气美国股份有限公司 The sequence conjunctive query method that transactional unstructured data for distributed system drives
US11442956B2 (en) 2016-06-23 2022-09-13 Schneider Electric USA, Inc. Transactional-unstructured data driven sequential federated query method for distributed systems
WO2017223468A1 (en) * 2016-06-23 2017-12-28 Schneider Electric USA, Inc. Transactional-unstructured data driven sequential federated query method for distributed systems
US11693865B2 (en) 2016-06-23 2023-07-04 Schneider Electric USA, Inc. Contextual-characteristic data driven sequential federated query methods for distributed systems
US11222032B2 (en) 2016-06-23 2022-01-11 Schneider Electric USA, Inc. Contextual-characteristic data driven sequential federated query methods for distributed systems
CN117235162A (en) * 2016-06-23 2023-12-15 施耐德电气美国股份有限公司 Transactional unstructured data-driven sequential joint query method for distributed system
US11250056B1 (en) 2016-09-26 2022-02-15 Splunk Inc. Updating a location marker of an ingestion buffer based on storing buckets in a shared storage system
US10977260B2 (en) 2016-09-26 2021-04-13 Splunk Inc. Task distribution in an execution node of a distributed execution environment
US10726009B2 (en) 2016-09-26 2020-07-28 Splunk Inc. Query processing using query-resource usage and node utilization data
US11232100B2 (en) 2016-09-26 2022-01-25 Splunk Inc. Resource allocation for multiple datasets
US11243963B2 (en) 2016-09-26 2022-02-08 Splunk Inc. Distributing partial results to worker nodes from an external data system
US11874691B1 (en) 2016-09-26 2024-01-16 Splunk Inc. Managing efficient query execution including mapping of buckets to search nodes
US11860940B1 (en) 2016-09-26 2024-01-02 Splunk Inc. Identifying buckets for query execution using a catalog of buckets
US11269939B1 (en) 2016-09-26 2022-03-08 Splunk Inc. Iterative message-based data processing including streaming analytics
US11222066B1 (en) 2016-09-26 2022-01-11 Splunk Inc. Processing data using containerized state-free indexing nodes in a containerized scalable environment
US10776355B1 (en) 2016-09-26 2020-09-15 Splunk Inc. Managing, storing, and caching query results and partial query results for combination with additional query results
US11281706B2 (en) 2016-09-26 2022-03-22 Splunk Inc. Multi-layer partition allocation for query execution
US11294941B1 (en) 2016-09-26 2022-04-05 Splunk Inc. Message-based data ingestion to a data intake and query system
US10795884B2 (en) 2016-09-26 2020-10-06 Splunk Inc. Dynamic resource allocation for common storage query
US11314753B2 (en) 2016-09-26 2022-04-26 Splunk Inc. Execution of a query received from a data intake and query system
US11321321B2 (en) 2016-09-26 2022-05-03 Splunk Inc. Record expansion and reduction based on a processing task in a data intake and query system
US11797618B2 (en) 2016-09-26 2023-10-24 Splunk Inc. Data fabric service system deployment
US10956415B2 (en) 2016-09-26 2021-03-23 Splunk Inc. Generating a subquery for an external data system using a configuration file
US11238112B2 (en) 2016-09-26 2022-02-01 Splunk Inc. Search service system monitoring
US11663227B2 (en) 2016-09-26 2023-05-30 Splunk Inc. Generating a subquery for a distinct data intake and query system
US11176208B2 (en) 2016-09-26 2021-11-16 Splunk Inc. Search functionality of a data intake and query system
US11341131B2 (en) 2016-09-26 2022-05-24 Splunk Inc. Query scheduling based on a query-resource allocation and resource availability
US10984044B1 (en) 2016-09-26 2021-04-20 Splunk Inc. Identifying buckets for query execution using a catalog of buckets stored in a remote shared storage system
US11163758B2 (en) 2016-09-26 2021-11-02 Splunk Inc. External dataset capability compensation
US11636105B2 (en) 2016-09-26 2023-04-25 Splunk Inc. Generating a subquery for an external data system using a configuration file
US11126632B2 (en) 2016-09-26 2021-09-21 Splunk Inc. Subquery generation based on search configuration data from an external data system
US11392654B2 (en) 2016-09-26 2022-07-19 Splunk Inc. Data fabric service system
US11106734B1 (en) 2016-09-26 2021-08-31 Splunk Inc. Query execution using containerized state-free search nodes in a containerized scalable environment
US11416528B2 (en) 2016-09-26 2022-08-16 Splunk Inc. Query acceleration data store
US11080345B2 (en) 2016-09-26 2021-08-03 Splunk Inc. Search functionality of worker nodes in a data fabric service system
US11620336B1 (en) 2016-09-26 2023-04-04 Splunk Inc. Managing and storing buckets to a remote shared storage system based on a collective bucket size
US11615104B2 (en) 2016-09-26 2023-03-28 Splunk Inc. Subquery generation based on a data ingest estimate of an external data system
US11442935B2 (en) 2016-09-26 2022-09-13 Splunk Inc. Determining a record generation estimate of a processing task
US11003714B1 (en) 2016-09-26 2021-05-11 Splunk Inc. Search node and bucket identification using a search node catalog and a data store catalog
US11604795B2 (en) 2016-09-26 2023-03-14 Splunk Inc. Distributing partial results from an external data system between worker nodes
US11461334B2 (en) 2016-09-26 2022-10-04 Splunk Inc. Data conditioning for dataset destination
US11599541B2 (en) 2016-09-26 2023-03-07 Splunk Inc. Determining records generated by a processing task of a query
US11023463B2 (en) 2016-09-26 2021-06-01 Splunk Inc. Converting and modifying a subquery for an external data system
US11593377B2 (en) 2016-09-26 2023-02-28 Splunk Inc. Assigning processing tasks in a data intake and query system
US11586692B2 (en) 2016-09-26 2023-02-21 Splunk Inc. Streaming data processing
US11023539B2 (en) 2016-09-26 2021-06-01 Splunk Inc. Data intake and query system search functionality in a data fabric service system
US11586627B2 (en) 2016-09-26 2023-02-21 Splunk Inc. Partitioning and reducing records at ingest of a worker node
US11580107B2 (en) 2016-09-26 2023-02-14 Splunk Inc. Bucket data distribution for exporting data to worker nodes
US11550847B1 (en) 2016-09-26 2023-01-10 Splunk Inc. Hashing bucket identifiers to identify search nodes for efficient query execution
US11010435B2 (en) 2016-09-26 2021-05-18 Splunk Inc. Search service for a data fabric system
US11562023B1 (en) 2016-09-26 2023-01-24 Splunk Inc. Merging buckets in a data intake and query system
US11567993B1 (en) 2016-09-26 2023-01-31 Splunk Inc. Copying buckets from a remote shared storage system to memory associated with a search node for query execution
US11068453B2 (en) 2017-03-09 2021-07-20 data.world, Inc Determining a degree of similarity of a subset of tabular data arrangements to subsets of graph data arrangements at ingestion into a data-driven collaborative dataset platform
US10824637B2 (en) 2017-03-09 2020-11-03 Data.World, Inc. Matching subsets of tabular data arrangements to subsets of graphical data arrangements at ingestion into data driven collaborative datasets
US11669540B2 (en) 2017-03-09 2023-06-06 Data.World, Inc. Matching subsets of tabular data arrangements to subsets of graphical data arrangements at ingestion into data-driven collaborative datasets
US11238109B2 (en) 2017-03-09 2022-02-01 Data.World, Inc. Computerized tools configured to determine subsets of graph data arrangements for linking relevant data to enrich datasets associated with a data-driven collaborative dataset platform
US11921672B2 (en) 2017-07-31 2024-03-05 Splunk Inc. Query execution at a remote heterogeneous data store of a data fabric service
US11860874B2 (en) 2017-09-25 2024-01-02 Splunk Inc. Multi-partitioning data for combination operations
US10896182B2 (en) 2017-09-25 2021-01-19 Splunk Inc. Multi-partitioning determination for combination operations
US11151137B2 (en) 2017-09-25 2021-10-19 Splunk Inc. Multi-partition operation in combination operations
US11500875B2 (en) 2017-09-25 2022-11-15 Splunk Inc. Multi-partitioning for combination operations
US11573948B2 (en) 2018-03-20 2023-02-07 Data.World, Inc. Predictive determination of constraint data for application with linked data in graph-based datasets associated with a data-driven collaborative dataset platform
US10922308B2 (en) 2018-03-20 2021-02-16 Data.World, Inc. Predictive determination of constraint data for application with linked data in graph-based datasets associated with a data-driven collaborative dataset platform
US11243960B2 (en) 2018-03-20 2022-02-08 Data.World, Inc. Content addressable caching and federation in linked data projects in a data-driven collaborative dataset platform using disparate database architectures
US11334543B1 (en) 2018-04-30 2022-05-17 Splunk Inc. Scalable bucket merging for a data intake and query system
US11720537B2 (en) 2018-04-30 2023-08-08 Splunk Inc. Bucket merging for a data intake and query system using size thresholds
US11327991B2 (en) 2018-05-22 2022-05-10 Data.World, Inc. Auxiliary query commands to deploy predictive data models for queries in a networked computing platform
US11537990B2 (en) 2018-05-22 2022-12-27 Data.World, Inc. Computerized tools to collaboratively generate queries to access in-situ predictive data models in a networked computing platform
US11947529B2 (en) 2018-05-22 2024-04-02 Data.World, Inc. Generating and analyzing a data model to identify relevant data catalog data derived from graph-based data arrangements to perform an action
USD940732S1 (en) 2018-05-22 2022-01-11 Data.World, Inc. Display screen or portion thereof with a graphical user interface
USD940169S1 (en) 2018-05-22 2022-01-04 Data.World, Inc. Display screen or portion thereof with a graphical user interface
USD920353S1 (en) 2018-05-22 2021-05-25 Data.World, Inc. Display screen or portion thereof with graphical user interface
US11657089B2 (en) 2018-06-07 2023-05-23 Data.World, Inc. Method and system for editing and maintaining a graph schema
US11442988B2 (en) 2018-06-07 2022-09-13 Data.World, Inc. Method and system for editing and maintaining a graph schema
WO2020027867A1 (en) * 2018-07-31 2020-02-06 Splunk Inc. Generating a subquery for a distinct data intake and query system
US20200175010A1 (en) * 2018-11-29 2020-06-04 Sap Se Distributed queries on legacy systems and micro-services
US11061927B2 (en) * 2019-04-03 2021-07-13 Sap Se Optimization of relocated queries in federated databases using cross database table replicas
US11615087B2 (en) 2019-04-29 2023-03-28 Splunk Inc. Search time estimate in a data intake and query system
US11715051B1 (en) 2019-04-30 2023-08-01 Splunk Inc. Service provider instance recommendations using machine-learned classifications and reconciliation
WO2020227659A1 (en) * 2019-05-08 2020-11-12 Datameer, Inc. Recommendation model generation and use in a hybrid multi-cloud database environment
US11216461B2 (en) 2019-05-08 2022-01-04 Datameer, Inc Query transformations in a hybrid multi-cloud database environment per target query performance
US11449506B2 (en) * 2019-05-08 2022-09-20 Datameer, Inc Recommendation model generation and use in a hybrid multi-cloud database environment
US11354312B2 (en) 2019-08-29 2022-06-07 International Business Machines Corporation Access-plan-based querying for federated database-management systems
US11494380B2 (en) 2019-10-18 2022-11-08 Splunk Inc. Management of distributed computing framework components in a data fabric service system
US11188532B2 (en) * 2019-11-19 2021-11-30 Vmware, Inc. Successive database record filtering on disparate database types
US11922222B1 (en) 2020-01-30 2024-03-05 Splunk Inc. Generating a modified component for a data intake and query system using an isolated execution environment image
US11429610B2 (en) * 2020-04-01 2022-08-30 Sap Se Scaled-out query execution engine
US20220012238A1 (en) * 2020-07-07 2022-01-13 AtScale, Inc. Datacube access connectors
US11704313B1 (en) 2020-10-19 2023-07-18 Splunk Inc. Parallel branch operation using intermediary nodes
US11947600B2 (en) 2021-11-30 2024-04-02 Data.World, Inc. Content addressable caching and federation in linked data projects in a data-driven collaborative dataset platform using disparate database architectures

Also Published As

Publication number Publication date
US8538985B2 (en) 2013-09-17

Similar Documents

Publication Publication Date Title
US8538985B2 (en) Efficient processing of queries in federated database systems
US8935232B2 (en) Query execution systems and methods
US8738568B2 (en) User-defined parallelization in transactional replication of in-memory database
US8782075B2 (en) Query handling in databases with replicated data
US9081837B2 (en) Scoped database connections
US8543596B1 (en) Assigning blocks of a file of a distributed file system to processing units of a parallel database management system
US7054852B1 (en) Performance of join operations in parallel database systems
US10885031B2 (en) Parallelizing SQL user defined transformation functions
US20130110873A1 (en) Method and system for data storage and management
US20070067274A1 (en) Hybrid push-down/pull-up of unions with expensive operations in a federated query processor
US20120239612A1 (en) User defined functions for data loading
MXPA06009355A (en) Ultra-shared-nothing parallel database.
US9760604B2 (en) System and method for adaptive filtering of data requests
JP4483034B2 (en) Heterogeneous data source integrated access method
Borkar et al. Have your data and query it too: From key-value caching to big data management
Samwel et al. F1 query: Declarative querying at scale
US8131711B2 (en) System, method, and computer-readable medium for partial redistribution, partial duplication of rows of parallel join operation on skewed data
US8150865B2 (en) Techniques for coalescing subqueries
Hasan et al. Data transformation from sql to nosql mongodb based on r programming language
Lawrence et al. The OLAP-enabled grid: Model and query processing algorithms
US7236971B1 (en) Method and system for deriving data through interpolation in a database system
US7620615B1 (en) Joins of relations in an object relational database system
Furtado Hierarchical aggregation in networked data management
Kurunji et al. Optimizing aggregate query processing in cloud data warehouses
Li et al. Query-driven frequent Co-occurring term computation over relational data using MapReduce

Legal Events

Date Code Title Description
AS Assignment

Owner name: INTERNATIONAL BUSINESS MACHINES CORPORATION, NEW Y

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:BETAWADKAR-NORWOOD, ANJALI;PIRAHESH, HAMID;SIMMEN, DAVID EVERETT;REEL/FRAME:020640/0009;SIGNING DATES FROM 20070310 TO 20080306

Owner name: INTERNATIONAL BUSINESS MACHINES CORPORATION, NEW Y

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:BETAWADKAR-NORWOOD, ANJALI;PIRAHESH, HAMID;SIMMEN, DAVID EVERETT;SIGNING DATES FROM 20070310 TO 20080306;REEL/FRAME:020640/0009

STCF Information on status: patent grant

Free format text: PATENTED CASE

FPAY Fee payment

Year of fee payment: 4

FEPP Fee payment procedure

Free format text: MAINTENANCE FEE REMINDER MAILED (ORIGINAL EVENT CODE: REM.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

LAPS Lapse for failure to pay maintenance fees

Free format text: PATENT EXPIRED FOR FAILURE TO PAY MAINTENANCE FEES (ORIGINAL EVENT CODE: EXP.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

STCH Information on status: patent discontinuation

Free format text: PATENT EXPIRED DUE TO NONPAYMENT OF MAINTENANCE FEES UNDER 37 CFR 1.362

FP Lapsed due to failure to pay maintenance fee

Effective date: 20210917