US20090006309A1 - Cluster processing of an aggregated dataset - Google Patents

Cluster processing of an aggregated dataset Download PDF

Info

Publication number
US20090006309A1
US20090006309A1 US12/023,267 US2326708A US2009006309A1 US 20090006309 A1 US20090006309 A1 US 20090006309A1 US 2326708 A US2326708 A US 2326708A US 2009006309 A1 US2009006309 A1 US 2009006309A1
Authority
US
United States
Prior art keywords
data
node
analytic
master
slave
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US12/023,267
Inventor
Herbert Dennis Hunt
John Randall West
Marshall Ashby Gibbs
Bradley Michael Griglione
Gregory David Neil Hudson
Andrea Basilico
Arvid C. Johnson
Cheryl G. Bergeon
Craig Joseph Chapa
Alberto Agostinelli
Jay Alan Yusko
Trevor Mason
Ting Liu
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
SymphonyIRI Group Inc
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US12/021,263 external-priority patent/US20090006156A1/en
Application filed by Individual filed Critical Individual
Priority to US12/023,267 priority Critical patent/US20090006309A1/en
Publication of US20090006309A1 publication Critical patent/US20090006309A1/en
Priority to US13/028,022 priority patent/US9466063B2/en
Assigned to SYMPHONYIRI GROUP, INC. reassignment SYMPHONYIRI GROUP, INC. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: JOHNSON, ARVID C., HUDSON, GREGORY DAVID NEIL, GIBBS, MARSHALL ASHBY, JR., BERGEON, CHERYL G., AGOSTINELLI, ALBERTO, BASILICO, ANDREA, CHAPA, CRAIG JOSEPH, GRIGLIONE, BRADLEY MICHAEL, LIU, TING, MASON, TREVOR, YUSKO, JAY ALAN
Assigned to SYMPHONYIRI GROUP, INC. reassignment SYMPHONYIRI GROUP, INC. CHANGE OF NAME (SEE DOCUMENT FOR DETAILS). Assignors: INFORMATION RESOURCES, INC.
Assigned to INFORMATION RESOURCES, INC. reassignment INFORMATION RESOURCES, INC. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: HUNT, HERBERT D., WEST, JOHN R.
Abandoned legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24553Query execution of query operations
    • G06F16/24554Unary operations; Data partitioning operations
    • G06F16/24556Aggregation; Duplicate elimination

Definitions

  • This invention relates to methods and systems for analyzing data, and more particularly to methods and systems for aggregating, projecting, and releasing data.
  • unified information sets and results drawn from such information sets can be released to third parties according to so-called “releasability” rules.
  • Theses rules might apply to any and all of the data from which the unified information sets are drawn, the dimensions (or points or ranges along the dimensions), the third party (or members or sub-organizations of the third party), and so on.
  • the releasability rules might apply to any and all of the data from which the unified information sets are drawn, the dimensions (or points or ranges along the dimensions), the third party (or members or sub-organizations of the third party), and so on.
  • projection methodologies are known in the art. Still other projection methodologies are subjects of the present invention. In any case, different projection methodologies provide outputs that have different statistical qualities. Analysts are interested in specifying the statistical qualities of the outputs at query-time. In practice, however, the universe of data and the projection methodologies that are applied to it are what drive the statistical qualities. Existing methods allow an analyst to choose a projection methodology and thereby affect the statistical qualities of the output, but this does not satisfy the analyst's desire to directly dictate the statistical qualities.
  • Information systems are a significant bottle neck for market analysis activities.
  • the architecture of information systems is often not designed to provide on-demand flexible access, integration at a very granular level, or many other critical capabilities necessary to support growth.
  • information systems are counter-productive to growth.
  • Hundreds of market and consumer databases make it very difficult to manage or integrate data.
  • Different market views and product hierarchies proliferate among manufacturers and retailers.
  • Restatements of data hierarchies waste precious time and are very expensive. Navigation from among views of data, such as from global views to regional to neighborhood to store views is virtually impossible, because there are different hierarchies used to store data from global to region to neighborhood to store-level data.
  • Analyses and insights often take weeks or months, or they are never produced. Insights are often sub-optimal because of silo-driven, narrowly defined, ad hoc analysis projects. Reflecting the ad hoc nature of these analytic projects are the analytic tools and infrastructure developed to support them.
  • market analysis, business intelligence, and the like often use rigid data cubes that may include hundreds of databases that are impossible to integrate. These systems may include hundreds of views, hierarchies, clusters, and so forth, each of which is associated with its own rigid data cube. This may make it almost impossible to navigate from global uses that are used, for example, to develop overall company strategy, down to specific program implementation or customer-driven uses.
  • These ad hoc analytic tools and infrastructure are fragmented and disconnected.
  • systems and methods may involve using a platform as disclosed herein for applications described herein where the systems and methods involve receiving a aggregated dataset, wherein the aggregated dataset includes data from a panel data source, a fact data source, and a dimension data source that have been associated with a standard population database.
  • the process may also involve storing the aggregated data in a partition within a partitioned database, wherein the partition is associated with a data characteristic.
  • the process may also involve associating a master processing node with a plurality of slave nodes, wherein each of the plurality of slave nodes is associated with a partition of the partitioned database.
  • the process may also involve submitting an analytic query to the master processing node.
  • the process may also involve assigning analytic processing to at least one of the plurality of slave nodes by the master processing node, wherein the assignment is based at least in part on the association of the partition with the data characteristic.
  • the process may also involve reading the aggregated data from the partitioned database by the assigned slave node.
  • the process may also involve analyzing the aggregated data by the assigned slave node, wherein the analysis produces a result at each slave node.
  • the process may also involve combining the results from each of the plurality of slave nodes by the master processing node into a master result 4120 and reporting the master result to a user interface.
  • FIG. 1 illustrates an analytic platform for performing data analysis.
  • FIG. 2 depicts cluster processing of an aggregated dataset.
  • An analytic platform 100 may support and include such improved methods and systems.
  • the analytic platform 100 may include, in certain embodiments, a range of hardware systems, software modules, data storage facilities, application programming interfaces, human-readable interfaces, and methodologies, as well as a range of applications, solutions, products, and methods that use various outputs of the analytic platform 100 , as more particularly detailed herein, other embodiments of which would be understood by one of ordinary skill in the art and are encompassed herein.
  • the analytic platform 100 includes methods and systems for providing various representations of data and metadata, methodologies for acting on data and metadata, an analytic engine, and a data management facility that is capable of handling disaggregated data and performing aggregation, calculations, functions, and real-time or quasi-real-time projections.
  • the methods and systems enable much more rapid and flexible manipulation of data sets, so that certain calculations and projections can be done in a fraction of the time as compared with older generation systems.
  • data compression and aggregations of data may be performed in conjunction with a user query such that the aggregation dataset can be specifically generated in a form most applicable for generating calculations and projections based on the query.
  • data compression and aggregations of data may be done prior to, in anticipation of, and/or following a query.
  • an analytic platform 100 (described in more detail below) may calculate projections and other solutions dynamically and create hierarchical data structures with custom dimensions that facilitate the analysis.
  • Such methods and systems may be used to process point-of-sale (POS) data, retail information, geography information, causal information, survey information, census data and other forms of data and forms of assessments of past performance (e.g. estimating the past sales of a certain product within a certain geographical region over a certain period of time) or projections of future results (e.g. estimating the future or expected sales of a certain product within a certain geographical region over a certain period of time).
  • POS point-of-sale
  • various estimates and projections can be used for various purposes of an enterprise, such as relating to purchasing, supply chain management, handling of inventory, pricing decisions, the planning of promotions, marketing plans, financial reporting, and many others.
  • an analytic platform 100 may be used to analyze and process data in a disaggregated or aggregated format, including, without limitation, dimension data defining the dimensions along which various items are measured and factual data about the facts that are measured with respect to the dimensions.
  • Factual data may come from a wide variety of sources and be of a wide range of types, such as traditional periodic point-of-sale (POS) data, causal data (such as data about activities of an enterprise, such as in-store promotions, that are posited to cause changes in factual data), household panel data, frequent shopper program information, daily, weekly, or real time POS data, store database data, store list files, stubs, dictionary data, product lists, as well as custom and traditional audit data. Further extensions into transaction level data, RFID data and data from non-retail industries may also be processed according to the methods and systems described herein.
  • a data loading facility 108 may be used to extract data from available data sources and load them to or within the analytic platform 100 for further storage, manipulation, structuring, fusion, analysis, retrieval, querying and other uses.
  • the data loading facility 108 may have the a plurality of responsibilities that may include eliminating data for non-releasable items, providing correct venue group flags for a venue group, feeding a core information matrix with relevant information (such as and without limitation statistical metrics), or the like.
  • the data loading facility 108 eliminate non-related items.
  • Available data sources may include a plurality of fact data sources 102 and a plurality of dimension data sources 104 .
  • Fact data sources 102 may include, for example, facts about sales volume, dollar sales, distribution, price, POS data, loyalty card transaction files, sales audit files, retailer sales data, and many other fact data sources 102 containing facts about the sales of the enterprise, as well as causal facts, such as facts about activities of the enterprise, in-store promotion audits, electronic pricing and/or promotion files, feature ad coding files, or others that tend to influence or cause changes in sales or other events, such as facts about in-store promotions, advertising, incentive programs, and the like.
  • Other fact data sources may include custom shelf audit files, shipment data files, media data files, explanatory data (e.g., data regarding weather), attitudinal data, or usage data.
  • Dimension data sources 104 may include information relating to any dimensions along which an enterprise wishes to collect data, such as dimensions relating to products sold (e.g. attribute data relating to the types of products that are sold, such as data about UPC codes, product hierarchies, categories, brands, sub-brands, SKUs and the like), venue data (e.g. store, chain, region, country, etc.), time data (e.g. day, week, quad-week, quarter, 12-week, etc.), geographic data (including breakdowns of stores by city, state, region, country or other geographic groupings), consumer or customer data (e.g. household, individual, demographics, household groupings, etc.), and other dimension data sources 104 .
  • dimensions relating to products sold e.g. attribute data relating to the types of products that are sold, such as data about UPC codes, product hierarchies, categories, brands, sub-brands, SKUs and the like
  • venue data e.g. store, chain, region, country, etc.
  • time data e.
  • embodiments disclosed herein relate primarily to the collection of sales and marketing-related facts and the handling of dimensions related to the sales and marketing activities of an enterprise, it should be understood that the methods and systems disclosed herein may be applied to facts of other types and to the handling of dimensions of other types, such as facts and dimensions related to manufacturing activities, financial activities, information technology activities, media activities, supply chain management activities, accounting activities, political activities, contracting activities, and many others.
  • the analytic platform 100 comprises a combination of data, technologies, methods, and delivery mechanisms brought together by an analytic engine.
  • the analytic platform 100 may provide a novel approach to managing and integrating market and enterprise information and enabling predictive analytics.
  • the analytic platform 100 may leverage approaches to representing and storing the base data so that it may be consumed and delivered in real-time, with flexibility and open integration. This representation of the data, when combined with the analytic methods and techniques, and a delivery infrastructure, may minimize the processing time and cost and maximize the performance and value for the end user.
  • This technique may be applied to problems where there may be a need to access integrated views across multiple data sources, where there may be a large multi-dimensional data repository against which there may be a need to rapidly and accurately handle dynamic dimensionality requests, with appropriate aggregations and projections, where there may be highly personalized and flexible real-time reporting 190 , analysis 192 and forecasting capabilities required, where there may be a need to tie seamlessly and on-the-fly with other enterprise applications 184 via web services 194 such as to receive a request with specific dimensionality, apply appropriate calculation methods, perform and deliver an outcome (e.g. dataset, coefficient, etc.), and the like.
  • an outcome e.g. dataset, coefficient, etc.
  • the analytic platform 100 may provide innovative solutions to application partners, including on-demand pricing insights, emerging category insights, product launch management, loyalty insights, daily data out-of-stock insights, assortment planning, on-demand audit groups, neighborhood insights, shopper insights, health and wellness insights, consumer tracking and targeting, and the like.
  • a decision framework may enable new revenue and competitive advantages to application partners by brand building, product innovation, consumer-centric retail execution, consumer and shopper relationship management, and the like.
  • Predictive planning and optimization solutions, automated analytics and insight solutions, and on-demand business performance reporting may be drawn from a plurality of sources, such as InfoScan, total C-scan, daily data, panel data, retailer direct data, SAP, consumer segmentation, consumer demographics, FSP/loyalty data, data provided directly for customers, or the like.
  • the analytic platform 100 may have advantages over more traditional federation/consolidation approaches, requiring fewer updates in a smaller portion of the process.
  • the analytic platform 100 may support greater insight to users, and provide users with more innovative applications.
  • the analytic platform 100 may provide a unified reporting and solutions framework, providing on-demand and scheduled reports in a user dashboard with summary views and graphical dial indicators, as well as flexible formatting options.
  • Benefits and products of the analytic platform 100 may include non-additive measures for custom product groupings, elimination of restatements to save significant time and effort, cross-category visibility to spot emerging trends, provide a total market picture for faster competitor analysis, provide granular data on demand to view detailed retail performance, provide attribute driven analysis for market insights, and the like.
  • the analytic capabilities of the present invention may provide for on-demand projection, on-demand aggregation, multi-source master data management, and the like.
  • On-demand projection may be derived directly for all possible geographies, store and demographic attributes, per geography or category, with built-in dynamic releasability controls, and the like.
  • On-demand aggregation may provide both additive and non-additive measures, provide custom groups, provide cross-category or geography analytics, and the like.
  • Multi-source master data management may provide management of dimension member catalogue and hierarchy attributes, processing of raw fact data that may reduce harmonization work to attribute matching, product and store attributes stored relationally, with data that may be extended independently of fact data, and used to create additional dimensions, and the like.
  • the analytic platform 100 may provide flexibility, while maintaining a structured user approach. Flexibility may be realized with multiple hierarchies applied to the same database, the ability to create new custom hierarchies and views, rapid addition of new measures and dimensions, and the like.
  • the user may be provided a structured approach through publishing and subscribing reports to a broader user base, by enabling multiple user classes with different privileges, providing security access, and the like.
  • the user may also be provided with increased performance and ease of use, through leading-edge hardware and software, and web application for integrated analysis.
  • the data available within a fact data source 102 and a dimension data source 104 may be linked, such as through the use of a key.
  • key-based fusion of fact 102 and dimension data 104 may occur by using a key, such as using the Abilitec Key software product offered by Acxiom, in order to fuse multiple sources of data.
  • a key can be used to relate loyalty card data (e.g., Grocery Store 1 loyalty card, Grocery Store 2 loyalty card, and Convenience Store 1 loyalty card) that are available for a single customer, so that the fact data from multiple sources can be used as a fused data source for analysis on desirable dimensions.
  • loyalty card data e.g., Grocery Store 1 loyalty card, Grocery Store 2 loyalty card, and Convenience Store 1 loyalty card
  • an analyst might wish to view time-series trends in the dollar sales allotted by the customer to each store within a given product category.
  • the data loading facility may comprise any of a wide range of data loading facilities, including or using suitable connectors, bridges, adaptors, extraction engines, transformation engines, loading engines, data filtering facilities, data cleansing facilities, data integration facilities, or the like, of the type known to those of ordinary skill in the art.
  • POS data may be automatically transmitted to the facts database after the sales information has been collected at the stores POS terminals.
  • the same store may also provide information about how it promoted certain products, its store or the like. This data may be stored in another database; however, this causal information may provide one with insight on recent sales activities so it may be used in later sales assessments or forecasts.
  • a manufacturer may load product attribute data into yet another database and this data may also be accessible for sales assessment or projection analysis. For example, when making such analysis one may be interested in knowing what categories of products sold well or what brand sold well.
  • the causal store information may be aggregated with the POS data and dimension data corresponding to the products referred to in the POS data. With this aggregation of information one can make an analysis on any of the related data.
  • data that is obtained by the data loading facility 108 may be transferred to a plurality of facilities within the analytic platform 100 , including the data mart 114 .
  • the data loading facility 108 may contain one or more interfaces 182 by which the data loaded by the data loading facility 108 may interact with or be used by other facilities within the platform 100 or external to the platform.
  • Interfaces to the data loading facility 108 may include human-readable user interfaces, application programming interfaces (APIs), registries or similar facilities suitable for providing interfaces to services in a services oriented architecture, connectors, bridges, adaptors, bindings, protocols, message brokers, extraction facilities, transformation facilities, loading facilities and other data integration facilities suitable for allowing various other entities to interact with the data loading facility 108 .
  • the interfaces 182 may support interactions with the data loading facility 108 by applications 184 , solutions 188 , reporting facilities 190 , analyses facilities 192 , services 194 or other entities, external to or internal to an enterprise. In embodiments these interfaces are associated with interfaces 182 to the platform 100 , but in other embodiments direct interfaces may exist to the data loading facility 108 , either by other components of the platform 100 , or by external entities.
  • the data mart facility 114 may be used to store data loaded from the data loading facility 108 and to make the data loaded from the data loading facility 108 available to various other entities in or external to the platform 100 in a convenient format.
  • facilities may be present to further store, manipulate, structure, subset, merge, join, fuse, or perform a wide range of data structuring and manipulation activities.
  • the data mart facility 114 may also allow storage, manipulation and retrieval of metadata, and perform activities on metadata similar to those disclosed with respect to data.
  • the data mart facility 114 may allow storage of data and metadata about facts (including sales facts, causal facts, and the like) and dimension data, as well as other relevant data and metadata.
  • the data mart facility 114 may compress the data and/or create summaries in order to facilitate faster processing by other of the applications 184 within the platform 100 (e.g. the analytic server 134 ).
  • the data mart facility 114 may include various methods, components, modules, systems, sub-systems, features or facilities associated with data and metadata.
  • the data mart facility 114 may contain one or more interfaces 182 (not shown on FIG. 1 ), by which the data loaded by the data mart facility 114 may interact with or be used by other facilities within the platform 100 or external to the platform.
  • Interfaces to the data mart facility 114 may include human-readable user interfaces, application programming interfaces (APIs), registries or similar facilities suitable for providing interfaces to services in a services oriented architecture, connectors, bridges, adaptors, bindings, protocols, message brokers, extraction facilities, transformation facilities, loading facilities and other data integration facilities suitable for allowing various other entities to interact with the data mart facility 114 .
  • APIs application programming interfaces
  • registries or similar facilities suitable for providing interfaces to services in a services oriented architecture, connectors, bridges, adaptors, bindings, protocols, message brokers, extraction facilities, transformation facilities, loading facilities and other data integration facilities suitable for allowing various other entities to interact with the data mart facility 114 .
  • interfaces may comprise interfaces 182 to the platform 100 as a whole, or may be interfaces associated directly with the data mart facility 114 itself, such as for access from other components of the platform 100 or for access by external entities directly to the data mart facility 114 .
  • the interfaces 182 may support interactions with the data mart facility 114 by applications 184 , solutions 188 , reporting facilities 190 , analyses facilities 192 , services 194 (each of which is describe in greater detail herein) or other entities, external to or internal to an enterprise.
  • the security facility 118 may be any hardware or software implementation, process, procedure, or protocol that may be used to block, limit, filter or alter access to the data mart facility 114 , and/or any of the facilities within the data mart facility 114 , by a human operator, a group of operators, an organization, software program, bot, virus, or some other entity or program.
  • the security facility 118 may include a firewall, an anti-virus facility, a facility for managing permission to store, manipulate and/or retrieve data or metadata, a conditional access facility, a logging facility, a tracking facility, a reporting facility, an asset management facility, an intrusion-detection facility, an intrusion-prevention facility or other suitable security facility.
  • the analytic platform 100 may include an analytic engine 134 .
  • the analytic engine 134 may be used to build and deploy analytic applications or solutions or undertake analytic methods based upon the use of a plurality of data sources and data types.
  • the analytic engine 134 may perform a wide range of calculations and data manipulation steps necessary to apply models, such as mathematical and economic models, to sets of data, including fact data, dimension data, and metadata.
  • the analytic engine 134 may be associated with an interface 182 , such as any of the interfaces described herein.
  • the analytic engine 134 may interact with a model storage facility 148 , which may be any facility for generating models used in the analysis of sets of data, such as economic models, econometric models, forecasting models, decision support models, estimation models, projection models, and many others.
  • output from the analytic engine 134 may be used to condition or refine models in the model storage 148 ; thus, there may be a feedback loop between the two, where calculations in the analytic engine 134 are used to refine models managed by the model storage facility 148 .
  • a security facility 138 of the analytic engine 134 may be the same or similar to the security facility 118 associated with the data mart facility 114 , as described herein.
  • the security facility 138 associated with the analytic engine 134 may have features and rules that are specifically designed to operate within the analytic engine 134 .
  • the analytic platform 100 may contain a master data management hub 150 (MDMH).
  • MDMH 150 may serve as a central facility for handling dimension data used within the analytic platform 100 , such as data about products, stores, venues, geographies, time periods and the like, as well as various other dimensions relating to or associated with the data and metadata types in the data sources 102 , 104 , the data loading facility 108 , the data mart facility 114 , the analytic engine 134 , the model storage facility 148 or various applications, 184 , solutions 188 , reporting facilities 190 , analytic facilities 192 or services 194 that interact with the analytic platform 100 .
  • the MDMH 150 may in embodiments include a security facility 152 , an interface 158 , a data loader 160 , a data manipulation and structuring facility 162 , and one or more staging tables 164 .
  • the data loader 160 may be used to receive data. Data may enter the MDMH from various sources, such as from the data mart 114 after the data mart 114 completes its intended processing of the information and data that it received as described herein. Data may also enter the MDMH 150 through a user interface 158 , such as an API or a human user interface, web browser or some other interface, of any of the types disclosed herein or in the documents incorporated by reference herein.
  • the user interface 158 may be deployed on a client device, such as a PDA, personal computer, laptop computer, cellular phone, or some other client device capable of handling data.
  • the staging tables 164 may be included in the MDMH 150 .
  • a matching facility 180 may be associated with the MDMH 150 .
  • the matching facility 180 may receive an input data hierarchy within the MDMH 150 and analyze the characteristics of the hierarchy and select a set of attributes that are salient to a particular analytic interest (e.g., product selection by a type of consumer, product sales by a type of venue, and so forth).
  • the matching facility 180 may select primary attributes, match attributes, associate attributes, block attributes and prioritize the attributes.
  • the matching facility 180 may associate each attribute with a weight and define a set of probabilistic weights.
  • the probabilistic weights may be the probability of a match or a non-match, or thresholds of a match or non-match that is associated with an analytic purpose (e.g., product purchase).
  • the probabilistic weights may then be used in an algorithm that is run within a probabilistic matching engine (e.g., IBM QualityStage).
  • the output of the matching engine may provide information on, for example, other products which are appropriate to include in a data hierarchy, the untapped market (i.e. other venues) in which a product is probabilistically more likely to sell well, and so forth.
  • the matching facility 180 may be used to generate projections of what types of products, people, customers, retailers, stores, store departments, etc. are similar in nature and therefore they may be appropriate to combine in a projection or an assessment.
  • the analytic platform 100 may include a projection facility 178 .
  • a projection facility 178 may be used to produce projections, whereby a partial data set (such as data from a subset of stores of a chain) is projected to a universe (such as all of the stores in a chain), by applying appropriate weights to the data in the partial data set.
  • a partial data set such as data from a subset of stores of a chain
  • a universe such as all of the stores in a chain
  • the methodologies can be used to generate projection factors. As to any given projection, there is typically a tradeoff among various statistical quality measurements associated with that type of projection.
  • the projection facility 178 takes dimension information from the MDMH 150 or from another source and provides a set of projection weightings along the applicable dimensions, typically reflected in a matrix of projection weights, which can be applied at the data mart facility 114 to a partial data set in order to render a projected data set.
  • the projection facility 178 may have an interface 182 of any of the types disclosed herein.
  • an interface 182 may be included in the analytic platform 100 .
  • data may be transferred to the MDMH 150 of the platform 100 using a user interface 182 .
  • the interface 182 may be a web browser operating over the Internet or within an intranet or other network, it may be an analytic engine 134 , an application plug-in, or some other user interface that is capable of handling data.
  • the interface 182 may be human readable or may consist of one or more application programming interfaces, or it may include various connectors, adaptors, bridges, services, transformation facilities, extraction facilities, loading facilities, bindings, couplings, or other data integration facilities, including any such facilities described herein or in documents incorporated by reference herein.
  • the platform 100 may interact with a variety of applications 184 , solutions 188 , reporting facilities 190 , analytic facilities 192 and services 194 , such as web services, or with other platforms or systems of an enterprise or external to an enterprise.
  • Any such applications 184 , solutions 188 , reporting facilities 190 , analytic facilities 192 and services 194 may interact with the platform 100 in a variety of ways, such as providing input to the platform 100 (such as data, metadata, dimension information, models, projections, or the like), taking output from the platform 100 (such as data, metadata, projection information, information about similarities, analytic output, output from calculations, or the like), modifying the platform 100 (including in a feedback or iterative loop), being modified by the platform 100 (again optionally in a feedback or iterative loop), or the like.
  • input to the platform 100 such as data, metadata, dimension information, models, projections, or the like
  • output from the platform 100 such as data, metadata, projection information, information about similarities, analytic output, output from calculations, or the like
  • modifying the platform 100 including in a feedback or iterative loop
  • being modified by the platform 100 (again optionally in a feedback or iterative loop), or the like.
  • one or more applications 184 or solutions 188 may interact with the platform 100 via an interface 182 .
  • Applications 184 and solutions 188 may include applications and solutions (consisting of a combination of hardware, software and methods, among other components) that relate to planning the sales and marketing activities of an enterprise, decision support applications, financial reporting applications, applications relating to strategic planning, enterprise dashboard applications, supply chain management applications, inventory management and ordering applications, manufacturing applications, customer relationship management applications, information technology applications, applications relating to purchasing, applications relating to pricing, promotion, positioning, placement and products, and a wide range of other applications and solutions.
  • applications 184 and solutions 188 may include analytic output that is organized around a topic area.
  • the organizing principle of an application 184 or a solution 188 may be a new product introduction. Manufacturers may release thousands of new products each year. It may be useful for an analytic platform 100 to be able to group analysis around the topic area, such as new products, and organize a bundle of analyses and workflows that are presented as an application 184 or solution 188 .
  • Applications 184 and solutions 188 may incorporate planning information, forecasting information, “what if?” scenario capability, and other analytic features.
  • Applications 184 and solutions 188 may be associated with web services 194 that enable users within a client's organization to access and work with the applications 184 and solutions 188 .
  • the analytic platform 100 may facilitate delivering information to external applications 184 . This may include providing data or analytic results to certain classes of applications 184 .
  • an application may include enterprise resource planning/backbone applications 184 such as SAP, including those applications 184 focused on Marketing, Sales & Operations Planning and Supply Chain Management.
  • an application may include business intelligence applications 184 , including those applications 184 that may apply data mining techniques.
  • an application may include customer relationship management applications 184 , including customer sales force applications 184 .
  • an application may include specialty applications 184 such as a price or SKU optimization application.
  • the analytic platform 100 may facilitate supply chain efficiency applications 184 .
  • an application may include supply chain models based on sales out (POS/FSP) rather than sales in (Shipments).
  • POS/FSP sales out
  • an application may include RFID based supply chain management.
  • an application may include a retailer co-op to enable partnership with a distributor who may manage collective stock and distribution services.
  • the analytic platform 100 may be applied to industries characterized by large multi-dimensional data structures. This may include industries such as telecommunications, elections and polling, and the like.
  • the analytic platform 100 may be applied to opportunities to vend large amounts of data through a portal with the possibility to deliver highly customized views for individual users with effectively controlled user accessibility rights. This may include collaborative groups such as insurance brokers, real estate agents, and the like.
  • the analytic platform 100 may be applied to applications 184 requiring self monitoring of critical coefficients and parameters. Such applications 184 may rely on constant updating of statistical models, such as financial models, with real-time flows of data and ongoing re-calibration and optimization. The analytic platform 100 may be applied to applications 184 that require breaking apart and recombining geographies and territories at will.
  • a data field may be dynamically altered to conform to a bit size or some other desired format.
  • a record of the dynamic alteration may be tracked by the analytic platform 100 and stored in a database that may be accessed by other facilities of the analytic platform 100 .
  • a data field may relate to sales data.
  • the sales data field may be dynamically altered to conform to a desired bit size of, for example, 6 bits. Once this alteration is made, a record may be stored indicating that each sales datum in the sales field is a datum of 6 bits.
  • the query may communicate with the stored data indicating the dynamic alteration of sales data to a 6 bit size format.
  • the analytic query may process and analyze the sales data by reading the sales field in 6 bit units. This process may remove the need for the sales data to be associated with a header and/or footer indicating how the sales data is to be read and processed. As a result, processing speed may be increased.
  • the MDMH 150 may be associated with a partitioned database.
  • the MDMH 150 may be further associated with a master cluster node that is, in turn, associated with a plurality of slave cluster nodes.
  • Each partition of the partitioned database may be associated with a slave cluster node or a plurality of slave cluster nodes.
  • Each slave cluster node may be associated with a mirror slave cluster node.
  • the mirror slave cluster node may be used in the event of a node failure of the slave cluster node to which it is assigned to mirror.
  • data such as sales data, may enter the analytic platform 100 using a data loading facility 108 .
  • the sales data may be loaded with the causal fact extractor 110 and processed into a data mart 114 which may store the sales data within a partitioned database.
  • the sales data mart may be processed by the MDMH 150 and the MDMH 150 used to create a portioned sales database.
  • the partitioned sales database may have two partitions, Partition One and Partition Two, each associated with one of the two stores for which sales data are available.
  • Partition One may be associated with Slave Cluster Node One.
  • Partition Two may be associated with Slave Cluster Node Two.
  • Each slave cluster node may, in turn, be associated with a slave cluster node mirror that is associated with the same database partition as the slave cluster node to which it is a mirror.
  • the MDMH 150 and the master cluster node may store and/or have access to stored data indicating the associations among the database partitions and the slave cluster nodes.
  • the master cluster node may command the Slave Cluster Node One (which is associated with the Store One sales data that is stored in Partition One) to process Store One's sales data.
  • This command from the master cluster node may be associated with information relating to dynamic alterations that have been performed on the stored data (e.g., the bit size of each stored datum) to enable the slave node to accurately read the sales data during analysis.
  • the analysis may take place on a plurality of slave cluster nodes, each of which is associated with a database partition or plurality of database partitions.
  • the partitioned database may be updated as new data become available.
  • the update may be made on the fly, at a set interval, or according to some other criteria.
  • the cluster-based processing may be associated with bitmap compression techniques, including word-aligned hybrid (WAH) code compression.
  • WAH compression may be used to increase cluster processing speed by using run-length encoding for long sequences of identical bits and encoding/decoding bitmaps in word size groupings in order to reduce their computational complexity.
  • failover clusters may be implemented for the purpose of improving the availability of services which a cluster provides. Failover clusters may operate using redundant nodes, which may be used to provide service when system components fail. Failover cluster implementations may manage the redundancy inherent in a cluster to minimize the impact of single points of failure.
  • load-balancing clusters may operate by having all workload come through one or more load-balancing front ends, which then distribute it to a collection of back end servers. Such a cluster of computers is sometimes referred to as a server farm.
  • high-performance clusters may be implemented to provide increased performance by splitting a computational task across many different nodes in the cluster. Such clusters commonly run custom programs which have been designed to exploit the parallelism available on high-performance clusters. High-performance clusters are optimized for workloads which require jobs or processes happening on the separate cluster computer nodes to communicate actively during the computation. These include computations where intermediate results from one node's calculations will affect future calculations on other nodes.
  • MPI Message passing interface
  • API application programming interface
  • MPI has defined semantics and flexible interpretations; it does not define the protocol by which these operations are to be performed in the sense of sockets for TCP/IP or other layer-4 and below models in the ISO/OSI Reference Model. It is consequently a layer-5+ type set of interfaces, although implementations can cover most layers of the reference model, with sockets+TCP/IP as a common transport used inside the implementation.
  • MPI's goals are high performance, scalability, and portability. It may express parallelism explicitly rather than implicitly.
  • MPI is a de facto standard for communication among the processes modeling a parallel program on a distributed memory system. Often these programs are mapped to clusters, actual distributed memory supercomputers, and to other environments. However, the principal MPI-1 model has no shared memory concept, and MPI-2 has only a limited distributed shared memory concept used in one portion of that set of extensions.
  • the analytic server may use ODBC to connect to a data server.
  • An ODBC library may use socket communication through the socket library to communicate with the data server.
  • the data server may be cluster-based in order to distribute the data server processing.
  • a socket communication library may reside on the data server.
  • the data server may pass information to a SQL parser module.
  • Gnu Flex and/or Bison may used to generate a Lexer and parser.
  • a master node and multiple slave nodes may be used in a cluster framework.
  • a master node may obtain the SQL code by ODBC sockets and forward it to a parser to interpret the SQL sequence.
  • MPI may be used to distribute the server request to slave nodes for processing.
  • a bitvector implementation may be used.
  • retrieval may be facilitated based at least in part on representing the data as efficiently as possible.
  • This efficiency may enable the data to be kept in memory as an in-memory database.
  • data structures may be used that are small enough that they may be stored in memory.
  • multiple record types may be used to allow minimizing the data size so that it may be kept in memory within a hardware implementation. Keeping the data within a hardware implementation may have the additional advantage of reducing the expense of the system.
  • the cluster system may fit modestly sized hardware nodes with modest amounts of memory. This may keep the data near the CPU, so that one mustn't use file-based I/O. Data that is in the regular system memory may be directly accessed by the CPU.
  • a distribution hash key may be used to divide the data among the nodes.
  • the data may be partitioned by one dimension.
  • an analyst may want to analyze a set of retail store data looking at which products are selling, taking into account the size of the store revenue in which they are sold. Store One may have $10 M in revenue, Store Two $20 M, and Store Three $30 M.
  • the analytic goal is to determine how well a brand of cola is selling relative to the size of the store in which it is sold. To accomplish this, one may analyze the total potential size and figure out how well a product is selling relative to the whole. However, this may be difficult because one may have to look across multiple time periods in which the product may be selling multiple times but only count it once. The use of a distinct sum or count operator may be expensive, especially in something that is in millions of records.
  • this data may be partitioned by “venue” so that a venue only exists on one of the processing nodes. If all of a venue's data is processed on a unique node there is a reduced risk of double-counting, as the data only reside in a single location. On the other hand, if the data are distributed by venue and some other key, one might have data for the same venue located in multiple places. By partitioning by venue and associating each venue with an independent node, the venues may be added on the master node.
  • partitioning may be done within each node by certain dimensions in order to more efficiently access those data according to which data dimensions clients have used in the past. For example, data may be partitioned by venue and time, so that on any given processing node it is relatively easy to access particular sets of information based on venue and time dimensions.
  • partitioning may be used as an implicit indexing method. This may simplify the process of analyzing wanted data without having to build an actual index.
  • cluster processing may be dynamically configurable to accommodate increases and/or reductions in the number of nodes that are used.
  • cluster processing may have failover processes that may re-enable a cluster by having a node take on the function of another node that has failed
  • a threading model may be used for inter-processing communication between the nodes and the master.
  • Posix threads may be used in combination with an MPI.
  • multiple threads may run with one logical process and with separate physical processes running on different machines.
  • a thread model may form the backbone of communication between processing elements.
  • An inbound SQL request may come into the master node and be intercepted by a thread that is using a socket.
  • the thread may transmit to a master thread running on each slave process that creates threads that do actual analysis and, in turn, communicate to a listener thread on the master that passes information to a collator thread on the master.
  • a new series of threads may be created for new thread arrival.
  • the listener threads may be designed to look for information from a specific slave source. If a query comes into the system, a new collator thread may be created, a new worker thread created in each slave node, and information sent from each slave node to a listener on the master that passes information to the collator thread created for that query. The collator thread may then pass information back through the socket to the ODBC client.
  • this system may be scalable. For every slave that is created, the system may create a new listener thread for that code.
  • inter-server communication may be done through MPI.
  • Data server and client communication may be conducted using regular sockets.
  • Each server may have data (its partition of information), so that each of the servers knows what information for which it is responsible.
  • the collator may collate the partial results into a final result set.
  • ODBC may pass to a master node and a master thread in the master node's process.
  • the SQL query may be translated into something the server can understand.
  • the master node may pass a thread to all nodes as part of a Query One.
  • the first node may retrieve Store One data, and may add up a partial result and creates a data tuple that it communicates back to the listener for that slave node.
  • the Second Node may do the same thing and communicate with its listener. Nodes with only Store Two (as opposed to Store One data) may do nothing.
  • the collator may add up the results from the two relevant listeners' results.
  • socket communication it may communicate the result through ODBC communication to the client. After that is accomplished, the collator thread and worker threads that performed the retrieval may be omitted. In embodiments, these transient threads may be associated with and used for a particular query.
  • a normalization scheme may be used in order to minimize the size of internal data structures.
  • An aspect of the present invention relates to cluster processing of an aggregated dataset.
  • a logical process 4100 may be used to for processing the aggregated dataset in clusters.
  • the present invention illustrates the processing of the aggregated data.
  • a fact data source 102 and a dimension data source 104 may be linked through a key.
  • the fact data source 102 from multiple data source can be used as an aggregated data source for analysis on desirable dimensions. For example, an analyst might wish to view time-series trends in the dollar sales allotted by the customer to each store within a given product category.
  • systems and methods may involve using a platform as disclosed herein for applications described herein where the systems and methods involve receiving a aggregated dataset, wherein the aggregated dataset includes data from a panel data source, a fact data source, and a dimension data source that have been associated with a standard population database 4102 .
  • the process may also involve storing the aggregated data in a partition within a partitioned database, wherein the partition is associated with a data characteristic 4104 .
  • the process may also involve associating a master processing node with a plurality of slave nodes, wherein each of the plurality of slave nodes is associated with a partition of the partitioned database 4108 .
  • the process may also involve submitting an analytic query to the master processing node 4110 .
  • the process may also involve assigning analytic processing to at least one of the plurality of slave nodes by the master processing node, wherein the assignment is based at least in part on the association of the partition with the data characteristic 4112 .
  • the process may also involve reading the aggregated data from the partitioned database by the assigned slave node 4114 .
  • the process may also involve analyzing the aggregated data by the assigned slave node, wherein the analysis produces a result at each slave node 4118 .
  • the process may also involve combining the results from each of the plurality of slave nodes by the master processing node into a master result 4120 and reporting the master result to a user interface 4122 .
  • the methods or processes described above, and steps thereof, may be realized in hardware, software, or any combination of these suitable for a particular application.
  • the hardware may include a general-purpose computer and/or dedicated computing device.
  • the processes may be realized in one or more microprocessors, microcontrollers, embedded microcontrollers, programmable digital signal processors or other programmable device, along with internal and/or external memory.
  • the processes may also, or instead, be embodied in an application specific integrated circuit, a programmable gate array, programmable array logic, or any other device or combination of devices that may be configured to process electronic signals.
  • one or more of the processes may be realized as computer executable code created using a structured programming language such as C, an object oriented programming language such as C++, or any other high-level or low-level programming language (including assembly languages, hardware description languages, and database programming languages and technologies) that may be stored, compiled or interpreted to run on one of the above devices, as well as heterogeneous combinations of processors, processor architectures, or combinations of different hardware and software.
  • a structured programming language such as C
  • an object oriented programming language such as C++
  • any other high-level or low-level programming language including assembly languages, hardware description languages, and database programming languages and technologies
  • each method described above and combinations thereof may be embodied in computer executable code that, when executing on one or more computing devices, performs the steps thereof.
  • the methods may be embodied in systems that perform the steps thereof, and may be distributed across devices in a number of ways, or all of the functionality may be integrated into a dedicated, standalone device or other hardware.
  • means for performing the steps associated with the processes described above may include any of the hardware and/or software described above. All such permutations and combinations are intended to fall within the scope of the present disclosure.

Abstract

Systems and methods are presented that may involve receiving a aggregated dataset, wherein the aggregated dataset includes data from a panel data source, a fact data source, and a dimension data source that have been associated with a standard population database. The process may also involve storing the aggregated data in a partition within a partitioned database, wherein the partition is associated with a data characteristic. The process may also involve associating a master processing node with a plurality of slave nodes, wherein each of the plurality of slave nodes is associated with a partition of the partitioned database. The process may also involve submitting an analytic query to the master processing node. The process may also involve assigning analytic processing to at least one of the plurality of slave nodes by the master processing node, wherein the assignment is based at least in part on the association of the partition with the data characteristic. The process may also involve reading the aggregated data from the partitioned database by the assigned slave node. The process may also involve analyzing the aggregated data by the assigned slave node, wherein the analysis produces a result at each slave node. The process may also involve combining the results from each of the plurality of slave nodes by the master processing node into a master result and reporting the master result to a user interface.

Description

    CROSS-REFERENCE TO RELATED APPLICATIONS
  • This application claims the benefit of the following U.S. provisional applications: App. No. 60/887,573 filed on Jan. 31, 2007 and entitled “Analytic Platform,” App. No. 60/891,508 filed on Feb. 24, 2007 and entitled “Analytic Platform,” App. No. 60/891,936 filed on Feb. 27, 2007 and entitled “Analytic Platform,” App. No. 60/952,898 filed on Jul. 31, 2007 and entitled “Analytic Platform.”
  • This application is a continuation-in-part of U.S. application Ser. No. 12/021,263 filed on Jan. 28, 2008 and entitled “Associating a Granting Matrix with an Analytic Platform”, which claims the benefit of the following U.S. provisional applications: App. No. 60/886,798 filed on Jan. 26, 2007 and entitled “A Method of Aggregating Data,” App. No. 60/886,801 filed on Jan. 26, 2007 and entitled “Utilizing Aggregated Data.”
  • Each of the above applications is incorporated by reference herein in its entirety.
  • BACKGROUND
  • 1. Field
  • This invention relates to methods and systems for analyzing data, and more particularly to methods and systems for aggregating, projecting, and releasing data.
  • 2. Description of Related Art
  • Currently, there exists a large variety of data sources, such as census data or movement data received from point-of-sale terminals, sample data received from manual surveys, panel data obtained from the inputs of consumers who are members of panels, fact data relating to products, sales, and many other facts associated with the sales and marketing efforts of an enterprise, and dimension data relating to dimensions along which an enterprise wishes to understand data, such as in order to analyze consumer behaviors, to predict likely outcomes of decisions relating to an enterprise's activities, and to project from sample sets of data to a larger universe. Conventional methods of synthesizing, aggregating, and exploring such a universe of data comprise techniques such as OLAP, which fix aggregation points along the dimensions of the universe in order to reduce the size and complexity of unified information sets such as OLAP stars. Exploration of the unified information sets can involve run-time queries and query-time projections, both of which are constrained in current methods by a priori decisions that must be made to project and aggregate the universe of data. In practice, going back and changing the a priori decisions can lift these constraints, but this requires an arduous and computationally complex restructuring and reprocessing of data.
  • According to current business practices, unified information sets and results drawn from such information sets can be released to third parties according to so-called “releasability” rules. Theses rules might apply to any and all of the data from which the unified information sets are drawn, the dimensions (or points or ranges along the dimensions), the third party (or members or sub-organizations of the third party), and so on. Given this, there can be a complex interaction between the data, the dimensions, the third party, the releasability rules, the levels along the dimensions at which aggregations are performed, the information that is drawn from the unified information sets, and so on. In practice, configuring a system to apply the releasability rules is an error-prone process that requires extensive manual set up and results in a brittle mechanism that cannot adapt to on-the-fly changes in data, dimensions, third parties, rules, aggregations, projections, user queries, and so on.
  • Various projection methodologies are known in the art. Still other projection methodologies are subjects of the present invention. In any case, different projection methodologies provide outputs that have different statistical qualities. Analysts are interested in specifying the statistical qualities of the outputs at query-time. In practice, however, the universe of data and the projection methodologies that are applied to it are what drive the statistical qualities. Existing methods allow an analyst to choose a projection methodology and thereby affect the statistical qualities of the output, but this does not satisfy the analyst's desire to directly dictate the statistical qualities.
  • Information systems are a significant bottle neck for market analysis activities. The architecture of information systems is often not designed to provide on-demand flexible access, integration at a very granular level, or many other critical capabilities necessary to support growth. Thus, information systems are counter-productive to growth. Hundreds of market and consumer databases make it very difficult to manage or integrate data. For example, there may be a separate database for each data source, hierarchy, and other data characteristics relevant to market analysis. Different market views and product hierarchies proliferate among manufacturers and retailers. Restatements of data hierarchies waste precious time and are very expensive. Navigation from among views of data, such as from global views to regional to neighborhood to store views is virtually impossible, because there are different hierarchies used to store data from global to region to neighborhood to store-level data. Analyses and insights often take weeks or months, or they are never produced. Insights are often sub-optimal because of silo-driven, narrowly defined, ad hoc analysis projects. Reflecting the ad hoc nature of these analytic projects are the analytic tools and infrastructure developed to support them. Currently, market analysis, business intelligence, and the like often use rigid data cubes that may include hundreds of databases that are impossible to integrate. These systems may include hundreds of views, hierarchies, clusters, and so forth, each of which is associated with its own rigid data cube. This may make it almost impossible to navigate from global uses that are used, for example, to develop overall company strategy, down to specific program implementation or customer-driven uses. These ad hoc analytic tools and infrastructure are fragmented and disconnected.
  • In sum, there are many problems associated with the data used for market analysis, and there is a need for a flexible, extendable analytic platform, the architecture for which is designed to support a broad array of evolving market analysis needs. Furthermore, there is a need for better business intelligence in order to accelerate revenue growth, make business intelligence more customer-driven, to gain insights about markets in a more timely fashion, and a need for data projection and release methods and systems that provide improved dimensional flexibility, reduced query-time computational complexity, automatic selection and blending of projection methodologies, and flexibly applied releasability rules.
  • SUMMARY
  • In embodiments, systems and methods may involve using a platform as disclosed herein for applications described herein where the systems and methods involve receiving a aggregated dataset, wherein the aggregated dataset includes data from a panel data source, a fact data source, and a dimension data source that have been associated with a standard population database. The process may also involve storing the aggregated data in a partition within a partitioned database, wherein the partition is associated with a data characteristic. The process may also involve associating a master processing node with a plurality of slave nodes, wherein each of the plurality of slave nodes is associated with a partition of the partitioned database. The process may also involve submitting an analytic query to the master processing node. The process may also involve assigning analytic processing to at least one of the plurality of slave nodes by the master processing node, wherein the assignment is based at least in part on the association of the partition with the data characteristic. The process may also involve reading the aggregated data from the partitioned database by the assigned slave node. The process may also involve analyzing the aggregated data by the assigned slave node, wherein the analysis produces a result at each slave node. The process may also involve combining the results from each of the plurality of slave nodes by the master processing node into a master result 4120 and reporting the master result to a user interface.
  • These and other systems, methods, objects, features, and advantages of the present invention will be apparent to those skilled in the art from the following detailed description of the preferred embodiment and the drawings. Capitalized terms used herein (such as relating to titles of data objects, tables, or the like) should be understood to encompass other similar content or features performing similar functions, except where the context specifically limits such terms to the use herein.
  • BRIEF DESCRIPTION OF THE FIGURES
  • The invention and the following detailed description of certain embodiments thereof may be understood by reference to the following figures:
  • FIG. 1 illustrates an analytic platform for performing data analysis.
  • FIG. 2 depicts cluster processing of an aggregated dataset.
  • DETAILED DESCRIPTION
  • Referring to FIG. 1, the methods and systems disclosed herein are related to improved methods for handling and using data and metadata for the benefit of an enterprise. An analytic platform 100 may support and include such improved methods and systems. The analytic platform 100 may include, in certain embodiments, a range of hardware systems, software modules, data storage facilities, application programming interfaces, human-readable interfaces, and methodologies, as well as a range of applications, solutions, products, and methods that use various outputs of the analytic platform 100, as more particularly detailed herein, other embodiments of which would be understood by one of ordinary skill in the art and are encompassed herein. Among other components, the analytic platform 100 includes methods and systems for providing various representations of data and metadata, methodologies for acting on data and metadata, an analytic engine, and a data management facility that is capable of handling disaggregated data and performing aggregation, calculations, functions, and real-time or quasi-real-time projections. In certain embodiments, the methods and systems enable much more rapid and flexible manipulation of data sets, so that certain calculations and projections can be done in a fraction of the time as compared with older generation systems.
  • In embodiments, data compression and aggregations of data, such as fact data sources 102, and dimension data sources 104, may be performed in conjunction with a user query such that the aggregation dataset can be specifically generated in a form most applicable for generating calculations and projections based on the query. In embodiments, data compression and aggregations of data may be done prior to, in anticipation of, and/or following a query. In embodiments, an analytic platform 100 (described in more detail below) may calculate projections and other solutions dynamically and create hierarchical data structures with custom dimensions that facilitate the analysis. Such methods and systems may be used to process point-of-sale (POS) data, retail information, geography information, causal information, survey information, census data and other forms of data and forms of assessments of past performance (e.g. estimating the past sales of a certain product within a certain geographical region over a certain period of time) or projections of future results (e.g. estimating the future or expected sales of a certain product within a certain geographical region over a certain period of time). In turn, various estimates and projections can be used for various purposes of an enterprise, such as relating to purchasing, supply chain management, handling of inventory, pricing decisions, the planning of promotions, marketing plans, financial reporting, and many others.
  • Referring still to FIG. 1 an analytic platform 100 is illustrated that may be used to analyze and process data in a disaggregated or aggregated format, including, without limitation, dimension data defining the dimensions along which various items are measured and factual data about the facts that are measured with respect to the dimensions. Factual data may come from a wide variety of sources and be of a wide range of types, such as traditional periodic point-of-sale (POS) data, causal data (such as data about activities of an enterprise, such as in-store promotions, that are posited to cause changes in factual data), household panel data, frequent shopper program information, daily, weekly, or real time POS data, store database data, store list files, stubs, dictionary data, product lists, as well as custom and traditional audit data. Further extensions into transaction level data, RFID data and data from non-retail industries may also be processed according to the methods and systems described herein.
  • In embodiments, a data loading facility 108 may be used to extract data from available data sources and load them to or within the analytic platform 100 for further storage, manipulation, structuring, fusion, analysis, retrieval, querying and other uses. The data loading facility 108 may have the a plurality of responsibilities that may include eliminating data for non-releasable items, providing correct venue group flags for a venue group, feeding a core information matrix with relevant information (such as and without limitation statistical metrics), or the like. In an embodiment, the data loading facility 108 eliminate non-related items. Available data sources may include a plurality of fact data sources 102 and a plurality of dimension data sources 104. Fact data sources 102 may include, for example, facts about sales volume, dollar sales, distribution, price, POS data, loyalty card transaction files, sales audit files, retailer sales data, and many other fact data sources 102 containing facts about the sales of the enterprise, as well as causal facts, such as facts about activities of the enterprise, in-store promotion audits, electronic pricing and/or promotion files, feature ad coding files, or others that tend to influence or cause changes in sales or other events, such as facts about in-store promotions, advertising, incentive programs, and the like. Other fact data sources may include custom shelf audit files, shipment data files, media data files, explanatory data (e.g., data regarding weather), attitudinal data, or usage data. Dimension data sources 104 may include information relating to any dimensions along which an enterprise wishes to collect data, such as dimensions relating to products sold (e.g. attribute data relating to the types of products that are sold, such as data about UPC codes, product hierarchies, categories, brands, sub-brands, SKUs and the like), venue data (e.g. store, chain, region, country, etc.), time data (e.g. day, week, quad-week, quarter, 12-week, etc.), geographic data (including breakdowns of stores by city, state, region, country or other geographic groupings), consumer or customer data (e.g. household, individual, demographics, household groupings, etc.), and other dimension data sources 104. While embodiments disclosed herein relate primarily to the collection of sales and marketing-related facts and the handling of dimensions related to the sales and marketing activities of an enterprise, it should be understood that the methods and systems disclosed herein may be applied to facts of other types and to the handling of dimensions of other types, such as facts and dimensions related to manufacturing activities, financial activities, information technology activities, media activities, supply chain management activities, accounting activities, political activities, contracting activities, and many others.
  • In an embodiment, the analytic platform 100 comprises a combination of data, technologies, methods, and delivery mechanisms brought together by an analytic engine. The analytic platform 100 may provide a novel approach to managing and integrating market and enterprise information and enabling predictive analytics. The analytic platform 100 may leverage approaches to representing and storing the base data so that it may be consumed and delivered in real-time, with flexibility and open integration. This representation of the data, when combined with the analytic methods and techniques, and a delivery infrastructure, may minimize the processing time and cost and maximize the performance and value for the end user. This technique may be applied to problems where there may be a need to access integrated views across multiple data sources, where there may be a large multi-dimensional data repository against which there may be a need to rapidly and accurately handle dynamic dimensionality requests, with appropriate aggregations and projections, where there may be highly personalized and flexible real-time reporting 190, analysis 192 and forecasting capabilities required, where there may be a need to tie seamlessly and on-the-fly with other enterprise applications 184 via web services 194 such as to receive a request with specific dimensionality, apply appropriate calculation methods, perform and deliver an outcome (e.g. dataset, coefficient, etc.), and the like.
  • The analytic platform 100 may provide innovative solutions to application partners, including on-demand pricing insights, emerging category insights, product launch management, loyalty insights, daily data out-of-stock insights, assortment planning, on-demand audit groups, neighborhood insights, shopper insights, health and wellness insights, consumer tracking and targeting, and the like.
  • A decision framework may enable new revenue and competitive advantages to application partners by brand building, product innovation, consumer-centric retail execution, consumer and shopper relationship management, and the like. Predictive planning and optimization solutions, automated analytics and insight solutions, and on-demand business performance reporting may be drawn from a plurality of sources, such as InfoScan, total C-scan, daily data, panel data, retailer direct data, SAP, consumer segmentation, consumer demographics, FSP/loyalty data, data provided directly for customers, or the like.
  • The analytic platform 100 may have advantages over more traditional federation/consolidation approaches, requiring fewer updates in a smaller portion of the process. The analytic platform 100 may support greater insight to users, and provide users with more innovative applications. The analytic platform 100 may provide a unified reporting and solutions framework, providing on-demand and scheduled reports in a user dashboard with summary views and graphical dial indicators, as well as flexible formatting options. Benefits and products of the analytic platform 100 may include non-additive measures for custom product groupings, elimination of restatements to save significant time and effort, cross-category visibility to spot emerging trends, provide a total market picture for faster competitor analysis, provide granular data on demand to view detailed retail performance, provide attribute driven analysis for market insights, and the like.
  • The analytic capabilities of the present invention may provide for on-demand projection, on-demand aggregation, multi-source master data management, and the like. On-demand projection may be derived directly for all possible geographies, store and demographic attributes, per geography or category, with built-in dynamic releasability controls, and the like. On-demand aggregation may provide both additive and non-additive measures, provide custom groups, provide cross-category or geography analytics, and the like. Multi-source master data management may provide management of dimension member catalogue and hierarchy attributes, processing of raw fact data that may reduce harmonization work to attribute matching, product and store attributes stored relationally, with data that may be extended independently of fact data, and used to create additional dimensions, and the like.
  • In addition, the analytic platform 100 may provide flexibility, while maintaining a structured user approach. Flexibility may be realized with multiple hierarchies applied to the same database, the ability to create new custom hierarchies and views, rapid addition of new measures and dimensions, and the like. The user may be provided a structured approach through publishing and subscribing reports to a broader user base, by enabling multiple user classes with different privileges, providing security access, and the like. The user may also be provided with increased performance and ease of use, through leading-edge hardware and software, and web application for integrated analysis.
  • In embodiments, the data available within a fact data source 102 and a dimension data source 104 may be linked, such as through the use of a key. For example, key-based fusion of fact 102 and dimension data 104 may occur by using a key, such as using the Abilitec Key software product offered by Acxiom, in order to fuse multiple sources of data. For example, such a key can be used to relate loyalty card data (e.g., Grocery Store 1 loyalty card, Grocery Store 2 loyalty card, and Convenience Store 1 loyalty card) that are available for a single customer, so that the fact data from multiple sources can be used as a fused data source for analysis on desirable dimensions. For example, an analyst might wish to view time-series trends in the dollar sales allotted by the customer to each store within a given product category.
  • In embodiments the data loading facility may comprise any of a wide range of data loading facilities, including or using suitable connectors, bridges, adaptors, extraction engines, transformation engines, loading engines, data filtering facilities, data cleansing facilities, data integration facilities, or the like, of the type known to those of ordinary skill in the art. In various embodiments, there are many situations where a store will provide POS data and causal information relating to its store. For example, the POS data may be automatically transmitted to the facts database after the sales information has been collected at the stores POS terminals. The same store may also provide information about how it promoted certain products, its store or the like. This data may be stored in another database; however, this causal information may provide one with insight on recent sales activities so it may be used in later sales assessments or forecasts. Similarly, a manufacturer may load product attribute data into yet another database and this data may also be accessible for sales assessment or projection analysis. For example, when making such analysis one may be interested in knowing what categories of products sold well or what brand sold well. In this case, the causal store information may be aggregated with the POS data and dimension data corresponding to the products referred to in the POS data. With this aggregation of information one can make an analysis on any of the related data.
  • Referring still to FIG. 1, data that is obtained by the data loading facility 108 may be transferred to a plurality of facilities within the analytic platform 100, including the data mart 114. In embodiments the data loading facility 108 may contain one or more interfaces 182 by which the data loaded by the data loading facility 108 may interact with or be used by other facilities within the platform 100 or external to the platform. Interfaces to the data loading facility 108 may include human-readable user interfaces, application programming interfaces (APIs), registries or similar facilities suitable for providing interfaces to services in a services oriented architecture, connectors, bridges, adaptors, bindings, protocols, message brokers, extraction facilities, transformation facilities, loading facilities and other data integration facilities suitable for allowing various other entities to interact with the data loading facility 108. The interfaces 182 may support interactions with the data loading facility 108 by applications 184, solutions 188, reporting facilities 190, analyses facilities 192, services 194 or other entities, external to or internal to an enterprise. In embodiments these interfaces are associated with interfaces 182 to the platform 100, but in other embodiments direct interfaces may exist to the data loading facility 108, either by other components of the platform 100, or by external entities.
  • Referring still to FIG. 1, in embodiments the data mart facility 114 may be used to store data loaded from the data loading facility 108 and to make the data loaded from the data loading facility 108 available to various other entities in or external to the platform 100 in a convenient format. Within the data mart 114 facilities may be present to further store, manipulate, structure, subset, merge, join, fuse, or perform a wide range of data structuring and manipulation activities. The data mart facility 114 may also allow storage, manipulation and retrieval of metadata, and perform activities on metadata similar to those disclosed with respect to data. Thus, the data mart facility 114 may allow storage of data and metadata about facts (including sales facts, causal facts, and the like) and dimension data, as well as other relevant data and metadata. In embodiments, the data mart facility 114 may compress the data and/or create summaries in order to facilitate faster processing by other of the applications 184 within the platform 100 (e.g. the analytic server 134). In embodiments the data mart facility 114 may include various methods, components, modules, systems, sub-systems, features or facilities associated with data and metadata.
  • In certain embodiments the data mart facility 114 may contain one or more interfaces 182 (not shown on FIG. 1), by which the data loaded by the data mart facility 114 may interact with or be used by other facilities within the platform 100 or external to the platform. Interfaces to the data mart facility 114 may include human-readable user interfaces, application programming interfaces (APIs), registries or similar facilities suitable for providing interfaces to services in a services oriented architecture, connectors, bridges, adaptors, bindings, protocols, message brokers, extraction facilities, transformation facilities, loading facilities and other data integration facilities suitable for allowing various other entities to interact with the data mart facility 114. These interfaces may comprise interfaces 182 to the platform 100 as a whole, or may be interfaces associated directly with the data mart facility 114 itself, such as for access from other components of the platform 100 or for access by external entities directly to the data mart facility 114. The interfaces 182 may support interactions with the data mart facility 114 by applications 184, solutions 188, reporting facilities 190, analyses facilities 192, services 194 (each of which is describe in greater detail herein) or other entities, external to or internal to an enterprise.
  • In certain optional embodiments, the security facility 118 may be any hardware or software implementation, process, procedure, or protocol that may be used to block, limit, filter or alter access to the data mart facility 114, and/or any of the facilities within the data mart facility 114, by a human operator, a group of operators, an organization, software program, bot, virus, or some other entity or program. The security facility 118 may include a firewall, an anti-virus facility, a facility for managing permission to store, manipulate and/or retrieve data or metadata, a conditional access facility, a logging facility, a tracking facility, a reporting facility, an asset management facility, an intrusion-detection facility, an intrusion-prevention facility or other suitable security facility.
  • Still referring to FIG. 1, the analytic platform 100 may include an analytic engine 134. The analytic engine 134 may be used to build and deploy analytic applications or solutions or undertake analytic methods based upon the use of a plurality of data sources and data types. Among other things, the analytic engine 134 may perform a wide range of calculations and data manipulation steps necessary to apply models, such as mathematical and economic models, to sets of data, including fact data, dimension data, and metadata. The analytic engine 134 may be associated with an interface 182, such as any of the interfaces described herein.
  • The analytic engine 134 may interact with a model storage facility 148, which may be any facility for generating models used in the analysis of sets of data, such as economic models, econometric models, forecasting models, decision support models, estimation models, projection models, and many others. In embodiments output from the analytic engine 134 may be used to condition or refine models in the model storage 148; thus, there may be a feedback loop between the two, where calculations in the analytic engine 134 are used to refine models managed by the model storage facility 148.
  • In embodiments, a security facility 138 of the analytic engine 134 may be the same or similar to the security facility 118 associated with the data mart facility 114, as described herein. Alternatively, the security facility 138 associated with the analytic engine 134 may have features and rules that are specifically designed to operate within the analytic engine 134.
  • As illustrated in FIG. 1, the analytic platform 100 may contain a master data management hub 150 (MDMH). In embodiments the MDMH 150 may serve as a central facility for handling dimension data used within the analytic platform 100, such as data about products, stores, venues, geographies, time periods and the like, as well as various other dimensions relating to or associated with the data and metadata types in the data sources 102, 104, the data loading facility 108, the data mart facility 114, the analytic engine 134, the model storage facility 148 or various applications, 184, solutions 188, reporting facilities 190, analytic facilities 192 or services 194 that interact with the analytic platform 100. The MDMH 150 may in embodiments include a security facility 152, an interface 158, a data loader 160, a data manipulation and structuring facility 162, and one or more staging tables 164. The data loader 160 may be used to receive data. Data may enter the MDMH from various sources, such as from the data mart 114 after the data mart 114 completes its intended processing of the information and data that it received as described herein. Data may also enter the MDMH 150 through a user interface 158, such as an API or a human user interface, web browser or some other interface, of any of the types disclosed herein or in the documents incorporated by reference herein. The user interface 158 may be deployed on a client device, such as a PDA, personal computer, laptop computer, cellular phone, or some other client device capable of handling data. In embodiments, the staging tables 164 may be included in the MDMH 150.
  • In embodiments, a matching facility 180 may be associated with the MDMH 150. The matching facility 180 may receive an input data hierarchy within the MDMH 150 and analyze the characteristics of the hierarchy and select a set of attributes that are salient to a particular analytic interest (e.g., product selection by a type of consumer, product sales by a type of venue, and so forth). The matching facility 180 may select primary attributes, match attributes, associate attributes, block attributes and prioritize the attributes. The matching facility 180 may associate each attribute with a weight and define a set of probabilistic weights. The probabilistic weights may be the probability of a match or a non-match, or thresholds of a match or non-match that is associated with an analytic purpose (e.g., product purchase). The probabilistic weights may then be used in an algorithm that is run within a probabilistic matching engine (e.g., IBM QualityStage). The output of the matching engine may provide information on, for example, other products which are appropriate to include in a data hierarchy, the untapped market (i.e. other venues) in which a product is probabilistically more likely to sell well, and so forth. In embodiments, the matching facility 180 may be used to generate projections of what types of products, people, customers, retailers, stores, store departments, etc. are similar in nature and therefore they may be appropriate to combine in a projection or an assessment.
  • As illustrated in FIG. 1, the analytic platform 100 may include a projection facility 178. A projection facility 178 may be used to produce projections, whereby a partial data set (such as data from a subset of stores of a chain) is projected to a universe (such as all of the stores in a chain), by applying appropriate weights to the data in the partial data set. A wide range of potential projection methodologies exist, including cell-based methodologies, store matrix methodologies, iterative proportional fitting methodologies, virtual census methodologies, and others. The methodologies can be used to generate projection factors. As to any given projection, there is typically a tradeoff among various statistical quality measurements associated with that type of projection. Some projections are more accurate than others, while some are more consistent, have less spillage, are more closely calibrated, or have other attributes that make them relatively more or less desirable depending on how the output of the projection is likely to be used. In embodiments of the platform 100, the projection facility 178 takes dimension information from the MDMH 150 or from another source and provides a set of projection weightings along the applicable dimensions, typically reflected in a matrix of projection weights, which can be applied at the data mart facility 114 to a partial data set in order to render a projected data set. The projection facility 178 may have an interface 182 of any of the types disclosed herein.
  • As shown in FIG. 1, an interface 182 may be included in the analytic platform 100. In embodiments, data may be transferred to the MDMH 150 of the platform 100 using a user interface 182. The interface 182 may be a web browser operating over the Internet or within an intranet or other network, it may be an analytic engine 134, an application plug-in, or some other user interface that is capable of handling data. The interface 182 may be human readable or may consist of one or more application programming interfaces, or it may include various connectors, adaptors, bridges, services, transformation facilities, extraction facilities, loading facilities, bindings, couplings, or other data integration facilities, including any such facilities described herein or in documents incorporated by reference herein.
  • As illustrated in FIG. 1, the platform 100 may interact with a variety of applications 184, solutions 188, reporting facilities 190, analytic facilities 192 and services 194, such as web services, or with other platforms or systems of an enterprise or external to an enterprise. Any such applications 184, solutions 188, reporting facilities 190, analytic facilities 192 and services 194 may interact with the platform 100 in a variety of ways, such as providing input to the platform 100 (such as data, metadata, dimension information, models, projections, or the like), taking output from the platform 100 (such as data, metadata, projection information, information about similarities, analytic output, output from calculations, or the like), modifying the platform 100 (including in a feedback or iterative loop), being modified by the platform 100 (again optionally in a feedback or iterative loop), or the like.
  • In embodiments one or more applications 184 or solutions 188 may interact with the platform 100 via an interface 182. Applications 184 and solutions 188 may include applications and solutions (consisting of a combination of hardware, software and methods, among other components) that relate to planning the sales and marketing activities of an enterprise, decision support applications, financial reporting applications, applications relating to strategic planning, enterprise dashboard applications, supply chain management applications, inventory management and ordering applications, manufacturing applications, customer relationship management applications, information technology applications, applications relating to purchasing, applications relating to pricing, promotion, positioning, placement and products, and a wide range of other applications and solutions.
  • In embodiments, applications 184 and solutions 188 may include analytic output that is organized around a topic area. For example, the organizing principle of an application 184 or a solution 188 may be a new product introduction. Manufacturers may release thousands of new products each year. It may be useful for an analytic platform 100 to be able to group analysis around the topic area, such as new products, and organize a bundle of analyses and workflows that are presented as an application 184 or solution 188. Applications 184 and solutions 188 may incorporate planning information, forecasting information, “what if?” scenario capability, and other analytic features. Applications 184 and solutions 188 may be associated with web services 194 that enable users within a client's organization to access and work with the applications 184 and solutions 188.
  • In embodiments, the analytic platform 100 may facilitate delivering information to external applications 184. This may include providing data or analytic results to certain classes of applications 184. For example and without limitation, an application may include enterprise resource planning/backbone applications 184 such as SAP, including those applications 184 focused on Marketing, Sales & Operations Planning and Supply Chain Management. In another example, an application may include business intelligence applications 184, including those applications 184 that may apply data mining techniques. In another example, an application may include customer relationship management applications 184, including customer sales force applications 184. In another example, an application may include specialty applications 184 such as a price or SKU optimization application. The analytic platform 100 may facilitate supply chain efficiency applications 184. For example and without limitation, an application may include supply chain models based on sales out (POS/FSP) rather than sales in (Shipments). In another example, an application may include RFID based supply chain management. In another example, an application may include a retailer co-op to enable partnership with a distributor who may manage collective stock and distribution services. The analytic platform 100 may be applied to industries characterized by large multi-dimensional data structures. This may include industries such as telecommunications, elections and polling, and the like. The analytic platform 100 may be applied to opportunities to vend large amounts of data through a portal with the possibility to deliver highly customized views for individual users with effectively controlled user accessibility rights. This may include collaborative groups such as insurance brokers, real estate agents, and the like. The analytic platform 100 may be applied to applications 184 requiring self monitoring of critical coefficients and parameters. Such applications 184 may rely on constant updating of statistical models, such as financial models, with real-time flows of data and ongoing re-calibration and optimization. The analytic platform 100 may be applied to applications 184 that require breaking apart and recombining geographies and territories at will.
  • In embodiments, a data field may be dynamically altered to conform to a bit size or some other desired format. A record of the dynamic alteration may be tracked by the analytic platform 100 and stored in a database that may be accessed by other facilities of the analytic platform 100. In an example, a data field may relate to sales data. In order to, in part, reduce the processing time required to utilize the sales data as part of an analysis, the sales data field may be dynamically altered to conform to a desired bit size of, for example, 6 bits. Once this alteration is made, a record may be stored indicating that each sales datum in the sales field is a datum of 6 bits. Upon making an analytic query involving the sales field (e.g., “compute average sales by store”) the query may communicate with the stored data indicating the dynamic alteration of sales data to a 6 bit size format. With this information, the analytic query may process and analyze the sales data by reading the sales field in 6 bit units. This process may remove the need for the sales data to be associated with a header and/or footer indicating how the sales data is to be read and processed. As a result, processing speed may be increased.
  • In embodiments, the MDMH 150 may be associated with a partitioned database. The MDMH 150 may be further associated with a master cluster node that is, in turn, associated with a plurality of slave cluster nodes. Each partition of the partitioned database may be associated with a slave cluster node or a plurality of slave cluster nodes. Each slave cluster node may be associated with a mirror slave cluster node. The mirror slave cluster node may be used in the event of a node failure of the slave cluster node to which it is assigned to mirror. In an example, data, such as sales data, may enter the analytic platform 100 using a data loading facility 108. The sales data may be loaded with the causal fact extractor 110 and processed into a data mart 114 which may store the sales data within a partitioned database. In an alternate embodiment, the sales data mart may be processed by the MDMH 150 and the MDMH 150 used to create a portioned sales database. In this simplified example, the partitioned sales database may have two partitions, Partition One and Partition Two, each associated with one of the two stores for which sales data are available. Partition One may be associated with Slave Cluster Node One. Partition Two may be associated with Slave Cluster Node Two. Each slave cluster node may, in turn, be associated with a slave cluster node mirror that is associated with the same database partition as the slave cluster node to which it is a mirror. The MDMH 150 and the master cluster node may store and/or have access to stored data indicating the associations among the database partitions and the slave cluster nodes. In an example, upon receipt of an analytic query to summarize sales data for Store One, the master cluster node may command the Slave Cluster Node One (which is associated with the Store One sales data that is stored in Partition One) to process Store One's sales data. This command from the master cluster node may be associated with information relating to dynamic alterations that have been performed on the stored data (e.g., the bit size of each stored datum) to enable the slave node to accurately read the sales data during analysis. Similarly, the analysis may take place on a plurality of slave cluster nodes, each of which is associated with a database partition or plurality of database partitions.
  • In embodiments, the partitioned database may be updated as new data become available. The update may be made on the fly, at a set interval, or according to some other criteria.
  • In embodiments, the cluster-based processing may be associated with bitmap compression techniques, including word-aligned hybrid (WAH) code compression. In an example, WAH compression may be used to increase cluster processing speed by using run-length encoding for long sequences of identical bits and encoding/decoding bitmaps in word size groupings in order to reduce their computational complexity.
  • In embodiments, failover clusters may be implemented for the purpose of improving the availability of services which a cluster provides. Failover clusters may operate using redundant nodes, which may be used to provide service when system components fail. Failover cluster implementations may manage the redundancy inherent in a cluster to minimize the impact of single points of failure. In embodiments, load-balancing clusters may operate by having all workload come through one or more load-balancing front ends, which then distribute it to a collection of back end servers. Such a cluster of computers is sometimes referred to as a server farm. In embodiments, high-performance clusters may be implemented to provide increased performance by splitting a computational task across many different nodes in the cluster. Such clusters commonly run custom programs which have been designed to exploit the parallelism available on high-performance clusters. High-performance clusters are optimized for workloads which require jobs or processes happening on the separate cluster computer nodes to communicate actively during the computation. These include computations where intermediate results from one node's calculations will affect future calculations on other nodes.
  • Message passing interface (MPI) refers to a language-independent computer communications descriptive application programming interface (API) for message-passing on a parallel computer. MPI has defined semantics and flexible interpretations; it does not define the protocol by which these operations are to be performed in the sense of sockets for TCP/IP or other layer-4 and below models in the ISO/OSI Reference Model. It is consequently a layer-5+ type set of interfaces, although implementations can cover most layers of the reference model, with sockets+TCP/IP as a common transport used inside the implementation. MPI's goals are high performance, scalability, and portability. It may express parallelism explicitly rather than implicitly. MPI is a de facto standard for communication among the processes modeling a parallel program on a distributed memory system. Often these programs are mapped to clusters, actual distributed memory supercomputers, and to other environments. However, the principal MPI-1 model has no shared memory concept, and MPI-2 has only a limited distributed shared memory concept used in one portion of that set of extensions.
  • In embodiments, the analytic server may use ODBC to connect to a data server.
  • An ODBC library may use socket communication through the socket library to communicate with the data server. The data server may be cluster-based in order to distribute the data server processing. A socket communication library may reside on the data server. In an embodiment, the data server may pass information to a SQL parser module. In an embodiment, Gnu Flex and/or Bison may used to generate a Lexer and parser.
  • In embodiments, a master node and multiple slave nodes may be used in a cluster framework. A master node may obtain the SQL code by ODBC sockets and forward it to a parser to interpret the SQL sequence. Once the server has received SQL as part of a query request, MPI may be used to distribute the server request to slave nodes for processing. In embodiments, a bitvector implementation may be used.
  • In embodiments, retrieval may be facilitated based at least in part on representing the data as efficiently as possible. This efficiency may enable the data to be kept in memory as an in-memory database. In order to facilitate the process, data structures may be used that are small enough that they may be stored in memory. In an example, unlike a relational database, multiple record types may be used to allow minimizing the data size so that it may be kept in memory within a hardware implementation. Keeping the data within a hardware implementation may have the additional advantage of reducing the expense of the system. In embodiments, the cluster system may fit modestly sized hardware nodes with modest amounts of memory. This may keep the data near the CPU, so that one mustn't use file-based I/O. Data that is in the regular system memory may be directly accessed by the CPU.
  • In embodiments, a distribution hash key may be used to divide the data among the nodes.
  • In embodiments, the data may be partitioned by one dimension. In an example, an analyst may want to analyze a set of retail store data looking at which products are selling, taking into account the size of the store revenue in which they are sold. Store One may have $10 M in revenue, Store Two $20 M, and Store Three $30 M. In this example, the analytic goal is to determine how well a brand of cola is selling relative to the size of the store in which it is sold. To accomplish this, one may analyze the total potential size and figure out how well a product is selling relative to the whole. However, this may be difficult because one may have to look across multiple time periods in which the product may be selling multiple times but only count it once. The use of a distinct sum or count operator may be expensive, especially in something that is in millions of records. Instead, this data may be partitioned by “venue” so that a venue only exists on one of the processing nodes. If all of a venue's data is processed on a unique node there is a reduced risk of double-counting, as the data only reside in a single location. On the other hand, if the data are distributed by venue and some other key, one might have data for the same venue located in multiple places. By partitioning by venue and associating each venue with an independent node, the venues may be added on the master node.
  • In embodiments, partitioning may be done within each node by certain dimensions in order to more efficiently access those data according to which data dimensions clients have used in the past. For example, data may be partitioned by venue and time, so that on any given processing node it is relatively easy to access particular sets of information based on venue and time dimensions. In embodiments, partitioning may be used as an implicit indexing method. This may simplify the process of analyzing wanted data without having to build an actual index.
  • In embodiments, cluster processing may be dynamically configurable to accommodate increases and/or reductions in the number of nodes that are used.
  • In embodiments, cluster processing may have failover processes that may re-enable a cluster by having a node take on the function of another node that has failed
  • In embodiments, a threading model may be used for inter-processing communication between the nodes and the master. Posix threads may be used in combination with an MPI. In embodiments, multiple threads may run with one logical process and with separate physical processes running on different machines. A thread model may form the backbone of communication between processing elements. In an example, if there is a master and two slaves, there may be one physical process on the master and one on each slave node. An inbound SQL request may come into the master node and be intercepted by a thread that is using a socket. The thread may transmit to a master thread running on each slave process that creates threads that do actual analysis and, in turn, communicate to a listener thread on the master that passes information to a collator thread on the master. A new series of threads may be created for new thread arrival. The listener threads may be designed to look for information from a specific slave source. If a query comes into the system, a new collator thread may be created, a new worker thread created in each slave node, and information sent from each slave node to a listener on the master that passes information to the collator thread created for that query. The collator thread may then pass information back through the socket to the ODBC client. In embodiments, this system may be scalable. For every slave that is created, the system may create a new listener thread for that code.
  • In embodiments, inter-server communication may be done through MPI. Data server and client communication may be conducted using regular sockets. Each server may have data (its partition of information), so that each of the servers knows what information for which it is responsible. The collator may collate the partial results into a final result set.
  • In an example, ODBC may pass to a master node and a master thread in the master node's process. The SQL query may be translated into something the server can understand. Next, the master node may pass a thread to all nodes as part of a Query One. The first node may retrieve Store One data, and may add up a partial result and creates a data tuple that it communicates back to the listener for that slave node. The Second Node may do the same thing and communicate with its listener. Nodes with only Store Two (as opposed to Store One data) may do nothing. At the master node, the collator may add up the results from the two relevant listeners' results. Next, through socket communication, it may communicate the result through ODBC communication to the client. After that is accomplished, the collator thread and worker threads that performed the retrieval may be omitted. In embodiments, these transient threads may be associated with and used for a particular query.
  • In embodiments, a normalization scheme may be used in order to minimize the size of internal data structures.
  • An aspect of the present invention relates to cluster processing of an aggregated dataset. As will be explained in more detail in FIG. 2, a logical process 4100 may be used to for processing the aggregated dataset in clusters.
  • The present invention illustrates the processing of the aggregated data. In FIG. 1, a fact data source 102 and a dimension data source 104 may be linked through a key. The fact data source 102 from multiple data source can be used as an aggregated data source for analysis on desirable dimensions. For example, an analyst might wish to view time-series trends in the dollar sales allotted by the customer to each store within a given product category.
  • In embodiments, referring to FIG. 2, systems and methods may involve using a platform as disclosed herein for applications described herein where the systems and methods involve receiving a aggregated dataset, wherein the aggregated dataset includes data from a panel data source, a fact data source, and a dimension data source that have been associated with a standard population database 4102. The process may also involve storing the aggregated data in a partition within a partitioned database, wherein the partition is associated with a data characteristic 4104. The process may also involve associating a master processing node with a plurality of slave nodes, wherein each of the plurality of slave nodes is associated with a partition of the partitioned database 4108. The process may also involve submitting an analytic query to the master processing node 4110. The process may also involve assigning analytic processing to at least one of the plurality of slave nodes by the master processing node, wherein the assignment is based at least in part on the association of the partition with the data characteristic 4112. The process may also involve reading the aggregated data from the partitioned database by the assigned slave node 4114. The process may also involve analyzing the aggregated data by the assigned slave node, wherein the analysis produces a result at each slave node 4118. The process may also involve combining the results from each of the plurality of slave nodes by the master processing node into a master result 4120 and reporting the master result to a user interface 4122.
  • The elements depicted in flow charts and block diagrams throughout the figures imply logical boundaries between the elements. However, according to software or hardware engineering practices, the depicted elements and the functions thereof may be implemented as parts of a monolithic software structure, as standalone software modules, or as modules that employ external routines, code, services, and so forth, or any combination of these, and all such implementations are within the scope of the present disclosure. Thus, while the foregoing drawings and description set forth functional aspects of the disclosed systems, no particular arrangement of software for implementing these functional aspects should be inferred from these descriptions unless explicitly stated or otherwise clear from the context.
  • Similarly, it will be appreciated that the various steps identified and described above may be varied, and that the order of steps may be adapted to particular applications of the techniques disclosed herein. All such variations and modifications are intended to fall within the scope of this disclosure. As such, the depiction and/or description of an order for various steps should not be understood to require a particular order of execution for those steps, unless required by a particular application, or explicitly stated or otherwise clear from the context.
  • The methods or processes described above, and steps thereof, may be realized in hardware, software, or any combination of these suitable for a particular application. The hardware may include a general-purpose computer and/or dedicated computing device. The processes may be realized in one or more microprocessors, microcontrollers, embedded microcontrollers, programmable digital signal processors or other programmable device, along with internal and/or external memory. The processes may also, or instead, be embodied in an application specific integrated circuit, a programmable gate array, programmable array logic, or any other device or combination of devices that may be configured to process electronic signals. It will further be appreciated that one or more of the processes may be realized as computer executable code created using a structured programming language such as C, an object oriented programming language such as C++, or any other high-level or low-level programming language (including assembly languages, hardware description languages, and database programming languages and technologies) that may be stored, compiled or interpreted to run on one of the above devices, as well as heterogeneous combinations of processors, processor architectures, or combinations of different hardware and software.
  • Thus, in one aspect, each method described above and combinations thereof may be embodied in computer executable code that, when executing on one or more computing devices, performs the steps thereof. In another aspect, the methods may be embodied in systems that perform the steps thereof, and may be distributed across devices in a number of ways, or all of the functionality may be integrated into a dedicated, standalone device or other hardware. In another aspect, means for performing the steps associated with the processes described above may include any of the hardware and/or software described above. All such permutations and combinations are intended to fall within the scope of the present disclosure.
  • While the invention has been disclosed in connection with the preferred embodiments shown and described in detail, various modifications and improvements thereon will become readily apparent to those skilled in the art. Accordingly, the spirit and scope of the present invention is not to be limited by the foregoing examples, but is to be understood in the broadest sense allowable by law.
  • All documents referenced herein are hereby incorporated by reference.

Claims (1)

1. A method comprising:
receiving an aggregated dataset, wherein the aggregated dataset includes data from a panel data source, a fact data source, and a dimension data source that have been associated with a standard population database;
storing the aggregated data in a partition within a partitioned database, wherein the partition is associated with a data characteristic;
associating a master processing node with a plurality of slave nodes, wherein each of the plurality of slave nodes is associated with a partition of the partitioned database;
submitting an analytic query to the master processing node;
assigning analytic processing to at least one of the plurality of slave nodes by the master processing node, wherein the assignment is based at least in part on the association of the partition with the data characteristic; and
reading the aggregated data from the partitioned database by the assigned slave node;
analyzing the aggregated data by the assigned slave node, wherein the analysis produces a result at each slave node;
combining the results from each of the plurality of slave nodes by the master processing node into a master result; and
reporting the master result to a user interface.
US12/023,267 2007-01-26 2008-01-31 Cluster processing of an aggregated dataset Abandoned US20090006309A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
US12/023,267 US20090006309A1 (en) 2007-01-26 2008-01-31 Cluster processing of an aggregated dataset
US13/028,022 US9466063B2 (en) 2007-01-26 2011-02-15 Cluster processing of an aggregated dataset

Applications Claiming Priority (8)

Application Number Priority Date Filing Date Title
US88680107P 2007-01-26 2007-01-26
US88679807P 2007-01-26 2007-01-26
US88757307P 2007-01-31 2007-01-31
US89150807P 2007-02-24 2007-02-24
US89193607P 2007-02-27 2007-02-27
US95289807P 2007-07-31 2007-07-31
US12/021,263 US20090006156A1 (en) 2007-01-26 2008-01-28 Associating a granting matrix with an analytic platform
US12/023,267 US20090006309A1 (en) 2007-01-26 2008-01-31 Cluster processing of an aggregated dataset

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
US12/021,263 Continuation-In-Part US20090006156A1 (en) 2004-02-20 2008-01-28 Associating a granting matrix with an analytic platform

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US13/028,022 Continuation US9466063B2 (en) 2007-01-26 2011-02-15 Cluster processing of an aggregated dataset

Publications (1)

Publication Number Publication Date
US20090006309A1 true US20090006309A1 (en) 2009-01-01

Family

ID=46331842

Family Applications (2)

Application Number Title Priority Date Filing Date
US12/023,267 Abandoned US20090006309A1 (en) 2007-01-26 2008-01-31 Cluster processing of an aggregated dataset
US13/028,022 Active 2030-02-18 US9466063B2 (en) 2007-01-26 2011-02-15 Cluster processing of an aggregated dataset

Family Applications After (1)

Application Number Title Priority Date Filing Date
US13/028,022 Active 2030-02-18 US9466063B2 (en) 2007-01-26 2011-02-15 Cluster processing of an aggregated dataset

Country Status (1)

Country Link
US (2) US20090006309A1 (en)

Cited By (31)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080294583A1 (en) * 2007-01-26 2008-11-27 Herbert Dennis Hunt Similarity matching of a competitor's products
US20100030796A1 (en) * 2008-07-31 2010-02-04 Microsoft Corporation Efficient column based data encoding for large-scale data storage
US20100053616A1 (en) * 2008-09-03 2010-03-04 Macronix International Co., Ltd. Alignment mark and method of getting position reference for wafer
CN101916261A (en) * 2010-07-28 2010-12-15 北京播思软件技术有限公司 Data partitioning method for distributed parallel database system
US20110137924A1 (en) * 2007-01-26 2011-06-09 Herbert Dennis Hunt Cluster processing of an aggregated dataset
US20110179028A1 (en) * 2010-01-15 2011-07-21 Microsoft Corporation Aggregating data from a work queue
US20120066224A1 (en) * 2010-09-15 2012-03-15 International Business Machines Corporation Clustering of analytic functions
US8402027B1 (en) * 2010-02-11 2013-03-19 Disney Enterprises, Inc. System and method for hybrid hierarchical segmentation
US8543523B1 (en) 2012-06-01 2013-09-24 Rentrak Corporation Systems and methods for calibrating user and consumer data
CN103473374A (en) * 2013-09-29 2013-12-25 方正国际软件有限公司 Patient data partitioning system and patient data partitioning method
US8719266B2 (en) 2007-01-26 2014-05-06 Information Resources, Inc. Data perturbation of non-unique values
US8918388B1 (en) * 2010-02-26 2014-12-23 Turn Inc. Custom data warehouse on top of mapreduce
US20150134401A1 (en) * 2013-11-09 2015-05-14 Carsten Heuer In-memory end-to-end process of predictive analytics
US9262503B2 (en) 2007-01-26 2016-02-16 Information Resources, Inc. Similarity matching of products based on multiple classification schemes
US20160239797A1 (en) * 2012-05-01 2016-08-18 Hand Held Products, Inc. Dynamic scan context determination for asset reconciliation background
US9747175B2 (en) 2015-09-30 2017-08-29 Bank Of America Corporation System for aggregation and transformation of real-time data
US9952932B2 (en) * 2015-11-02 2018-04-24 Chicago Mercantile Exchange Inc. Clustered fault tolerance systems and methods using load-based failover
US10069891B2 (en) 2015-09-30 2018-09-04 Bank Of America Corporation Channel accessible single function micro service data collection process for light analytics
US20180374010A1 (en) * 2017-06-26 2018-12-27 International Business Machines Corporation Predicting early warning signals in project delivery
CN109242048A (en) * 2018-11-07 2019-01-18 电子科技大学 Sensation target distributed clustering method based on time series
US10262331B1 (en) 2016-01-29 2019-04-16 Videomining Corporation Cross-channel in-store shopper behavior analysis
CN109783515A (en) * 2019-01-25 2019-05-21 上海创景信息科技有限公司 More relation data tracks retroactive method and system based on database
US10354262B1 (en) 2016-06-02 2019-07-16 Videomining Corporation Brand-switching analysis using longitudinal tracking of at-shelf shopper behavior
US10387896B1 (en) 2016-04-27 2019-08-20 Videomining Corporation At-shelf brand strength tracking and decision analytics
US10491663B1 (en) * 2013-10-28 2019-11-26 Amazon Technologies, Inc. Heterogeneous computations on homogeneous input data
US10592287B2 (en) 2015-11-20 2020-03-17 Red Hat, Inc. API and user interface for MapReduce jobs
US10664457B2 (en) 2015-09-30 2020-05-26 Bank Of America Corporation System for real-time data structuring and storage
CN111241162A (en) * 2020-01-16 2020-06-05 同济大学 Method for analyzing travel behaviors of passengers under high-speed railway network formation condition and storage medium
US10755344B2 (en) 2015-09-30 2020-08-25 Bank Of America Corporation System framework processor for channel contacts
US10963893B1 (en) 2016-02-23 2021-03-30 Videomining Corporation Personalized decision tree based on in-store behavior analysis
US11354683B1 (en) 2015-12-30 2022-06-07 Videomining Corporation Method and system for creating anonymous shopper panel using multi-modal sensor fusion

Families Citing this family (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8732004B1 (en) 2004-09-22 2014-05-20 Experian Information Solutions, Inc. Automated analysis of data to generate prospect notifications based on trigger events
US7711636B2 (en) 2006-03-10 2010-05-04 Experian Information Solutions, Inc. Systems and methods for analyzing data
US20080270363A1 (en) * 2007-01-26 2008-10-30 Herbert Dennis Hunt Cluster processing of a core information matrix
US20080294372A1 (en) * 2007-01-26 2008-11-27 Herbert Dennis Hunt Projection facility within an analytic platform
US9690820B1 (en) 2007-09-27 2017-06-27 Experian Information Solutions, Inc. Database system for triggering event notifications based on updates to database records
US20100174638A1 (en) 2009-01-06 2010-07-08 ConsumerInfo.com Report existence monitoring
US9558519B1 (en) 2011-04-29 2017-01-31 Consumerinfo.Com, Inc. Exposing reporting cycle information
US8745066B2 (en) 2012-08-13 2014-06-03 Visier Solutions, Inc. Apparatus, systems and methods for dynamic on-demand context sensitive cluster analysis
WO2016149286A1 (en) * 2015-03-17 2016-09-22 Wong Matthew E System and method of providing a platform for enabling drill-down analysis of tabular data
CN107430633B (en) * 2015-11-03 2021-05-14 慧与发展有限责任合伙企业 System and method for data storage and computer readable medium
US11410230B1 (en) 2015-11-17 2022-08-09 Consumerinfo.Com, Inc. Realtime access and control of secure regulated data
US10757154B1 (en) 2015-11-24 2020-08-25 Experian Information Solutions, Inc. Real-time event-based notification system
US11270376B1 (en) * 2017-04-14 2022-03-08 Vantagescore Solutions, Llc Method and system for enhancing modeling for credit risk scores
US10880313B2 (en) 2018-09-05 2020-12-29 Consumerinfo.Com, Inc. Database platform for realtime updating of user data from third party sources
US11922497B1 (en) 2022-10-27 2024-03-05 Vantagescore Solutions, Llc System, method and apparatus for generating credit scores

Citations (41)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020065707A1 (en) * 2000-11-30 2002-05-30 Glacier Advertising Ltd. Automobile customer information generation and transmission system
US20020186818A1 (en) * 2000-08-29 2002-12-12 Osteonet, Inc. System and method for building and manipulating a centralized measurement value database
US20030028424A1 (en) * 2001-06-05 2003-02-06 Catalina Marketing International, Inc. Method and system for the direct delivery of product samples
US6523025B1 (en) * 1998-03-10 2003-02-18 Fujitsu Limited Document processing system and recording medium
US20030088565A1 (en) * 2001-10-15 2003-05-08 Insightful Corporation Method and system for mining large data sets
US20030149586A1 (en) * 2001-11-07 2003-08-07 Enkata Technologies Method and system for root cause analysis of structured and unstructured data
US6662192B1 (en) * 2000-03-29 2003-12-09 Bizrate.Com System and method for data collection, evaluation, information generation, and presentation
US20030228541A1 (en) * 2002-06-10 2003-12-11 International Business Machines Corporation Hybrid electronic mask
US20040098390A1 (en) * 2002-11-14 2004-05-20 David Bayliss Method for sorting and distributing data among a plurality of nodes
US20050187977A1 (en) * 2004-02-21 2005-08-25 Datallegro, Inc. Ultra-shared-nothing parallel database
US20050197883A1 (en) * 2004-03-08 2005-09-08 Sap Aktiengesellschaft Method and system for classifying retail products and services using characteristic-based grouping structures
US20050240085A1 (en) * 2004-01-16 2005-10-27 Basf Aktiengesellschaft Balanced care product customization
US20050267889A1 (en) * 2004-02-09 2005-12-01 Coremetrics, Inc. System and method of managing software product-line customizations
US20060009935A1 (en) * 2004-07-09 2006-01-12 Uzarski Donald R Knowledge-based condition survey inspection (KBCSI) framework and procedure
US20060080141A1 (en) * 2004-10-08 2006-04-13 Sentillion, Inc. Method and apparatus for processing a context change request in a CCOW environment
US20060164257A1 (en) * 2003-07-17 2006-07-27 Paolo Giubbini Method and system for remote updates of meters for metering the consumption of electricity, water or gas
US7107254B1 (en) * 2001-05-07 2006-09-12 Microsoft Corporation Probablistic models and methods for combining multiple content classifiers
US20060212413A1 (en) * 1999-04-28 2006-09-21 Pal Rujan Classification method and apparatus
US20060259358A1 (en) * 2005-05-16 2006-11-16 Hometown Info, Inc. Grocery scoring
US20070028111A1 (en) * 2005-07-01 2007-02-01 Fred Covely Methods and apparatus for authentication of content delivery and playback applications
US7191183B1 (en) * 2001-04-10 2007-03-13 Rgi Informatics, Llc Analytics and data warehousing infrastructure and services
US20070160320A1 (en) * 2006-01-06 2007-07-12 The Procter & Gamble Company Merchandising systems, methods of mechandising, and point-of-sale devices comprising micro-optics technology
US20070174290A1 (en) * 2006-01-19 2007-07-26 International Business Machines Corporation System and architecture for enterprise-scale, parallel data mining
US20070276676A1 (en) * 2006-05-23 2007-11-29 Christopher Hoenig Social information system
US20070294583A1 (en) * 2004-02-09 2007-12-20 Continental Teves Ag & Co. Ohg Device and Method for Analyzing Embedded Systems for Safety-Critical Computer Systems in Motor Vehicles
US20080059489A1 (en) * 2006-08-30 2008-03-06 International Business Machines Corporation Method for parallel query processing with non-dedicated, heterogeneous computers that is resilient to load bursts and node failures
US20080077469A1 (en) * 2006-09-27 2008-03-27 Philport Joseph C Method and system for determining media exposure
US20080228797A1 (en) * 2007-03-16 2008-09-18 Expanse Networks, Inc. Creation of Attribute Combination Databases Using Expanded Attribute Profiles
US7430532B2 (en) * 2001-06-12 2008-09-30 Blackrock Financial Management, Inc. System and method for trade entry
US20080256275A1 (en) * 2001-03-22 2008-10-16 Harm Peter Hofstee Multi-Chip Module With Third Dimension Interconnect
US20080270363A1 (en) * 2007-01-26 2008-10-30 Herbert Dennis Hunt Cluster processing of a core information matrix
US20080276232A1 (en) * 2003-09-25 2008-11-06 International Business Machines Corporation Processor Dedicated Code Handling in a Multi-Processor Environment
US20080288209A1 (en) * 2007-01-26 2008-11-20 Herbert Dennis Hunt Flexible projection facility within an analytic platform
US20080294372A1 (en) * 2007-01-26 2008-11-27 Herbert Dennis Hunt Projection facility within an analytic platform
US7469246B1 (en) * 2001-05-18 2008-12-23 Stratify, Inc. Method and system for classifying or clustering one item into multiple categories
US20090006156A1 (en) * 2007-01-26 2009-01-01 Herbert Dennis Hunt Associating a granting matrix with an analytic platform
US20090012971A1 (en) * 2007-01-26 2009-01-08 Herbert Dennis Hunt Similarity matching of products based on multiple classification schemes
US20090070131A1 (en) * 2005-12-29 2009-03-12 Lin Chen Standardized urban product
US20090132541A1 (en) * 2007-11-19 2009-05-21 Eric Lawrence Barsness Managing database resources used for optimizing query execution on a parallel computer system
US20090132609A1 (en) * 2007-11-16 2009-05-21 Eric Lawrence Barsness Real time data replication for query execution in a massively parallel computer
US7672877B1 (en) * 2004-02-26 2010-03-02 Yahoo! Inc. Product data classification

Family Cites Families (109)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US3660605A (en) * 1970-04-15 1972-05-02 Int Standard Electric Corp Pulse code modulation switching system utilizing tasi
US4047157A (en) * 1974-02-01 1977-09-06 Digital Equipment Corporation Secondary storage facility for data processing
FR2618624B1 (en) 1987-07-24 1992-04-30 Michel Servel HYBRID TIME MULTIPLEX SWITCHING SYSTEM WITH OPTIMIZED BUFFER MEMORY
US5596331A (en) * 1988-05-13 1997-01-21 Lockheed Martin Corporation Real-time control sequencer with state matrix logic
DE69031538T2 (en) * 1990-02-26 1998-05-07 Digital Equipment Corp System and method for collecting software application events
US7082426B2 (en) 1993-06-18 2006-07-25 Cnet Networks, Inc. Content aggregation method and apparatus for an on-line product catalog
US5915036A (en) * 1994-08-29 1999-06-22 Eskofot A/S Method of estimation
US5912887A (en) * 1996-06-27 1999-06-15 Mciworldcom, Inc. System and method for implementing user-to-user data transfer services
US5832509A (en) 1996-12-17 1998-11-03 Chrysler Corporation Apparatus and method for adjusting data sizes in database operations
US20060068903A1 (en) * 1996-12-30 2006-03-30 Walker Jay S Methods and apparatus for facilitating accelerated play of a flat rate play gaming session
US5845285A (en) 1997-01-07 1998-12-01 Klein; Laurence C. Computer system and method of data analysis
US5978788A (en) 1997-04-14 1999-11-02 International Business Machines Corporation System and method for generating multi-representations of a data cube
US5999924A (en) * 1997-07-25 1999-12-07 Amazon.Com, Inc. Method and apparatus for producing sequenced queries
US6098033A (en) * 1997-07-31 2000-08-01 Microsoft Corporation Determining similarity between words
US6430545B1 (en) * 1998-03-05 2002-08-06 American Management Systems, Inc. Use of online analytical processing (OLAP) in a rules based decision management system
US6167405A (en) 1998-04-27 2000-12-26 Bull Hn Information Systems Inc. Method and apparatus for automatically populating a data warehouse system
US6212524B1 (en) * 1998-05-06 2001-04-03 E.Piphany, Inc. Method and apparatus for creating and populating a datamart
DE69942339D1 (en) * 1998-08-24 2010-06-17 Microunity Systems Eng SYSTEM WITH WIDE OPERAND ARCHITECTURE AND METHOD
US6957201B2 (en) 1998-11-17 2005-10-18 Sofresud S.A. Controlled capacity modeling tool
US6282544B1 (en) * 1999-05-24 2001-08-28 Computer Associates Think, Inc. Method and apparatus for populating multiple data marts in a single aggregation process
US6163774A (en) 1999-05-24 2000-12-19 Platinum Technology Ip, Inc. Method and apparatus for simplified and flexible selection of aggregate and cross product levels for a data warehouse
ATE246824T1 (en) 1999-07-21 2003-08-15 Torben Bach Pedersen METHODS AND SYSTEMS TO MAKE OLAP HIERARCHICES SUMMERIZABLE
WO2001016850A2 (en) 1999-08-31 2001-03-08 Accenture Llp A system, method and article of manufacture for organizing and managing transaction-related tax information
WO2001042937A1 (en) * 1999-12-06 2001-06-14 Henry Milan Modular stackable component system including universal serial bus hub
US20010034679A1 (en) * 2000-01-21 2001-10-25 Wrigley Mark L. Platform independent and non-invasive financial report mark-up
US6760720B1 (en) * 2000-02-25 2004-07-06 Pedestrian Concepts, Inc. Search-on-the-fly/sort-on-the-fly search engine for searching databases
US20020029207A1 (en) * 2000-02-28 2002-03-07 Hyperroll, Inc. Data aggregation server for managing a multi-dimensional database and database management system having data aggregation server integrated therein
US20040230461A1 (en) 2000-03-30 2004-11-18 Talib Iqbal A. Methods and systems for enabling efficient retrieval of data from data collections
US6456997B1 (en) * 2000-04-12 2002-09-24 International Business Machines Corporation System and method for dynamically generating an invisible hierarchy in a planning system
US7376573B1 (en) * 2000-04-28 2008-05-20 Accenture Llp Claims data analysis toolkit
US20020004390A1 (en) * 2000-05-05 2002-01-10 Cutaia Rory Joseph Method and system for managing telecommunications services and network interconnections
AU2001260380A1 (en) * 2000-05-22 2001-12-03 Nero Research Oy Method in monitoring the condition of machines
US7130853B2 (en) * 2000-06-06 2006-10-31 Fair Isaac Corporation Datamart including routines for extraction, accessing, analyzing, transformation of data into standardized format modeled on star schema
US7117215B1 (en) * 2001-06-07 2006-10-03 Informatica Corporation Method and apparatus for transporting data for data warehousing applications that incorporates analytic data interface
US7133858B1 (en) 2000-06-30 2006-11-07 Microsoft Corporation Partial pre-aggregation in relational database queries
EP1182577A1 (en) * 2000-08-18 2002-02-27 SER Systeme AG Produkte und Anwendungen der Datenverarbeitung Associative memory
US20020111870A1 (en) 2000-09-26 2002-08-15 I2 Technologies, Inc. System and method for identifying a product
US20090018891A1 (en) 2003-12-30 2009-01-15 Jeff Scott Eder Market value matrix
US6687693B2 (en) * 2000-12-18 2004-02-03 Ncr Corporation Architecture for distributed relational data mining systems
US20020116213A1 (en) * 2001-01-30 2002-08-22 Manugistics, Inc. System and method for viewing supply chain network metrics
US6643635B2 (en) 2001-03-15 2003-11-04 Sagemetrics Corporation Methods for dynamically accessing, processing, and presenting data acquired from disparate data sources
US20020161520A1 (en) * 2001-04-27 2002-10-31 International Business Machines Corporation Method to display allowed parking areas in a vehicle
US7133876B2 (en) * 2001-06-12 2006-11-07 The University Of Maryland College Park Dwarf cube architecture for reducing storage sizes of multidimensional data
US7177855B2 (en) * 2001-06-20 2007-02-13 Oracle International Corporation Compile-time optimizations of queries with SQL spreadsheet
US8086643B1 (en) 2001-06-28 2011-12-27 Jda Software Group, Inc. Translation between product classification schemas
US7043492B1 (en) * 2001-07-05 2006-05-09 Requisite Technology, Inc. Automated classification of items using classification mappings
US7499045B2 (en) 2001-08-01 2009-03-03 International Business Machines Corporation Graphics image generation
TW503479B (en) * 2001-08-16 2002-09-21 Mosel Vitelic Inc Exposure time determination method of wafer photolithography process
US7437344B2 (en) 2001-10-01 2008-10-14 L'oreal S.A. Use of artificial intelligence in providing beauty advice
US20030177055A1 (en) * 2002-03-14 2003-09-18 The Procter & Gamble Company Virtual test market system and method
US7107285B2 (en) * 2002-03-16 2006-09-12 Questerra Corporation Method, system, and program for an improved enterprise spatial system
US20030200134A1 (en) 2002-03-29 2003-10-23 Leonard Michael James System and method for large-scale automatic forecasting
US7483945B2 (en) * 2002-04-19 2009-01-27 Akamai Technologies, Inc. Method of, and system for, webcasting with just-in-time resource provisioning, automated telephone signal acquisition and streaming, and fully-automated event archival
US7458034B2 (en) 2002-05-08 2008-11-25 Kabushiki Kaisha Toshiba Data organization support method and program product therefor
US20020194145A1 (en) 2002-05-28 2002-12-19 Boucher Thomas Charles Method and system for financing a renewable energy generating facility
US7275022B2 (en) * 2002-07-19 2007-09-25 Microsoft Corporation System and method for analytically modeling data organized according to non-referred attributes
US20040030593A1 (en) * 2002-08-05 2004-02-12 Webster Adam W. On demand aircraft charter and air taxi booking and dispatch system
US7716167B2 (en) * 2002-12-18 2010-05-11 International Business Machines Corporation System and method for automatically building an OLAP model in a relational database
US7606699B2 (en) 2003-03-25 2009-10-20 Siebel Systems Inc. Modeling of forecasting and production planning data
US7392239B2 (en) * 2003-04-14 2008-06-24 International Business Machines Corporation System and method for querying XML streams
US7870148B2 (en) 2003-04-18 2011-01-11 Unica Corporation Scalable computation of data
US7386549B2 (en) 2003-04-30 2008-06-10 International Business Machines Corporation Integration of business process and use of fields in a master database
US8019659B2 (en) 2003-05-02 2011-09-13 Cbs Interactive Inc. Catalog taxonomy for storing product information and system and method using same
US7516157B2 (en) 2003-05-08 2009-04-07 Microsoft Corporation Relational directory
US7239989B2 (en) * 2003-07-18 2007-07-03 Oracle International Corporation Within-distance query pruning in an R-tree index
US20050043097A1 (en) * 2003-08-21 2005-02-24 Spidermonk Entertainment, Llc Interrelated game and information portals provided within the context of an encompassing virtual world
US7756907B2 (en) * 2003-09-16 2010-07-13 The Board Of Trustees Of The Leland Stanford Jr. University Computer systems and methods for visualizing data
US7415405B2 (en) * 2003-09-18 2008-08-19 International Business Machines Corporation Database script translation tool
US7664795B2 (en) * 2003-09-26 2010-02-16 Microsoft Corporation Apparatus and method for database migration
US7349919B2 (en) 2003-11-21 2008-03-25 International Business Machines Corporation Computerized method, system and program product for generating a data mining model
WO2005077024A2 (en) 2004-02-06 2005-08-25 Test Advantage, Inc. Methods and apparatus for data analysis
US7870039B1 (en) * 2004-02-27 2011-01-11 Yahoo! Inc. Automatic product categorization
US20050216512A1 (en) * 2004-03-26 2005-09-29 Rahav Dor Method of accessing a work of art, a product, or other tangible or intangible objects without knowing the title or name thereof using fractional sampling of the work of art or object
US20050246307A1 (en) 2004-03-26 2005-11-03 Datamat Systems Research, Inc. Computerized modeling method and a computer program product employing a hybrid Bayesian decision tree for classification
US20050251513A1 (en) 2004-04-05 2005-11-10 Rene Tenazas Techniques for correlated searching through disparate data and content repositories
EP1769433A4 (en) * 2004-04-26 2009-05-06 Right90 Inc Forecasting data with real-time updates
US9684703B2 (en) 2004-04-29 2017-06-20 Precisionpoint Software Limited Method and apparatus for automatically creating a data warehouse and OLAP cube
US20050246324A1 (en) * 2004-04-30 2005-11-03 Nokia Inc. System and associated device, method, and computer program product for performing metadata-based searches
US7490009B2 (en) * 2004-08-03 2009-02-10 Fei Company Method and system for spectroscopic data analysis
US7698170B1 (en) 2004-08-05 2010-04-13 Versata Development Group, Inc. Retail recommendation domain model
US7360697B1 (en) * 2004-11-18 2008-04-22 Vendavo, Inc. Methods and systems for making pricing decisions in a price management system
US7800613B2 (en) * 2004-12-02 2010-09-21 Tableau Software, Inc. Computer systems and methods for visualizing data with generation of marks
US7587410B2 (en) * 2005-03-22 2009-09-08 Microsoft Corporation Dynamic cube services
US8176002B2 (en) * 2005-03-24 2012-05-08 Microsoft Corporation Method and system for user alteration of the configuration of a data warehouse
WO2006116573A2 (en) * 2005-04-23 2006-11-02 Musa John A Enhanced business and inventory management systems
FI20050779L (en) * 2005-07-22 2007-01-23 Analyse Solutions Finland Oy Information management method and system
US20070061185A1 (en) * 2005-09-09 2007-03-15 International Business Machines Corporation Method, system, and computer program product for implementing availability messaging services
WO2007034482A2 (en) 2005-09-20 2007-03-29 Sterna Technologies (2005) Ltd. A method and system for managing data and organizational constraints
US7966315B2 (en) * 2005-11-15 2011-06-21 Vmware, Inc. Multi-query optimization
GB0524017D0 (en) * 2005-11-24 2006-01-04 Ibm Generation of a categorisation scheme
US7870031B2 (en) * 2005-12-22 2011-01-11 Ebay Inc. Suggested item category systems and methods
US7885958B2 (en) * 2006-02-27 2011-02-08 International Business Machines Corporation Method, apparatus and computer program product for organizing hierarchical information
US8009673B2 (en) * 2006-03-13 2011-08-30 Freescale Semiconductor, Inc. Method and device for processing frames
US7418453B2 (en) 2006-06-15 2008-08-26 International Business Machines Corporation Updating a data warehouse schema based on changes in an observation model
US8671091B2 (en) * 2006-08-02 2014-03-11 Hewlett-Packard Development Company, L.P. Optimizing snowflake schema queries
US7620526B2 (en) * 2006-10-25 2009-11-17 Zeugma Systems Inc. Technique for accessing a database of serializable objects using field values corresponding to fields of an object marked with the same index value
US7698252B2 (en) * 2006-10-27 2010-04-13 Cerner Innovation, Inc. Query restriction for timely and efficient paging
US8112624B2 (en) * 2006-11-29 2012-02-07 Red Hat, Inc. Method and system for certificate revocation list compression
US8010410B2 (en) * 2006-12-29 2011-08-30 Ebay Inc. Method and system for listing categorization
US9390158B2 (en) 2007-01-26 2016-07-12 Information Resources, Inc. Dimensional compression using an analytic platform
US20090006309A1 (en) * 2007-01-26 2009-01-01 Herbert Dennis Hunt Cluster processing of an aggregated dataset
US20080288522A1 (en) 2007-01-26 2008-11-20 Herbert Dennis Hunt Creating and storing a data field alteration datum using an analytic platform
US20090006788A1 (en) * 2007-01-26 2009-01-01 Herbert Dennis Hunt Associating a flexible data hierarchy with an availability condition in a granting matrix
US8160984B2 (en) * 2007-01-26 2012-04-17 Symphonyiri Group, Inc. Similarity matching of a competitor's products
EP2111593A2 (en) * 2007-01-26 2009-10-28 Information Resources, Inc. Analytic platform
US20080294996A1 (en) 2007-01-31 2008-11-27 Herbert Dennis Hunt Customized retailer portal within an analytic platform
US8583613B2 (en) * 2007-08-21 2013-11-12 Oracle International Corporation On demand data conversion
US7941398B2 (en) * 2007-09-26 2011-05-10 Pentaho Corporation Autopropagation of business intelligence metadata
US8301583B2 (en) * 2008-10-09 2012-10-30 International Business Machines Corporation Automated data conversion and route tracking in distributed databases

Patent Citations (41)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6523025B1 (en) * 1998-03-10 2003-02-18 Fujitsu Limited Document processing system and recording medium
US20060212413A1 (en) * 1999-04-28 2006-09-21 Pal Rujan Classification method and apparatus
US6662192B1 (en) * 2000-03-29 2003-12-09 Bizrate.Com System and method for data collection, evaluation, information generation, and presentation
US20020186818A1 (en) * 2000-08-29 2002-12-12 Osteonet, Inc. System and method for building and manipulating a centralized measurement value database
US20020065707A1 (en) * 2000-11-30 2002-05-30 Glacier Advertising Ltd. Automobile customer information generation and transmission system
US20080256275A1 (en) * 2001-03-22 2008-10-16 Harm Peter Hofstee Multi-Chip Module With Third Dimension Interconnect
US7191183B1 (en) * 2001-04-10 2007-03-13 Rgi Informatics, Llc Analytics and data warehousing infrastructure and services
US7107254B1 (en) * 2001-05-07 2006-09-12 Microsoft Corporation Probablistic models and methods for combining multiple content classifiers
US7469246B1 (en) * 2001-05-18 2008-12-23 Stratify, Inc. Method and system for classifying or clustering one item into multiple categories
US20030028424A1 (en) * 2001-06-05 2003-02-06 Catalina Marketing International, Inc. Method and system for the direct delivery of product samples
US7430532B2 (en) * 2001-06-12 2008-09-30 Blackrock Financial Management, Inc. System and method for trade entry
US20030088565A1 (en) * 2001-10-15 2003-05-08 Insightful Corporation Method and system for mining large data sets
US20030149586A1 (en) * 2001-11-07 2003-08-07 Enkata Technologies Method and system for root cause analysis of structured and unstructured data
US20030228541A1 (en) * 2002-06-10 2003-12-11 International Business Machines Corporation Hybrid electronic mask
US20040098390A1 (en) * 2002-11-14 2004-05-20 David Bayliss Method for sorting and distributing data among a plurality of nodes
US20060164257A1 (en) * 2003-07-17 2006-07-27 Paolo Giubbini Method and system for remote updates of meters for metering the consumption of electricity, water or gas
US20080276232A1 (en) * 2003-09-25 2008-11-06 International Business Machines Corporation Processor Dedicated Code Handling in a Multi-Processor Environment
US20050240085A1 (en) * 2004-01-16 2005-10-27 Basf Aktiengesellschaft Balanced care product customization
US20050267889A1 (en) * 2004-02-09 2005-12-01 Coremetrics, Inc. System and method of managing software product-line customizations
US20070294583A1 (en) * 2004-02-09 2007-12-20 Continental Teves Ag & Co. Ohg Device and Method for Analyzing Embedded Systems for Safety-Critical Computer Systems in Motor Vehicles
US20050187977A1 (en) * 2004-02-21 2005-08-25 Datallegro, Inc. Ultra-shared-nothing parallel database
US7672877B1 (en) * 2004-02-26 2010-03-02 Yahoo! Inc. Product data classification
US20050197883A1 (en) * 2004-03-08 2005-09-08 Sap Aktiengesellschaft Method and system for classifying retail products and services using characteristic-based grouping structures
US20060009935A1 (en) * 2004-07-09 2006-01-12 Uzarski Donald R Knowledge-based condition survey inspection (KBCSI) framework and procedure
US20060080141A1 (en) * 2004-10-08 2006-04-13 Sentillion, Inc. Method and apparatus for processing a context change request in a CCOW environment
US20060259358A1 (en) * 2005-05-16 2006-11-16 Hometown Info, Inc. Grocery scoring
US20070028111A1 (en) * 2005-07-01 2007-02-01 Fred Covely Methods and apparatus for authentication of content delivery and playback applications
US20090070131A1 (en) * 2005-12-29 2009-03-12 Lin Chen Standardized urban product
US20070160320A1 (en) * 2006-01-06 2007-07-12 The Procter & Gamble Company Merchandising systems, methods of mechandising, and point-of-sale devices comprising micro-optics technology
US20070174290A1 (en) * 2006-01-19 2007-07-26 International Business Machines Corporation System and architecture for enterprise-scale, parallel data mining
US20070276676A1 (en) * 2006-05-23 2007-11-29 Christopher Hoenig Social information system
US20080059489A1 (en) * 2006-08-30 2008-03-06 International Business Machines Corporation Method for parallel query processing with non-dedicated, heterogeneous computers that is resilient to load bursts and node failures
US20080077469A1 (en) * 2006-09-27 2008-03-27 Philport Joseph C Method and system for determining media exposure
US20090006156A1 (en) * 2007-01-26 2009-01-01 Herbert Dennis Hunt Associating a granting matrix with an analytic platform
US20080294372A1 (en) * 2007-01-26 2008-11-27 Herbert Dennis Hunt Projection facility within an analytic platform
US20080288209A1 (en) * 2007-01-26 2008-11-20 Herbert Dennis Hunt Flexible projection facility within an analytic platform
US20090012971A1 (en) * 2007-01-26 2009-01-08 Herbert Dennis Hunt Similarity matching of products based on multiple classification schemes
US20080270363A1 (en) * 2007-01-26 2008-10-30 Herbert Dennis Hunt Cluster processing of a core information matrix
US20080228797A1 (en) * 2007-03-16 2008-09-18 Expanse Networks, Inc. Creation of Attribute Combination Databases Using Expanded Attribute Profiles
US20090132609A1 (en) * 2007-11-16 2009-05-21 Eric Lawrence Barsness Real time data replication for query execution in a massively parallel computer
US20090132541A1 (en) * 2007-11-19 2009-05-21 Eric Lawrence Barsness Managing database resources used for optimizing query execution on a parallel computer system

Cited By (44)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8160984B2 (en) 2007-01-26 2012-04-17 Symphonyiri Group, Inc. Similarity matching of a competitor's products
US9466063B2 (en) 2007-01-26 2016-10-11 Information Resources, Inc. Cluster processing of an aggregated dataset
US9262503B2 (en) 2007-01-26 2016-02-16 Information Resources, Inc. Similarity matching of products based on multiple classification schemes
US8719266B2 (en) 2007-01-26 2014-05-06 Information Resources, Inc. Data perturbation of non-unique values
US20110137924A1 (en) * 2007-01-26 2011-06-09 Herbert Dennis Hunt Cluster processing of an aggregated dataset
US8489532B2 (en) 2007-01-26 2013-07-16 Information Resources, Inc. Similarity matching of a competitor's products
US20080294583A1 (en) * 2007-01-26 2008-11-27 Herbert Dennis Hunt Similarity matching of a competitor's products
US8452737B2 (en) 2008-07-31 2013-05-28 Microsoft Corporation Efficient column based data encoding for large-scale data storage
US8108361B2 (en) * 2008-07-31 2012-01-31 Microsoft Corporation Efficient column based data encoding for large-scale data storage
US20100030796A1 (en) * 2008-07-31 2010-02-04 Microsoft Corporation Efficient column based data encoding for large-scale data storage
US20100053616A1 (en) * 2008-09-03 2010-03-04 Macronix International Co., Ltd. Alignment mark and method of getting position reference for wafer
US20110179028A1 (en) * 2010-01-15 2011-07-21 Microsoft Corporation Aggregating data from a work queue
US8645377B2 (en) * 2010-01-15 2014-02-04 Microsoft Corporation Aggregating data from a work queue
US8402027B1 (en) * 2010-02-11 2013-03-19 Disney Enterprises, Inc. System and method for hybrid hierarchical segmentation
US8918388B1 (en) * 2010-02-26 2014-12-23 Turn Inc. Custom data warehouse on top of mapreduce
WO2012012968A1 (en) * 2010-07-28 2012-02-02 北京播思软件技术有限公司 Data partitioning method for distributed parallel database system
CN101916261A (en) * 2010-07-28 2010-12-15 北京播思软件技术有限公司 Data partitioning method for distributed parallel database system
US20120066224A1 (en) * 2010-09-15 2012-03-15 International Business Machines Corporation Clustering of analytic functions
US8560544B2 (en) * 2010-09-15 2013-10-15 International Business Machines Corporation Clustering of analytic functions
US20160239797A1 (en) * 2012-05-01 2016-08-18 Hand Held Products, Inc. Dynamic scan context determination for asset reconciliation background
US9934486B2 (en) * 2012-05-01 2018-04-03 Hand Held Products, Inc. Dynamic scan context determination for asset reconciliation background
US11004094B2 (en) 2012-06-01 2021-05-11 Comscore, Inc. Systems and methods for calibrating user and consumer data
US9519910B2 (en) 2012-06-01 2016-12-13 Rentrak Corporation System and methods for calibrating user and consumer data
US8543523B1 (en) 2012-06-01 2013-09-24 Rentrak Corporation Systems and methods for calibrating user and consumer data
CN103473374A (en) * 2013-09-29 2013-12-25 方正国际软件有限公司 Patient data partitioning system and patient data partitioning method
US10491663B1 (en) * 2013-10-28 2019-11-26 Amazon Technologies, Inc. Heterogeneous computations on homogeneous input data
US20150134401A1 (en) * 2013-11-09 2015-05-14 Carsten Heuer In-memory end-to-end process of predictive analytics
US10664457B2 (en) 2015-09-30 2020-05-26 Bank Of America Corporation System for real-time data structuring and storage
US10069891B2 (en) 2015-09-30 2018-09-04 Bank Of America Corporation Channel accessible single function micro service data collection process for light analytics
US10755344B2 (en) 2015-09-30 2020-08-25 Bank Of America Corporation System framework processor for channel contacts
US9747175B2 (en) 2015-09-30 2017-08-29 Bank Of America Corporation System for aggregation and transformation of real-time data
US20180210791A1 (en) * 2015-11-02 2018-07-26 Chicago Mercantile Exchange Inc. Clustered fault tolerance systems and methods using load-based failover
US9952932B2 (en) * 2015-11-02 2018-04-24 Chicago Mercantile Exchange Inc. Clustered fault tolerance systems and methods using load-based failover
US10592345B2 (en) * 2015-11-02 2020-03-17 Chicago Mercantile Exchange Inc. Clustered fault tolerance systems and methods using load-based failover
US10592287B2 (en) 2015-11-20 2020-03-17 Red Hat, Inc. API and user interface for MapReduce jobs
US11354683B1 (en) 2015-12-30 2022-06-07 Videomining Corporation Method and system for creating anonymous shopper panel using multi-modal sensor fusion
US10262331B1 (en) 2016-01-29 2019-04-16 Videomining Corporation Cross-channel in-store shopper behavior analysis
US10963893B1 (en) 2016-02-23 2021-03-30 Videomining Corporation Personalized decision tree based on in-store behavior analysis
US10387896B1 (en) 2016-04-27 2019-08-20 Videomining Corporation At-shelf brand strength tracking and decision analytics
US10354262B1 (en) 2016-06-02 2019-07-16 Videomining Corporation Brand-switching analysis using longitudinal tracking of at-shelf shopper behavior
US20180374010A1 (en) * 2017-06-26 2018-12-27 International Business Machines Corporation Predicting early warning signals in project delivery
CN109242048A (en) * 2018-11-07 2019-01-18 电子科技大学 Sensation target distributed clustering method based on time series
CN109783515A (en) * 2019-01-25 2019-05-21 上海创景信息科技有限公司 More relation data tracks retroactive method and system based on database
CN111241162A (en) * 2020-01-16 2020-06-05 同济大学 Method for analyzing travel behaviors of passengers under high-speed railway network formation condition and storage medium

Also Published As

Publication number Publication date
US9466063B2 (en) 2016-10-11
US20110137924A1 (en) 2011-06-09

Similar Documents

Publication Publication Date Title
US9466063B2 (en) Cluster processing of an aggregated dataset
US20080288522A1 (en) Creating and storing a data field alteration datum using an analytic platform
US8489532B2 (en) Similarity matching of a competitor's products
US9262503B2 (en) Similarity matching of products based on multiple classification schemes
US7949639B2 (en) Attribute segments and data table bias reduction
US9390158B2 (en) Dimensional compression using an analytic platform
US20080288209A1 (en) Flexible projection facility within an analytic platform
US20090006788A1 (en) Associating a flexible data hierarchy with an availability condition in a granting matrix
US20080294372A1 (en) Projection facility within an analytic platform
Ponniah Data warehousing fundamentals for IT professionals
US8041760B2 (en) Service oriented architecture for a loading function in a data integration platform
US7814470B2 (en) Multiple service bindings for a real time data integration service
US8060553B2 (en) Service oriented architecture for a transformation function in a data integration platform
US7814142B2 (en) User interface service for a services oriented architecture in a data integration platform
US20080294996A1 (en) Customized retailer portal within an analytic platform
US20060235714A1 (en) Enabling flexible scalable delivery of on demand datasets
US20080270363A1 (en) Cluster processing of a core information matrix
US20060069717A1 (en) Security service for a services oriented architecture in a data integration platform
US20050240592A1 (en) Real time data integration for supply chain management
US20050262193A1 (en) Logging service for a services oriented architecture in a data integration platform
US20050223109A1 (en) Data integration through a services oriented architecture
US20050234969A1 (en) Services oriented architecture for handling metadata in a data integration platform
US20050235274A1 (en) Real time data integration for inventory management
US20050262189A1 (en) Server-side application programming interface for a real time data integration service
US20050222931A1 (en) Real time data integration services for financial information data integration

Legal Events

Date Code Title Description
STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION

AS Assignment

Owner name: SYMPHONYIRI GROUP, INC., ILLINOIS

Free format text: CHANGE OF NAME;ASSIGNOR:INFORMATION RESOURCES, INC.;REEL/FRAME:026787/0573

Effective date: 20100525

Owner name: INFORMATION RESOURCES, INC., ILLINOIS

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:HUNT, HERBERT D.;WEST, JOHN R.;REEL/FRAME:026784/0657

Effective date: 20080702

Owner name: SYMPHONYIRI GROUP, INC., ILLINOIS

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:GIBBS, MARSHALL ASHBY, JR.;GRIGLIONE, BRADLEY MICHAEL;HUDSON, GREGORY DAVID NEIL;AND OTHERS;SIGNING DATES FROM 20110524 TO 20110709;REEL/FRAME:026784/0806