• Netezza Architecture

    The Netezza Performance Server system consists of a closely coupled server with many parallel Snippet Processing Units, each with their own disk and streaming database logic chip to perform fast pattern matching.

    Netezza Architecture

    Brajesh

    The Netezza Performance Server (NPS) system�s architecture, depicted in diagram below, is a two-tiered system designed to handle very large queries from multiple users. The first tier is a high-performance Linux symmetric multiprocessing host. The host compiles queries received from business information (BI) applications, and generates query execution plans. It then divides a query into a sequence of sub-tasks, or snippets, which can be executed in parallel, and distributes the snippets to the second tier for execution. The host returns the final results to the requesting application thus providing the programming advantages of appearing to be a traditional database server.

    Netezza Architecture

    The second tier consists of dozens to hundreds or thousands of Snippet Processing Units (SPUs) operating in parallel. Each SPU is an intelligent query processing and storage node, and consists of a powerful commodity processor, dedicated memory, a disk drive and a field-programmable disk controller with hard-wired logic to manage data flows and process queries at the disk level, as depicted in Figure 2. The massively parallel, shared-nothing SPU blades provide the performance advantages of massively parallel processors.

    Nearly all query processing is done at the SPU level, with each SPU operating on its portion of the database. All operations that easily lend themselves to parallel processing (including record operations, parsing, filtering, projecting, interlocking and logging) are performed by the SPU nodes, which significantly reduces the amount of data moved within the system. Operations on sets of intermediate results, such as sorts, joins and aggregates, are executed primarily on the SPUs, but can also be done on the host, depending on the processing cost of that operation.

    Netezza SPU

    Intelligent Query Streaming is performed on each SPU by a Field-Programmable Gate Array (FPGA) chip that functions as the disk controller, but which is also capable of basic processing as data is read from the disk. The system is able to run critical database query functions such as parsing, filtering and projecting at full disk reading speed, while maintaining full ACID (Atomicity, Consistency, Isolation, and Durability) transactional operations of the database. Data flows from disk to memory in a single laminar stream, rather than as a series of disjointed steps that would require materializing partial results.

    To achieve high performance, the storage interconnection, which is a bottleneck with traditional systems, is eliminated by directly attaching the disks so that data can stream straight into the FPGA for initial query filtering. Then, to further reduce the workload on the central server, the intermediate query tasks are performed in parallel on the SPUs. The internal network traffic has been reduced by approximately two orders of magnitude by only using the system-wide, gigabit Ethernet network to transmit intermediate results, rather than the entire collection of raw data. As a result, the I/O bus and memory bus on the host computer are only used for assembling final results.

    In general-purpose systems, I/O bottlenecks are commonplace as the system is scaled to accommodate complex queries. On the other hand, in this architecture, additional arrays of snippet processing units can be added to the NPS system without impacting I/O performance because query processing involves just a minute fraction of the data traffic associated with traditional systems, and because storage and processing are tightly coupled into a single unit. The autonomy of the SPUs creates further conditions for a highly scalable system, allowing SPUs to be added without worrying about coordination with other units. As a result, growing data volumes can be managed and accommodated with orderly, predictable investments.

    HOW IT WORKS

    As the data is loaded into the Appliance, it intelligently separates each table across the 108 SPUs. Typically, the hard disk is the slowest part of a computer. Imagine 108 of these spinning up at once, loading a small piece of the table. This is how Netezza achieves a 500 Gigabyte an hour load time.

    After a piece of the table is loaded and stored on each SPU (computer on an integrated circuit card), each column is analyzed to gain descriptive statistics such as minimum and maximum values. These values are stored on each of the 108 SPUs, instead of indexes, which take time to create, updated and take up unnecessary space. Imagine your environment without the need to create indexes.

    When it is time to query the data, a master computer inside of the Appliance queries the SPUs to see which ones contain the data required. Only the SPUs that contain appropriate data return information, therefore less movement of information across the network to the Business Intelligence/Analytics Server.

    For joining data, it gets even better. The Appliance distributes data in multiple tables across multiple SPUs by a key. Each SPU contains partial data for multiple tables. It joins parts of each table locally on each SPU returning only the local result. All of the �local results� are assembled internally in the cabinet and then returned to the Business Intelligence/Analytics Server as a query result. This methodology also contributes to the speed story.

    The key to all of this is �less movement of data across the network�. The Appliance only returns data required back to the Business Intelligence/Analytics server across the organization�s 1000/100 MB network. This is very different from traditional processing where the Business Intelligence/Analytics software typically extracts most of the data from the database to do its processing on its own server. The database does the work to determine the data needed, returning a smaller subset result to the Business Intelligence/Analytics server.

    The Netezza Data Warehouse Appliance looks like a refrigerator that rolls into the Data Center. Each rack holds 12.5 Terabytes of storage. For more storage, just chain a bunch of these �refrigerator looking� racks together. We believe it is simpler because all of the pieces are contained in a single rack. Many vendors on the market require a complement of software, hardware, database technology and networking to make their solutions work.