Row Store Vs Column Store
A column-oriented DBMS (or columnar database) is a database management system (DBMS) that stores data tables as columns rather than as rows. Both columnar and row databases use traditional database languages like SQL to load data and perform queries. Both row and columnar databases can become the backbone in a system to serve data for common ETL and data visualization tools. However, by storing data in columns rather than rows, the database can more precisely access the data it needs to answer a query rather than scanning and discarding unwanted data in rows. Query performance is often increased as a result, particularly in very large data sets.
Basic Understanding of Row Store Vs Column Store
A database table is conceptually a two dimensional structure stored in rows and columns
Because computer memory is structured linearly, there are two options for the sequences of cell values stored in contiguous memory locations. Row storage stores data in a sequence of rows while the column storage stores data in sequence of columns
Let us Consider the Below Table and see how a table is stored
Row Store Column Store
When the data is stored in row store the each row is stored in continuous memory locations in a perfect sequence. When I need to fetch data from third column, it should go though the entire table and fetch the record. It is very expensive if the table is too large.
Now let us consider the second case i.e. Column Store. Each column in the table acts as an individual table, and gets stored separately. When each column acts as an individual table, each of these individual mini-tables can be indexed (sorted) and compressed ( Duplicates will be merged). This makes sure that each of these tables only contains a unique entry.
To realize this, imagine a table with a million rows. Most of the columns would have only a few hundred or thousand at most of unique values. Compression makes sure you save disk space and indexing makes sure you find things faster.
-> In Case of OLTP Column Store is slow, since writing to a row requires seeking to each column for an insert or multi-column update. Whereas Write operations on a row-oriented storage involve writing to a contiguous portion of the storage so it is much faster.
Advantages of column-based tables:
Faster Data Access:
Only affected columns have to be read during the selection process of a query. Any of the columns can serve as an index.Better Compression:
Columnar data storage allows highly efficient compression because the majority of the columns contain only few distinct values (compared to number of rows).
Better parallel Processing:
In a column store, data is already vertically partitioned. This means that operations on different columns can easily be processed in parallel. If multiple columns need to be searched or aggregated, each of these operations can be assigned to a different processor core.
Advantages and disadvantages of row-based tables:
Row based tables have advantages in the following circumstances:
- The application needs to only process a single record at one time (many selects and/or updates of single records).
- The application typically needs to access a complete record (or row).
- Neither aggregations nor fast searching are required.
- The table has a small number of rows (e. g. configuration tables, system tables).
Row based tables have dis-advantages in the following circumstances:
- In case of analytic applications where aggregation are used and fast search and processing is required. In row based tables all data in a row has to be read even though the requirement may be to access data from a few columns.
Comments
Post a Comment