Written by: Juan Carlos Olamendy Turruellas Introduction In this article, I want to talk about a nice feature that comes with Oracle database 12c Release 1 . This feature enables storing in memory columns, tables, partitions and materialized views in a columnar format rather than the typical row-based format. Today is very common to find OLTP systems been adapted to run analytical workloads for supporting real-time decision-making instead of having a separate data warehouses and data marts. Designing a database to support both transactional and analytical workloads without degradation of the performance is a really hard challenge. The typical read/write performance dichotomy and trade-off, if we want to support high-speed read workload, we need to create indexes (that slows down the write workload) and de-normalize (that creates inconsistencies in the transactional data) and inversely if we want to support high-speed write workload. At the end of the day, OLTP and RDBMS in general are optimized for consistent and efficient write workload such as recording transactions and serving row-oriented data. A row-based format enables accessing quickly to all of the columns in a record since all of the data for a given row are kept together either in memory in the database buffer cache or on the storage medium. In an analytical workload when we work with aggregation of data, we need a different data model approach to support few columns but which span a huge of rows. That’s why columnar format is a better approach for dealing with analytical workload. The great advantage of in-memory column store in Oracle 12c is that the same database has ability to perform analytical workload together transaction workload without any database schema and application change or re-design. So, both workloads can be served by the same Oracle database instance. So, what’s the In-Memory Column Store feature? This is a new memory area in the System Global Area ( SGA ) keeping a copy of the data in columnar format. It doesn’t replace a buffer cache; instead it’s a complement; so the data is represented in memory in row- and columnar-based format. The relation between the in-memory area and the SGA is depicted in the figure 01. Figure 01 The logic of the in-memory area is as followed: When the data is requested for transactional workload (row-oriented read/write operation), then it’s loaded from the storage into the buffer cache and finally it’s served. When the data is requested for analytical workload (read-only aggreation operation), then it’s loaded from the storage into the in-memory area and finally it’s served. When a transaction (write=== INSERT, UPDATE, DELETE ) is committed, then the changes are synchronized in both the buffer cache and in-memory area. Demo Time We can control the size of the in-memory area by using the initialization parameter INMEMORY_SIZE ( default 0 ) regardless we’re using AMM (setting to MEMORY_TARGET ) and ASMM (setting to SGA_TARGET ). The current size of the in-memory area is visible in V$SGA view. As a static pool, any changes to the INMEMORY_SIZE parameter will not take effect until the database instance is restarted. Let’s suppose we want to target 3GB for the SGA and leave 2GB for the in-memory area inside the SGA , we can do it using the listing 01. SQL> ALTER SYSTEM SET SGA_TARGET=3G SCOPE=SPFILE; SQL> ALTER SYSTEM SET INMEMORY_SIZE=2G SCOPE=SPFILE; SQL> SHUTDOWN IMMEDIATE; SQL> STARTUP; ORACLE instance started. Total System Global Area 3221225472 bytes Fixed Size 2929552 bytes Variable Size 419433584 bytes Database Buffers 637534208 bytes Redo Buffers 13844480 bytes In-Memory Area 2147483648 bytes Database mounted. Database opened. Listing 01 In this example, the bulk of memory is assigned to the in-memory area leaving just around 600MB to the buffer caches. It’s remarkable to say one trade-off when using the In-Memory feature is to balance the use the memory available in the system. Transactional workloads perform best with row-based storage with data stored in the buffer cache, while analytics workloads perform best with columnar format with data stored in the in-memory area . We can see the current setting for the in-memory area as shown below in the listing 02. SQL> SHOW PARAMETER INMEMORY NAME TYPE VALUE ------------------------------------ ----------- ------------------------------ inmemory_clause_default string inmemory_force string DEFAULT inmemory_max_populate_servers integer 1 inmemory_query string ENABLE inmemory_size big integer 2G inmemory_trickle_repopulate_servers_ integer 1 percent optimizer_inmemory_aware boolean TRUE Listing 02 Once the in-memory area is setup, we can indicate which database objects can be place on it. We can indicate putting a table/materialized view inside the in-memory area after the first full-table scan as shown in the listing 03. SQL> ALTER TABLE my_aggregation_table INMEMORY; SQL> SELECT segment_name, populate_status FROM v$im_segments; no rows selected Listing 03 We can indicate putting a table/materialized view inside the in-memory area after the instance starts up as shown in the listing 04. SQL> ALTER TABLE my_aggregation_table INMEMORY PRIORITY CRITICAL; SQL> SELECT segment_name, populate_status FROM v$im_segments; SEGMENT_NAME POPULATE_ ------------------------------------ ----------- MY_AGGREGATION_TABLE COMPLETED Listing 04 Objects are populated into the in-memory area either in a prioritized list immediately after the database is opened or after they are scanned (queried) for the first time. The order in which objects are populated is controlled by the keyword PRIORITY (as shown in the listing 04) in five levels: CRITICAL , HIGH , MEDIUM , LOW and NONE . The default PRIORITY is NONE , which means an object is populated only after it is scanned for the first time as shown in the listing 03. When populating the in-memory area , the IMCO background process initiates population of in-memory enabled objects with priority CRITICAL , HIGH , MEDIUM , LOW. Then slave processes ( ORA_WXXX_ORASID ) are dynamically spawned to execute these tasks. The default number of slave processes is 1/2*CPU-cores. While the population is in progress, the database remains available for running queries, but it’s not possible to read data from the in-memory area until the population is completed. We can also indicate several compression options as shown in the listing 05. SQL> ALTER TABLE my_aggregation_table INMEMORY MEMCOMPRESS FOR QUERY; -- Default compression SQL> ALTER TABLE my_aggregation_table INMEMORY MEMCOMPRESS FOR CAPACITY HIGH; -- Capacity high compression SQL> ALTER TABLE my_aggregation_table INMEMORY MEMCOMPRESS FOR CAPACITY LOW; --default low compression Listing 04 There are six levels of compression: No Memcompress. Data is populated to the in-memory area without compression. MEMCOMPRESS FOR DML . Mainly for DML performance and minimal compression. MEMCOMPRESS FOR QUERY LOW (Default). Optimized for query performance. MEMCOMPRESS FOR QUERY HIGH . Optimized for query performance and space saving MEMCOMPRESS FOR CAPACITY LOW. Optimized for space saving compare to query performance MEMCOMPRESS FOR CAPACITY HIGH . Optimized for space saving and little bit less performance It’s remarkable to say that compression rate strongly depends on the data nature, although it’s possible to store more data in memory using in-memory area rather than using the buffer cache . We can set different in-memory options per column in a table as shown below in the listing 05. SQL> CREATE TABLE sales ( sales_amount NUMBER, items_amount NUMBER, region VARCHAR2(16), description VARCHAR2(64) ) INMEMORY INMEMORY MEMCOMPRESS FOR QUERY HIGH (sales_amount, items_amount) INMEMORY MEMCOMPRESS FOR CAPACITY HIGH (region) NO INMEMORY (description); SQL> SELECT segment_column_id, column_name, inmemory_compression FROM v$im_column_level WHERE table_name = 'sales' ORDER BY segment_column_id; SEGMENT_COLUMN_ID COLUMN_NAME INMEMORY_COMPRESSION -------------------- ----------------- ------------------------------ 1 SALES_AMOUNT FOR QUERY HIGH 2 ITEMS_AMOUNT FOR QUERY HIGH 3 REGION FOR CAPACITY HIGH 4 DESCRIPTION NO INMEMORY 4 rows selected. Listing 05 One of the best use cases of the In-Memory option is to store in memory as a columnar format a materialized view. Remember that materialized views are mainly used in BI solutions to store physical aggregated data from transactional data tables. Let’s supposed we want to visualize the sales done by region from the table on the listing 05, so we create a materialized view using the aggregated data and mark it to use the in-memory area for complex queries as shown in the listing 06. SQL> CREATE MATERIALIZED VIEW mv_total_sales_by_region INMEMORY MEMCOMPRESS FOR CAPACITY HIGH PRIORITY HIGH AS SELECT count(sales_amount) total_sales, region FROM sales GROUP BY region; SQL> SELECT total_sales, region FROM mv_total_sales_by_region; -- A high-speed query when reading directly from the in-memory area compared to reading from the buffer cache Listing 06 It’s remarkable to say that we can monitor the in-memory area using the following views: V$IM_SEGMENTS , v$IM_USER_SEGMENTS and v$IM_COLUMN_LEVEL . Conclusion In this article, I've showed a nice feature that comes with Oracle 12c Release 1 that enables managing both transactional and analytical workloads from the same database instance without to redesign your database schema or to deploy an independent data warehouse or data mart. Now you can apply theses concepts and real-world scripts to your own BI scenarios with Oracle database environments.
↧