One of the things I really like about attending DB2 conferences is the face-to-face time I get with people who otherwise would be on the other side of e-mail exchanges. I get a whole lot more out of in-person communication versus the electronic variety. Case in point: at IBM's recent Information on Demand event in Las Vegas, I ran into a friend who is a DB2 for z/OS database engineering leader at a large financial services firm. He talked up a new mainframe DB2 data serving reference architecture recently implemented for one of his company's mission-critical applications, and did so with an enthusiasm that could not have been fully conveyed through a text message. I got pretty fired up listening to the story this DBA had to tell, in part because of the infectious excitement with which it was recounted, but also because the system described so closely matches a vision of a DB2 for z/OS data-serving architecture that I've had in mind -- and have advocated -- for years. To see that vision validated in the form of a real-world system that is delivering high performance and high availability in a demanding production environment really made my day. I am convinced that what the aforementioned financial services firm (hereinafter referred to as Company XYZ) is doing represents the future of mainframe DB2 as a world-class enterprise data-serving platform. Read on if you want to know more.

Three characteristics of the reference DB2 for z/OS data architecture (so called because it is seen as the go-forward model by the folks at Company XYZ) really stand out in my mind and make it an example to be emulated:

  1. It is built on a DB2 data sharing / parallel sysplex foundation, for maximum availability and scalability (not only that -- these folks have done data sharing Really Right, as I'll explain).
  2. It leverages Big Memory (aka 64-bit addressing) for enhanced performance.
  3. The software stack on the mainframe servers is pretty short -- these are database machines, plain and simple.

A little elaboration now on these three key aspects of Company XYZ's DB2 for z/OS reference architecture:

The robust foundation: a DB2 data sharing group on a parallel sysplex mainframe cluster. It's well known that a standalone System z server running z/OS and DB2 can be counted on to provide very high levels of availability and scalability for a data-serving workload. These core strengths of the mainframe platform are further magnified when concurrent read/write access to the database is shared by multiple DB2 members of a data sharing group, running in the multiple z/OS LPARs (logical partitions) and multiple System z servers of a parallel sysplex. You're not going to beat the uptime delivered by that configuration: formerly planned outages for maintenance purposes are virtually eliminated (service levels of of DB2, z/OS, and other software components can be updated, and server hardware can be upgraded, with no -- I mean zero -- interruption of application access to the database), and the impact of an unplanned failure of a DB2 member or a z/OS LPAR or a server is greatly diminished (only data pages and/or rows that were in the process of being changed by programs running on a failed DB2 subsystem are temporarily unavailable following the failure, and those retained locks will be usually be freed up within a couple of minutes via automatic restart of the failed member). And scalability? Up to 32 DB2 subsystems (which could be running on 32 different mainframe servers) can be configured in one data sharing group.

Now, you can set up a DB2 data sharing group the right way, or the Really Right way. Company XYZ did it Really Right. Here's what I mean:

  • More z/OS LPARs and DB2 members than mainframes in the sysplex. I like having more than one z/OS LPAR (and DB2 subsystem) per mainframe in a parallel sysplex, because 1) you can route work away from one of the LPARs for DB2 or z/OS maintenance purposes and still have access to that server's processing capacity, and 2) more DB2 members means fewer retained locks and quicker restart in the event of a DB2 subsystem failure.
  • Dynamic VIPA network addressing. Availability and operational flexibility are optimized when remote DRDA clients use a dynamic VIPA (virtual IP address) to connect to the DB2 data sharing group (as long as at least one member of the data sharing group is up, a connection request specifying the group's VIPA can be successfully processed). A sysplex software component called the Sysplex Distributor handles load balancing across DB2 members for initial connection requests from remote systems (these will often be application servers), while load balancing for subsequent requests is managed at the DB2 member level.
  • Internal coupling facilities. ICFs (basically, coupling facility control code running in an LPAR on a mainframe server) are less expensive than external coupling facilities, not only with respect to acquisition cost, but also in terms of environmental expenses (floor space, power, cooling). [It's true that if the mainframe containing the ICF holding the lock structure and the shared communications area (SCA) should fail, and if on that mainframe there is also a member of the DB2 data sharing group, the result will be a group-wide outage unless the lock structure and SCA are duplexed in the second ICF. Company XYZ went with system-managed duplexing of the lock structure and SCA (DB2 manages group buffer pool duplexing in a very low-overhead way). Some other organizations using ICFs exclusively (i.e., no external coupling facilities) decide not to pay the overhead of system-managed lock structure and SCA duplexing, on the ground that a) a mainframe server failure is exceedingly unlikely, b) the group-wide outage would only occur if a particular mainframe (the one with the ICF in which the lock structure and SCA are located) were to fail, and c) the group-restart following a group-wide outage should complete within a few minutes. The right way to go regarding the use or non-use of system-managed lock structure and SCA duplexing will vary according to the needs of a given organization.]


Taking advantage of 64-bit addressing. Each of the LPARs in the parallel sysplex on which Company XYZ's model DB2 for z/OS data-serving system is built has more than 20 GB of central storage, and each DB2 subsystem (there is one per LPAR) has a buffer pool configuration that exceeds 10 GB in size. In these days of Big Memory (versus the paltry 2 GB to which we were limited not long ago), I don't think of a production-environment DB2 buffer pool configuration as being large unless the aggregate size of all pools in the subsystem is at least 10 GB. The reduced level of disk I/O activity that generally comes with a large buffer pool configuration can have a significant and positive impact on both the elapsed time and CPU efficiency of data access operations.

Lean, mean, data-serving machines. A production instance of DB2 for Linux, UNIX, and Windows (LUW) usually runs on a machine that is a dedicated data server -- data access code executes there, and that's it. Business-logic programs? They run on application servers. Presentation-logic programs? They might run on yet another tier of servers. The DB2 for LUW server Just Does Data. When I started my IT career in the early 1980s, a mainframe-based application was almost always entirely mainframe-based, by which I mean that all application functionality -- data access logic, business logic, and presentation logic -- was implemented in programs that ran on a mainframe server. Nowadays, I believe that the unmatched availability, scalability, reliability, and security offered by the System z platform is put to most advantageous use in the servicing of data access requests. In other words, I feel that a DB2 for z/OS system should be thought of, in an architectural sense, as a dedicated data server, just as we tend to think of DB2 for LUW systems (and other database management systems that run on Linux, UNIX, and/or Windows platforms) as dedicated data servers.

That's how DB2 for z/OS functions in Company XYZ's reference architecture: it just does data. Consequently, the software stack on the data-serving mainframes is relatively short, consisting of z/OS, DB2, RACF (security management), a data replication tool, some system automation and management tools, some performance monitoring tools, and little else. Transaction management is handled on application servers. Database access requests come in via the DB2 Distributed Data Facility (DDF), and much of the access logic is packaged in stored procedures (the preference at Company XYZ is DB2 9 native SQL procedures, because they perform very well and -- when invoked through calls that come through DDF -- much of their processing can be handled by zIIP engines).

Does this system, which looks so good on paper, deliver the goods? Absolutely. Volume has been taken north of 1400 transactions per second with excellent response time, and my DBA friend is confident that crossing the 2000-trans-per-second threshold won't be a problem. On the availability side, Company XYZ is getting industry-leading uptime. The message: System z is more than just capable of functioning effectively as a dedicated data server -- it works exceptionally well when used in that way. This is a clean, modern architecture that leverages what mainframes do best -- scale, serve, protect, secure -- in a way that addresses a wide range of application design requirements.

Here's a good coda for you: still at IOD in Las Vegas, and shortly after my conversation with the DBA from Company XYZ, I encountered another friend -- a lead DB2 technical professional at another company. He told me about the new DB2 for z/OS reference architecture that had recently been approved by his organization's IT executive management. The pillars of that architecture are a DB2 data sharing / parallel sysplex mainframe cluster, z/OS systems functioning as dedicated data servers, data requests coming in via the DB2 DDF, and data access logic packaged in DB2 9 native SQL procedures. I told this friend that he and his colleagues are definitely on the right track. It's a track that more and more DB2 for z/OS-using companies are traveling, and it could well be the right one for your organization.

 

Leave a comment below. If you don't see the Add a Comment button, please log in or register.
357 Views 0 Comments 0 References Permalink Tags: db2, z/os, data_serving, information_on_demand, architecture, high_performance, high_availability, financial_services, data_sharing, data_architecture, parallel_sysplex, cluster, 64_bit_addressing, ddf

As some of you may know, I'm a big fan of the native SQL procedure functionality introduced with DB2 for z/OS Version 9 (I've written a number of blog entries on the subject, starting with one posted last year). Native SQL procedures offer a number of advantages versus external SQL procedures (formerly known simply as SQL procedures in pre-Version 9 DB2 environments), including (generally) better performance, zIIP engine eligibility when called from a remote client via DRDA, and simplified lifecycle processes (referring to development, deployment, and management). These advantages have plenty of folks looking to convert external SQL procedures to native SQL procedures, and that's fine, but some of these people are under the impression that the conversion process involves nothing more than dropping an external SQL procedure and re-issuing that routine's CREATE PROCEDURE statement, minus the EXTERNAL NAME and FENCED options (if either had been specified in creating the external SQL procedure). This may in fact do the trick for a very simple SQL procedure, but in many cases the external-to-native conversion will be a more involved process. In this post I'll provide some information as to why this is so, along with a link to a well-written "technote" on IBM's Web site that contains further details on the topic.

First, a little more on this drop-and-recreate-without-EXTERNAL-NAME-or-FENCED business. It is true that, in a DB2 9 New Function Mode system, a SQL procedure (i.e., a stored procedure for which the routine source is contained within the CREATE PROCEDURE statement) will be external if it is created with the EXTERNAL NAME and/or FENCED options specified, and native if created with neither EXTERNAL NAME nor FENCED specified; however, it is not necessarily the case that an external SQL procedure re-created sans EXTERNAL NAME and FENCED will behave as you want it to when executed as a native SQL procedure. Why is this so? Well, some of the reasons are kind of obvious when you think about it. Others are less so. On the obvious side, think about options that you'd specify for an external SQL procedure (which ends up becoming a C language program with embedded SQL) at precompile time (e.g., VERSION, DATE, DEC) and at bind time (e.g., QUALIFIER, CURRENTDATA, ISOLATION). For a native SQL procedure, there's nothing to precompile (as there is no associated external-to-DB2 program), and the package is generated as part of CREATE PROCEDURE execution (versus by way of a separate BIND PACKAGE step). That being the case, these options for a native SQL procedure are specified via CREATE PROCEDURE options (some of which have names that are slightly different from the corresponding precompile options, an example being the PACKAGE OWNER option of CREATE PROCEDURE, which corresponds to the OWNER option of the BIND PACKAGE command). Here's another reason to pay attention to these options of CREATE PROCEDURE when converting an external SQL procedure to a native SQL procedure: the default options for CURRENTDATA and ISOLATION changed to NO and CS, respectively, in the DB2 9 environment.

A less-obvious consideration when it comes to external-to-native conversion of SQL procedures has to do with condition handlers. These are statements in the SQL procedure that are executed in the event of an error or warning situation occurring. External SQL procedures do not allow for nested compound SQL statements (a compound SQL statement is a set of one or more statements, delimited by BEGIN and END, that is treated as a block of code); so, if you had within an external SQL procedure a compound SQL statement, and you wanted within that compound SQL statement a multi-statement condition handler, you couldn't do that by way of a nested compound statement. What people would often do instead in that case is code the condition handler in the form of an IF statement containing multiple SQL statements. In converting such an external SQL procedure to a native SQL procedure, the IF-coded condition handler should be changed to a nested compound SQL statement set off by BEGIN and END (in fact, it would be a good native SQL procedure coding practice to bracket even a single-statement condition handler with BEGIN and END). This change would be very much advised not just because nested compound statements are allowed in a native SQL procedure, but also because, in a native SQL procedure, an IF statement intended to ensure execution of multiple statements in an IF-based condition handler (e.g., IF 1=1 THEN...) would itself clear the diagnostics area, thereby preventing (most likely) the condition handler from functioning as desired (in an external SQL procedure, it so happens that a trivial IF statement such as IF 1=1 will not clear the diagnostics area).

Also in the not-so-obvious category of reasons to change the body of a SQL procedure when converting from external to internal: resolution of unqualified parameter, variable, and column names differs depending on whether a SQL procedure is external or native. Technically, there's nothing to prevent you from giving to a parameter or variable in a SQL procedure a name that's the same as one used for a column in a table that the SQL procedure references. If a statement in an external SQL procedure contains a name that could refer to a variable or a parameter or a column, DB2 will, in processing that statement, check to see if a variable of that name has been declared in the SQL procedure. If a matching variable name cannot be found, DB2 will check to see if the name is used for one of the procedure's parameters. If neither a matching variable nor a matching parameter name is found, DB2 will assume that the name refers to a column in a table referenced by the procedure. If the same statement is encountered in a native SQL procedure, DB2 will first check to see if the name is that of a column of a table referenced by the procedure. If a matching column name is not found, DB2 will then look for a matching variable name, and after that for a matching parameter name. If no match is found, DB2 will return an error if VALIDATE BIND was specified in the CREATE statement for the native SQL procedure (if VALIDATE RUN was specified, DB2 will assume that the name refers to a column, and will return an error if no such column is found at run time). Given this difference in parameter/variable/column name resolution, it would be a good idea to remove ambiguities with respect to these names in an external SQL procedure prior to converting the routine to a native SQL procedure. This could be done either through a naming convention that would distinguish variable and parameter names from column names (perhaps by prefixing variable and parameter names with v_ and p_, respectively) or by qualifying the names. Variable names are qualified by the label of the compound statement in which they are declared (so if you're going to go this route, put a label before the BEGIN and after the END that frame the compound statement), parameter names are qualified by the procedure name, and column names are qualified by the name of the associated table or view.

Then there's the matter of the package collection name that will be used when the SQL procedure is executed. For an external SQL procedure, this name can be specified via the COLLID option of the CREATE PROCEDURE statement. [If NO COLLID -- the default -- is specified, the name will be the same as the package collection of the calling program. If the calling program does not use a package, the SQL procedure's package will be resolved using the value of CURRENT PACKAGE PATH or CURRENT PACKAGESET, or the plan's PKLIST specification.] When a native SQL procedure is created, the name of the associated package's collection will be the same as the procedure's schema. In terms of external-to-native SQL procedure conversion, here's what that means:

  • If external SQL procedures were created with a COLLID value equal to the procedure's schema, it's smooth sailing ahead.
  • If external procedures were create with a COLLID value other than the procedure's schema, a relatively minor adjustment is in order. This adjustment could take one of two forms: a) go with the one "root" package for the native SQL procedure in the collection with the name matching the routine's schema, and ensure that this collection will be searched when the procedure is called, or b) add a SET CURRENT PACKAGESET statement to the body of the SQL procedure, specifying the collection name used for COLLID in creating the external SQL procedure, and (via BIND PACKAGE COPY) place a copy of the "root" package of the native SQL procedure in that collection.
  • If external SQL procedures were bound with NO COLLID, there could be a good bit of related work in converting those routines to native SQL procedures, especially if a number of "variants" of an external SQL procedure's "root" package were generated and placed in different collections. The external SQL procedure package variants will have to be identified, SET CURRENT PACKAGESET will be needed in the native SQL procedure to navigate to the desired collection at run time (perhaps using a value passed by the caller as a parameter), and variants of the native SQL procedure's "root" package (again, that being the one in the collection with the name matching the procedure's schema) will need to be copied into the collections in which the external SQL procedure package variants had been placed.


The information I've provided in this blog entry is not exhaustive with respect to external-to-native SQL procedure conversion -- my intent was to cover the major issues that should be taken into consideration in planning your conversion process. For more details, check out the excellent "technote" document written by Tom Miller, a senior member of IBM's DB2 for z/OS development team and an authority on SQL procedures (both external and native). To access this document, go to the DB2 for z/OS Support page on IBM's Web site, enter "external native" as your search terms, and click on the search button -- Tom's document should be at the top of the search result list.

Native SQL procedures are the way of the future, and I encourage you to develop a plan for converting external SQL procedures to native SQL procedures (if you have any of the former). It's very do-able, and for some external SQL procedures the conversion will in fact be very straightforward. For others, more conversion effort will be required, but the payoff should make that extra effort worthwhile.

 

Leave a comment below. If you don't see the Add a Comment button, please log in or register.
266 Views 0 Comments 0 References Permalink Tags: robert_catterall, db2_9, sql, native_sql_procedures, ziip, z/os

Apologies for the delay in getting this entry posted to my blog -- the time since IBM's 2009 Information On Demand conference concluded on October 29 has been very busy for me. Now I have a little downtime, so I can share with you what I picked up on day 4 of the conference.

"Not Your Father's Database System," indeed - Guy Lohman, of IBM's Almaden (California) Research Center, delivered a very interesting presentation on the Smart Analytics Optimizer, a just-around-the-corner product (meaning, not yet formally announced) about which you'll be hearing a lot in the weeks and months to come.
Developed jointly by the Almaden Research Center and IBM's Silicon Valley and Boeblingen (Germany) software labs, the IBM Smart Analytics Optimizer (ISAO) is a business intelligence query-acceleration system that network-attaches to a mainframe server running DB2 for z/OS. The way it works: using a GUI, a DBA copies a portion of a data warehouse (one or more star schemas -- fact tables and their associated dimension tables) to the ISAO (in effect, you set up a data mart on the ISAO). Thereafter, queries that are submitted to DB2 (the ISAO is transparent from a user perspective) will be routed by DB2 to the ISAO if 1) the queries reference tables that have been copied to the ISAO, and 2) DB2 determines that they will run faster if executed on the ISAO. Here's the interesting part: the longer a query would run if executed in the DB2 system, the greater the degree of acceleration you'll get if it runs on the ISAO.

When I say "acceleration," I mean big-time speed-up, as in ten to one hundred times improvement in query run times (the ISAO "sweet spot" is execution of queries that contain aggregation functions -- such as AVERAGE and SUM -- and a GROUP BY clause). How is this accomplished? The ISAO hardware is commodity stuff: multi-core microprocessors with a lot of server memory in a blade center configuration (and several of these blade centers can be tied together in one ISAO system). The query processing software that runs on the ISAO hardware is anything but commodity -- it's a built-from-the-ground-up application that implements a hybrid row-store/column-store in-memory data server. Want DBA ease-of-use? You've got it: there's no need to implement indexes or materialized query tables or any other physical database design extras in order to get great performance for otherwise long-running queries. This is so because the ISAO does a simple thing -- scan data in one or more tables -- in a very advanced, multi-threaded way to deliver consistently good response time (typically less than 10 seconds) for most any query sent its way by DB2. [Caveat: As the ISAO does no I/Os (all data that it accesses is always in memory), it runs its CPUs flat-out to get a single query done as quickly as possible before doing the same for the next query; thus, if
queries are sent to the ISAO by DB2 at a rate that exceeds the rate at which the ISAO can process the queries, response times could increase to some degree -- this is just basic queuing theory.]

The ISAO is what's known as disruptive technology. As previously mentioned, you'll soon be hearing a lot more about it (the IOD session I attended was a "technology preview"). I'll be watching that space for sure.

A DB2 for z/OS data warehouse tune-up - Nin Lei, who works at IBM's System z benchmark center in Poughkeepsie (New York), delivered a presentation on performance management of a data warehouse mixed query workload ("mixed" referring to a combination of short- and long-running queries). A couple of the points made in the course of the session:

  • You might want to cap the degree of query parallelization on the system - There is a DB2 for z/OS ZPARM parameter, PARAMDEG, that can be used to set an upper limit on the degree to which DB2 will split a query for parallelized execution. For some time now, I've advocated going with a PARAMDEG value of 0 (the default), which leaves the max degree of parallelization decision up to DB2. Nin made a good case for setting PARAMDEG to a value equal to twice the number of engines in the z/OS LPAR in which DB2 is running. I may rethink my PARAMDEG = 0 recommendation.
  • The WLM_SET_CLIENT_INFO stored procedure is available on the DB2 for z/OS platform, too - This stored procedure, previously available only on the DB2 for Linux/UNIX/Windows and DB2 for System i platforms, was added to mainframe DB2 V8 and V9 environments via the fix for APAR PK74330. WLM_SET_CLIENT_INFO can be used to change the value of the so-called client special registers on a DB2 for z/OS server (CURRENT CLIENT_ACCTNG, CURRENT CLIENT_USERID, CURRENT CLIENT_WRKSTNNAME, and CURRENT CLIENT_APPLNAME). This capability provides greater flexibility in resource management and monitoring with respect to a query workload.


For fans of Big Memory - Chris Crone, Distinguished Engineer and member of the DB2 for z/OS team at IBM's Silicon Valley Lab, gave a presentation on 64-bit addressing in the mainframe DB2 environment. He said that development of this feature was motivated by a recognition that memory had become the key DB2 for z/OS system resource constraint as System z engines became faster and more numerous (referring to the ability to configure more central processors in a single z/OS image). Big DB2 buffer pools are needed these days because even a really fast I/O operation (involving a disk subsystem cache hit versus a read from spinning disk) can be painfully slow when a single mainframe engine can execute almost 1000 million instructions per second.

Here are a few of the many interesting items of information provided in Chris's session:

  • You can currently get up to 1.5 TB of memory on a System z server. Expect memory sizes of 3 TB or more in the near future.
  • The largest buffer pool configuration (aggregate size of all active buffer pools in a subsystem) that Chris has seen at a DB2 for z/OS site is 40 GB.
  • It is expected that the default RID pool size will be 400 MB in the next release of DB2 for z/OS (the RID pool in the DB2 database services address space is used for RID sort operations related to things such as multi-index access, list prefetch, and hybrid join).
  • The maximum size of the EDM pool components (EDM pool, skeleton pool, DBD pool, and statement pool) is expected to be much larger in the next release of DB2 (commonly referred to as DB2 X -- we'll get the actual version number at announcement time).
  • In the DB2 X environment, it's expected that 80-90% of the virtual storage needed for DB2 threads will be above the 2 GB "bar" in the DB2 database services address space. As a result, the number of threads that can be concurrently active will go way up with DB2 X (expect an upper limit of 20,000 for a subsystem, versus 2000 today).
  • DB2 data sharing groups (which run in a parallel sysplex mainframe cluster) could get really big -- IBM is looking at upping the limit on the number of DB2 subsystems in a group (currently 32).
  • Solid state storage is going to be a big deal -- the DB2 development team is looking at how to best leverage this technology.


After Chris's session, it was off to the airport to catch the red-eye back to Atlanta. I had a great week at IOD, and I'm looking forward to another great conference next year.

 

Leave a comment below. If you don't see the Add a Comment button, please log in or register.
132 Views 0 Comments 0 References Permalink Tags: iod, robert_catterall, db2, ibm, smart_analytics_optimizer, z/os, data_sharing, db2_x, buffer_pool, system_z, memory, data_warehouse, query_parallelization

Following is some good stuff that I picked up during the course of day 3 of IBM's 2009 Information on Demand conference:

IBM Information Management software executives had some interesting things to say - IBM got some of us bloggers together with some software execs for a Q&A session. A few highlights:

  • Interest in DB2 pureScale, the recently announced shared-data cluster for the DB2/AIX/Power platform, is strong. Demo sessions at the conference this week were full-up.
  • It used to be that organizations asked IBM about products. These days, companies are increasingly likely to ask about capabilities. IBM is responding by packaging software (and sometimes hardware) products into integrated offerings designed to fulfill these capability requirements.
  • New products at the upper end of IBM's information transformation software stack are driving requirements at the foundational level of the stack (where you'll find the database engines such as DB2), and even into IBM's hardware platforms (such as the Power Systems server line).
  • Regarding software-as-a-service (SaaS) and cloud computing, IBM sees a "broadening of capabilities" with respect to software delivery and pricing models.
  • The IBM folks in the room were pretty keyed up about the Company's new Smart Archive offerings, which can - among other things - drive cost savings by using discovery and analytics capabilities to determine which information (structured and unstructured data) an organization has to retain and archive.
  • Jeff Jonas, one of IBM's top scientists, talked about the huge increase in the amount of data streaming into many companies' systems (much of it from various sensors that emit various signals). People may assume that their organization cannot manage this informational in-surge, but Jeff noted that the more data you get into your system, the faster things can go ("It's like a jigsaw puzzle: the more pieces you put together, the more quickly you can correctly place other pieces").
  • Jeff also spoke of "enterprise amnesia:" a firm has so much information with which to deal that it loses track of some of it. Consequently, a large retailer will sometimes hire a person who had previously been fired for stealing from that same company.


Let's hear it for audience participation - I enjoyed delivering my presentation on DB2 for z/OS data warehouse performance. As usual, I got some great questions and comments from session attendees. After I mentioned that I'm usually comfortable with having more indexes on tables in a data warehouse versus an OLTP data-serving environment (I wrote of this in blog entry posted last year), I was asked if that statement applied to data warehouses that are updated in near-real time relative to source data changes (something that more organizations are doing these days). My response: in a continuously-updated data warehouse (versus a data warehouse updated via an overnight extract/transform/load process), I'd probably be more conservative when it comes to indexing tables.

After I'd covered DB2 query parallelism, a session attendee suggested that in a CPU-constrained mainframe DB2 data warehouse system, adding one or more zIIP engines and turning on query parallelism (something that probably wouldn't be activated in a system with little in the way of CPU head room) could provide a double benefit: more cycles to enable beneficial utilization of DB2's query parallelism capability, and a workload (parallelized queries) that could drive utilization of the cost-effective zIIPs. Spot on - couldn't have said it better myself (I wrote about query parallelism and zIIP engines in a comment that I added to a blog entry that I posted last year).

Bernie Spang is a man on a mission - IBM's Director of Strategy and Marketing for InfoSphere and Information Management software wants companies to have trusted information. Too often, people confuse "trusted" with "secure." "Secure" is important, but "trusted," in this context, refers to data that is reliable, complete, and correct - the kind of data on which you could confidently base important decisions. Bernie is out to make IBM's InfoSphere portfolio the go-to solution for organizations wanting to get to a trusted-information environment. There's a lot there: data architecting, discovery, master data management, and data governance are just a few of the capabilities that can be delivered by way of various InfoSphere offerings. It's all about getting a handle on the state of your data assets, rationalizing inconsistencies and discrepancies, and providing an interface that leads to agreed-upon "true" values (and this has plenty to do with integrating formerly siloed data stores). If you want to get your data house in order, there's a way to get that done.

Chris Eaton wants mainframe DB2 people to be at ease with DB2 for LUW lingo - Chris, one of the technical leaders in the DB2 for Linux, UNIX, and Windows development organization at IBM's Toronto Lab, knows that there are some DB2 for LUW concepts and terminologies that are a little confusing to mainframe DB2 folks, and he wants to clear things up. SQL data manipulation language statements are virtually identical across DB2 platforms, but there are some differences in the DBA and systems programming views of things on the mainframe and LUW platforms (largely a reflection of significantly different operating system and file system architectures and interfaces). In a session on DB2 for LUW for mainframe DB2 people, Chris explained plenty. Some examples:

  • A copy of DB2 running on an LUW server is an instance.  A copy of DB2 running on a mainframe server is a subsystem.
  • A DB2 for z/OS subsystem has its own catalog. A DB2 for LUW database, several of which can be associated with a DB2 instance, has its own catalog (and its own transaction log - something else that's identified with a subsystem in a mainframe DB2 environment).
  • So-called installation parameter values are associated with a DB2 subsystem on a mainframe (most of these values are specified in a module known as ZPARM). The bulk of DB2 for LUW installation parameter values are specified at the database level.
  • A DB2 for z/OS thread is analogous to a DB for LUW agent, and a DB2 for LUW thread is analogous to a mainframe DB2 TCB or SRB (i.e., a dispatchable piece of work in the system).
  • A mainframe DB2 data set would be called a file in a DB2 for LUW environment, and a mainframe address space would be referred to as memory on an LUW server.
  • The DB2 for LUW lock list is what mainframe people would call the IRLM component of DB2.
  • The DB2 for LUW command FORCE APPLICATION is analogous to the -CANCEL THREAD command in a DB2 for z/OS environment.

Chris also passed on some hints and tips:

  • Self-tuning memory management (the ability for DB2 to automatically monitor and adjust amounts of memory used for things such as page buffering, package caching, and sorting) works very well on the LUW platform, and Chris recommends use of this feature.
  • Chris favors the use of DMS files (versus SMS) in a DB2 for LUW system, and the use of automatic-storage databases over DMS files for most objects in a DB2 database.
  • Chris is big on the use of administrative views as a means of easily obtaining DB2 for LUW performance and system information using SQL.


Tomorrow is the last day of the conference. More blogging to come.

 

Leave a comment below. If you don't see the Add a Comment button, please log in or register.

120 Views 0 Comments 0 References Permalink Tags: robert_catterall, db2, z/os, luw, iod, information_on_demand, purescale, saas, smart_archive, data_warehouse_performance, query_parallelism, ziip, db2_for_luw_lingo

Another day done at IBM's 2009 Information on Demand Conference - another day of learning more about DB2, and about technologies used at higher levels of the information transformation software stack. Some take-aways from today's sessions follow.

A good DB2 9 for z/OS  migration story- Maria McCoy of the UK Land Registry delivered a very good presentation on her organization's DB2 9 for z/OS migration experience. The Land Registry has one of the world's largest operational (versus decision support) databases, holding almost 40 TB of data. On top of that, the agency recently launched it's first public e-business application, a consequence being that downtime is even less well tolerated than before.

The Land Registry runs DB2 in data sharing mode on a parallel sysplex mainframe cluster. The number of DB2 subsystems across all of the Land Registry's environments (test, development, and production) is about 30.

The DB2 9 migration effort went off well, largely because the Land Registry stays pretty current on system maintenance, with quarterly upgrades of the DB2 service level (Maria confirmed what others have said, indicating that DB2 9 is very stable at the F906 maintenance level and beyond).

For the Land Registry, the primary DB2 9 migration drivers included:

  • XML support
  • Spatial data support (spatial awareness had historically been achieved by way of user-written code)
  • Extensions to online schema changes
  • Further exploitation of 64-bit addressing
  • Improved utility CPU efficiency
  • Indexes on column expressions
  • Real-time statistics (especially the capability of identifying indexes that have gone a long time without being used for data access)
  • Larger index page sizes (offering potentially reduced GETPAGE activity due to a  reduction in the  number of index levels)

 

An important part of the Land Registry's DB2 9 migration planning effort involved identification of third-party tools used with DB2. The agency identified 42 such products, among these being monitors, middleware, compilers, utilities, file management systems, and legacy software.

A dedicated test system proved to be very valuable. The LoadRunner tool was used to drive online transaction test scripts.

Following the migration to DB2 9, the Land Registry converted all existing simple tablespaces to segmented tablespaces (a good idea, as simple tablespaces can no longer be created in a DB2 9 environment). Maria and her colleagues thought that there were no simple tablespaces in their DB2 databases, but it turned out that 41 such tablespaces did exist.

Among the DB2 9 new features put to good use by the Land Registry are the following:

  • Indexes on column expressions (thus was achieved a HUGE decrease in CPU time for a batch job containing a query with a predicate involving a column in a substring function)
  • Clone tables (a table-data-change outage that formerly ran to 5 hours due to time needed to load new data and to inspect the newly loaded data for correctness went to 2 seconds)
  • Rename column
  • Rename index

The DB2 9 migration project went from beginning to end in about 12 months. The Land Registry ran with DB2 9 in Conversion Mode for about 2 months in each of their DB2 environments prior to moving to Enable New Function Mode and then to New Function Mode.

The current Information Management software scene - Arvind Krishna, General Manager of IBM's Information Management software business, spoke during a keynote presentation of the challenges faced by organizations dealing with explosive information growth (an estimated 15 petabytes of new data are generated daily - that's about 50 exabytes per year). He went on to talk about the benefits of "workload-optimized systems" being brought to market now by IBM - systems comprised of fully integrated hardware and software offerings that are optimized for specific workloads. An example of a workload-optimized system is IBM's Smart Analytics system, which provides hardware and a comprehensive software stack (with data management, warehousing, and analytics software) in one package that can be quickly and effectively deployed.

Ross Mauri, General Manager of IBM's Power Systems business (formerly called System p), provided information on the current state of the Power line (currently utilizing generation 6 of IBM's RISC-based microprocessor family, with generation 7 now in beta test mode). Ross said that "Power is everywhere," not only in IBM's Power servers but also in supercomputers, cars, all three of the major electronic game consoles, and the Mars Rover ("we have 100% market share on Mars"). From around 17% market share a few years ago, Power systems now has more than 40% of the market for RISC processor-based servers. Particular strengths of the server line include efficiency ("work per watt," as Ross put it), virtualization, management, and resiliency.

Arvind Krishna closed out the keynote session with remarks that spotlighted IBM's close partnership with SAP (the companies have joint development teams and tens of thousands of mutual customers).

DB2 9 for z/OS native SQL procedures are looking very good - Philip Czachorowski of Fidelity Investments presented information related to his company's early experiences with the native SQL procedures feature of DB2 9 for z/OS (I've blogged a number of times on this technology, beginning with an entry posted late last year). Philip talked about DDL extensions that help with the migration of native SQL procedures from development to test to production environments (statements such as ALTER PROCEDURE ADD VERSION and ALTER PROCEDURE ACTIVATE VERSION), and the new SET CURRENT ROUTINE VERSION statement that can facilitate the testing of a new native SQL procedure (Philip also stressed the importance of having a good naming convention for SQL procedure version identifiers, so you'll know what you're executing when running tests).

Performance data presented during the session was most interesting. Philip showed monitor data for one case in which total class 1 CPU time (from a DB2 monitor accounting report) for a native SQL procedure was only 4% greater than that of a comparable stored procedure written in COBOL.

Near the end of his presentation, Philip mentioned that the DB2_LINE_NUMBER clause of the GET DIAGNOSTICS statement could be very helpful in terms of resolving native SQL procedure code problems.

Stream analytics is way cool - Just before dinner, those of us participating in the IOD Blogger Program had an opportunity to spend an hour with IBMers who are working on the System S "stream analytics" technology on which IBM's InfoSphere Streams offering is based. This is cool stuff: stream analytics software, running under Linux on commodity hardware, can be used to analyze vast amounts of incoming data - often signal data produced by various sensors - to identify events or episodes as they occur, thereby enabling a very rapid response capability. The data could be structured or unstructured, and might consist of hydrophone-captured sounds (picking up, perhaps, the clicking of dolphins), radio astronomy signals, manufacturing data, vehicular traffic activity, weather data, telephone communications, or human-health indicators. Picking up on this latter stream category, a specialist in neonatology who has worked with the IBM System S team spoke of her work involving the monitoring of premature infants' vital signs. An electrocardiogram can generate 500 data signals per second, and there are other vital-sign streams that can be analyzed as well (e.g., blood flow data), and all this can be multiplied by several infants in one area being monitored concurrently (important, as an infection in one child could quickly spread to others). System S stream analytics technology is demonstrating the potential to save lives by taking anomaly detection time from 24 hours (using traditional monitoring methods) to seconds.

The IBM researchers then demonstrated the use of System S stream analytics software to analyze automobile traffic patterns in Stockholm, Sweden (500,000 pieces of GPS data per second).

The scalability of the System S technology is remarkable, the programming interface is surprisingly straightforward (people familiar with object-oriented programming languages tend to become proficient in a couple of weeks), and the GUI is pretty intuitive. Who knows how broadly applicable it might end up being (early adopters are largely in the government and health-care industries, but oil companies are also showing interest)? Watch this space, folks.

That's it for now. Tomorrow morning I'll deliver a presentation on DB2 for z/OS data warehouse performance, and tomorrow evening I'll try to post another blog entry.

 

Leave a comment below. If you don't see the Add a Comment button, please log in or register.

101 Views 0 Comments 0 References Permalink Tags: db2, system_s, stream, z/os, sql_procedures, performance, iod, information_on_demand, smart_analytics, db2_9, migration, robert_catterall

Greetings from Las Vegas. Day one of IBM's 2009 Information on Demand conference was a good one. In this post I'll share with you some of the more interesting items of information I picked up in today's sessions. I'll post at the end of days 2, 3, and 4, as well.

The Big Theme: "Information-Led Transformation" - Ambuj Goyal, General Manager for Business Analytics and Process Optimization in IBM's Software Group, kicked off the Grand Opening session with an overview of the Company's Information Management software strategy. He pointed out that IBM has spent $12 billion on its information-on-demand software stack over the past 4 years: $8 billion on acquisitions (such as Cognos and SPSS) and $4 billion on internal development and related activity. That's some serious money, and it reflects the confidence of IBM's executives that we are on the front end of a major change in the way that organizations manage and leverage their data assets. Ambuj professed that information-led transformation would be even bigger in scope and impact than the enterprise resource planning software wave that got started about 20 years ago.

Companies, said Ambuj, are transitioning from information-focused projects to the information-based enterprise - an operational model characterized by the use of rationalized and trusted data to make timely, effective, and predictive (versus reactive) decisions. Frank Kern, a Senior Vice President in IBM's Global Business Services division, joined Ambuj onstage and continued to underscore the importance of organizations developing a predictive decision-making capability. He described a new service line, Business Analytics and Optimization, that will be delivered by a 4000-strong team of consultants. He also talked about the irony of executives reporting a lack of information needed to make good decisions, even as their organizations are awash in data as never before.

During a panel discussion, several IT executives from IBM customer companies shared their experiences related to the use of advanced analytics software:

  • Shirley Lady of Blue Cross and Blue Shield, a health insurance company, said that "what if" analysis is more important to her organization than ever before, given the major market changes that could result from health care reform in the United States.
  • Nihad Aytaman, of clothing retailer Elie Tahari, talked about the importance of quick (as well as effective) decision making to his company's efforts to successfully "chase the business" in the very fluid world of fashion.
  • Debbie Oshman of Chevron, a global energy company, mentioned that her organization was pursuing information-driven enterprise transformation, after having used information well in a project-by-project way. Process optimization and risk mitigation were described as being two key analytics-driven initiatives underway at Chevron.


Following the panel discussion, Arvind Krishna, IBM's General Manager for Information Management software, talked about new developments in his part of the business: DB2 pureScale (about which I recently blogged), smart archiving, smart analytics, a master information hub, InfoSphere streams, Cognos content analytics, and two recent acquisitions: SPSS and ILOG.

An interesting press conference - I joined journalists, analysts, and fellow bloggers for a press conference featuring several senior IBM executives. Announced during the conference were new analytics applications, enhanced stream computing technology, and new Master Information Hub software (you can view the press release on IBM's Web site).

Steve Mills, Senior Vice President and IBM Software Group Executive, talked about a new information-related transformation in light of transformations past:

  • The PC transformation of the 1980s that enabled personal delivery of information.
  • The Worldwide Web transformation of the 1990s that made incredible levels of connectivity a reality.
  • The process-focused transformation of the past decade, which led to improvements in efficiency and effectiveness.


Going on now, said Mills, is an information-led (not just information-focused) transformation, through which organizations are seeking to understand not only processes, but the environment in which they operate. The urgency of this transformation is prompted by two questions: 1) can your organization move fast enough, and 2) can it move smart enough? Helping to make the transformation possible are historically low costs for units of compute power; human interface improvements, such as dashboards, that enable people to quickly absorb and act on information; and the ability to physically place information capture and analysis technology where it could not be placed before. Mills said that 35,000 IBM people are involved in building the Company's "portfolio of capability" regarding advanced analytics - a portfolio that includes software technology and the know-how to put that technology to work for organizations in all kinds of industries.

The press conference concluded with a question-and-answer session. In responding to questions asked by session attendees:

  • Arvind Krishna said that IBM's Information Management software business had grown at a 14% annual rate over the past three years - in a market that grew at a 6% rate.
  • It was mentioned that over 50 OEM vendors are delivering analytics capabilities via cloud computing systems using IBM's Cognos Express offering.
  • Steve Mills indicated that data governance is increasingly seen by organizations as being a mission-critical competence.
  • Ambuj Goyal said that even as IBM works to be a one-stop-shop provider of the information transformation software stack (software that manages, archives, cleanses, catalogs, integrates, and analyzes data), the Company designs its products to use open standards that make it easier for organizations to use a mix of IBM and third-party products in a stack.
  • It was explained that IBM is delivering software that can be used to analyze unstructured data on the Web (e.g., what customers are saying about your company's products), with an emphasis on combining that information with information generated using in-house data.


DB2 "X" is coming along just fine - The next release of DB2 for z/OS is mostly coded, with activity now focused mainly on testing. Jeff Josten, an Distinguished Engineer on the DB2 for z/OS development team at IBM's Silicon Valley Lab, provided a preview of this coming attraction. A few highlights (CAVEAT: this information is truly of a preview nature - it should not be considered as final until the product is generally available):

  • A further exploitation of 64-bit addressing should dramatically increase the number of threads that can be concurrently active in a DB2 subsystem.
  • DB2 X is expected to reduce the CPU consumption of a typical DB2 workload  by 5-10% as compared to a DB2 Version 9 environment.
  • Native SQL stored procedures might get a performance boost of up to 10-20% versus a Version 9 environment.
  • LOBs (large objects) that can fit onto a page will be in-lined in a base table versus being physically stored in a separate LOB tablespace.
  • Dynamic statement caching will be more effective for SQL statements that contain literal values versus host variables.
  • There will be a conversion path available for changing simple, segmented, and "classic" partitioned tablespaces to universal tablespaces.
  • RUNSTATS will provide an "auto stats" option.
  • Temporal data support will enable DB2 X to be significantly more useful in the management of data that has "effective" dates (e.g., a change to an insurance policy will become effective on such-and-such a date) and/or which is updated of deleted at some time following initial insert into a table (DB2 will maintain a history of such data changes).
  • Building of a tablespace compression dictionary will not require a utility execution.
  • DB2 X will enable data-masking to be specified at the column level.
  • Private protocol will go away (DRDA is much better anyway), and so will the ability to bind a DBRM directly into a plan (these should be bound into packages anyway).

 

Rick Bowers has his priorities in order - IBM's Director of DB2 for z/OS development stated repeatedly during his "trends and directions" presentation that "it's all about the customer." If you're a DB2 user, Rick's in your corner. A few of his comments during the session:

  • Enhancing the capabilities of DB2 for z/OS in data warehouse environments is a big priority for Rick's team.
  • Migration of DB2 for z/OS-using organizations to DB2 9 is proceeding apace.
  • 100% of the top 100 DB2 for z/OS-using organizations are using DB2 Version 8 or beyond, as are 99+% of the top 200.


Got mashups?  They're easier than ever now - IBM gave us blog-folk a preview of two new products that will be formally announced: Version 2 of the IBM Mashup Center, and Cognos 8 Mashup Service. The latter makes it very easy to use Cognos-generated report data in a mashup application, and the former very much simplifies creation, cataloging, discovery, and reuse of mashups (mashups provide a quick and convenient means of combining data from two or more sources, either external or internal to an organization - example sources could be a sales performance report and a CRM system). With these new products (and they don't have to be used together), if you have existing data sources (internal and/or external) you can combine data into useful new representations in very little time and at very little cost. The GUI interface of the Mashup Center is very intuitive (program development skill is not a prerequisite for productive use of the product), and the product's flexibility is impressive: sources can include MQ queues and RSS feeds (among other things - including, of course, Cognos 8 reports via the Cognos 8 Mashup Service), and you can implement security controls that will govern the use of various mashups. Cool!

That's the  wrap-up of my day-one experience at IOD. I'll post day two information tomorrow.

 

Leave a comment below. If you don't see the Add a Comment button, please log in or register.

131 Views 0 Comments 0 References Permalink Tags: ibm, iod, information_on_demand, mashups, db2, z/os, data_warehouse, db2_9, cognos, drda, db2_x, analytics, master_information_hub, robert_catterall