Project Description
Suggested universal solutions extends the capability of standard Structured Query Language (SQL) by adding the Aggregate Product function, portable to any SQL-backed databases.

Entire “pure SQL” solution is encapsulated into a single query, portable to any SQL backed databases, for example, Microsoft Access or SQL server. Several solutions, which differ in terms of their performance-universality metrics, are presented and discussed. (Note: refer to the working DEMO: Access Database file, attached to this project, containing sample data and queries, corresponding to the Listing 1...Listing 6)

1. Key concept and definitions

SQL_Product_Function.jpg
In modern data-driven applications central database typically hosts a great deal of business logic, written in SQL. Pure SQL solutions have many practical advantages: they are native to any Database and as such, they are highly optimized and extremely effective in dealing with regular data sets; they are portable between all major commercially-available databases and are totally independent of the OS/platform. At least but not last: SQL was in place for many decades, and as such, its security vulnerability issues were pretty much identified and properly addressed.

Alternative data-centric solutions, which involve in addition to SQL some other programming language (like C/C++, C#, VB, Java, etc.) could potentially raise the portability, performance, maintainability and application security issues; many of them are platform dependent, some of them are less efficient (especially those which do not produce the native compiled code) and each of them will inevitably add more security concerns and limitations because of their known/unknown security vulnerabilities. It’s also relevant to mention that the application maintenance and upgrade are less difficult when it’s core solution is mostly based on a single highly standardized and portable language such as SQL instead of being programmed with extended set of different (and in many cases – platform dependent) programming languages. Having all these considerations, SQL is in the core of any data-centric application.

Standard SQL contains several aggregate functions (Sum, Count, Min, Max, etc) with pretty noticeable absent of aggregate Product. Just as a reminder, a Product function ∏ of multiple arguments (X1, X2 ,…, XN) is defined as:

N
∏ (Xi) = X1 * X2 …. XN …………………………..…………………............………………….......…………..(1)
i=1

Database engine cannot perform the aggregate product calculation directly, but it can calculate sums. Simple mathematical transforms provide a workaround enabling to compute the product ∏ by using the standard built-in mathematical Log(), Exp() and SQL aggregated Sum() functions; the core technique is illustrated by mathematical formulas (2) and (3):


Log(X1 * X2 …. XN) = Log(X1 ) Log(X2) … + Log(XN) …….………........….………….……(2)

N N
∏(Xi )= Exp ( ∑ (Log(Xi))) ………………………….………………..................................…….…………….…(3)
i=1 i=1

The last formula (3) could be programmatically translated into the equivalent SQL statement in a rather straightforward manner, enabling the calculation of aggregate Product by means of standard built-in SQL features.

There is a certain practical limitation of using formulas (2) and (3): they provide an accurate solution applicable to any set of positive non-zero values (Xi), but this solution can not be applied directly to the negative values and zero (mathematical 0). The issue is rooted in the properties of a Log(Xi) function, which does not provide a real solution for negative numbers, plus, theoretically it should return a negative infinite value for zero input. While negative infinite is a quite common mathematical abstraction, it presents a programming challenge when translating into machine code. In order to extend formula (3) to cover the full range of positive, negative and zero-equal values, couple additional mathematical transforms must apply, namely:

a). All negatives must be converted to positive numbers: standard Abs(Xi) function can do the job;

b). The sign of the final expression (aggregate Product) must be defined. It could be done, for example, by counting of all negatives, then the “+” sign shall be assigned to the final expression for the even and “-“ for the odd count of negatives.

c). The solution should implement a special handling of zeros. Quite ironically, though this case has rather simple logical solution (no computation actually needed: resulting Product must be 0 regardless of other row values) it causes some programmatic overhead, added to the SQL statement. There are couple solutions to this issue, which are discussed in greater details later in the article: one, though rather simple will cause additional methodical error rooted in the difference between “true negative infinite” in a pure mathematical sense and its approximate machine representation. Second solution will provide more accurate result by using the aggregate Min() function to detect zeros and, in case there any – to assign a zero value to the Product function output.

2. Programming Technique: Math-to-SQL translation

Simplified Aggregate Product of all positive numbers: This simple yet practical example will demonstrate the SQL programming technique enabling to calculate the Product of all positive numbers {2, 4, 5, 7, 8} stored in a Microsoft Access Table1. Based on the precondition that there are no any negative values, simple SQL query can do the job of calculating Product (see its corresponding SQL statement shown in the Listing 1): Just run this query to get the accurate result of 2240 (∏ = 24578 = 2240).
LISTING 1
SELECT Exp(Sum(Log([Num]))) AS P FROM Table1

The statement shown in Listing 1 is a direct SQL equivalent of the formula (3); it does not include any exception handling caused by negatives or zeros, so any of such values entered in Table1 will cause the SQL procedure to fail with error message “Invalid procedure call” (the explanation for this issue was already given in a previous discussion dedicated to the property of Log() function). The statement could be modified with IIf() conditional operator added in order to handle zeros (see Listing 2):
LISTING 2
SELECT Exp(Sum(Log(IIf([Num]=0,10^-306,[Num])))) AS P FROM Table1

The logic behind this statement is as follows: any time zero value is found the system instead of trying to calculate the Log(0) will replace zero value with extremely small positive number, for e.g. 10^(-306), close to machine 0. This simple approach has certain deficiency due to methodical error; the aggregate Product output is not exactly 0, moreover the actual result depends on other values in the rows. In many practical cases this error is relatively small and could fit the acceptable error budget. Fore more accurate calculation there is another solution with improved zero-value handling achieved through the use of conditional IIf() operator (see the corresponding SQL statement shown in the Listing 3):
LISTING 3
SELECT Exp(Sum(IIf([Num]=0,0,Log([Num]))))*IIf(Min([Num])=0,0,1) AS P FROM Table1

Additional multiplier IIf(Min(Num)=0,0,1) is added to the SQL statement in order to accurately “nullify” (set to 0) the return value of aggregate Product function in case at least one zero value was detected in any row. The first conditional operator is also modified, excluding 0 values from further processing (otherwise they will be converted to a small number and then passed to a Log() function). This measure allows to increase the overall computational efficiency; performance boost could be quite noticeable in case there are multiple rows in the Table1 containing zeros.

Up to this point all discussed SQL solutions were able to calculate aggregate Product for positive numbers and zero. In the next section the universal solution will be introduced, extending the applicable range to cover negatives as well.

3 Real-life example: investment performance calculation

The problem and resolution. This simple real-life example related to the investment performance calculation will serve both practical and didactical purposes, demonstrating the application of the SQL aggregate Product programming technique. Based on the commonly accepted practice, the compound annual performance (ANNUAL_PFM) of the financial investment portfolio could be calculated on the basis of its monthly performance array PFMi (i=1…12) as following:

ANNUAL_PFM = (1+PFM1) *(1+ PFM2) *…*(1+PFM12) -1 .................................. (4)

Similar approach is applicable to the Quarterly Performance (Q_PFM), Semi-Annual, 3-, 5- and 10 years performance calculation, etc. Programmatically, all this type of compound investment performance calculations could use the same Aggregate Product function and differ only by CROUP BY clause. Following two examples will demonstrate this technique in regards to Annual and Quarterly performance calculation. Sample Table2 in the supplement database contains some hypothetical XYZ portfolio monthly performance data entered for two consequent years, 2004 and 2005 (please notice, that all data in this article and corresponding downloads is shown for demonstration purpose only and does not represent any actual financial information).

Investment performance is typically presented in a percent; here the numbers shown are stored as a relative values, provided that the conversion to percent could be done later on a different layers (mid-tier or presentation). In a business sense, positive numbers reflect monthly capital gain, negative numbers, correspondingly, capital losses. This example is chosen because of one rather important application specific feature: typically, monthly losses for the investment portfolios do not exceed 100%, which mathematically means that (PFMi)=>-1 and correspondingly: (1+PFMi)=>0. It is quite a distant probability, that monthly losses could reach 100%, which means a financial disaster (all invested capital is lost), but please be aware, that theoretically monthly losses could reach or even exceed 100%; for example, in case when investment portfolio contains short positions or some type of exotic high-risk derivates. Anyway, this type of discussion belongs to the rather different subject matter (finance and investment) and goes far beyond the boundary and original intention of this article, which should focused primarily on the programming technique of SQL aggregate Product function. Therefore, based on the pre-condition that (PFMi)=>-1 will have SQL statement as shown in Listing 4.
LISTING 4
SELECT Year, Exp(Sum(IIf([PFM]+1=0,0,Log([PFM]+1))))*IIf(Min([PFM]+1)=0,0,1)-1 AS Annual_PFM FROM Table2 GROUP BY Year;

4. SQL Aggregate Product function: Universal solution

Universal SQL aggregate Product function, capable of handling any positive, negative numbers and 0 (in other words, any real numbers) is shown in the Listing 6;
LISTING 6
SELECT Exp(Sum(IIf(Abs([Num])=0,0,Log(Abs([Num])))))*IIf(Min(Abs([Num]))=0,0,1)*
(1-2*(Sum(IIf([Num]>=0,0,1)) Mod 2)) AS P FROM Table1

This universal solution, encapsulated into this single SQL statement, contains three parts (multipliers). The first and the second are recognizable ones, which already have been discussed early in article (see Listing 3); they are capable of handling positive numbers and zeros. Third multiplier is added to take care of the sign by counting negative numbers, then dividing it by Modulo 2 and subtracting the doubled result value from 1. In simple words, it returns 1 for even count and -1 for odd count of all negatives, if any.

4.1. Special handling of Null

The solutions for aggregate Product function discussed above, including the universal one, do not implement the implicit Null-to-value conversion, so any row containing Null will cause the procedure to fail, displaying the error message in Access database as “Data type mismatch in criteria expression”. This issue could be easily resolved by adding Null handling procedure, but the resolution could be application specific. Variable scenarios are listed below.

a). Restrict rows from having Null values; for e.g., by adding constrains to the Table schema definition and/or Field level Validation object.

b). Add Null-to-Zero conversion: NZ(Num) function can do the job. In this case any row containing Null will cause the aggregate Product function to return 0 regardless of the value in other rows.

c). Add Null-to-1 conversion; for example, use the expression: IIF (Num Is Null, 1, Num). In this case the aggregate Product function will return the product of all Non-Null row values.

4.2. Performance-Universality consideration

In a context of this section performance means a speed of computation; not to be confused with investment performance discussed above. As the programmatic complexity of the universal solution (Listing 6) is higher than the simplified versions (Listing 1 and 3), then the performance of the universal SQL query will be lower comparing to that simplified ones. This is a classical example of performance-universality dilemma; decision should be made based on the compromise between these two competing metrics when choosing a solution to be adequate to the particular application.

Summary

Suggested approach extends the capability of standard SQL by adding aggregate Product function. The whole solution is encapsulated into a single query, portable to many other SQL backed RDMB, for example, Microsoft SQL server; some syntactical differences could apply. Three versions, which differ in terms of their performance-universality matrix, are presented and discussed in details.
OTHER ARTICLES BY AUTHOR:
  1. Embedded YouTube™ video player
  2. WebTV pilot project

SQL | T-SQL | Product Function | Product Function SQL | Microsoft Access | Database | Query | Aggregate Function | Group By | Log | Exp | SQL Server | RDBM | BI | Data Layer| Business Logic |

Last edited Feb 18, 2015 at 2:18 AM by DrABELL, version 5