CodePlexProject Hosting for Open Source Software

Suggested universal solutions extends the capability of standard Structured Query Language (SQL) by adding the Aggregate Product function, portable to any SQL-backed databases.

Entire “pure SQL” solution is encapsulated into a single query, portable to any SQL backed databases, for example, Microsoft Access or SQL server. Several solutions, which differ in terms of their performance-universality metrics, are presented and discussed. (Note:

In modern data-driven applications central database typically hosts a great deal of business logic, written in SQL. Pure SQL solutions have many practical advantages: they are native to any Database and as such, they are highly optimized and extremely effective in dealing with regular data sets; they are portable between all major commercially-available databases and are totally independent of the OS/platform. At least but not last: SQL was in place for many decades, and as such, its security vulnerability issues were pretty much identified and properly addressed.

Alternative data-centric solutions, which involve in addition to SQL some other programming language (like C/C++, C#, VB, Java, etc.) could potentially raise the portability, performance, maintainability and application security issues; many of them are platform dependent, some of them are less efficient (especially those which do not produce the native compiled code) and each of them will inevitably add more security concerns and limitations because of their known/unknown security vulnerabilities. It’s also relevant to mention that the application maintenance and upgrade are less difficult when it’s core solution is mostly based on a single highly standardized and portable language such as SQL instead of being programmed with extended set of different (and in many cases – platform dependent) programming languages. Having all these considerations, SQL is in the core of any data-centric application.

Standard SQL contains several aggregate functions (Sum, Count, Min, Max, etc) with pretty noticeable absent of aggregate Product. Just as a reminder, a Product function ∏ of multiple arguments (X1, X2 ,…, XN) is defined as:

N

∏ (Xi) = X1 * X2

i=1

Database engine cannot perform the aggregate product calculation directly, but it can calculate sums. Simple mathematical transforms provide a workaround enabling to compute the product ∏ by using the standard built-in mathematical Log(), Exp() and SQL aggregated Sum() functions; the core technique is illustrated by mathematical formulas (2) and (3):

Log(X1 * X2

N N

∏(Xi )= Exp ( ∑ (Log(Xi))) ………………………….………………..................................…….…………….…(3)

i=1 i=1

The last formula (3) could be programmatically translated into the equivalent SQL statement in a rather straightforward manner, enabling the calculation of aggregate Product by means of standard built-in SQL features.

There is a certain practical limitation of using formulas (2) and (3): they provide an accurate solution applicable to any set of positive non-zero values (Xi), but this solution can not be applied directly to the negative values and zero (mathematical 0). The issue is rooted in the properties of a Log(Xi) function, which does not provide a real solution for negative numbers, plus, theoretically it should return a negative infinite value for zero input. While negative infinite is a quite common mathematical abstraction, it presents a programming challenge when translating into machine code. In order to extend formula (3) to cover the full range of positive, negative and zero-equal values, couple additional mathematical transforms must apply, namely:

a). All negatives must be converted to positive numbers: standard Abs(Xi) function can do the job;

b). The sign of the final expression (aggregate Product) must be defined. It could be done, for example, by counting of all negatives, then the “+” sign shall be assigned to the final expression for the even and “-“ for the odd count of negatives.

c). The solution should implement a special handling of zeros. Quite ironically, though this case has rather simple logical solution (no computation actually needed: resulting Product must be 0 regardless of other row values) it causes some programmatic overhead, added to the SQL statement. There are couple solutions to this issue, which are discussed in greater details later in the article: one, though rather simple will cause additional methodical error rooted in the difference between “true negative infinite” in a pure mathematical sense and its approximate machine representation. Second solution will provide more accurate result by using the aggregate Min() function to detect zeros and, in case there any – to assign a zero value to the Product function output.

SELECT Exp(Sum(Log([Num]))) AS P FROM Table1

The statement shown in Listing 1 is a direct SQL equivalent of the formula (3); it does not include any exception handling caused by negatives or zeros, so any of such values entered in Table1 will cause the SQL procedure to fail with error message “Invalid procedure call” (the explanation for this issue was already given in a previous discussion dedicated to the property of Log() function). The statement could be modified with IIf() conditional operator added in order to handle zeros (see Listing 2):

SELECT Exp(Sum(Log(IIf([Num]=0,10^-306,[Num])))) AS P FROM Table1

The logic behind this statement is as follows: any time zero value is found the system instead of trying to calculate the Log(0) will replace zero value with extremely small positive number, for e.g. 10^(-306), close to machine 0. This simple approach has certain deficiency due to methodical error; the aggregate Product output is not exactly 0, moreover the actual result depends on other values in the rows. In many practical cases this error is relatively small and could fit the acceptable error budget. Fore more accurate calculation there is another solution with improved zero-value handling achieved through the use of conditional IIf() operator (see the corresponding SQL statement shown in the Listing 3):

SELECT Exp(Sum(IIf([Num]=0,0,Log([Num]))))*IIf(Min([Num])=0,0,1) AS P FROM Table1

Additional multiplier IIf(Min(Num)=0,0,1) is added to the SQL statement in order to accurately “nullify” (set to 0) the return value of aggregate Product function in case at least one zero value was detected in any row. The first conditional operator is also modified, excluding 0 values from further processing (otherwise they will be converted to a small number and then passed to a Log() function). This measure allows to increase the overall computational efficiency; performance boost could be quite noticeable in case there are multiple rows in the Table1 containing zeros.

Up to this point all discussed SQL solutions were able to calculate aggregate Product for positive numbers and zero. In the next section the universal solution will be introduced, extending the applicable range to cover negatives as well.

ANNUAL_PFM = (1+PFM1) *(1+ PFM2) *…*(1+PFM12) -1 .................................. (4)

Similar approach is applicable to the Quarterly Performance (Q_PFM), Semi-Annual, 3-, 5- and 10 years performance calculation, etc. Programmatically, all this type of compound investment performance calculations could use the same Aggregate Product function and differ only by CROUP BY clause. Following two examples will demonstrate this technique in regards to Annual and Quarterly performance calculation. Sample Table2 in the supplement database contains some hypothetical XYZ portfolio monthly performance data entered for two consequent years, 2004 and 2005 (please notice, that all data in this article and corresponding downloads is shown for demonstration purpose only and does not represent any actual financial information).

Investment performance is typically presented in a percent; here the numbers shown are stored as a relative values, provided that the conversion to percent could be done later on a different layers (mid-tier or presentation). In a business sense, positive numbers reflect monthly capital gain, negative numbers, correspondingly, capital losses. This example is chosen because of one rather important application specific feature: typically, monthly losses for the investment portfolios do not exceed 100%, which mathematically means that (PFMi)=>-1 and correspondingly: (1+PFMi)=>0. It is quite a distant probability, that monthly losses could reach 100%, which means a financial disaster (all invested capital is lost), but please be aware, that theoretically monthly losses could reach or even exceed 100%; for example, in case when investment portfolio contains short positions or some type of exotic high-risk derivates. Anyway, this type of discussion belongs to the rather different subject matter (finance and investment) and goes far beyond the boundary and original intention of this article, which should focused primarily on the programming technique of SQL aggregate Product function. Therefore, based on the pre-condition that (PFMi)=>-1 will have SQL statement as shown in Listing 4.

SELECT Year, Exp(Sum(IIf([PFM]+1=0,0,Log([PFM]+1))))*IIf(Min([PFM]+1)=0,0,1)-1 AS Annual_PFM FROM Table2 GROUP BY Year;

SELECT Exp(Sum(IIf(Abs([Num])=0,0,Log(Abs([Num])))))*IIf(Min(Abs([Num]))=0,0,1)* (1-2*(Sum(IIf([Num]>=0,0,1)) Mod 2)) AS P FROM Table1

This universal solution, encapsulated into this single SQL statement, contains three parts (multipliers). The first and the second are recognizable ones, which already have been discussed early in article (see Listing 3); they are capable of handling positive numbers and zeros. Third multiplier is added to take care of the sign by counting negative numbers, then dividing it by Modulo 2 and subtracting the doubled result value from 1. In simple words, it returns 1 for even count and -1 for odd count of all negatives, if any.

a). Restrict rows from having Null values; for e.g., by adding constrains to the Table schema definition and/or Field level Validation object.

b). Add Null-to-Zero conversion: NZ(Num) function can do the job. In this case any row containing Null will cause the aggregate Product function to return 0 regardless of the value in other rows.

c). Add Null-to-1 conversion; for example, use the expression: IIF (Num Is Null, 1, Num). In this case the aggregate Product function will return the product of all Non-Null row values.

SQL | T-SQL | Product Function | Product Function SQL | Microsoft Access | Database | Query | Aggregate Function | Group By | Log | Exp | SQL Server | RDBM | BI | Data Layer| Business Logic |

Last edited Feb 18, 2015 at 1:18 AM by DrABELL, version 5