July 2007 - Posts

Red color for negative numbers in Analysis Services 2005

Sometimes there is a simple way to solve an issue. For example, if you want to color all measures (including calculated measures) in red when the value is negative, you can simply write:

SCOPE( Measures.AllMembers );
    FORE_COLOR(THIS) = IIF( Measures.CurrentMember < 0, 255, 0);  // 255 = RED
END SCOPE;

 

MDX Script is so better than classical properties for each measure in these cases...




Cross-posted from SQLBlog! - http://www.sqlblog.com


The &amp;amp;amp;quot;by design&amp;amp;amp;quot; abuse

In the last months, I got the infamous "it's by design" answer for many bugs/issues/irrational behaviors I posted to Microsoft. For most of them I used the Connect web site, in a couple of cases I opened a formal incident to Microsoft support.

There are a lot of story (in and outside Microsoft) about the "it's by design" excuse and most of them are myths and urban legends. Unfortunately, today I received a notification about a bug closure with the "by design" comment that has the consequence to invalidate Microsoft documentation (ok, a small part of...).

The bug is related to SQL Server. One customer wrote me about a strange issue of the ODBC international date constant syntax {d 'yyyy-mm-dd'}. I never used that, since I prefer the yyyymmdd syntax, but Microsoft effectively suggests using this in the "Writing International Transact-SQL Statements" of SQL Server 2005 Books Online.

As a consultant, I was a little bit shamed because I never used it and I wasn't aware of the issues raised by changing the language settings (you know, this syntax should be used just because it's language neutral...). I double checked the whole thing (I always assume it's my fault before suspecting it's a product bug...) and then I posted this bug on the Connect web site. For the lazy of you, this is the repro code.


1:  USE tempdb
2:  GO
3:  
IF NOT EXISTS (SELECT * FROM sysobjects where name = 'ProblemDate') BEGIN CREATE TABLE [dbo].[ProblemDate](
4:  
[DateA] [datetime] NOT NULL ,
5:  
[DateB] [datetime] NOT NULL
6:  )
7:  
END
8:  
GO
9:  
IF NOT EXISTS (SELECT * FROM ProblemDate) BEGIN INSERT INTO [ProblemDate] ([DateA],[DateB])
10:  
VALUES ({ d '2004-09-15'} ,{ d '2004-03-30'}) END
11:  GO
12:  
SET language english
13:  go
14:  
-- The select run correctly
15:  
SELECT CASE WHEN ProblemDate.DateA > {d '2006-11-21'} THEN 'y' ELSE 'x' END
16:  FROM
ProblemDate
17:  
WHERE ProblemDate.DateB={d '2004-03-30'}
18:  GO
19:  
SET language italian
20:  GO

21:  
-- The same select, got an error
22:  
SELECT CASE WHEN ProblemDate.DateA > {d '2006-11-21'} THEN 'y' ELSE 'x' END
23:  FROM
ProblemDate
24:  
WHERE ProblemDate.DateB={d '2004-03-30'}
25:  GO

If you were able to say that line 22 will generate a date conversion error before running the code, I'd be ready to offer you a pizza everywhere you are in the world.

Now, it seems that I need to prepare a delivery of pizzas to Redmond, since the bug has been closed as "By Design". Unfortunately, there are no comments to explain such a decision.

I can make several hypotheses:

  • The bug has been fixed in a future release and someone pressed the wrong button closing the bug. That's the optimistic one.
  • The bug has been considered not a bug because I need to pay some pizzas to someone.
    • In that case, I will claim the pizzas from the documentation team, which has misguided those few people that RTFM :-)
  • The bug has been closed by someone that read one comment to the bug that pointed out a similar issue on the 'yyyy-mm-dd' syntax - but it is not a good explanation of the behavior that differs from documentation of the product.

I don't know. However, the issue I described is not the real problem; it's just an example to illustrate what happens on the Connect web site. In this case, the answer "by design" has been posted 3 month after the bug submission. Sometime I haven't seen an answer for many more months. However, it really depends on the team, there are other guys who typically answer in a few days (or some week in the worse case).

I really don't like writing this kind of posts, but I know that many people in Microsoft monitor blogs like mine and this is a message in a bottle for them. Guys, the whole Connect site isn't very useful if you will continue to answer with "by design" / "we don't have resource now" / "it's not on our plan for the next release (!)". The last one is my nightmare, because it means that this necessary feature/bug/whatever will be implemented/corrected in a production code that won't be released until 2011 (assuming that the 3 years gap between SQL Server releases will be maintained).

Posting on the Connect site has a time cost. To me (writing) and to Microsoft (reading and -sometime- answering). I typically post on Connect only after a search in newsgroups and web sites, only when I'm able to describe a repro of the issue. Until today, I submitted 44 comments (bugs and suggestions) for SQL Server 2005 - I'm not the first, but considering an average of 30 minute per post (and I'm not calculating the time to discover the bug of to recognize the need for a feature/change) it means 22 hours of work. Most of my posts are still active. Many don't have an answer, but I know that Katmai should have improvements covering some of my suggestions. A better update of these submissions would be appreciated...

I think that the Connect web site is a great resource, but it's not comparable to the kind of support that is offered by some ISV (I use other third-party tools related to SQL Server and .NET programming and they are years-light more responsive). I know, the small size of a company might help in being very efficient in customer support, but I can't believe that a company like Microsoft is not able to improve the Connect efficiency. In the last 7 days there were 48 new bugs and 31 new suggestions for the whole SQL Server platform (which includes SSIS, SSAS, SSRS, SSNS, SSBS). Are these numbers big? I don't know. I would say "no, they are not so big for a company like Microsoft".

Finally... I know, probably there are other priorities. But, in this case, please don't abuse the "by design" way to fix your bugs.




Cross-posted from SQLBlog! - http://www.sqlblog.com


Beware changing the Attribute Key

BI Developer Studio does a lot of automatic changes when you change something. Each dimension is stored in a separate file, but many of the dimension informations are copied into cube files. When you modify a dimension which is already used by one or more cubes in the same project, the editor automatically updates many of these informations.

Each automatic behavior has a cost. In this case, the issue is when you want to change the Attribute Key of a dimension. In reality it shouldn't be a common practice, but if you want to hide surrogate keys from the end users (see this previous post) then you might go in the wrong direction. After many tests, the most confortable way to do this is the following (we assume a Product dimension with an attribute key named Product):

  1. Rename the current Attribute Key, giving an attribute name which will be hidden to end user (i.e. ProductKey)
  2. Create the new Attribute with the desired application key (drag-and-drop it - suppose we have a ProductCode field).
  3. Rename the created Attribute (i.e. ProductCode -> Product)
  4. (optional) if the ProductKey attribute has a NameColumn defined, copy this definition in the new Product attribute (for the NameColumn property)
  5. (optional) set the NameColumn property of ProductKey attribute to None
  6. Move all the attribute relationships from ProductKey attribute to Product attribute, except the relationship between ProductKey and Product
  7. Change the Cardinality property of the Product relationship (under ProductKey attribute) to One (if your application key has the same cardinality as the surrogate key - this is not true in Slowly Changing Dimensions, for example)

At this point the attribute key is not changed - you only renamed it, but its internal key is the same and all references from the cube are still valid. If you try other routes (for example, I tried creating a new attribute for the surrogate key only, assigning to it the Attribute Key role in Usage property), related cubes might require a manual correction.

I'll be happy to get feedback on better ways to get this job done.




Cross-posted from SQLBlog! - http://www.sqlblog.com


DMVstats released

SQLCAT (Customer Advisory Team) has announced the release of DMVStats. It produces a database (calling it data warehouse is very ambitious) that contains snapshots caught through Dynamic Management Views (DMV) of SQL Server 2005. You can see database behavior from a new and useful perspective. It's an interesting idea and these kinds of applications are a good change to use many of the "real-time" features of Analysis Services 2005.

The next steps should be:

  1. An Analysis Services cube to analyze data
  2. A Data Mining analysis to analyze relationship between events

The second one might be an utopia today... but this is the first necessary step in that direction.




Cross-posted from SQLBlog! - http://www.sqlblog.com


Strange behavior with KEY0, KEY1, KEYn in SSAS

I just found that this query has different behavior (with breaking differences in result) with different builds of Analysis Services 2005.

This is the query (I used a bitmap because this query throws an error when I try to put it in text in Community Server blog - if someone want to check this other issue...):

With Analysis Services 2005 SP1 (build 9.00.2047) we get the expected result:

 

When we move to Analysis Services 2005 SP2 (I tested with builds 9.00.3050, 9.00.3054, 9.00.3152) you get a different number of columns: the KEY0 keyword returns one column for each field that is part of a composite key, instead of only the required part.

The good news is that the build 9.00.3175 (that you can obtain asking to PSS) returns to the right behavior of build 9.00.2047.

I looked for other information about this bug on the web but I haven't found anything. This feedback seems very similar but it still doesn't work with build 9.00.3175, thus it isn't the same.

UPDATE: this issue is described in KB938585 that is part of KB936305 (that refers to build 3175). Thanks to Andrea for pointing me to the existing articles.

Long life to TableDifference

It seems that Microsoft will not improve the slow SCD component that is part of SSIS (at least, this is what I understand from this Microsoft Connect answer). This means that TableDifference has a long life because it will be useful even in SQL Server 2008.

The versione 1.2 beta of TableDifference still needs some test because there is a latent bug on blob fields processing (in some multithreading scenarios).




Cross-posted from SQLBlog! - http://www.sqlblog.com


Surrogate key issues with Analysis Services

Usually I don't post something just to raise a question - but every rule has its exceptions.

As you know, using surrogate keys is a best practice for a lot of reasons. Everything works fine with SSAS when you use surrogate keys with a plain star schema. If you have an incremental process of the dimensions in the Data Warehouse, you are also granted that surrogate keys don't change their internal meaning during time. But, are you sure you want to use surrogate keys in end user reporting tool?

The problem arises when you have a daily reprocess of the whole OLAP database and of the whole relational data mart (or data warehouse, if you prefer). A complete reprocess is not so strange if you have less than 10 GB of data. I've seen the same technique applied to data mart with tens of millions of rows in the fact table and 20/30 GB of data (the process can be 3/4 hours each night).

Each dimension that makes the granularity attribute visible to end users, also exposes the surrogate key. Even if you change the Name property of the attribute, its Key still maintains the surrogate key. When the end user creates a pivot table with Excel 2007 or a report with ProClarity, he never see the surrogate key, but he save a document that contains that key and that will use this key to query the cube next time.

The user might open a query saved some days/months ago and... He doesn't see the same members, just because the same surrogate keys got a different meaning. If surrogate key are generated each night, this happens daily. Even with an incremental data mart you might have the need to reprocess (on the relational side) one or more dimensions in the future.

Now, the issue is both practical and philosophical. From the practical point of view, the user rants he lost his reports. From the philosophicalside, when you save a surrogate key in a report, you are giving a semantic value to something that probably shouldn't have one.

With SSAS and pivot tables this is a practical issue that may be approached in several ways. I'm studying what is the better balance between usability, maintainability and performance. But I'd like to get some comments about this issue that probably is very common. I did some search, but I haven't found a good discussion focused on the issues related to Analysis Services client tools.

Give me your feedback!




Cross-posted from SQLBlog! - http://www.sqlblog.com


The clustered index, the bulk insert and the sort operation

A few months ago I wrote about a SSIS setting you have to use just to get better performance when you (fast) load data into a table with a clustered key. I and Alberto met Stefano Stefani at SQL Server Conference 2007 (an Italian conference where we were all speakers) and we got the most direct Microsoft support service you can imagine, provided in our own native language, by one of the guy who writes the query optimizer.

The story is: when you have a clustered index, a bulk insert operation is really fast only when you insert data sorted with the same order as the clustered index expression. The same is true for fast load operations with SSIS (and DTS too). We noted that with an identity column used as clustered index, you don't get really fast insert. In this case, SQL Server doesn't realize that data are already sorted and it sorts them one more time.

Alberto explains very well this behavior in his new post. The bad news is that today you have to rely on some trick and/or workaround, as the dummy ORDER BY operation showed by Alberto. The good news is that a future CTP of Katmai will work without requiring these tricks.




Cross-posted from SQLBlog! - http://www.sqlblog.com