I had this issue at the client.
Imagine that you have to model a datwarehouse for an hotel group.
In the operationnal system, you get entries like this:
|1/01/2010||10/01/2010||Ralph Kimball||1800||9 days|
|3/06/2012||15/07/2012||Jose garcia||20976||44 days|
My first error was to store the data like in the operationnal system and when the client asked me, statistical on the room occupation by month, I was not able to found the correct value because I had only 2 links with the dimension date that were From_Date_ID and To_Date_ID.
The customer ‘Ralph kimball’ is not causing any problem for statistics by month but customer ‘Jose garcia’ is :
I have to calculate the occupation rate by month:
- For Customer Ralph Kimball there is no problem, it’s 9 for January
- For Customer José Garcia, I have to split the period like this : 30 days in June and 7 days in July! And as you see in the table bellow, I only have one value for the journey that is 44 and 2 dates linked to it (from and to). So If I Filter on from date for january, I will get 44 and same for To date!!!! and of course, that’s not what we want.
So As I don’t have the split by month, except by doing very complicated calulations, it’s not possible to get the correct result. It means that if the customer stays a year, the amount of days he stays will be summed to month (from–> begin of the period) or to the month (to –> end of the period).
The solution of this architecture issue is to change the granularity of your fact table and not to store periods in it but a split of each journey by day.
Then you will have as many rows in your data warehouse as days in each journey.
The query of the split will be something like this:
select From, Date, To, Name , Amount
from journey inner join
Date_dimension on From <= Date and Date >= To
This will be the result for the first journey of the person ‘Ralph Kimball’: