Friday, June 28, 2013

OLAP with Mondrian

If you ever wanted to do OLAP analysis, first thing that would needed to be done is selection of the tools. There are several good tools out there but I would like to write a little bit about Mondrian OLAP as it comes in form of open source community edition and commercially supported project.

Pentaho Analysis Services Community Edition also known as Mondrian, is an Online Analytical Processing (OLAP) server that enables business users to analyze large quantities of data in real-time. The need to analyze the data comes from data feeds that are detrimental for successful planning and strategies when making decisions about past, present and future sales/expenditures. A great introduction to Mondrian can be found on Pentaho's site so I will skip some of the explanation and get to the point of describing why I liked this product.

To begin with, Mondrian concept is based on ROLAP which means that it draws its data from relational database and it uses relational model for its operations. A good comparison between different concepts can be found here (MOLAP, ROLAP, And HOLAP). To draw on one segment of description, ROLAP methodology relies on manipulating the data stored in the relational database to give the appearance of traditional OLAP's slicing and dicing functionality. In essence, each action of slicing and dicing is equivalent to adding a "WHERE" clause in the SQL statement.

A few other concepts that you might want to understand before using Mondrian are:

  1. You will use MDX query language to work with your data. Multidimensional Expressions (MDX) is a query language for OLAP databases, much like SQL is a query language for relational databases. It is also a calculation language, with syntax similar to spreadsheet formulas. You can find a few online tutorials to learn about it and if you have tools like JPivot or Pivot4J, you will only need a starting MDX query where olap4j and JPivot will build up queries based on your navigation through UI.
  2. olap4j is a common API for any OLAP server, so you can write an analytic application on one server and easily switch it to another. Built on that API, there is a growing collection of tools and components. If you ever wanted to write a custom API do work directly with Mondrian, you would use this library.
  3. Mondrian schema is a single XML file where you will create OLAP constructs like Cubes or Dimensions. You will load the schema when connecting to Mondrian and you will write your MDX queries to be executed against this schema.
  4. Mondrian comes with two tools, Mondrian Schema Workbench, that you will use to build schema, and Aggregation Designer, that you will use to create aggregate tables based on your MDX query.
  5. There are two front ends for Mondrian, JPivot (older, fast and more stable) and Pivot4J (newer, with JSF, but slower).
Edit: I did view Pivot4J and Analytics UI as one and that would not be completely fair to say that Pivot4J core is slower than JPivot as it is comprised from two parts. Anyway, many thanks to Xavier who is lead on the Pivot4J project for explaining the difference and for quickly addressing some of the issues that I had. I feel that Pivot4J will be great product considering that it started less than a year ago.

Schema Workbench JPivot Table Drill Through Table

Now that we understand some basic concepts, here are a couple of pointers on how to start building your OLAP solution.

First thing is that you will probably want to create a separate web project just for OLAP purpose as depending on your facts table size, you might need a lot of memory. Mixing Mondrian with your application can unnecessarily slow things down. Mondrian (used with Java 5 and up) has the ability to report high memory consumption and gracefully shut down the query before exception is thrown (well, exception is still thrown but not OutOfMemoryError exception).

You can create link back to the OLAP servers and integrate them using IFrames or popups, for example (making sure that you have SSO, of course). If you are using JPivot I would not worry too much about integration with JSF as it comes with WCF (Web Component Framework). I found it a bit difficult to integrate with JSF anyway and hence the need for new project Pivot4j with full JSF support.

Next item on the list is creating a solid fact table. Mondrian schema will use one fact table per Cube. Designing a good fact table means creating proper foreign keys and measure columns and assigning adequate indexes. It also means thinking about implementation and implementing, for example, Star or Snowflake schema. Generally, having too many branches of tables will be more complex to work with (two tables per dimension should be good enough, and if you have more, just create views as needed) as modeling in current workbench will require you to work directly in XML (more than one join). Fact table should have basic dimensions where you will then create hierarchies and levels. Mondrian schema is pretty flexible, so it can work with tables and views, and allows for many complex constructs, but for maintenance purpose, aggregating data to appropriate level will prove easier to design and faster to use. Aggregation Designer is used for that purpose. For OLAP, speed and flexibility are important concepts and we should try to optimize things as much as possible.

If we take FoodMart example, we can analyze a schema and potential usage of data. So, lets examine Sales Cube for example. If we open FoodMart.xml in Schema Workbench we can then expand the Cube Sales node.

We will find following sub-nodes:

  1. Fact Table - with several aggregate commands defined. This will shape the fact table to desired cube. 
  2. Dimension Usage - these are dimensions that you will share between cubes and this tag allows you to construct dimensions outside of the cube and reference it in the cube. This increases re-usability.
  3. Measures - like sales, cost etc. These are going to be numbers that you will work with. This will give you different perspective on performance using different hierarchies/dimensions combinations.
  4. Calculated Memebers - these do not exist in the fact table but are calculated from existing measures.

If we get into one of the dimensions (e.g. Promotions), we will find that each dimension has link to fact table using a foreign key column. Usually, dimension will have only one hierarchy unless we can link directly to two or more tables from the fact table foreign key columns. Please note that we can also create a dimension from the fact table if more data is stored in the fact table (other than measures and keys). Each hierarchy has a primary key that links this hierarchy with a dimension. Dimension foreign key matches hierarchy table primary key. Each hierarchy must have a table, view or a join. Each join can be nested e.g. (T-J(T-J(T-T))). Each hierarchy has to have one or more levels. Each level can have property (like employee id, has name which is property). Each level represents one column (unless it has properties which can expand this). You will need to specify a column type. When using joins, you can create aliases for easier navigation. For usage on many other XML elements, you can check this page.

Sample MDX:
select {[Measures].[Unit Sales], [Measures].[Store Cost], [Measures].[Store Sales]} ON COLUMNS,
  Hierarchize({([Promotion Media].[All Media], [Product].[All Products])}) ON ROWS
from [Sales]
where [Time].[1997]

When starting to work with a schema, you will be able to put dimensions on columns and rows in addition to the measures. Once you start drilling into dimensions, a different breakdowns will appear as a combination of different levels and hierarchies. With each level you can get different measures. You will be able to change positions on rows and columns until you are happy with the data presented.

Designing meaningful information for your business is a key to success. You should be able to use this information to better understand current sales and predict future trends. Once you are at acceptable level you can drill down into data where Mondrian will display appropriate detailed information based on selected levels and slices (time in this example). This would be a report level. To achieve this level of flexibility without an OLAP, you would need to build many different reports.

Mondrian also has the ability to cache the data and this cache is maintained per connection. There are many options that you can set using mondrian.properties file which can greatly improve the performance.

I hope that this gives you a brief overview of what you can accomplish with the Mondrian. If you are looking for a more detailed explanation, please reference Pentaho documentation as I found it very extensive and helpful.

If you have any suggestions, please send me a note.

Regards

Friday, April 19, 2013

Inserting data into DB2 from CSV file (using Ruby)

I recently had to import some data into DB2 and I came up with this little program to help me out. I had to search the Internet for a few hints and finally came up with this code that was useful to me. I hope that it will help anyone that needs the same thing. I am pretty sure that this can be accomplished even easier, so if you have a suggestion, please let me know.

require 'rubygems'
require 'mswin32/ibm_db'
require 'csv'

db = true
csv_text = IO.read("input.csv").encode('iso-8859-1','utf-16', :invalid=>:replace, :undef => :replace, :replace => '')
csv_text = csv_text.gsub(/'/,"''")
csv = CSV.parse(csv_text, :headers => true)
statements = []
csv.each do |row|
  map = row.to_hash
  k = map.keys.join(",")
  v = map.values.join("','")
  statements << "insert into sch.tab(#{k}) values('#{v}');".gsub(/''/,'null')
end

if db
  conn = IBM_DB.connect("DRIVER={IBM DB2 ODBC DRIVER};DATABASE=db2;\
                       HOSTNAME=localhost;PORT=50000;PROTOCOL=TCPIP;\
                       UID=user;PWD=password;", "", "")
                       if conn
                         puts "We're connected!"
                         data = ""
                         begin
                           statements.each do |row|
                             data = row
                             run = IBM_DB.prepare(conn,row)
                             IBM_DB.execute(run)
                           end
                           IBM_DB.commit(conn)
                         rescue Exception => msg  
                           puts "Inserting #{data} with error #{msg}"
                           IBM_DB.rollback(conn)
                         end  
                         IBM_DB.close(conn)
                         puts "Done!"
                       else
                         puts "There was an error in the connection: #{IBM_DB.conn_errormsg}"
                       end
else
  puts statements
end 

Before you use this you will need to install appropriate gems.

gem install activerecord
gem install ibm_db

If you are behind the proxy, do not forget to set it: SET HTTP_PROXY=http://user:password@ip-address:port

Thanks

Monday, March 25, 2013

IBM BPM 8


IBM has acquired Lombardi Software in January 2010 and integrated their excellent BPM into a new solution called IBM BPM. IBM BPM comes in three variants: Express, Standard and Advanced. They are different by licensing and products that are included into them. Express comes with limited users and deployment options with Lombardi part only, Standard comes with Lombardi part of BPM and Advanced are integrated with Lombardi and WPS that has advanced service integration, BPEL and ESB.

If the company already has services exposed directly or by already existing ESB, Standard version of the BPM should be sufficient.

BPM standard comes with WAS 8 and can be easily installed. BPM cannot be downloaded and has to be procured from IBM support even for the testing purposes. One additional item is that DB2 Express is needed to be installed and databases created before the installation. BPM needs three databases for start: BPMDB (Process Server database), CMNDB (Common database for Business Process Manager Advanced), PDWDB (Performance Data Warehouse database). You can rename them during the installation. Here is the link  with the instructions for setting this up quickly.

After the installation, user will be able to connect to different consoles. First item would be to connect to Process Center Console and download the Process Designer. This is Eclipse based tool that will help you to design your processes. Please note that when installing BPM, you need to specify the address that is not localhost as other developers will not be able to connect to the computer to download Processes Center and later connect to your process repository. Since you will pay for every component of BPM, this is important.

These are the consoles that would be available after the installation:

  1. Process Center Console (this console has similar layout as Designer but you cannot design a process here, it is used only to view and deploy process apps)
  2. Process Admin Console  (in this console you can set users, groups and use process monitoring)
  3. Process Portal Console (this is user related console and in here you can see yours or your team reports and process user tasks)
  4. REST API integrated test console (used for testing purposes and very useful during the application development)

Once connected, you will be able to start building your first process.


IBM BPM supports following constructs in Process Designer (not all of them are listed here):

  • Business Objects (You can use this item to pass your data around BPM and also use it in input/output variables)
  • BPD's (This is your business process definition)
  • Coach View (If you use Coaches for working with user tasks, this is where you will develop UI)
  • Decision Service (Used as a flow for the business rules)
  • Event Subscription (Used when registering to events in other repositories outside of BPM. You can then trigger a service in BPM environment and process application)
  • Exposed Process Value (Used when you need to expose values outside of you process and not only as input/output)
  • External implementation (Used to connect to external system in more generic way, but usually using WS calls)
  • General System Service (This will be used a lot as a bridge between different constructs listed in here. It is important to say that this is one of the ways to use UCAs)
  • Human Service (Used to define Coaches flow and link different screens created with Coach View. Please note that this can be done from Human Task from BPD using double click to automatically access default Service)
  • Integration Service (Used when you need to integrate external/internal services like WS, Java, Document repository or invoke UCAs)
  • Participant Group (Used to create groups of people/roles that can access certain resource. You will have only default groups defined when you start a new process, so you can use this to create other grouping structures for security or assigning to lanes when using BPD)
  • Service Level Agreement (Very useful construct. We can track execution of the process and act if we breach some SLA. We can then invoke a service, send an email or do something similar. It is very useful for reporting, too.)
  • Tracking Group (Used to define tracking parameters that can be later related to in BPD)
  • Undercover Agent (UCA) (Used for intermediate events or start process events)
  • Web Service (Used to expose General System Service, Integration Service or Service as a Web Service. Ajax is used and security can be enabled. Very customizable)

Note1: Every time you are exposing something outside of BPM, you will need to setup users/groups that can see/work with this resource. Do not forget this or otherwise, you will not be able to use exposed values.
Note2: I have left out a few of the other constructs like tracking, scenarios or file manipulation out of this list as I did not need them at this time, but you can find help for these and other items on IBM support site.

Depending if you choose that your solution becomes BPM centric or not, you may want to use Coaches to interact with the user, but in case that you already have a front end or you intend to build one, you can use the great support that IBM offers through Web Services (SOAP/REST). By using integrated REST API console, you can see the parameters for the process, and test it before you integrate it with your application.

Overall, my experience with IBM BPM was very satisfactory where BPM excelled in almost every aspect. IBM made a good decision when they acquired this product. Nevertheless, keep in mind that with a good product, comes a big price :).

I hope that this gives you some overview in how to start with IBM BPM 8. Here is also an eBook that might help you with first steps: BPM for Dummies.

Thanks

Wednesday, February 13, 2013

DB2 database setup

Did you ever have to setup DB2 on your local machine? You will find that if you have domain user, things are not as straight forward as they might seam. Here is a little help for DB2 Express 10.


I was using Windows 7 64 bit for this.

If you are logged in domain and what you found on IBM help site does not work, you might try this (You will need to run Command window - Administrator):

1. Create same username as domain username using local account.
2. Create new group DB2ADMNS
3. Add all usernames (domain,local and db2admin) to Administrators and DB2ADMNS
4. Run db2set DB2_GRP_LOOKUP=LOCAL,TOKENLOCAL
5. Run DB2STOP
6. Run DB2 UPDATE DBM CFG USING SYSADM_GROUP DB2ADMNS
7. Run DB2START
8. And now, the important step that I always forget, logoff and logon with your username. You should be able to do administration tasks now.

You can now create database:
attach to db2 user <username>;
create database <dbname>;
connect to <dbname>;
create bufferpool bpool8k size 20000 pagesize 8k;
create tablespace tbsp_8k pagesize 8k bufferpool bpool8k;

That's it. You can use your favorite tools now to create tables and run scripts.

Regards

Saturday, February 02, 2013

JDeveloper overview

Until recently, I was never a big fan of the visual tools like JDeveloper concerning Web UI development, but my recent exposure has changed my mind. Being a hard core developer was always something that had to go along with Java development or at least I though so. Great platform, but too much choice and somewhat looser restrictions in terms of what needs to be used and in which way (just remember trying to implement Web Services before JEE 5 with different tools...). JEE is only defined by its specification and it is a recommendation to be followed by many (see Wiki). Even then, some things are easily done but others are not.

JDeveloper (version 11.1.2.3.0) is, I have to say, a complete tool for the application development especially if you are developing for WebLogic server   (server is integrated into JDeveloper as test platform). You can of course use other vendors too and there are half a dozen of them. ADF Fusion comes certified for the WebLogic, WebSphere, Tomcat and JBoss (look here). We were building an ADF Fusion Application (so primary components like ADF Faces and ADF BC) and I have to say that I was impressed by the quality and the speed of the application that I could write in about 30 minutes. There is no way to achieve that kind of productivity with most other JSF related tools. There are some restrictions that you might not like (like overuse of XML and proprietary format of the JDeveloper for ADF part of the application) but overall, there is a lot of help available on the Oracle site. After a while you will get used to thinking XML and debugging XML (at first I though, how in the world can I debug XML...). You can still keep faces-config but moving to ADF implementation (adfc-config) offers much more choice and it is default for the Fusion framework.

In case that you are considering this tool, here are a few benefits that I could come up with:

  1. Full JEE support for the projects including EJB, WS, ADF BC, ADF Faces, JPA and other (and when I say other I mean that it has more than 170 types of items that you can create in different projects).
  2. One of the biggest advantages is that it offers a full RAD environment regarding JSF applications and it is pretty much the only tool to do so. This however comes with a price of needing to expose model as Data Control to fully utilize truly rapid creation of the visual components (and this is for the elements not coming from ADF BC).
  3. Even though I would consider overuse of XML something that is definitely not a benefit, exposing a ton of values through properties is one of the things that this approach has enabled. You can still expose Java methods for every control should you need it. Once you understand a bit on how exactly internals of the BC are working, it will not be to difficult to create effective controls.
  4. Drag and drop functionality for JSF (ADF Faces) works like a charm. Very precise, looks the same way in the design as it will be in the run-time and has a ton of options over JSF.
  5. Tool itself has a very good Database Navigator and can display UML diagrams, database table relationships, ADF flow and bunch of other productivity features.
  6. For ADF Faces, it has more than 150 components that you can use.
  7. Creating WS is super easy and exposing it through the controls is just a few clicks away.
  8. Productivity features for LOVs based on ADF BC are simply amazing. You can create drop down lists and search dialogs that are very usable with only few clicks and few lines of code.
  9. Help for the usage of the product is outstanding. I found that when trying to use property or already existing control, it is very easy to find some help by not even searching and only using F1.
  10. Adding new field for already existing database control is very easy and it takes only few minutes to update controls in the model and in the UI layer.
  11. It has a really nice support for mobile devices where you can create the application for desktop and mobile with just a few minor changes and a thoughtful design.
  12. A great control over task flows that you can visually develop (adfc-config). A lot of extensions over faces-config.
  13. JDeveloper site has Learn More tab where you can find a lot of examples in the PDF format or video.

Of course, no tool comes perfect, so here are few items that I did not like:

  1. Even though it is a free tool, to deploy ADF Fusion project you need to pay for the license if you do not already have one for WebLogic server.
  2. Tool uses a lot of memory to operate.
  3. Sometimes can be very slow (especially if you are running it on 32-bit machine with database deployed on the same machine).
  4. Debugging in XML is not something I am used to and in some cases it is a bit difficult, even though you have ADF Data and Structure tabs for debugging to see what ADF variables are and you have EL evaluator (it seems to me that too much data is proxied and used through the reflection).
  5. ADF Faces components are based on the Apache MyFaces Trinidad controls and that sometimes makes it a bit hard to extend (successful in the end, but it took some time). At least I found it to be very complex to understand models behind the controls. You can try for example, to create custom data-table that is based on the WS call and not database control (ADF BC) and that has ability to filter, sort and paginate, and you will understand what I mean.
  6. Pagination in the data table is something that I personally do not like as it implements scrollable set (Idea sounds cool, but when you try customizing it, it is a hell to synchronize).
  7. Help for creating custom controls is not so great. I had some difficulties finding good samples of the things that I tried to accomplish.

Overall, for a big projects and a lot of team members where streamlined development is a must, JDeveloper will come as a great recommendation. It is a balanced tool that is great if you want use things out of the box, but if you need to customize it a lot or use components from other vendors, I would not recommend it.

To help you further, I can recommend a few resources:

Oracle ADF Real World Developer's Guide from Jobinesh Purushothaman is a great book that is offering insight into inner workings of the Oracle ADF framework. This is not a beginners book and you need to know some basics to fully benefit from reading it. You can get the book here.


Quick Start Guide to Oracle Fusion Development: Oracle JDeveloper and Oracle ADF from Grant Ronald if great introductory book that you might want to read before you start working with ADF Fusion application. You can get the book here.





If you are looking for a videos or PDFs, JDeveloper Learn More tab will offer you plenty of resources. A great deal of videos are posted on YouTube by Shay Shmeltzer. These videos will give you a great overview of capabilities of the JDeveloper and ADF framework.

Please let me know if you have any other great resource that might help potential user to understand ADF and JDeveloper a bit better.

Thanks.

Books 1

If you love technothrillers like I do, I would like to recommend a few books for the reading while you are riding a train or waiting in the doctors office (e.g.). So here is my list:

[Little Brother] Cory Doctorow is outstanding writer that can create atmosphere and keep it to the very end. Little Brother is a great story that describes an unfortunate moment  in time of a life of a teenager Marcus. Some interesting technology and look into problems that Marcus will face will keep you reading this book until very end. You can buy this book here.

Marcus, a.k.a “w1n5t0n,” is only seventeen years old, but he figures he already knows how the system works–and how to work the system. Smart, fast, and wise to the ways of the networked world, he has no trouble outwitting his high school’s intrusive but clumsy surveillance systems.


[Daemon] Daniel Suarez is now one of my favorite writers! An amazing book, probably one of the best I have read. You simply cannot pass this title. Technology, story, characters and just about everything in this book is well though of. A bit slow in the beginning, but later on it picks up. You can buy this book here.

When a designer of computer games dies, he leaves behind a program that unravels the Internet's interconnected world. It corrupts, kills, and runs independent of human control. It's up to Detective Peter Sebeck to wrest the world from the malevolent virtual enemy before its ultimate purpose is realized: to destroy civilization... 


[Freedom] Sequel to Daemon, this book brings a conclusion to the story. Perhaps even better than it predecessor. If you liked the first one, you simply cannot pass reading this one. You can buy this book here.

In the opening chapters of Freedom(tm), the Daemon is firmly in control, using an expanded network of real-world, dispossessed darknet operatives to tear apart civilization and rebuild it anew. Soon civil war breaks out in the American Midwest, in a brutal wave of violence that becomes known as the Corn Rebellion. Former detective Pete Sebeck, now the Daemon's most powerful-though reluctant-operative, must lead a small band of enlightened humans toward a populist movement designed to protect the new world order. But the private armies of global business are preparing to crush the Daemon once and for all. In a world of conflicted loyalties, rapidly diminishing human power, and the possibility that anyone can be a spy, what's at stake is nothing less than human freedom's last hope to survive the technology revolution.


[Digital Fortress] Dan Brown has a way of creating a great story and keep it interesting to the very end. If you like cryptography, you will love this book. You can buy this book here.
When the NSA's invincible code-breaking machine encounters a mysterious code it cannot break, the agency calls its head cryptographer, Susan Fletcher, a brilliant, beautiful mathematician. What she uncovers sends shock waves through the corridors of power. The NSA is being held hostage--not by guns or bombs -- but by a code so complex that if released would cripple U.S. intelligence. Caught in an accelerating tempest of secrecy and lies, Fletcher battles to save the agency she believes in. Betrayed on all sides, she finds herself fighting not only for her country but for her life, and in the end, for the life of the man she loves.

[Deception Point] Same as previous books from Dan Brown, this one has plenty of technology, science,  twists and to the very end reader will wonder who is the person behind the conspiracy! You can get the book here.

When a NASA satellite discovers an astonishingly rare object buried deep in the Arctic ice, the floundering space agency proclaims a much-needed victory -- a victory with profound implications for NASA policy and the impending presidential election. To verify the authenticity of the find, the White House calls upon the skills of intelligence analyst Rachel Sexton. Accompanied by a team of experts, including the charismatic scholar Michael Tolland, Rachel travels to the Arctic and uncovers the unthinkable: evidence of scientific trickery -- a bold deception that threatens to plunge the world into controversy. But before she can warn the president, Rachel and Michael are ambushed by a team of assassins. Fleeing for their lives across a desolate and lethal landscape, their only hope for survival is to discover who is behind this masterful plot. The truth, they will learn, is the most shocking deception of all.


That is it for now, I have a few more titles, but I should probably leave it for the next post...

Cheers!