Wednesday, October 28, 2020

MSSQL to MongoDB



There have been a lot of discussion in recent years regarding NoSQL databases and when would they be preferable to SQL databases. There are a lot of articles written on this subject, but I wanted to give some insight to one of my past projects that I have been part of, and provide my perspective on the subject.

When choosing topology for the system we should take in consideration all of the factors and use technology that is most suitable for the given task. IMHO, there is no technology which would be optimal for all kinds of scenarios in a complex enterprise system. I have heard arguments such as, that if there is a limited support in one database, for e.g. NoSQL processing, then it should be a viable option to consider just for the sake of it being already present in the ecosystem and for the purpose cutting down the costs of the initial deployment. I would rather disagree with this as I believe that we need to expand our thinking and use technologies that are built specifically for the task at hand as cost savings in speed and development further down the road should not be overlooked. Benefits of this approach are many and cannot only be calculated by initial deployment costs. There are also good articles on the somewhat opposing views [link1] if you would like an honest debate on this matter.

In one of our use cases, MSSQL was already present and deployed in the cloud and initial decision was made to use this to manipulate and store NoSQL data. Even though MSSQL has support to deal with NoSQL structures, they are stored as string that had to be continuously converted to table format (or use special functions) to have a full range of capabilities for e.g. PSF (paging, sorting and filtering) and any serious and frequent data updates. (Here is the guide from Microsoft regarding SQL/NoSQL on Azure in that matter).

A better choice, IMHO, for JSON structures would be e.g. MongoDB or Cosmos DB, depending what is available in your current infrastructure. MongoDB was choice for database due to more familiarity in the development team and the fact that we could deploy our instances to both public and private cloud relatively easily with open source version of the database.

What was gained is that MongoDB is already optimized to deal with JSON structures, fully supports PSF on driver level and it is extremely easy to setup and maintain. We decided to start with SSL connection on 3 replication nodes. We also decided to save on development environment and deploy 3 nodes on the same server (for prod this should be distributed).

In our case MongoDB was being used as a cache database to a secondary layer of APIs that were supported with Oracle database in business API layer. Since we were looking for more flexibility and increased performance, this was a good choice. Data arriving from business layer was already well structured as one JSON but due to size and GUI editing capabilities, we needed to break it down to offer a more flexible usage based on the given business requirements for sample GUI.

Our API for Mongo layer was written using Spring Boot and was previously designed to work with Hibernate and MSSQL. There was a lot of business logic generated and it was handled with Maps and Strings without explicit Java mappings in many cases. Yes, there were some usages to map certain objects using JSON parser but it was all done manually. To proceed, we needed to remove Hibernate, generate 2 additional sets of domain objects (VO->Mongo->Business API), write converters (e.g. Orika) and enhance business logic to avoid parsing to HashMaps but using MongoDB drivers to map directly to Java objects. We also gained ability to use projections, aggregations and MongoDB views. There was a portion of data in MSSQL that was extensively designed to use relations (and this was on top of cached API data) that we needed to convert to NoSQL and integrate into new collections. Removing relations and designing collections proved to be a tricky part as we did not want to change our Angular application extensively. Business requirement was that other than paging changes no other visible functionality can be upgraded, and performance had to significantly improve. MongoDB 4 came with support for transactions between collections and even though ideal usage of NoSQL is to contain everything within one collection, we could not afford to do this due to several factors. One was changing GUI extensively, second one is that we still could not lose concept of relation that was introduced (to some extent), third one was size of the payload if we kept everything under same JSON and the last is performance issues on Angular side due to parsing speed. Perspective of SQL is speed between tables, correctness of data and data safety and the second is ease of use and practicality. Setting up NoSQL vs SQL database engine if also secondary benefit of NoSQL as it is much easier to tune it up (at least from the aspect of what we were doing). Lastly, scaling is much easier to accomplish with NoSQL.


Creating collections, from SQL to NoSQL


One of the challenging aspects of moving from SQL to NoSQL is to design appropriate data storage considering everything already implemented and respect the best practices of the underlying technology. In SQL we have relations and normalization and NoSQL is quite opposite where we would ideally want to contain as many aspects of a request in a single collection. The thing is that we also need to consider how much code is there using already confirmed contracts and APIs. If we have ESB layer or gateway, we may use this to bridge some of the gaps, but for some APIs, to fully gain better performance, smaller corrections may be needed on both server and the client side. In our case, client was missing pagination, consistent contract definition and sort and filtering capabilities were inconsistent. One of the first things we did was to collaborate with the team to understand the benefits regarding performance and page navigation by looking into information being queried and paginated. Since pagination can be done on a driver level, this was our initial goal. There is a secondary option of slicing arrays within one document, but this was not preferred approach. Next problem was dealing with huge payload and frequent updates to API. Older browsers had difficulties parsing this content continuously. Situation was that we have huge and diverse user base with different technical capabilities. Payload delivery needed to be carefully calculated to provide business value with already crafted GUI components, but also to keep in mind performance. We absolutely could not count on fact that the customers will always work with up to date computers and browsers and that they will all have high speed network access. 

As I mentioned earlier, MongoDB 4 included transaction between collections so this significantly helped with our restructuring. We, of course, tried not to misuse this, as NoSQL philosophy is not to build relational collections. Reference data, cached data, configuration data all found its way to new collections and as they were separately accessed, so this was not the issue. Main data got separated into three main collections keeping in mind business flows and GUI design. Looking back at the end goal, I believe that it all worked out well. 

Last thing to do was to create indexes based on frequently accessed data elements, add few aggregations for different views and create several views to serve data for the future use.


Changes to API contracts


Changes to API contract were done to standardize API development and exposure of the data to the client, introduce paging, sorting and filtering in consistent way and to reorganize some of the APIs to better serve NoSQL data (all based on our changes to the collections). These changes were organized with client usage in mind. The question that we continuously asked ourselves was, how will the client get appropriate amount of data with good flexibility and least amount of interaction with the APIs. Network quality that is used varies through the world and the network is also quite huge. This all plays into performance enhancements that we were bringing to the solution. One of the main things was to restructure models used to have server side models separated from client side models. We also introduced abstraction layer in the business service layer to help with future changes.


Updating Angular side


Apart for the various performance changes, modifications for API changes were relatively straight forward (and time consuming). API was split into generic configuration and task engine, and paged data was added where we used native MongoDB paging functionality. We also added sorting and filtering as opposed to SQL string store. MongoDB was able to process this without any performance hits. Instead of one API call to back-end service we chained  two or more services, usually one un-paged for header data and paged services for business data. This worked well and gave us much smaller objects to deal with and also improved performance by a long shot. Ultimately, changes to Angular that were required were as listed (this included performance updates and MongoDB updates for the API):

  • Introduction of modular children routes
  • Introduction of lazy loading modules
  • Introduction of PSF functionality based on native database support to execute such queries
  • Reduction of the payload size by remodeling data as a cache and introducing concept that we should process and load only data that user can see + 1 page of buffered data
  • Moving cross field validation logic to the API since only validation logic on the client side should be applicable to fields that are visible to the user.

The end result of all of these operations was vastly improved performance, simplified development, maintenance and solid platform for future operations.


No comments:

Post a Comment