Simple Thoughts Online: spring boot

Showing posts with label spring boot. Show all posts

Wednesday, October 28, 2020

MSSQL to MongoDB

There have been a lot of discussion in recent years regarding NoSQL databases and when would they be preferable to SQL databases. There are a lot of articles written on this subject, but I wanted to give some insight to one of my past projects that I have been part of, and provide my perspective on the subject.

When choosing topology for the system we should take in consideration all of the factors and use technology that is most suitable for the given task. IMHO, there is no technology which would be optimal for all kinds of scenarios in a complex enterprise system. I have heard arguments such as, that if there is a limited support in one database, for e.g. NoSQL processing, then it should be a viable option to consider just for the sake of it being already present in the ecosystem and for the purpose cutting down the costs of the initial deployment. I would rather disagree with this as I believe that we need to expand our thinking and use technologies that are built specifically for the task at hand as cost savings in speed and development further down the road should not be overlooked. Benefits of this approach are many and cannot only be calculated by initial deployment costs. There are also good articles on the somewhat opposing views [link1] if you would like an honest debate on this matter.

In one of our use cases, MSSQL was already present and deployed in the cloud and initial decision was made to use this to manipulate and store NoSQL data. Even though MSSQL has support to deal with NoSQL structures, they are stored as string that had to be continuously converted to table format (or use special functions) to have a full range of capabilities for e.g. PSF (paging, sorting and filtering) and any serious and frequent data updates. (Here is the guide from Microsoft regarding SQL/NoSQL on Azure in that matter).

A better choice, IMHO, for JSON structures would be e.g. MongoDB or Cosmos DB, depending what is available in your current infrastructure. MongoDB was choice for database due to more familiarity in the development team and the fact that we could deploy our instances to both public and private cloud relatively easily with open source version of the database.

What was gained is that MongoDB is already optimized to deal with JSON structures, fully supports PSF on driver level and it is extremely easy to setup and maintain. We decided to start with SSL connection on 3 replication nodes. We also decided to save on development environment and deploy 3 nodes on the same server (for prod this should be distributed).

In our case MongoDB was being used as a cache database to a secondary layer of APIs that were supported with Oracle database in business API layer. Since we were looking for more flexibility and increased performance, this was a good choice. Data arriving from business layer was already well structured as one JSON but due to size and GUI editing capabilities, we needed to break it down to offer a more flexible usage based on the given business requirements for sample GUI.

Our API for Mongo layer was written using Spring Boot and was previously designed to work with Hibernate and MSSQL. There was a lot of business logic generated and it was handled with Maps and Strings without explicit Java mappings in many cases. Yes, there were some usages to map certain objects using JSON parser but it was all done manually. To proceed, we needed to remove Hibernate, generate 2 additional sets of domain objects (VO->Mongo->Business API), write converters (e.g. Orika) and enhance business logic to avoid parsing to HashMaps but using MongoDB drivers to map directly to Java objects. We also gained ability to use projections, aggregations and MongoDB views. There was a portion of data in MSSQL that was extensively designed to use relations (and this was on top of cached API data) that we needed to convert to NoSQL and integrate into new collections. Removing relations and designing collections proved to be a tricky part as we did not want to change our Angular application extensively. Business requirement was that other than paging changes no other visible functionality can be upgraded, and performance had to significantly improve. MongoDB 4 came with support for transactions between collections and even though ideal usage of NoSQL is to contain everything within one collection, we could not afford to do this due to several factors. One was changing GUI extensively, second one is that we still could not lose concept of relation that was introduced (to some extent), third one was size of the payload if we kept everything under same JSON and the last is performance issues on Angular side due to parsing speed. Perspective of SQL is speed between tables, correctness of data and data safety and the second is ease of use and practicality. Setting up NoSQL vs SQL database engine if also secondary benefit of NoSQL as it is much easier to tune it up (at least from the aspect of what we were doing). Lastly, scaling is much easier to accomplish with NoSQL.

Creating collections, from SQL to NoSQL

One of the challenging aspects of moving from SQL to NoSQL is to design appropriate data storage considering everything already implemented and respect the best practices of the underlying technology. In SQL we have relations and normalization and NoSQL is quite opposite where we would ideally want to contain as many aspects of a request in a single collection. The thing is that we also need to consider how much code is there using already confirmed contracts and APIs. If we have ESB layer or gateway, we may use this to bridge some of the gaps, but for some APIs, to fully gain better performance, smaller corrections may be needed on both server and the client side. In our case, client was missing pagination, consistent contract definition and sort and filtering capabilities were inconsistent. One of the first things we did was to collaborate with the team to understand the benefits regarding performance and page navigation by looking into information being queried and paginated. Since pagination can be done on a driver level, this was our initial goal. There is a secondary option of slicing arrays within one document, but this was not preferred approach. Next problem was dealing with huge payload and frequent updates to API. Older browsers had difficulties parsing this content continuously. Situation was that we have huge and diverse user base with different technical capabilities. Payload delivery needed to be carefully calculated to provide business value with already crafted GUI components, but also to keep in mind performance. We absolutely could not count on fact that the customers will always work with up to date computers and browsers and that they will all have high speed network access.

As I mentioned earlier, MongoDB 4 included transaction between collections so this significantly helped with our restructuring. We, of course, tried not to misuse this, as NoSQL philosophy is not to build relational collections. Reference data, cached data, configuration data all found its way to new collections and as they were separately accessed, so this was not the issue. Main data got separated into three main collections keeping in mind business flows and GUI design. Looking back at the end goal, I believe that it all worked out well.

Last thing to do was to create indexes based on frequently accessed data elements, add few aggregations for different views and create several views to serve data for the future use.

Changes to API contracts

Changes to API contract were done to standardize API development and exposure of the data to the client, introduce paging, sorting and filtering in consistent way and to reorganize some of the APIs to better serve NoSQL data (all based on our changes to the collections). These changes were organized with client usage in mind. The question that we continuously asked ourselves was, how will the client get appropriate amount of data with good flexibility and least amount of interaction with the APIs. Network quality that is used varies through the world and the network is also quite huge. This all plays into performance enhancements that we were bringing to the solution. One of the main things was to restructure models used to have server side models separated from client side models. We also introduced abstraction layer in the business service layer to help with future changes.

Updating Angular side

Apart for the various performance changes, modifications for API changes were relatively straight forward (and time consuming). API was split into generic configuration and task engine, and paged data was added where we used native MongoDB paging functionality. We also added sorting and filtering as opposed to SQL string store. MongoDB was able to process this without any performance hits. Instead of one API call to back-end service we chained two or more services, usually one un-paged for header data and paged services for business data. This worked well and gave us much smaller objects to deal with and also improved performance by a long shot. Ultimately, changes to Angular that were required were as listed (this included performance updates and MongoDB updates for the API):

Introduction of modular children routes
Introduction of lazy loading modules
Introduction of PSF functionality based on native database support to execute such queries
Reduction of the payload size by remodeling data as a cache and introducing concept that we should process and load only data that user can see + 1 page of buffered data
Moving cross field validation logic to the API since only validation logic on the client side should be applicable to fields that are visible to the user.

The end result of all of these operations was vastly improved performance, simplified development, maintenance and solid platform for future operations.

Friday, April 05, 2019

Deploying Angular 6 + Spring Boot + Postgres to AWS

You have completed your Angular application and now you are looking for some deployment options... Among many in existence today, you can find choices like AWS, GCP, Azure, PCF, etc. and few other in the cloud. In this post, I will explain what I needed to do to deploy my services to AWS and keep the cost low (or non existent with AWS free tier). There is always an option to get full 'by the book' services and pay for those, but it is better to understand what your options are when deploying the application and then understand how your revenue is going to go against your expenses. As it is, My Open Invoice application is now designed to operate for a single contractor. To add another company a new URL and setup is needed. This can, of course, be upgraded with a few tweaks in the data model and the application itself. We can leave that to some other blog, though. For now, let me introduce you to my simple architecture:

Front end of the application is Angular 6 built on Google Material design capable of rendering on both desktop and mobile devices
Middle tier is Spring Boot 2
Database is Postgres (H2 as test database)
Application has cache that is currently used only for RSS feed (EH Cache)
Authentication and authorization are done through JWT with Bearer token.

Amazon offers several options for hosting your web application, like EC2 instance and S3 bucket. There is then CloudFront which is used to cache your content (dist folder) and front is with HTTPS access. S3 has durability and redundancy, Route53 has DNS covered and RDS has your database. Elastic Beanstalk is used for EC2 instance generation, auto scaling and Load Balancing setup. CloudWatch is used for log tracing and then there are several other options that you can turn on for multiple instances, load balancing, reporting, caching etc.

My goal here is to create something that won't drain your money away but still give you a decent application with the ability to have backups. I will also mention the option that would be needed for more robust solution.

This is my current setup:

Let us start by explaining components that I have used:

Angular 6

This is a client facing application that is using a template that is both desktop and mobile app friendly. This app is compiled into /dist directory with production setting and code optimization, and uploaded to S3 bucket.

S3 Bucket

This is standard AWS S3 bucket in one zone. You can create zone replication but Amazon already guarantees 99.999999999% durability on single zone in case you worry about your data. You could even go to S3-IA for lower cost, but storage is very low for hosting one small web app anyway. This is going to host two items for us:

Angular 6 code (you can keep this private with access only to CloudFront)
Beanstalk repo for WAR files

CloudFront

This service is our fronting service to enable routing to /index.html, content distribution closest to your customer location and HTTPS certificate termination and routing for our Angular app. There are few important items here worth mentioning:

you need 400 and 404 (at least) routing enabled to /index.html 200 HTTP code or your app will not work properly
you can provide your certificate here if you have your own domain registered (you can use Amazon ACM to generate and provide certificate here)

CloudWatch

This service is enabled by default and will track and report your usage for your AWS components. Max resolution is 1 minute. Depending on configuration and logging amount different charges may apply. Modest logging should remain relatively cheap or non existent. Here is the pricing.

Elastic Beanstalk

This is your PaaS. You will use this as an entry point to create your configuration, environment, VPC, load balancing, auto scaling groups, RDS, etc. This can be done in an easy and convenient way. You can save hours of work by deploying this way. There are a few important items to consider:

I have created my database separately from this config. There is a difference if you want to use Tomcat (e.g.) to manage your DB connection or your application. There is also more flexibility to configure RDS individually.
I am using Nginx as a reverse proxy and Tomcat 8 to serve content (WAR). One important item is that since I am using Spring Boot application, you need to pass in -D style properties to overwrite Spring ones and NOT environment ${} variables. Took me good 1 hour to figure what 'environment' means in Beanstalk.
I did not turn on load balancing as this costs money even in free tier. You can alternatively load balance with Route53, but you will need to connect directly to EC2 instances and this limits auto-scaling option.
If you want to increase instance numbers, I did not find an option to change auto-scale Beanstalk other than enable LB or Time-based Scaling. But, you can also log into auto-scaling directly and increase Max instances to your desired number and also configure trigger for activation. This, however, will not help a lot as you would need load balancer to connect to these. The only other option would be to have ElasicIP address on EC2 instances and DNS balance on those, but I honestly did not try this.
When deploying WAR file, you need to create custom 443 port nginx.conf in addition with uploaded certificates (I got mine from this SSL site for free). You will need .ebextensions in WAR file with all configurations and certificates. EC2 config is rebuilt on every restart so you will lose 443 port if you do not have this enabled. This is ONLY needed if you do not have LB. Otherwise, LB will take care of 443 port for you (after you configure it).
You need to open 443 port to your EC2 security group (one will be created by EBS if you already do not have one). This needs to be accessible from 0.0.0.0/0. Your Angular app will connect directly to servers using this rule.

My choice of RDS is Postgres for prod and H2 for dev. On the side note, I was amazed how fast and compatible H2 is with SQL standards in terms of functions (I notice this every time :)). Postgres was closest for some custom queries for capability that I needed (compared with e.g. MySQL). RDS was created in only 1 zone with minimal sized instance, and security group was opened from EC2 to this group by name for desired port. RDS access was limited only to EC2 and if needed access can be done directly for DB management through port forwarding.

Route 53

I have registered domain through Amazon and AWS created a Hosted Zone for me. Inside, I have created ALIAS record pointing to Beanstalk for both www., api. and naked domain name. If you choose option to point directly to EC2, you can do that, and you can always choose load balancing using DNS (though this is not primary balancing method on AWS).

Auto Scaling Group

This is created by Beanstalk and in general, you have two options. One is for the single instance and one for the load balanced. Again, load balanced instance will cost you so use only if you need it.

This is just one way of setting up you environment and for a small application it works for me. I have all the backups so I am not too worried about my downtime if any.

As suggested, there are a few items to change if you need more robust environment. Couple of them would be:

Enable LB on Elastic Beanstalk and have your system balance across at least 2 zones.
Have at least 2 zone deployments, but depends on your clientele, hosting in each major zone would be beneficial (and then use Geolocation/proximity to optimize content delivery)
Right now, I have everything deployed in one VPC, but depending on your needs and layout, you may want more than one and then deciding which would be private and public and where to use Gateways, and how to connect VPCs.
API Gateway is always and option for more complex environments where HTTPS also can terminate (with your certificate). This adds a layer of abstraction if you are using microservices from multiple points and with various infrastructure.
RDS needs to be deployed on Multi-AZ with read replicas enabled. Backup is already enabled even for 1 instance. A possibility exists to use NoSQL database that is automatically deployed into Multi-AZ (like DynamoDB).

There are many ways things can be configured on Amazon, so it is worth a while to investigate and try it out. It is not expensive to test configurations to figure out what is the right one for you, but it may be expensive to try to correct everything down the road once you realize that it was not set up properly. Amazon offers option to pay for only what you use, so why not try it?

So far, my costs were for registering my domain and a small Route53 charge ($0.50 per hosted zone).

If you have a better setup in mind, please let me know. I am always trying to learn new and better ways of optimizing data, infrastructure and cost.

Sunday, September 30, 2018

Spring Boot (1.5) OAuth2 Server in Enterprise environment

The problem

If we want to have an array of microservices and support user interaction through delegated authorization, this implementation would be one of the options to consider or at least review. We have to first understand the differences between OAuth2 and e.g. OIDC before we continue explaining how to achieve OAuth2 implementation in Spring Boot in a way that is stateless and integrated so many microservices can rely on this implementation. OAuth 2.0 is the industry-standard protocol for authorization where OpenID Connect 1.0 is a simple identity layer on top of the OAuth 2.0 protocol. It allows Clients to verify the identity of the End-User based on the authentication performed by an Authorization Server, as well as to obtain basic profile information about the End-User in an interoperable and REST-like manner. This does not mean that OAuth2 cannot do authorization, it is just that OIDC is better suited for this work.

We are going to focus on OAuth2 implementation in this article. There are several good references for reading that you can look for before venturing to create your own implementation (one, two, three, four, five, six). My reasons to do this were several, like not having OAuth2 service available, need for microservices in our architecture, uncertain structure of the client-server architecture and several other. Since we already worked with Spring Boot implementing this solution was the next best thing we could do to move our project further towards our goals.

There are two ways we can handle tokens in OAuth2 and those are plain token and JWT token. For our purposes we chose plain token implementation. The difference was that plain token needs to be verified by OAuth2 service every time it is accessed and JWT can be stored in the resource and verified by the public keys provided. Either way we can have a working solution but implementing plain tokens was a simpler and faster way to go.

Our requirements were to provide solution for the stateless authentication/authorization for the we client, server had to have a small footprint, had to be scalable, we had to see the tokens/users that we generated, we had to have revoke token capability, we had to provide automated solution to obtain the token to integration testing, microservices had to be able to both authenticate themselves and take user authenticated tokens, we had to have ability to connect to LDAP or database and ability to later support SSO from third party provider. There is always an opportunity to implement different solutions but this was something that could potentially play well into the future banking architecture and plans.

Server configuration

To start off we chose Spring Boot OAuth2 and created spring boot application. There were several configurations that we needed to implement to make our server to perform authorization and authentication.

WebSecurityConfigurerAdapter (to define /login, /logout, swagger, filters - we also can use @EnableOAuth2Client to additionally configure SSO client)
GlobalAuthenticationConfigurerAdapter (to define user details service, BCrypt password encoder and init method to distinguish between LDAP and database source). This adapter was needed as there are several filters to read users depending on the flow invoked.

auth.ldapAuthentication() was starting point for LDAP
auth.userDetailsService(...) was starting point for user details service and password encoding

LdapAuthoritiesPopulator bean for LDAP custom authorities (used repository to load authorities based on user authentication)
AuthorizationServerConfigurerAdapter (to define OAuth2 server infrastrusture, including custom SQL queries (as we needed DB access for stateless solution across servers). This included tables like oauth_access_token, oauth_refresh_token, oauth_code and oauth_client_details. Tables are used depending on the flow invoked. Action involved overriding TokenStore, ClientDetailsService, AuthorizationCodeServices, configure(AuthorizationServerEndpointsConfigurer endpoints), configure(AuthorizationServerSecurityConfigurer security), DefaultTokenServices and configure(ClientDetailsServiceConfigurer clients) - with @EnableAuthorizationServer.
ResourceServerConfigurerAdapter (to define adapter that will server as entry point and configuration for any custom APIs) - with @EnableResourceServer.
We also needed to expose API for the user verification where we publish Principal object (this will be used by the microservices to obtain user details)

It is very important to note that adapter ordering is extremely important and that you may loose a lot of time investigating why something is not working just because of this. The order (lowest to the highest) should be Web (needed for authorization_code and implicit and stateful - because of the login page and authentication) -> OAuth2 (needed for all grants but stateless for password, refresh_token and client_credentials grants) -> Resource (needed for APIs).

We opted to use authorization_code without secret and refresh token for user authentication, client_credentials for the server and microservices that needed to authenticate themselves and password grant for the integration test cases (as this is the easiest way to obtain the token for specific user). Our client_credentials added a default role for the client and the rest added a default role for the user. This way, every authenticated client/user will have a default role to start with. This was a sure way to distinguish between human user and server API. The one problem that we still needed to solve for is propagation of the tokens in the layers of services. It is not a good practice to propagate same token between different horizontal layers due to the loss of the identification of the service doing the authorization.

Client configuration

Before we start with the microservices it is important to say that all services are defined as resources (using @EnableResourceServer annotation). This automatically means that we can for one identify a resource and enable its usage to the client for the OAuth2 configuration, and second, we can setup verification URL for the token. In order for any microservice to identify itself, we have two options for the configuration. First in the application.yml and the other programmatic (in case we need to obtain the token itself). For the first option it is useful to use it on any service that needs to implement verify_token, that is, whenever we receive the token our API will send the request to validate the token and populate user details in the SecurityContext in the spring. This is achieved in the security.oauth2.client and security.oauth2.resource entries. There we have to specify our given client id, secret, verify_token URL, resource id, user details URL and a few other parameters. The second option is to obtain token programatically, for example in the spring integration layer, where declarative approach might be difficult, non existent or depending on the extensive logic. In this case the approach is to create a client code that is annotated by the @EnableOAuth2Client. In our case this was done in the ClientHttpRequestFactory. Obtaining the token is achieved by the OAuthClient and OAuthClientRequest to our authorization server using client_credentials grant.

In the end the goal we had was achieved and all flows are now functional and serving the purpose. The next thing we need to worry about is switching the framework to the OIDC coupled with OAuth2. This will, perhaps, be a good topic for one of the next blogs.

Saturday, May 13, 2017

Spring Boot with JSF

I have open sourced an application, recently, that I created for my own time tracking and invoicing and published it on GitHub under the Apache 2.0 license. When I started working on it, I thought about which technology should I use and decided on Spring, more precisely Spring Boot, because that was something that is natural to me as I have used this for many years. But this would only serve a purpose for the middle tier and I still needed to think about what to use for the front end. Trend for the Spring Boot is to use AngularJS or similar frameworks that would tie in it's capabilities and potentially be deployable on the platforms like Cloud Foundry or similar cloud based solutions. This is due to the fact that these are the stateless solutions, and due to their scalability and ease of deployment they are a good fit for the Spring Boot. However, once we start distinguishing between web applications and enterprise applications, depending on what they need to achieve, I believe that JSF still plays a big role due to its robustness and number of out of the box components that we can use. PrimeFaces framework is something I have used extensively in the past years and it played well with older Spring implementations, but Spring Boot was not meant to work with this out of the box. I have found a project on the GitHub called JoinFaces to help me with this. JoinFaces incorporate best practices and frameworks from JSF world and allow it to work together with the Spring Boot. In an environment where we will need scalability and multiple server deployments, we would still need a solution to share sessions or create sticky client connections, but for my purpose, this was ideal. So here is the stack that I used:

JoinFaces
Spring Boot
Hibernate (with QueryDSL)
H2 database
BIRT reporting engine

These are the basic libraries that allowed me to create an easy deployable solution for the local computer or cloud based application depending on your need. This application still has only one deployable file, but we can easily abstract and create mid tier with business services if we needed micro services type of an app. So why use these components? Let's address my thinking behind each one of these:

JoinFaces

This is the project that ties JSF to Spring Boot and it can be found on this location. In their own words: This project enables JSF usage inside JAR packaged Spring Boot Application. It autoconfigures PrimeFaces, PrimeFaces Extensions, BootsFaces, ButterFaces, RichFaces, OmniFaces, AngularFaces, Mojarra and MyFaces libraries to run at embedded Tomcat, Jetty or Undertow servlet containers. It also aims to solve JSF and Spring Boot integration features. Current version includes JSF and CDI annotations support and Spring Security JSF Facelet Tag support.

Since it includes PrimeFaces and this is my favourite JSF engine, it was perfect for what I was trying to do. It is also up to date and well maintained.

Spring Boot

Spring Boot is the new flavor and trend for the development if you prefer this framework. This is what Maven was to Ant when it came out. The goal here is to pre-configure as many parameters as possible and autodetect dependencies. The whole Cloud Foundry was created to make development and deployment as easy as possible in the enterprise environment. In their own words: Spring Boot makes it easy to create stand-alone, production-grade Spring based Applications that you can "just run". We take an opinionated view of the Spring platform and third-party libraries so you can get started with minimum fuss. Most Spring Boot applications need very little Spring configuration.

Spring in general was a revelation when it came out. It made developing in Java something that you can actually like. Using this as a wiring framework for the libraries is a really good fit in my opinion.

Hibernate (with QueryDSL)

Hibernate was always one of my favourite ORM frameworks. In the world where good programmers are difficult to find and where Java developers do not know or understand how to write a proper SQL, Hibernate offers to bridge this gap. Another good thing is that it will adapt to any supported database you can use. The bad thing is that you may not be able to use database specific things (e.g. hierarchy queries) or where you have complex and demanding queries. Hibernate also may not be a good option for optimisation of the complex queries. If your existing database model is also not 'by the book', you may experience additional problems. In their own words, Hibernate is: Hibernate ORM enables developers to more easily write applications whose data outlives the application process. As an Object/Relational Mapping (ORM) framework, Hibernate is concerned with data persistence as it applies to relational databases (via JDBC).

You may fall back to JDBC from Hibernate in case you need to write database specific queries or write native type of queries.

QueryDSL is a good addition to JPA as it enables for an easy filtering and query structuring where this may be required. I found it to be very helpful. In their own words: Querydsl is a framework which enables the construction of type-safe SQL-like queries for multiple backends including JPA, MongoDB and SQL in Java. Instead of writing queries as inline strings or externalizing them into XML files they are constructed via a fluent API.

H2 database

I have gone through several databases (Derby, H2 and HyperSQL) for this project and H2 for me was easiest to setup, backup, run in shared mode and just support a lot from the SQL standard. Very fast to initialise and super easy to understand. It integrated well into the application and so far I have not seen any issues with it. In their own words: Welcome to H2, the Java SQL database. The main features of H2 are: Very fast, open source, JDBC API, Embedded and server modes; in-memory databases, Browser based Console application, Small footprint: around 1.5 MB jar file size.

BIRT reporting engine

For invoices, I required an open source reporting engine that was free. I looked into two: Jasper Report and BIRT. I started with Jasper as it seemed easier to integrate but once I started doing development for the subqueries is when the things got 'interested'. They both have a good UI for designing reports but I found BIRT to be easier and faster to work with. The main difference is that Jasper enables absolute positioning of the elements and BIRT does not. With some planning, this really is not relevant. In their own words: BIRT is an open source technology platform used to create data visualizations and reports that can be embedded into rich client and web applications.

In the next few blogs, I will try to explain what I did, how and why. You may find the application that I called My Open Invoice on GitHub.

Thanks