Monday, December 11, 2023

Mountebank testing


 

I recently switched to do some testing from Wiremock to Mountebank. I found that Mountebank had far more flexibility and simplicity when doing complex data setups to support your test cases. I ran both on the docker and kubernetes and they work perfectly fine in both engines.

Sample docker  mountebank run:

docker run --name mountebank -p 2525:2525 -p 11000-11010:11000-11010 \\
 -v /home/user/mountebank-data/:/mb bbyars/mountebank --configfile /mb/templates/imposters.ejs \\
 --allowInjection --ipWhitelist "::ffff:172.17.0.1|172.17.0.1"

Sample wiremock run:

docker run --name wiremock \\
  -p 9999:8080 \\
  -v /home/bcavlin/IdeaProjects/moleculer/mock:/home/wiremock \\
  wiremock/wiremock --verbose --global-response-templating --max-template-cache-entries 0
 

Starting MEGA instances


 

MEGA is proving to be a solid provider for the data storage with encryption enabled. In case you wanted to start multiple instances in Linux (e.g. work and personal), here is the script I picked up and modified a bit from the web:

Sample:


Sunday, April 11, 2021

Turning keyboard light on Ubuntu after sleep

 

One of the things that I like on the Mac I use for work is that everything seems to be well integrated, however the keyboard is pretty bad though in my opinion, at least on the model that I use. Nevertheless, the back-light feature seems to be working really nice and I wanted to model this partially on the Ubuntu. One of the things that was missing on my Ubuntu laptop is that if I turn off my keyboard back-light, it stays off even if I resume working after dark or if it was on during the night and I resume it during the day, it stays on and consumes a lot of battery. It kind of seemed a shortfall on Ubuntu side, so I decided to write a little program to start and check if I needed a back-light after sleep or not. This is based on my settings for the night mode and hours entered in the setting. The idea is that the laptop comes from the sleep and if it is in the night hours, it will automatically light my keyboard, or otherwise shut off my back-light during the day. This is only after sleep mode and it gives nice feeling to start the work. Script was placed in 

/usr/lib/systemd/system-sleep/

Since it runs under root user, you will need to replace <USER> to put in your user where night mode is configured.


Wednesday, November 04, 2020

Interviews from the aspect of a tech guy

Hi,

For the last 12 years I had the opportunity to interview quite a few people for various positions mostly related to Node.js, Java, Spring, Cloud and SQL processing. I would like to take this opportunity to write about what I was looking for in the candidates resumes, recruiters relationships and some common mistakes that people make during the interviews or writing resumes. Please note that this is strictly my opinion and that you can take this with a grain of salt.

Pre-interview, gathering requirements

On the several projects where I worked or where I am still working, most job openings will come from business, financing the needs of a project that is, where there is a specific business requirement that needs to be fulfilled. These requirements are created from ad-hoc need or opportunity, or strategic planning. 
There can be two forms of financing in general that I observed, one where we are looking to initialize the platform or novel way of processing data, and one where we are utilizing existing or expanding platform for the business need or competitive advantage. The difference is that first one is developing supporting technology and second one is developing the business. This is where some ideas of what kind of experience and profiles we would require from the candidates come to mind. The other important factor is the company and team culture. Some teams are conservative some more open, some follow strict structure and the others have more free form. These are all important factors for the success of the team. Remember, most of the time, the goal is usually to establish an effective team not exceptional individuals.


Thoughts about candidates

One of the things with agencies is that they use automated search engines when looking for the candidates most of the time and some people tend to add things into the keywords or their job description that they perhaps only heard of or just have bare understanding about them. We get candidates of different backgrounds, both cultural and technical and our goal should always be to see and understand how they can fit into our team and business culture and whether they will bring something valuable in terms of their personal and professional skills and their knowledge of technology. One of the most important aspect for the candidate is to show their enthusiasm and professionalism during this process, regardless on how they actually feel towards the process or the interviewer. This proves couple of points:
  1. That they are capable of desired behavior when needed (professional and courteous)
  2. That they understand the rules of engagement and what is expected of them
  3. That they are not shy to enter into the new situation and do their best to fit in
Anything else leaves employer side at the mercy of the moment and unnecessary training where we should be focusing on bringing candidates up to the speed of our business and not basic understanding of the general work ethics and their professional skills.

Unfortunately, many times candidates will come unprepared or uninterested to the interview. There is a difference when someone is a bit scared or overwhelmed and when someone is unprepared. It is fully expected that candidate will know their resume in detail (what is written in the resume). This is why writing resumes that are more than 2 pages long is very wrong in my opinion. I would usually not go over second page for several reasons:
  1. It is out of date (I am only interested in last 5 years at most)
  2. Technology used may be outdated
  3. Candidate may not have relevant knowledge to apply in the current scenario
  4. I have many resumes to review and only limited time to do this
  5. Many times candidates would say that they do not remember details of the implementation past 5 years and this does not help in this line of business where technology changes all the time 
It is, of course, good that candidate can reference their past experience and what they learned from it and how they evolved professionally, but details of the discussion should be kept at their most recent experience. There are many ifs and buts here, however, I need to make sure we establish a reasonable expectation of what is needed to pass the interview during the one hour process. This is a very limited time to get to know someone you have met for the first time and establish both objective and subjective opinion of the candidate. 
 

Reviewing resumes

As I have mentioned before, it is well advised to contact some of the agencies that professionally craft resumes towards the job. We all have various experiences and they do not always apply to every company or every job. The idea of a resume is to get invited to the interview in most focused and honest way and it is not the point to write you life story in there. Professional agents and tools that they use have a very limited time to go over the resumes especially for the job where hundreds of applicants apply. You must point out in your resume why you are the right person for the job requested even though in many cases you will not be a 100% match. You need to single out point from your career and experience that puts you on top over other candidates, or in other words, how to be better than the second last candidate. Once you get an interview, resume is not that much important and serves only as a reference for the conversation. As far as cover letters goes, I honestly do not remember when I last read the cover letter, so I would generally drop these. 

When reviewing resume, it always depends if I am reviewing for position of full time job or a contractor. The difference is that first position will have time to build on a career and experience and the later needs to perform immediately. Criteria may be a bit different for two types but it entirely depends on hiring manager and company that is hiring.  

Unless you are not the person that graduated recently, I would generally put very little emphasis on the education other than passing requirement for the job, but more one experience and enthusiasm to do the job. I would prefer to see more resumes where candidate has the experience in the open source community or had some recent (relevant) courses completed. In my opinion this would give an edge to this candidate over the rest. Continuous education and participation in your own or open source project in this line of work is extremely important. If candidate has a blog or example of work, even better. Those are the points that would put candidates way above over the rest that do not have this, providing that what is displayed demonstrates capability and is in line with the resume and job requirements.


The interview

On the interview day, whether it is online or in person, please do not be late. This is not a cliche but shows some responsibility and respect towards the people involved. In case you are being late for objective reasons, please pick up a phone and give interviewer a call, it will be appreciated. Later on this also shows that candidate understands importance of not being late to the many meetings that we may have. 
During the interviews, sometimes I interview alone and sometimes there is hiring manager or another senior resource present, but it depends on the role we are interviewing and our availability. In any case there is going to be different sets of questions designed to:
  1. Validate candidate resume and expertise
  2. Establish effective communication
  3. Validate behavior under stress
  4. Establish problem solving capacity
When interview starts, we always give a chance to candidates to introduce themselves. This can take anywhere from 5-10 minutes after we give a short introduction of our environment and job requirement. Expectation is that candidate already understands something about company and tools/processes that we are using, or how would they get the interview in the first place, right? This also shows that candidate is interested in this company/position and that they did some research prior to the interview. 
All my interviews are technical, and that means that we will talk about technology and processes that candidate used and that are relevant to us. I definitely do not know and can discuss about all the items on everyone's resumes, but based on my previous experience, there will be a lot of them where we can find common ground for the discussion. 
Regardless of the candidate I would always go through basic questions to get an understanding if the candidate thought about technologies they are using and not just copy pasted from the Internet. This would involve simple questions like: Why do you think there is an abstract class and interface in Java, for example, or tell me a difference between Java and JEE? You would be amazed how many people would struggle with this question even though this is a very foundation of how design patterns and Java coding is performed in general. After we cover basics just to get the idea where candidate stands in regards to what is written in the resume, we would get on relevant technologies that are listed as most recent or marked as expert or experienced. I would establish a and ask about scenario with the expectation from candidate to:
  1. Demonstrate expertise in the area
  2. Establish communication and problem solving capability
  3. Provide a suggestion and see how candidate can follow a given direction
Depending on a candidate and their previous responses I may provide false leads or conflicting narrative and ask the candidate to proceed in that way to establish the possibility of a future conflict or behavior problem. This is all done in respectable way, of course, with the expectation from the candidate to politely and argumentatively point to the error. In several cases this proved to be impossible obstacle for few candidates... In general, and depending on a job, there are 3 major layers in the enterprise application. Front end, middle tier and database. There are hundreds of technologies that can go in between, like caching, routing, messaging, ETL, etc., but the important factor is that every candidate needs to be aware of these differences and how applications perform in general. When we discuss the problem, awareness of what is being applied as a solution needs to be present at all times. 
One of the important suggestions during the interview is that you should not start on a subject that you do not fully understand when giving examples. This will lead to more in depth questions and possible dead end. If a question pops up that you are not familiar with, do try to supplement with closest example from your experience. This will set you on familiar ground where we can make a constructive conversation. Having a good exchange of ideas is considered a good interview in my opinion. And remember, interview is not an interrogation, look on it more as a  focused conversation.

Post-interview thoughts

Once the interview has been completed, normally we would gather to discuss the candidates and their performance. I would generally present a written or verbal report and provide my opinion, but as a general rule, decision on hiring the candidates usually rests with the Director or VP or whoever owns the budget for the project.  

In any case, I have been in various interviews during my life where some were good and the others not so much. Not everything turns always as you expected, but in the end, you only need one good interview, and with some luck, knowledge and positive attitude, you will get your next job, or at least, a valuable experience be it positive or negative.

As always, please share your thoughts and opinions about this matter. All the best!

Wednesday, October 28, 2020

Proxy in Microsoft proprietary world


In a world where we have a corporate proxy server requiring NTLM authentication, having Mac/Linux may prove to be difficult choice if all data needs to be routed though the said proxy. Talk about Brew, Git, Node, etc., and all of them will require authenticated access. When you try to use these apps, you may get 407 error saying that you require authentication. In this case, we would need to authenticate using NTLM and this can easily be done with cntlm. We can leave proxy running and connect to its default port 3128 on localhost for most applications requiring it, be it through environment export and http or https proxy or config in e.g. gitconfig file. To start, we need to generate hash for our password:

cntlm -u myusername -d mydomain -H

After this, you will get a password hash that you need to copy in cntlm.conf file in /etc and this will be used to start your server and authenticate you. Server is started with cntlm command specifying configuration file. A good resource to look at is cntlm documentation,  or I found that this post also helps.

Thanks


MSSQL to MongoDB



There have been a lot of discussion in recent years regarding NoSQL databases and when would they be preferable to SQL databases. There are a lot of articles written on this subject, but I wanted to give some insight to one of my past projects that I have been part of, and provide my perspective on the subject.

When choosing topology for the system we should take in consideration all of the factors and use technology that is most suitable for the given task. IMHO, there is no technology which would be optimal for all kinds of scenarios in a complex enterprise system. I have heard arguments such as, that if there is a limited support in one database, for e.g. NoSQL processing, then it should be a viable option to consider just for the sake of it being already present in the ecosystem and for the purpose cutting down the costs of the initial deployment. I would rather disagree with this as I believe that we need to expand our thinking and use technologies that are built specifically for the task at hand as cost savings in speed and development further down the road should not be overlooked. Benefits of this approach are many and cannot only be calculated by initial deployment costs. There are also good articles on the somewhat opposing views [link1] if you would like an honest debate on this matter.

In one of our use cases, MSSQL was already present and deployed in the cloud and initial decision was made to use this to manipulate and store NoSQL data. Even though MSSQL has support to deal with NoSQL structures, they are stored as string that had to be continuously converted to table format (or use special functions) to have a full range of capabilities for e.g. PSF (paging, sorting and filtering) and any serious and frequent data updates. (Here is the guide from Microsoft regarding SQL/NoSQL on Azure in that matter).

A better choice, IMHO, for JSON structures would be e.g. MongoDB or Cosmos DB, depending what is available in your current infrastructure. MongoDB was choice for database due to more familiarity in the development team and the fact that we could deploy our instances to both public and private cloud relatively easily with open source version of the database.

What was gained is that MongoDB is already optimized to deal with JSON structures, fully supports PSF on driver level and it is extremely easy to setup and maintain. We decided to start with SSL connection on 3 replication nodes. We also decided to save on development environment and deploy 3 nodes on the same server (for prod this should be distributed).

In our case MongoDB was being used as a cache database to a secondary layer of APIs that were supported with Oracle database in business API layer. Since we were looking for more flexibility and increased performance, this was a good choice. Data arriving from business layer was already well structured as one JSON but due to size and GUI editing capabilities, we needed to break it down to offer a more flexible usage based on the given business requirements for sample GUI.

Our API for Mongo layer was written using Spring Boot and was previously designed to work with Hibernate and MSSQL. There was a lot of business logic generated and it was handled with Maps and Strings without explicit Java mappings in many cases. Yes, there were some usages to map certain objects using JSON parser but it was all done manually. To proceed, we needed to remove Hibernate, generate 2 additional sets of domain objects (VO->Mongo->Business API), write converters (e.g. Orika) and enhance business logic to avoid parsing to HashMaps but using MongoDB drivers to map directly to Java objects. We also gained ability to use projections, aggregations and MongoDB views. There was a portion of data in MSSQL that was extensively designed to use relations (and this was on top of cached API data) that we needed to convert to NoSQL and integrate into new collections. Removing relations and designing collections proved to be a tricky part as we did not want to change our Angular application extensively. Business requirement was that other than paging changes no other visible functionality can be upgraded, and performance had to significantly improve. MongoDB 4 came with support for transactions between collections and even though ideal usage of NoSQL is to contain everything within one collection, we could not afford to do this due to several factors. One was changing GUI extensively, second one is that we still could not lose concept of relation that was introduced (to some extent), third one was size of the payload if we kept everything under same JSON and the last is performance issues on Angular side due to parsing speed. Perspective of SQL is speed between tables, correctness of data and data safety and the second is ease of use and practicality. Setting up NoSQL vs SQL database engine if also secondary benefit of NoSQL as it is much easier to tune it up (at least from the aspect of what we were doing). Lastly, scaling is much easier to accomplish with NoSQL.


Creating collections, from SQL to NoSQL


One of the challenging aspects of moving from SQL to NoSQL is to design appropriate data storage considering everything already implemented and respect the best practices of the underlying technology. In SQL we have relations and normalization and NoSQL is quite opposite where we would ideally want to contain as many aspects of a request in a single collection. The thing is that we also need to consider how much code is there using already confirmed contracts and APIs. If we have ESB layer or gateway, we may use this to bridge some of the gaps, but for some APIs, to fully gain better performance, smaller corrections may be needed on both server and the client side. In our case, client was missing pagination, consistent contract definition and sort and filtering capabilities were inconsistent. One of the first things we did was to collaborate with the team to understand the benefits regarding performance and page navigation by looking into information being queried and paginated. Since pagination can be done on a driver level, this was our initial goal. There is a secondary option of slicing arrays within one document, but this was not preferred approach. Next problem was dealing with huge payload and frequent updates to API. Older browsers had difficulties parsing this content continuously. Situation was that we have huge and diverse user base with different technical capabilities. Payload delivery needed to be carefully calculated to provide business value with already crafted GUI components, but also to keep in mind performance. We absolutely could not count on fact that the customers will always work with up to date computers and browsers and that they will all have high speed network access. 

As I mentioned earlier, MongoDB 4 included transaction between collections so this significantly helped with our restructuring. We, of course, tried not to misuse this, as NoSQL philosophy is not to build relational collections. Reference data, cached data, configuration data all found its way to new collections and as they were separately accessed, so this was not the issue. Main data got separated into three main collections keeping in mind business flows and GUI design. Looking back at the end goal, I believe that it all worked out well. 

Last thing to do was to create indexes based on frequently accessed data elements, add few aggregations for different views and create several views to serve data for the future use.


Changes to API contracts


Changes to API contract were done to standardize API development and exposure of the data to the client, introduce paging, sorting and filtering in consistent way and to reorganize some of the APIs to better serve NoSQL data (all based on our changes to the collections). These changes were organized with client usage in mind. The question that we continuously asked ourselves was, how will the client get appropriate amount of data with good flexibility and least amount of interaction with the APIs. Network quality that is used varies through the world and the network is also quite huge. This all plays into performance enhancements that we were bringing to the solution. One of the main things was to restructure models used to have server side models separated from client side models. We also introduced abstraction layer in the business service layer to help with future changes.


Updating Angular side


Apart for the various performance changes, modifications for API changes were relatively straight forward (and time consuming). API was split into generic configuration and task engine, and paged data was added where we used native MongoDB paging functionality. We also added sorting and filtering as opposed to SQL string store. MongoDB was able to process this without any performance hits. Instead of one API call to back-end service we chained  two or more services, usually one un-paged for header data and paged services for business data. This worked well and gave us much smaller objects to deal with and also improved performance by a long shot. Ultimately, changes to Angular that were required were as listed (this included performance updates and MongoDB updates for the API):

  • Introduction of modular children routes
  • Introduction of lazy loading modules
  • Introduction of PSF functionality based on native database support to execute such queries
  • Reduction of the payload size by remodeling data as a cache and introducing concept that we should process and load only data that user can see + 1 page of buffered data
  • Moving cross field validation logic to the API since only validation logic on the client side should be applicable to fields that are visible to the user.

The end result of all of these operations was vastly improved performance, simplified development, maintenance and solid platform for future operations.


Friday, April 05, 2019

Deploying Angular 6 + Spring Boot + Postgres to AWS

You have completed your Angular application and now you are looking for some deployment options... Among many in existence today, you can find choices like AWS, GCP, Azure, PCF, etc. and few other in the cloud. In this post, I will explain what I needed to do to deploy my services to AWS and keep the cost low (or non existent with AWS free tier). There is always an option to get full 'by the book' services and pay for those, but it is better to understand what your options are when deploying the application and then understand how your revenue is going to go against your expenses. As it is, My Open Invoice application is now designed to operate for a single contractor. To add another company a new URL and setup is needed. This can, of course, be upgraded with a few tweaks in the data model and the application itself. We can leave that to some other blog, though. For now, let me introduce you to my simple architecture:
  1. Front end of the application is Angular 6 built on Google Material design capable of rendering on both desktop and mobile devices
  2. Middle tier is Spring Boot 2
  3. Database is Postgres (H2 as test database)
  4. Application has cache that is currently used only for RSS feed (EH Cache)
  5. Authentication and authorization are done through JWT with Bearer token.
Amazon offers several options for hosting your web application, like EC2 instance and S3 bucket. There is then CloudFront which is used to cache your content (dist folder) and front is with HTTPS access. S3 has durability and redundancy, Route53 has DNS covered and RDS has your database. Elastic Beanstalk is used for EC2 instance generation, auto scaling and Load Balancing setup. CloudWatch is used for log tracing and then there are several other options that you can turn on for multiple instances, load balancing, reporting, caching etc. 
My goal here is to create something that won't drain your money away but still give you a decent application with the ability to have backups. I will also mention the option that would be needed for more robust solution.

This is my current setup:


Let us start by explaining components that I have used:

  • Angular 6
    • This is a client facing application that is using a template that is both desktop and mobile app friendly. This app is compiled into /dist directory with production setting and code optimization, and uploaded to S3 bucket.
  • S3 Bucket
    • This is standard AWS S3 bucket in one zone. You can create zone replication but Amazon already guarantees 99.999999999% durability on single zone in case you worry about your data. You could even go to S3-IA for lower cost, but storage is very low for hosting one small web app anyway. This is going to host two items for us:
      • Angular 6 code (you can keep this private with access only to CloudFront)
      • Beanstalk repo for WAR files 
  • CloudFront
    • This service is our fronting service to enable routing to /index.html, content distribution closest to your customer location and HTTPS certificate termination and routing for our Angular app. There are few important items here worth mentioning:
      • you need 400 and 404 (at least) routing enabled to /index.html 200 HTTP code or your app will not work properly
      • you can provide your certificate here if you have your own domain registered (you can use Amazon ACM to generate and provide certificate here)
  • CloudWatch
    • This service is enabled by default and will track and report your usage for your AWS components. Max resolution is 1 minute. Depending on configuration and logging amount different charges may apply. Modest logging should remain relatively cheap or non existent. Here is the pricing.
  • Elastic Beanstalk
    • This is your PaaS. You will use this as an entry point to create your configuration, environment, VPC, load balancing, auto scaling groups, RDS, etc. This can be done in an easy and convenient way. You can save hours of work by deploying this way. There are a few important items to consider:
      • I have created my database separately from this config. There is a difference if you want to use Tomcat (e.g.) to manage your DB connection or your application. There is also more flexibility to configure RDS individually. 
      • I am using Nginx as a reverse proxy and Tomcat 8 to serve content (WAR). One important item is that since I am using Spring Boot application, you need to pass in -D style properties to overwrite Spring ones and NOT environment ${} variables. Took me good 1 hour to figure what 'environment' means in Beanstalk.
      • I did not turn on load balancing as this costs money even in free tier. You can alternatively load balance with Route53, but you will need to connect directly to EC2 instances and this limits auto-scaling option.
      • If you want to increase instance numbers, I did not find an option to change auto-scale Beanstalk other than enable LB or Time-based Scaling. But, you can also log into auto-scaling directly and increase Max instances to your desired number and also configure trigger for activation. This, however, will not help a lot as you would need load balancer to connect to these. The only other option would be to have ElasicIP address on EC2 instances and DNS balance on those, but I honestly did not try this.
      • When deploying WAR file, you need to create custom 443 port nginx.conf in addition with uploaded certificates (I got mine from this SSL site for free). You will need .ebextensions in WAR file with all configurations and certificates. EC2 config is rebuilt on every restart so you will lose 443 port if you do not have this enabled. This is ONLY needed if you do not have LB. Otherwise, LB will take care of 443 port for you (after you configure it).
      • You need to open 443 port to your EC2 security group (one will be created by EBS if you already do not have one). This needs to be accessible from 0.0.0.0/0. Your Angular app will connect directly to servers using this rule.
    • RDS
      • My choice of RDS is Postgres for prod and H2 for dev. On the side note, I was amazed how fast and compatible H2 is with SQL standards in terms of functions (I notice this every time :)). Postgres was closest for some custom queries for capability that I needed (compared with e.g. MySQL). RDS was created in only 1 zone with minimal sized instance, and security group was opened from EC2 to this group by name for desired port. RDS access was limited only to EC2 and if needed access can be done directly for DB management through port forwarding.
    • Route 53
      • I have registered domain through Amazon and AWS created a Hosted Zone for me. Inside, I have created ALIAS record  pointing to Beanstalk for both www., api. and naked domain name. If you choose option to point directly to EC2, you can do that, and you can always choose load balancing using DNS (though this is not primary balancing method on AWS).
    • Auto Scaling Group
      • This is created by Beanstalk and in general, you have two options. One is for the single instance and one for the load balanced. Again, load balanced instance will cost you so use only if you need it.
This is just one way of setting up you environment and for a small application it works for me. I have all the backups so I am not too worried about my downtime if any.

As suggested, there are a few items to change if you need more robust environment. Couple of them would be:
  1. Enable LB on Elastic Beanstalk and have your system balance across at least 2 zones.
  2. Have at least 2 zone deployments, but depends on your clientele, hosting in each major zone would be beneficial (and then use Geolocation/proximity to optimize content delivery)
  3. Right now, I have everything deployed in one VPC, but depending on your needs and layout, you may want more than one and then deciding which would be private and public and where to use Gateways, and how to connect VPCs.
  4. API Gateway is always and option for more complex environments where HTTPS also can terminate (with your certificate). This adds a layer of abstraction if you are using microservices from multiple points and with various infrastructure.
  5. RDS needs to be deployed on Multi-AZ with read replicas enabled. Backup is already enabled even for 1 instance. A possibility exists to use NoSQL database that is automatically deployed into Multi-AZ (like DynamoDB).
There are many ways things can be configured on Amazon, so it is worth a while to investigate and try it out. It is not expensive to test configurations to figure out what is the right one for you, but it may be expensive to try to correct everything down the road once you realize that it was not set up properly. Amazon offers option to pay for only what you use, so why not try it?

So far, my costs were for registering my domain and a small Route53 charge ($0.50 per hosted zone).

If you have a better setup in mind, please let me know. I am always trying to learn new and better ways of optimizing data, infrastructure and cost.