Architecture of Information Products – Explained for non-tech folks
In this post, we are going to spend time describing the basic architecture for information products. Also, this article is specifically designed for non-technical folks to understand overall details. If you are a tech guy, obviously you are going to find caveats depending on various scenarios. Here we are trying to look at the big picture. With this disclaimer aside, let’s get started.
What is an information product?
Most of the applications on the internet are information products including Uber, Google, On-demand delivery, etc. How come they are information products? Don’t they deal with a lot of physical infrastructures? Yes, they might. But the primary value of the apps is some sort of information organization or information arbitrage. Cabs were there before. You could hail a cab earlier as well. Uber made it possible for you to find a driver who has no intention of driving by your way but is still less than half a kilometer away. That is purely an information problem. The driver did not know you will want a car and you are close by. You did not know there was a driver half a kilometer away.
There is software written to solve other problems. But, if you look at most of the internet companies, they are solving information problems. Clarity of information and making information that was otherwise not available makes a difference to the customer.
In that sense, Freshdesk, Slack, Buffer, Twitter, Instagram, Uber – name any internet company are mostly information products. They juggle with and manipulates large volumes of information and make it meaningful enough for you.
Most information products are the best build as CRUD apps. CRUD stands for Create, Read, Update, Delete. CRUD is a way of building applications. It’s also a way of thinking. Think of it as a framework or an approach to doing things. What CRUD says is to identify each object in the set of information that you are trying to capture and work with them separately. Each object can be created, read, updated or deleted.
If you don’t think in the CRUD way, you are likely to create an endpoint that creates an object and deletes another object at the same time. Theoretically, there is nothing wrong with this.. but it becomes hard for you as the developer or product owner to remember what endpoints do what. It also becomes a maintenance nightmare. Change management becomes an issue. CRUD is like LEGO – building blocks. You want to delete object type A and create object type B, call the corresponding endpoints with the right set of data in parallel or series depending on your business logic.
A simple – 3 tier architecture
Having established that most internet companies are information products and most information products are the best build as CRUD and remain that way for most of their life, let’s look at what’s a good architecture for these.
Simple brief of each of these tiers:
There are different kinds of data. There is analytics data and application data. This data layer that I am referring to is just the application data. Analytics data is a different layer altogether(ref to
other layers in the architecture section). Application data can be state information(this is the current properties of something) or log information(something happened). Application data is mostly state information(roughly 90% state and say 10% log info.. this can vary by type of the application).
Over here you get to pick your database engine- you can pick a Relational database or a NoSQL database. If most of your data is state information, you are better picking a relational database. There are some cases where a NoSQL database might fit better, but if you are looking for a default answer, you should pick an RDBMS(Relational database). I usually pick Postgresql.
This is where most of your application logic resides. Some logic can reside in your presentation layer as well but the most important logic stays here. Logic associated with who has permission to see what data always remains here. This is typically your Nodejs server, Ruby on rails server, python django server, etc.
Most of the time your presentation layer is visible in the browser. In a web application, this is html, css, etc .. in a single page app, this is the html, react/vue and css, etc. In mobile apps, this is your mobile app. You can write client-side logic that executes inside the browser or the app.
Some possible design choices:
Single page apps:
Mobile and web solution:
Other layers in your architecture:
You will need to know what exactly is happening with your server. This is so that you can identify anomalies in your application code and also to understand how your customer behaves. Server logs on a crud application can give you detailed insights on what is going on. All of your application servers will write to a file on the location machine. The logging infrastructure will figure out how to pipe it to different locations for storage and analytics purpose.
We typically use a combination of elastic search, s3, and redshift for analyzing logs. Having multiple log storage also function as redundancies.
Once you have data either in-state form in your micro-services or as log data from your server logs, you will want to dissect it, view it, process it, transform it, etc. There are a bunch of analytics tools out there in the market. At small scales, I have had decent success working with just metabase, grafana, kibana. As the volume and nature of your data changes, you might find different tools best suited for your requirements. At early stages, metabase, grafana & kibana will do you just fine.
Microservice architecture vs Monolithic architecture
Microservices are all the rage these days.. been that way for a while now. Companies like Netflix have driven the concept of microservices forward where they found it easier for multiple developers to work on improving the system without stepping on someone else’s toes. Also, microservices provides the flexibility to use the best stack for the problem to be solved. It also gives the team the ability to scale one part of the system without having to scale other parts of the system.
Too much of anything can be bad for you. The same applies to microservices. If you don’t have enough scale you could end up spreading yourself too thin by having too many microservices. Also, microservices is born out of a performance optimization problem. In the early stages, when you are working with a small team, and have a lot of business uncertainty (you are not sure what features the customer wants, etc) you are better off leaning towards a monolithic architecture. When you scale up, you can slowly start splitting out. Also, you can split out when you have clearly distinguishable business functions.
Should you use this vs that?
As an engineer, these questions are valid, but usually, as a business owner, these questions should not be worrying you too much. You just need to think through what is the fundamental requirement for these tools that the subtle differences.
Comparing it to woodworking tools, it’s like knowing the purpose of a drill vs a jigsaw. And then knowing the difference between an impact drill and a regular drill. Drill and jigsaw do 2 different things… – drilling and cutting. You can’t use a jigsaw as a drilling tool and vice versa. If you have to drill and cut, you will need both of them. Impact drill and regular drill are meant for drilling, but different kinds of drilling… Impact drill performs better on walls and concrete surfaces… That said, to some extent you exchange the drills and you will be fine.
Should you have 5 kinds of drills(battery operated, impact, regular etc) and 5 kinds of cutting tools (Jigsaw, band saw, woodcutter, table saw, etc)? Of course, if you have that volume of work to do, you will find the compromise of using one tool for everything eating into you. Then you will like to do certain kinds of cutting using a band saw and certain other kinds of cutting using a table saw or jigsaw.
The same logic works in Software tools as well. To start with, you will be fine if you have one database tool and one application server that you are familiar with. Later, you will find that you want to have a specialized db for a particular kind of data storage. By all means, go get it. But, feel the pain before you buy the pain killer.
You are better off using a tool that you are most familiar with.
There is always an argument about the absolute best tool for a particular purpose. But, usually for an individual or a team, the best tool is something that they are very familiar with.
The tradeoff between optionality vs effective resource utilization
Effective resource utilization is where the company spends effort to do performance optimization to squeeze out the most juice from the lemon. This is often the most logical thing to do. You will hire a bunch of engineers with a specific skill set to optimize things further. One thing that you need to be conscious about is the more optimization you do the more you lose optionality or the flexibility to move. Flexibility to move is what you want when the business environment shifts. It sure will. The largest companies 50 years ago are not the biggest companies today.
Obviously, you have to optimize, but be aware of your loss in flexibility.