Monday, September 7, 2020

Cloud Computing

What is cloud computing ?

It may sound complicated and foreign
But it’s really just many technologies coming together
It allows for the outsourcing or borrowing of computers
Computations and applications can be done on computers that are not physically ours or located near us

Origins

The Internet was often drawn as a “nebulous cloud”
Its primary function was getting information from one location to another
On the Internet, getting something up and running is relatively easy
- There are many languages and databases to choose from
The problems arise when we’re successful
- With millions of customers, we have several logistical issues that come up
  - Our computers only have so much RAM, so much storage, and so many CPUs
  - With too many customers, or users, or requests, our technology can very quickly be overwhelmed

Cloud Solutions

Eventually we reach that upper limit, where too many users are trying to use our product
These users maybe experience error messages, or no working product at all
What if we added a second server, with the exact same technology?
Now, we probably don’t want to arbitrarily change our URL, a la www1.example.com or www13.example.com
What if we put a server in between our two new servers, and our URL brought users there
Sometimes, this middleman gives users the IP address of our original server, and sometimes it gives those users the IP address of our second server
We might call this middleman, which balances the traffic or load on our servers, a load-balancer

Caching

IP addresses don’t change that often, though they can change
If we remember, or cache an IP address for one of the servers behind our load-balancer, what happens if that server goes down or incurs downtime?
Maybe we don’t give our two servers unique IP addresses
Instead, our domain name leads users to the load-balancer
And somehow, the load-balancer routes traffic
If DNS returns the IP address of the load-balancer, then we don’t have this problem of having users unable to access a given server
To make sure that the load-balancer is keeping track of which servers are online, we can use something called a heartbeat
This just means having servers behind the load-balancer “check in” periodically with the load-balancer

SPOF

Now, our network can tolerate server one or two going down, but not the load-balancer
We assume that our load-balancer can handle at least twice as much traffic as an individual one of our servers
However, it can still be overwhelmed, and what happens if it is?
Previously, we added hardware, maybe we could add another load-balancer
Now, we have two load-balancers and two servers
We could configure the load balancers to send heartbeats to each other
If one of the load-balancers is down, then the other one takes on the responsibility of traffic

Costs

This hardware has both a financial and logistical cost
There are now wires that need to be connected, and housing needs to be bought for the servers
Maybe we don’t have enough physical space and cooling
MBTF - Mean Time Between Failure can very quickly drop
Maintaining this can quickly become quite difficult

IAaS - Infrastructure As a Service

This is one of the appeals of Cloud Computing - we can offload all of these logistical costs to the cloud
- Amazon, Google, Microsoft, etc
There is software that mimicks the behavior each of the things we’ve seen so far - load-balancers and servers - and more
Now, we’re able to buy the infrastructure we need, but not worry about make and model of server, cooling, wiring, and other logistical costs
Another added benefit is auto-scaling
If our company is more successful than anticipated - maybe it’s the holiday season - we need more technology
IAaS can provide for us the ability to scale up our hardware - maybe by turning on more servers
This can happen passively, without us knowing
The reverse is also true - the service automatically scales down when not being used
Over the years, we’ve been able to pack more computability into the same space (a la Moore’s Law)
Humans, however, have been checking their email at around the same speed
This means, that we can put multiple simulated machines on the same hardware

Virtual Machines

Virtual machines are software that allow us to simulate running several computers on the same hardware
This allows us to oversell to customers, where we only have so much memory on our servers, but we assume some customers will not have very successful businesses
This means that those customers will not need as many computer or CPU cycles
For the customer though, this means that your website can be hosted on the same machine as another customer whose business is hugely successful
This translates to slower runtimes and services for reasons outside of your control
Virtualization looks very much like splitting your computer into several smaller, virtual, computers
Virtualization can be inefficient though - what if all customers are using the same guest OS or virtual operating system as the host OS, or operating system actually installed on the server?
This is what has given rise to containerization, like in Docker
Containerization allows us to use the host OS, sharing operating system resources, while still separating the other virtual computers
While there is still some overhead - we need to have some enginer, Docker or Hypervisor, but there aren’t copies of the same operating system
This provides the same separation, but with less watse of resources

PAaS, SAaS - Platform as, software as services

Even if you don’t know it, you’ve likely already used SAaS, as with Gmail, or Outlook.com
We don’t need to worry about where our emails are, or how they get from place to place
IAaS is where you still need to understand network topology, and how to put things together
Between these two, is Platform As a Service, like Heroku
- Often, these services run on Amazon, Google, or Microsoft infrastructure, but provide a sort of middle ground
- This middle layer of abstraction makes it easier for us to run applications
- All we have to deal with is - host this as a server, without concerning me with how it communicates, scales, or load balancers
- The downside of course is not necessarily knowing how to solve a given problem, if that problem occurs within the platform provider

Databases

Beyond the application, we likely also need to store data somewhere
Now, it’s very common for a web server to be very specialized - a server to send emails, a server to respond to HTTP requests, etc
Our infrastructure likely has at least one database
Databases are generally drawn as cylinders, and are connected to all of our servers
All of our servers can then save and read all their data from the same place
Vertical scaling - pay more money for bigger and better hardware
Even the best database server can only handle so many connections, reads, and writes at any one time
Maybe we can go back to the engineering solution - scaling horizontally
If we add another database - we probably can’t just use a load balancer, because then some data will end up in one database and not in the other
We could also shard the databases, where we put similar users in databases that only contain those users
This doesn’t help us backup our data though
We could go back to something like our first model, with one large master database
In something called replication, we could add smaller or equivalent databases that can only read, from that main database
In another step up, maybe we have two master databases, and even if data is only written to one of them, they communicate with each other to make sure the data gets replicated

Summary

As difficult or foreign as these things can sometimes sound, we’re really just layering many layers on top of increaingly complicated ideas
We started with binary and ASCII, and have gotten all the way to cloud computing, building layer by layer
At each step, the differences are really on just abstracting away common problems to make our code as accessible and efficient as possible
With all of this in mind, the idea is that we can use each of these layers and puzzle pieces to build better solutions to the challenges that arise in building our own site

Technology Stacks

Front End

We’ve talked about HTML, CSS, and JavaScript
These allow us to build some webpages
However, what if there’s a common task people want to code?
We have a way to distribute that to other with something called libraries and frameworks
- Some common frameworks for JavaScript are Angular, Ember, and React
The different choices between frameworks are often based on varying factors:
- How familiar are we with a given framework?
- What parts of the framework’s stylistic conventions do we like
- How does the framework mesh with our project?
- Do we have a high cost to learning that might slow down our work?
- How well-designed is the framework?
  - Are the optimizations for having more customers in the future?
- The choices and tradeoffs are almost endless, but it’s dynamic
- In fact, we do not necessarily need to use the most popular framework in order to minimize regret

Back End

Here too are many choices - Java, Go, .NET, PHP, Python, Ruby, Scala, Django, Flask, Laravel, Node.js, Rails
Within these choices are several frameworks and libraries
These exist to make common tasks or items accessible
- For example, maybe there’s a common functionality (authenticating users) or an asset (a menu design) that people like
- These libraries and frameworks can make this sort of thing easily accessible across coders and engineers
The choices here can be informed by what engineers know, what the technology being used supports, and so on
Some of these frameworks and languages were designed with specific problems in mind
Many of these options are evolving rapidly, and so different versions within a choice have varying tradeoffs
And so many varying reasons for use
Some of the “right questions” that can come up when deciding a framework or language to use involve:
- What are people able to learn - will new hires need to be experts on an obscure language to work here?
- Do the current engineers have extensive background in the technology?
- Does the technology solve a current problem and leave room to solve future ones as they arise?
- And these are just a few of the reasons to consider when looking at a language or framework to use

Databases
These store data from users
Again, many choices - SQL, MariaDB, MongoDB, NoSQL, HBase
And again, many of these choices are informed by:
- Past experience
- Financial cost
- Infrastructure support

SQL and NoSQL - structured Query language

CRUD
- Create
- Read
- Update
- Delete
In SQL, there are four main commands or keywords that line up well with this acronym
- Create
- Select
- Update
- Delete
In both cases, these are all operations done on and with data
Specifically, user data
What is a SQL (or more generally, relational) database?
- Generally, a piece of software that allows you to lay out data, with relationships between the data
- Think back to the last time you used a table of data, maybe in a context like Excel
  - Data is not thrown around at random
  - Instead, we have rows and columns, and even different sheets of data to indicate varying meanings (dates, names, emails, etc)
A benefit to using software like SQL, the operations on data are meant to be quick
We don’t want to search a database by “brute force” or searching through the whole thing
We, being humans, can help the databases store data properly
- For example, we often know what kind or type of data will be going into a given database
  - Integers, characters (char), etc
  - Some of the specific data types that are commonly used are:
    - CHAR, VARCHAR
    - SMALLINT, INTEGER, BIGINT
    - FLOAT, REAL, DOUBLE PRECISION, DECIMAL
    - DATE, TIME, TIMESTAMP
    - etc.
The more specific we can be when creating a database, the more helpful it can be in storing and retrieving data
It might take us some time to think through how we’re going to be storing thing
But this cost will save us time that will continue to save us time over possibly millions of entries
If we over-allocate space for things, then we’re being wasteful, so it’s necessary to know or anticipate how big our data will be
A real problem in computer languages is how to represent floating-point numbers, or numbers that aren’t quite integers
We only have a physical amount of space, but an infinite number of numbers
We do have the ability to be more precise, with different datatypes, but at the cost of potentially space
Within a relational database, we have other options too:
- PRIMARY KEY, FOREIGN KEY, UNIQUE
- If we know that we will have a unique identifier for each row, we can call that a PRIMARY KEY, and make it easier to find
- If we don’t want any data to be duplicated, we could tell the database to keep a certain COLUMN UNIQUE - emails, phone numbers, usernames, etc
No matter how we get the data, as by HTML form, it will likely end up stored in a datab

Storing Information

Imagine someone, Zamyla, bought a single widget for $9.99
And another customer, Robert Bowden, bought 2 widgets at the same address, for $19.98
Maybe we should have columns - for Product, Quantity, Name, Street, City, State, Zip Code, Country, and Total
What data type should each of these columns have?
- Product -> If we say CHAR(8), this cannot contain any product of more than 8 characters
  - If we use CHAR(16), then we’re wasting a good number of spaces
  - If we use VARCHAR(16), then it makes it more difficult to search through this column
  - When we’re talking about millions of entries, this will add up
- Quantity - likely going to be INTEGER
- Name - We aren’t sure how big someone’s name will be, but maybe VARCHAR(128)
- And same for City and Street
- For State, at least in the US, we know that we can optimize this via the two letter code, CHAR(2)
- Zip Code may beg for some design decision making - do we want to use an INTEGER, a CHAR?
- For country, we have some design choices to take into account, depending on who we want to sell to, and how their country codes work
  - Likely some form of CHAR
- For Total, we don’t want to be accidentally creating or losing pennies with transactions, so maybe we want the accuracy of DECIMAL
As we get more and more transactions, and the same people order multiple times, we’ll end up with many Robert and Zamyla entries
Now we could create a new table for customers, Customers
- Within this table, we could store a customer ID, their name, and their address
- Now, storing purchases in our first table is easier
  - We only need to store the ID of the customer and their corresponding purchase
We can go even further and create a table for storing information about each of our products
- This way each product has an ID, along with all of the information stored in it
Our original table for purchases is now what we call normalized, or made more optimal for being used as a database
In our Orders table, the customer and product ID’s are foreign keys
- They uniquely identify customers and products in their respective tables

NoSQL
Our previous work was pretty hefty - a lot of thinking and time
That work will pay off over time, but there is another paradigm for databases
This idea has objects and doesn’t use tables, columns, and rows
A popular NoSQL database is MongoDB - which stores things as key-value pairs
Often, the data objects are stored such that within a purchase object there is a customer object, and inside that is probably information about that customer

Mobile
When designing for mobile devices, Android generally uses Java, and iPhones generally use Objective-C or Swift
There is plenty of room to make design decisions here though
- For example, you could design an application to use an embedded web browser and design the app with JavaScript and HTML and CSS
If we want to develop in an intermediate language, like JavaScript and then convert to Java or Swift, there’s a cost
Maybe we want to only target one customer base, and so only develop in Swift or Java (Apple or Android)
But if we want to develop for both, there’s a cost
- We either need a developer who is comfortable in both languages, or two developers
- In both cases there’s an increased cost: financially, talent-wise, and so on

Summary

These technology stacks are really just lists of options
When navigating this world, it’s important to have discussions about the varying tradeoffs to using any one technology
From there, we can make informed decisions about how we want to move forward

Computer Science for Business Professionals

Monday, September 7, 2020

Cloud Computing

Cloud Computing

What is cloud computing ?

Origins

Cloud Solutions

Caching

SPOF

Costs

IAaS - Infrastructure As a Service

Virtual Machines

PAaS, SAaS - Platform as, software as services

Databases

Summary

Technology Stacks

Front End

Back End

SQL and NoSQL - structured Query language

Storing Information

Summary

Cloud Computing

Search This Blog