Monday, September 7, 2020

Cloud Computing

 

Cloud Computing


What is cloud computing ?

  • It may sound complicated and foreign
  • But it’s really just many technologies coming together
  • It allows for the outsourcing or borrowing of computers
  • Computations and applications can be done on computers that are not physically ours or located near us

Origins

  • The Internet was often drawn as a “nebulous cloud”
  • Its primary function was getting information from one location to another
  • On the Internet, getting something up and running is relatively easy
    • There are many languages and databases to choose from
  • The problems arise when we’re successful
    • With millions of customers, we have several logistical issues that come up
      • Our computers only have so much RAM, so much storage, and so many CPUs
      • With too many customers, or users, or requests, our technology can very quickly be overwhelmed

Cloud Solutions

  • Eventually we reach that upper limit, where too many users are trying to use our product
  • These users maybe experience error messages, or no working product at all
  • What if we added a second server, with the exact same technology?
  • Now, we probably don’t want to arbitrarily change our URL, a la www1.example.com or www13.example.com
  • What if we put a server in between our two new servers, and our URL brought users there
  • Sometimes, this middleman gives users the IP address of our original server, and sometimes it gives those users the IP address of our second server
  • We might call this middleman, which balances the traffic or load on our servers, a load-balancer



Caching

  • IP addresses don’t change that often, though they can change
  • If we remember, or cache an IP address for one of the servers behind our load-balancer, what happens if that server goes down or incurs downtime?
  • Maybe we don’t give our two servers unique IP addresses
  • Instead, our domain name leads users to the load-balancer
  • And somehow, the load-balancer routes traffic
  • If DNS returns the IP address of the load-balancer, then we don’t have this problem of having users unable to access a given server
  • To make sure that the load-balancer is keeping track of which servers are online, we can use something called a heartbeat
  • This just means having servers behind the load-balancer “check in” periodically with the load-balancer

SPOF 

  • Now, our network can tolerate server one or two going down, but not the load-balancer
  • We assume that our load-balancer can handle at least twice as much traffic as an individual one of our servers
  • However, it can still be overwhelmed, and what happens if it is?
  • Previously, we added hardware, maybe we could add another load-balancer
  • Now, we have two load-balancers and two servers
  • We could configure the load balancers to send heartbeats to each other
  • If one of the load-balancers is down, then the other one takes on the responsibility of traffic

Costs

  • This hardware has both a financial and logistical cost
  • There are now wires that need to be connected, and housing needs to be bought for the servers
  • Maybe we don’t have enough physical space and cooling
  • MBTF - Mean Time Between Failure can very quickly drop
  • Maintaining this can quickly become quite difficult

IAaS - Infrastructure As a Service

  • This is one of the appeals of Cloud Computing - we can offload all of these logistical costs to the cloud
    • Amazon, Google, Microsoft, etc
  • There is software that mimicks the behavior each of the things we’ve seen so far - load-balancers and servers - and more
  • Now, we’re able to buy the infrastructure we need, but not worry about make and model of server, cooling, wiring, and other logistical costs
  • Another added benefit is auto-scaling
  • If our company is more successful than anticipated - maybe it’s the holiday season - we need more technology
  • IAaS can provide for us the ability to scale up our hardware - maybe by turning on more servers
  • This can happen passively, without us knowing
  • The reverse is also true - the service automatically scales down when not being used
  • Over the years, we’ve been able to pack more computability into the same space (a la Moore’s Law)
  • Humans, however, have been checking their email at around the same speed
  • This means, that we can put multiple simulated machines on the same hardware

Virtual Machines

  • Virtual machines are software that allow us to simulate running several computers on the same hardware
  • This allows us to oversell to customers, where we only have so much memory on our servers, but we assume some customers will not have very successful businesses
  • This means that those customers will not need as many computer or CPU cycles
  • For the customer though, this means that your website can be hosted on the same machine as another customer whose business is hugely successful
  • This translates to slower runtimes and services for reasons outside of your control
  • Virtualization looks very much like splitting your computer into several smaller, virtual, computers
  • Virtualization can be inefficient though - what if all customers are using the same guest OS or virtual operating system as the host OS, or operating system actually installed on the server?
  • This is what has given rise to containerization, like in Docker
  • Containerization allows us to use the host OS, sharing operating system resources, while still separating the other virtual computers
  • While there is still some overhead - we need to have some enginer, Docker or Hypervisor, but there aren’t copies of the same operating system
  • This provides the same separation, but with less watse of resources

PAaS, SAaS - Platform as, software as services

  • Even if you don’t know it, you’ve likely already used SAaS, as with Gmail, or Outlook.com
  • We don’t need to worry about where our emails are, or how they get from place to place
  • IAaS is where you still need to understand network topology, and how to put things together
  • Between these two, is Platform As a Service, like Heroku
    • Often, these services run on Amazon, Google, or Microsoft infrastructure, but provide a sort of middle ground
    • This middle layer of abstraction makes it easier for us to run applications
    • All we have to deal with is - host this as a server, without concerning me with how it communicates, scales, or load balancers
    • The downside of course is not necessarily knowing how to solve a given problem, if that problem occurs within the platform provider

Databases

  • Beyond the application, we likely also need to store data somewhere
  • Now, it’s very common for a web server to be very specialized - a server to send emails, a server to respond to HTTP requests, etc
  • Our infrastructure likely has at least one database
  • Databases are generally drawn as cylinders, and are connected to all of our servers
  • All of our servers can then save and read all their data from the same place
  • Vertical scaling - pay more money for bigger and better hardware
  • Even the best database server can only handle so many connections, reads, and writes at any one time
  • Maybe we can go back to the engineering solution - scaling horizontally
  • If we add another database - we probably can’t just use a load balancer, because then some data will end up in one database and not in the other
  • We could also shard the databases, where we put similar users in databases that only contain those users
  • This doesn’t help us backup our data though
  • We could go back to something like our first model, with one large master database
  • In something called replication, we could add smaller or equivalent databases that can only read, from that main database
  • In another step up, maybe we have two master databases, and even if data is only written to one of them, they communicate with each other to make sure the data gets replicated

Summary

  • As difficult or foreign as these things can sometimes sound, we’re really just layering many layers on top of increaingly complicated ideas
  • We started with binary and ASCII, and have gotten all the way to cloud computing, building layer by layer
  • At each step, the differences are really on just abstracting away common problems to make our code as accessible and efficient as possible
  • With all of this in mind, the idea is that we can use each of these layers and puzzle pieces to build better solutions to the challenges that arise in building our own site

Technology Stacks

Technology Stacks




Front End

  • We’ve talked about HTML, CSS, and JavaScript
  • These allow us to build some webpages
  • However, what if there’s a common task people want to code?
  • We have a way to distribute that to other with something called libraries and frameworks
    • Some common frameworks for JavaScript are Angular, Ember, and React
  • The different choices between frameworks are often based on varying factors:
    • How familiar are we with a given framework?
    • What parts of the framework’s stylistic conventions do we like
    • How does the framework mesh with our project?
    • Do we have a high cost to learning that might slow down our work?
    • How well-designed is the framework?
      • Are the optimizations for having more customers in the future?
    • The choices and tradeoffs are almost endless, but it’s dynamic
    • In fact, we do not necessarily need to use the most popular framework in order to minimize regret

Back End

  • Here too are many choices - Java, Go, .NET, PHP, Python, Ruby, Scala, Django, Flask, Laravel, Node.js, Rails
  • Within these choices are several frameworks and libraries
  • These exist to make common tasks or items accessible
    • For example, maybe there’s a common functionality (authenticating users) or an asset (a menu design) that people like
    • These libraries and frameworks can make this sort of thing easily accessible across coders and engineers
  • The choices here can be informed by what engineers know, what the technology being used supports, and so on
  • Some of these frameworks and languages were designed with specific problems in mind
  • Many of these options are evolving rapidly, and so different versions within a choice have varying tradeoffs
  • And so many varying reasons for use
  • Some of the “right questions” that can come up when deciding a framework or language to use involve:
    • What are people able to learn - will new hires need to be experts on an obscure language to work here?
    • Do the current engineers have extensive background in the technology?
    • Does the technology solve a current problem and leave room to solve future ones as they arise?
    • And these are just a few of the reasons to consider when looking at a language or framework to use

  • Databases
  • These store data from users
  • Again, many choices - SQL, MariaDB, MongoDB, NoSQL, HBase
  • And again, many of these choices are informed by:
    • Past experience
    • Financial cost
    • Infrastructure support


  • SQL and NoSQL - structured Query language

  • CRUD
    • Create
    • Read
    • Update
    • Delete
  • In SQL, there are four main commands or keywords that line up well with this acronym
    • Create
    • Select
    • Update
    • Delete
  • In both cases, these are all operations done on and with data
  • Specifically, user data
  • What is a SQL (or more generally, relational) database?
    • Generally, a piece of software that allows you to lay out data, with relationships between the data
    • Think back to the last time you used a table of data, maybe in a context like Excel
      • Data is not thrown around at random
      • Instead, we have rows and columns, and even different sheets of data to indicate varying meanings (dates, names, emails, etc)
  • A benefit to using software like SQL, the operations on data are meant to be quick
  • We don’t want to search a database by “brute force” or searching through the whole thing
  • We, being humans, can help the databases store data properly
    • For example, we often know what kind or type of data will be going into a given database
      • Integers, characters (char), etc
      • Some of the specific data types that are commonly used are:
        • CHAR, VARCHAR
        • SMALLINT, INTEGER, BIGINT
        • FLOAT, REAL, DOUBLE PRECISION, DECIMAL
        • DATE, TIME, TIMESTAMP
        • etc.
  • The more specific we can be when creating a database, the more helpful it can be in storing and retrieving data
  • It might take us some time to think through how we’re going to be storing thing
  • But this cost will save us time that will continue to save us time over possibly millions of entries
  • If we over-allocate space for things, then we’re being wasteful, so it’s necessary to know or anticipate how big our data will be
  • A real problem in computer languages is how to represent floating-point numbers, or numbers that aren’t quite integers
  • We only have a physical amount of space, but an infinite number of numbers
  • We do have the ability to be more precise, with different datatypes, but at the cost of potentially space
  • Within a relational database, we have other options too:
    • PRIMARY KEY, FOREIGN KEY, UNIQUE
    • If we know that we will have a unique identifier for each row, we can call that a PRIMARY KEY, and make it easier to find
    • If we don’t want any data to be duplicated, we could tell the database to keep a certain COLUMN UNIQUE - emails, phone numbers, usernames, etc
  • No matter how we get the data, as by HTML form, it will likely end up stored in a datab


  • Storing Information

  • Imagine someone, Zamyla, bought a single widget for $9.99
  • And another customer, Robert Bowden, bought 2 widgets at the same address, for $19.98
  • Maybe we should have columns - for Product, Quantity, Name, Street, City, State, Zip Code, Country, and Total
  • What data type should each of these columns have?
    • Product -> If we say CHAR(8), this cannot contain any product of more than 8 characters
      • If we use CHAR(16), then we’re wasting a good number of spaces
      • If we use VARCHAR(16), then it makes it more difficult to search through this column
      • When we’re talking about millions of entries, this will add up
    • Quantity - likely going to be INTEGER
    • Name - We aren’t sure how big someone’s name will be, but maybe VARCHAR(128)
    • And same for City and Street
    • For State, at least in the US, we know that we can optimize this via the two letter code, CHAR(2)
    • Zip Code may beg for some design decision making - do we want to use an INTEGER, a CHAR?
    • For country, we have some design choices to take into account, depending on who we want to sell to, and how their country codes work
      • Likely some form of CHAR
    • For Total, we don’t want to be accidentally creating or losing pennies with transactions, so maybe we want the accuracy of DECIMAL
  • As we get more and more transactions, and the same people order multiple times, we’ll end up with many Robert and Zamyla entries
  • Now we could create a new table for customers, Customers
    • Within this table, we could store a customer ID, their name, and their address
    • Now, storing purchases in our first table is easier
      • We only need to store the ID of the customer and their corresponding purchase
  • We can go even further and create a table for storing information about each of our products
    • This way each product has an ID, along with all of the information stored in it
  • Our original table for purchases is now what we call normalized, or made more optimal for being used as a database
  • In our Orders table, the customer and product ID’s are foreign keys
    • They uniquely identify customers and products in their respective tables


  • NoSQL
  • Our previous work was pretty hefty - a lot of thinking and time
  • That work will pay off over time, but there is another paradigm for databases
  • This idea has objects and doesn’t use tables, columns, and rows
  • A popular NoSQL database is MongoDB - which stores things as key-value pairs
  • Often, the data objects are stored such that within a purchase object there is a customer object, and inside that is probably information about that customer


  • Mobile
  • When designing for mobile devices, Android generally uses Java, and iPhones generally use Objective-C or Swift
  • There is plenty of room to make design decisions here though
    • For example, you could design an application to use an embedded web browser and design the app with JavaScript and HTML and CSS
  • If we want to develop in an intermediate language, like JavaScript and then convert to Java or Swift, there’s a cost
  • Maybe we want to only target one customer base, and so only develop in Swift or Java (Apple or Android)
  • But if we want to develop for both, there’s a cost
    • We either need a developer who is comfortable in both languages, or two developers
    • In both cases there’s an increased cost: financially, talent-wise, and so on

Summary

  • These technology stacks are really just lists of options
  • When navigating this world, it’s important to have discussions about the varying tradeoffs to using any one technology
  • From there, we can make informed decisions about how we want to move forward

Cloud Computing

  Cloud Computing What is cloud computing ? It may sound complicated and foreign But it’s really just many technologies coming together It a...