Difference between revisions of "CNM Farms"

From CNM Wiki
Jump to: navigation, search
(Mail server)
 
(91 intermediate revisions by 2 users not shown)
Line 1: Line 1:
[[CNM Servers]] (hereinafter, the ''Servers'') is the combination of [[computer server]]s that serve [[CNM Cloud]].
+
[[CNM Farms]] (formerly known as [[CNM Servers]], sometimes referred as [[CNM Platform]]; hereinafter, the ''Farms'') is The combination of [[computing server]]s and [[container engine]]s that host [[Opplet]], as well as those parts of [[CNM Lab Farm]] that are provisioned to run [[serverless]] systems. The ''Farms'' provide both [[Worldopp Middleware]] and [[CNM app]]s with functionality.
  
  
==Web servers==
+
==Application servers==
 +
Currently, every of four existing [[application server]]s that support [[Opplet]] is build on one droplet of [[DigitalOcean]]. Every of these ''Platform'' features 2 GB Memory, 50 GB Disk, and Ubuntu 18.04.2 x64.
 +
 
 +
All of those ''Platform'' are for compute. However, some need for one or more testing servers, tentatively called a ''fellow server'', is identified. Some control servers may also be considered.
 +
 
 +
===Campus Farm===
 +
:''Main wikipage: [[CNM Campus Farm]]''
 +
:The [[CNM Campus Farm]] supports...
 +
 
 +
===Fed Farm===
 +
:''Main wikipage: [[CNM Fed Farm]]''
 +
:The [[CNM Fed Farm]] supports one instance of [[WorldOpp Middleware]] ([[Opplet]]; shall be located at https://cabin.friendsofcnm.org).
  
==Application servers==
+
===Bureau Farm===
 +
:''Main wikipage: [[CNM Bureau Farm]]''
 +
 
 +
:The [[CNM Bureau Farm]] is a production server that shall support:
 +
:#One instance of [[Educaship Moodle]] ([[Moodle]]; currently, located at https://cert.friendsofcnm.org).
 +
:#One instance of [[CNM Mailware]] (currently, [[Roundcube]]; currently, not located at any [[URL]]).
 +
:#One instance of [[Educaship GitLab]] ([[Redmine]] and [[Apache Subversion|SVN]] linked to [[Bitbucket]]'s file storage; currently, located at https://lab.friendsofcnm.org).
 +
:#One instance of [[CNM Linkupware]] ([[SuiteCRM]]; currently, located at https://linkup.friendsofcnm.org).
 +
:#Two instances of [[Educaship MediaWiki]] setup for two languages ([[MediaWiki]]; currently, located at https://wiki.friendsofcnm.org).
 +
:#Several instances of [[Educaship WordPress]] ([[WordPress]]; currently, located at https://worldopp.org).
 +
:#One instance of [[Educaship AVideo]] ([[YouPHPTube]]; currently, located at https://tube.friendsofcnm.org).
 +
:#One instance of [[Educaship HumHub]] ([[HumHub]]; shall be located at https://social.friendsofcnm.org).
 +
 
 +
===Lab Farm===
 +
:''Main wikipage: [[CNM Lab Farm]]''
 +
 
 +
:The [[CNM Lab Farm]] is used for learning and testing. It shall support all the applications installed at the [[#EndUser Farm|EndUser Farm]] and, in addition, one [[Humhub]] instance.
  
 
==Support servers==
 
==Support servers==
Line 11: Line 38:
  
 
===File server===
 
===File server===
:No [[file server]] is currently deployed. [[CNM Labs]] utilizes [[Bitbucket]] to satisfy its file storage needs.
+
:No [[file server]] is currently deployed. [[CNM Lab]] utilizes [[Bitbucket]] to satisfy its file storage needs.
  
 
===Mail server===
 
===Mail server===
:Several [[CNM app]]s deploy [[sendmail]] as their [[mail server]]s.
+
:Several [[CNM app]]s currently deploy [[Sendmail]] as their [[mail server]]s.
 +
 
 +
===Web server===
 +
:All [[web server]]s of [[Opplet]] are currently built on [[Apache HTTP Server]]s.
 +
 
 +
==Development==
 +
The development of the ''Platform'' can be divided in two parts -- the historical endeavors and further projects.
 +
 
 +
===Historical endeavors===
 +
:''Main wikipage: [[CNM Cloud Project]]''
 +
:Historically, the [[WorldOpp Fellow Staff]] has undertaken the endeavors to develop the ''Platform'' under the [[CNM Cloud Project]] and, as of July of 2019, Romanof has completed the overwhelming majority of that work.
 +
 
 +
===Further projects===
 +
 
 +
:The promising cloud service model of [[Opplet]] that shall offer services of its ecosystem of servers and cloud OS, which is OpenStack. This infrastructure enables operations of [[CNM app]]s.
 +
 
 +
==See also==
 +
 
 +
===Related lectures===
 +
:*[[What CNM Farms Are]].
 +
 
 +
a draft for the promising [[CNM Farms]] that both:
 +
*Enable [[Opplet]] of the [[Opplet]]; consequently, [[Opplet]] enables the [[end-user application]]s of [[CNMCyber]]; and
 +
*Utilize a bundle of servers and [[OpenStack]] as the [[cloud operating system]].
 +
 
 +
 
 +
==Architecture==
 +
The ''Infrastructure'' supports cloud functions as follows:
 +
#Development and production. If more than one, this cloud servers shall be located in the same data center.
 +
#Backup
 +
#[[Testing]]
 +
#Demonstration.
 +
 
 +
Architectures of separate servers have not determined yet. If we can not think of anything better, we can take https://ru.hetzner.com/hosting/produkte_vserver/private-cloud
 +
 
 +
==Requirements -- first draft==
 +
The ''Infrastructure'':
 +
#Utilizes [[OpenStack]] as the [[cloud operating system]];
 +
#Utilizes [[OpenStack Keystone]] based on [[LDAP]] as [[identity provider]];
 +
#Shall enable both [[Opplet]] and [[CNM app]]s;
 +
#Most likely, will use [[hetzner.de]] dedicated servers at the beginning.
 +
 
 +
===Backup and recovery===
 +
::The backup work model and the ability to restore an operational state, including backups for all of its own data and development, are core requirements. Topics to be addressed are:
 +
::*'''Restarts''':
 +
::*#What parts and why shall be restarted;
 +
::*#How -- by request and automatically -- restarts shall be initiated
 +
::*#How often?
 +
::*#What are the dependencies of a restart?
 +
::*#What are the prerequisites of a restart?
 +
::*#Why are potential challenges or limitations faced when initiating a restart?
 +
::*#What is the plan if a restart fails?
 +
::*#What is the duration of a restart?
 +
::*#What is the outage?
 +
::*'''Outside docker-registries with the appropriate repositories''':
 +
::*#Where is the docker-registry located?
 +
::*#Why was the docker registry created?
 +
::*#Explain what is included in the repositories.
 +
::*#Why are the repositories necessary?
 +
::*'''Push launches''':
 +
::*#A push procedure shall be planned with a corresponding increase in the number of subversions.
 +
::*#What is a monthly frequency desirable?
 +
::*#Why is there a corresponding increase in the number of subversions per push?
 +
::*#What are the expected challenges faced (if any)?
 +
::*#What is done in the initial process to alleviate these challenges?
 +
::*#What are the unexpected challenges faced?
 +
::*#How are the unexpected challenges (if any) resolved?
 +
::*#What would you do differently if you know what the challenges are going to be?
 +
::*'''Tested recovery''':
 +
::*#Why is recovery testing necessary?
 +
::*#When is recovery tested?
 +
::*#How do you test recovery?
 +
::*#List the steps shall be taken when testing recovery.
 +
::*#List the steps to initiating the backup.
 +
::*#List expected or unexpected results.
 +
::*#How lessons learned will be managed?
 +
::*'''Backup'''
 +
::*#What and how to backup?
 +
::*#Where to store the backup data?
 +
::*#What is a frequency desired? Why is the backup frequency selected as optimal?
 +
::*#What are the steps to initiating a non-planned backup?
 +
 
 +
===Restarts===
 +
::#All the parts in the cluster shall be able to be restarted if the restart is initiated by request. The detailed instructions for these restarts shall be a part of the ''Infrastructure''.
 +
::#Automatic restarts shall be executed if the service gets stuck.
 +
 
 +
-> What are the dependencies of a restart?
 +
If we are using the the provider network then we can restart the controller and compute individually or both at once. But the virtual machines running on the  host will get restarted. In case of self service the communication between the virtual machines can be interrupted
 +
 
 +
-> What are the prerequisites of a restart?
 +
 
 +
If you want to need have frequent restart then you can setup openstack in high availablity mode so that VM's won't do down. for that we can do that using storage Server like CEPH
 +
 
 +
-> Why are potential challenges or limitations faced when initiating a restart?
 +
  Improper restart can damage services like database or network.
 +
 
 +
-> What is the plan if a restart fails?
 +
  If we have set up HA then the loss will be neglibile. Otherwise we have to evacuate the virtual machines from the server.
 +
 
 +
-> What is the duration of a restart?
 +
Restart of server can take 5 minutes min. Restart of server can take 1 minute.
 +
 
 +
-> What is the outage?
 +
Outage can be a power ,network or resources outage. That can make the resources
 +
=============================
 +
 
 +
=> Outside docker-registries with the appropriate repositories:
 +
=============================
 +
-> Where is the docker-registry located?
 +
We can use public or private registry
 +
 
 +
-> Why was the docker registry created?
 +
It is repository that contains docker images. So that we can pull own request
 +
 
 +
-> Explain what is included in the repositories.
 +
Docker repositories only contains docker images. It is a webbased panel where we can request the docker image
 +
 
 +
-> Why are the repositories necessary?
 +
Public repositories have a purpose to collect all the images. So that they can be shared to everyone and anyone can use it according to their requirement.
 +
Private repositories are for private use like they are only available within the organization.
 +
=============================
 +
 
 +
 
 +
=> Push launches:
 +
=============================
 +
A push procedure shall be planned with a corresponding increase in the number of subversions.
 +
-> What is a monthly frequency desirable?
 +
->  Why is there a corresponding increase in the number of subversions per push?
 +
->  What are the expected challenges faced (if any)?
 +
->  What is done in the initial process to alleviate these challenges?
 +
->  What are the unexpected challenges faced?
 +
-> How are the unexpected challenges (if any) resolved?
 +
-> What would you do differently if you know what the challenges are going to be?
 +
 
 +
----------------------------------
 +
I am sorry I don't have much experience with git or SVN. But usually the changes are needed be pushed after the code is final even if it is just a subpart.
 +
----------------------------------
 +
 
 +
================================
 +
 
 +
 
 +
=> Tested recovery:
 +
===================================
 +
-> Why is recovery testing necessary?
 +
  Recovery testing is necessary to ensure that we taking proper,complete and working backup for the data or instance.
 +
 
 +
-> When is recovery tested?
 +
It can only be tested after the setup is complete and data/application is runnning on the backup.
 +
 
 +
> How do you test recovery?
 +
In case of openstack we can use the snapshot. In case of a website we can test it with the web data and database of the website.
 +
 
 +
 
 +
-> List the steps to initiating the backup.
 +
There is cinder backup service which will keep a copy of cinder drive. We can use image snapshots to create backup's and integrate it with shell script. Which I have done recently.
 +
 
 +
-> List expected or unexpected results.
 +
 
 +
 
 +
-> How lessons learned will be managed?
 +
 
 +
=========================================
 +
 
 +
=> Backup
 +
===================================
 +
-> What and how to backup?
 +
There is cinder backup service which will keep a copy of cinder drive. We can use image snapshots to create backup's and integrate it with shell script. Which I have done recently.
 +
 
 +
-> Where to store the backup data?
 +
We can store images in seperate drive by mounting it on the /var/lib/glane/images. So we can have seperate Storage.
 +
 
 +
-> What is a frequency desired? Why is the backup frequency selected as optimal?
 +
Creating the whole backup of cinder drive can increase storage cost where as if we are using image snapshot's for backup then it might reduce the storage but the drives attached seperately will not be backed up. The frequency of backups generally depends upon the environment. In most cases the backup policies are set to 7 days. So that we can go back and revert the changes made 7 days ago.
 +
 
 +
-> What are the steps to initiating a non-planned backup?
 +
We can do the manual backup of the instances using snapshots or we can manually copy the files or take manual database dump’s.
 +
 
 +
If you want to run Docker and going with the orchestration tools/application like kubernetes I would recommend highly against it as it can make the openstack network slow. Keeping them seperate would be a good choice.
 +
 
 +
I will give you detailed server specifications. I will prepare that just wanted to know if you want separate storage servers for CEPH. I will give you the chance to increase your storage without any issue and will provide high availability feature but it will require faster switch
 +
 
 +
We can use opeenstack magnum service to deploy containers
 +
 
 +
===Server specs===
 +
IF you are using CEPH storage then the specs are these :-
 +
==================
 +
For controller
 +
==================
 +
8 GB RAM 4 CPU RAID 1 500SSD x2 or 250GB x2
 +
 
 +
==================
 +
For compute
 +
==================
 +
Min 32GB ram and 12 CPU's RAID 1 500SSD x2 or 250GB x2  (FOR DEV)
 +
Min 32GB ram and 12 CPU's RAID 1 500SSD x2 or 250GB x2 and 1 additional SSD or NVME for ceph cache (FOR PROD)
 +
 
 +
You can increase or decrease number of server's by replication same specs
 +
 
 +
==================
 +
For storage server
 +
==================
 +
Min 16GB RAM and 12 CPU RAID 1 minimum [[capacity]] HARDRIVE OR SSDx2 (only for OS) and 1 NVME or SSD for ceph jouranl's and 2TBx5 Harddrives for data storage.
 +
For Ceph we need atleast 2 server and HA we need 3 (Recommended)
 +
 
 +
For storage Server RAM requirement is 1GB ram per 1TB of ceph storage and 1 cpu per ceph osd
 +
 
 +
================================================================================================================================================
 +
As for Network We need atleast 10GBPS LAN cards. 2 cards for storage server and 2 10GBPS lan card and 1GBPS  for COMPUTE and controller.
 +
 
 +
==Executive roles==
 +
===System Administrator===
 +
::Duties
 +
::#Configuring servers, fault-tolerant solutions, infrastructure elements;
 +
::#Install / install servers / services, upgrade existing ones;
 +
::#Monitoring system performance
 +
::#Creating file systems
 +
::#Create a backup and restore policy
 +
::#Organization of remote access;
 +
::#Monitoring of network communication
 +
::#Update the system as soon as a new version of the OS and application software is released
 +
::#Implementing policies to use a computer system and network
 +
::#Information Security;
 +
::#Administration of users (setting up and maintaining an account). Monitoring networks to ensure security and availability for specific users.
 +
::#User support;
 +
::#Troubleshoot problems reported by users.
 +
::#Configuring user security policies.
 +
::#Managing passwords and identity
 +
::#Documentation in the form of an internal wiki
 +
::#Routing protocols and configuration of the routing table.
 +
::#Configurations for authentication and authorization of directory services.
 +
::#Writing server software;
 +
::#The system administrator sets tasks for writing the necessary modules to programmers, and introduces rules for working with software for the whole company.
 +
 
 +
===Administrator of particular application===
 +
::#Develop rule policy inside application
 +
::#Maintain application, version control, notify System Administrator about stable updates, issues of application
 +
 
 +
===Admin Tasks===
 +
 
 +
Ongoing tasks for which a monthly fee will apply:
 +
- Monitor the servers for any problem and quickly respond to fix it.
 +
- Make sure that the periodic backups are done and in complete health.
 +
- Maintain the health of all the services that run on the servers (ie. apache, postfix, mysql etc.)
 +
- Keep the servers fully updated with the latest patches and updates to prevent security problems and maintain performance
 +
- Keep track of performance and improve it with the latest tweaks.
 +
- If needed suggest hardware/resource upgrades.
 +
- Maintain the firewall rules
 +
 
 +
Additional tasks that will be billed by hour:
 +
- Development team requests
 +
- Installment of new software / scripts
 +
- Installment or configuration of new servers
 +
- Weekly meeting
 +
 
 +
===Tasks ideas===
 +
daily routine checkups on logs -- update the server and upgrade it manually -- make manual backup if necessary -- check firewall setting -- set a restriction for supposedly required ports only
 +
set restriction on ssh for allowed users only -- install anti malware script if necessary -- delete logs after reviewing mostly if it eats a lot of hard disk space
 +
 
 +
#if its a newly created server,
 +
#*create a sudo user although I have access to the root password;
 +
#*install the required dependencies and libraries for the server for what uses its going to be;
 +
#*Like if it's a mail server or a webserver; for every extension or dependencies even the main script or program to be install the server should be updated as always.
 +
#If its the existing server,
 +
#* check the backup of the server.
 +
#* after the access for root has given to me. I would create a sudo user and login using that username.
 +
#* check what are the process running from the server.
 +
#* check the installed packages on the server.
 +
#* Update the server and check which one is necessary to upgrade.
 +
#* check and monitor the logs from ssh and http.
 +
#* check the disk allocation.
 +
#* Delete old auth and http logs.
 +
#* check if the ufw is installed and active.
 +
#* check if the ports are filtered.
 +
#* check the ssh config if the allowed users is defined.
 +
 
 +
optimize the servers to run much more faster and smoothly by nginx,apache,php and mariadb tuning. use multiserver plugin for the video site to speedup the website performance.
 +
 
 +
==Requirements -- second draft==
 +
 
 +
These requirements are a set of detailed requirements for the ''Infrastructure''. This document has been drafted to assist cloud developers to come up with detailed implementation of [[enterprise private cloud]] with an objective to host [[BigBlueButton]], [[Moodle]], [[MediaWiki]], [[Odoo]], and [[Redmine]] [[end-user software application]]s. Each of these application shall serve more than 100 users with a peak load [[capacity]] of 20 simultaneous users. The requirements have been drafted keeping in mind that users of above applications would require an uninterrupted very high availability.
 +
 
 +
The target audience for this document is [[cloud architect]]s, [[DevOps engineer]]s and [[system administrator]]s.
 +
 
 +
===Block storage===
 +
====Block Storage Objectives====
 +
#[[capacity]] requirement of 2T
 +
#A peak speed of 110[[Mbps]] of data transfer
 +
#A peak speed of 15-20[[Mbps]] of data transfer rate with about 20 parallel transfers
 +
#3x Data redundancy
 +
#Super scalability
 +
#Uniform load distribution
 +
#Shall be consumed by [[VM]]s and [[Container]]s
 +
#Shall be consumed by applications
 +
#Shall be consumed by [[OpenStack]]
 +
#An acceptable latency
 +
 
 +
====Storage Requirements====
 +
#Distributed striped [[GlusterFS]] volume shall be implemented for data redundancy, load balancing and scalability
 +
#[[GlusterFS]] shall be implemented for serving block storage
 +
#System should be implemented with at least three or more stripes.
 +
#A minimum of three bare metal servers shall be used to implement [[GlusterFS]].
 +
#Each of the node shall have at least 3 [[SSD]] drives each with [[capacity]] of 1TBx2 terabytes and 125GBx1.
 +
#Hardware configuration for each node shall be, a minimum of 4GB [[RAM]], 2 or more [[CPU]] Cores, 1+ [[GBE]] x 2 [[Network adapter]]s.
 +
 
 +
===Object storage===
 +
====Object Storage Objectives====
 +
#[[capacity]] requirement of 3T
 +
#A peak speed of 110[[Mbps]] of data transfer
 +
#A peak speed of 15-20[[Mbps]] of data transfer rate with about 20 parallel transfers
 +
#2x data redundancy
 +
#Super scalability
 +
#Shall host [[Storage image]]s, [[Data backup]]s and [[System snapshot]]s
 +
#Shall host [[Docker Registry]]
 +
 
 +
====Object Storage Requirements====
 +
#Distributed [[OpenStack]] swift object volume shall be implemented for data redundancy, load balancing and scalability
 +
#[[Swift]] object shall be implemented for serving [[Block storage]]
 +
#A minimum of three bare metal servers shall be used to implement [[Swift]].
 +
#Each of the node shall have at least 3 [[SATA]]/[[SCSI]] drives each with [[capacity]] of 1TBx2 terabytes and 125GBx1.
 +
#Hardware configuration for each node shall be, a minimum of 4GB [[RAM]], 2 or more [[CPU]] Cores, 1+ [[GBE]] x 2 [[Network adapter]]s.
 +
#The management network shall be separate from data for replications
 +
#The [[OpenStack]] controller cluster shall be used to authenticate and authorise the usage of object storage.
 +
#Possibility of using data and management network separate
 +
 
 +
===BigBlueButton===
 +
====BigBlueButton Specific Objectives====
 +
#This cluster should be built using bare metals.
 +
#The streaming data that comes in and goes out should directly terminate on this application
 +
#A minimum bandwidth of 80[[Mbps]] internet data streaming [[capacity]] is required.
 +
 
 +
====BigBlueButton Requirements====
 +
#A two node cluster shall be implemented to host [[BigBlueButton]]
 +
#Hardware configuration for each node shall be, a minimum of 8GB [[RAM]], 4 or more [[CPU]] Cores, 1+ [[GBE]] x 2 [[Network adapter]]s.
 +
#Big blue button should be specifically implemented on [[Ubuntu]] 16.04 64-bit [[OS]]
 +
#$A minimum of 500G of [[GlusterFS]] storage shall be made available to the cluster
 +
#80[[Mbps]] symmetrical data transfer rate shall be made available to the cluster.
 +
 
 +
===Container/Magnum Cluster Requirements===
 +
#A two node cluster shall be implemented to host [[Docker]]s
 +
#Hardware configuration for each node shall be, a minimum of 32GB [[RAM]], 4 or more [[CPU]] Cores, 1+ [[GBE]] x 2 [[Network adapter]]s.
 +
#A best possible [[OS]] shall be chosen to implement this cluster, to accommodate maximum number of applications.
 +
#At a minimum of ten applications shall run simultaneously.
 +
#A minimum of 300G of [[GlusterFS]] storage shall be reserved for use by [[Container]]s
 +
#A minimum of 300G of [[Swift]] object storage shall be reserved for use by docker registry
 +
#All applications other than [[BigBlueButton]] shall be hosted using these [[Container]]s
 +
 
 +
===OpenStack Objectives===
 +
#Should be able to act as single point infrastructure management console.
 +
#A single authentication, authorisation agent for all the applications in the cloud.
 +
#Should be able to make all the cluster and nodes in concert.
 +
 
 +
====Controller Cluster====
 +
#A two node cluster shall be implemented to host [[OpenStack]] controller.
 +
#Hardware configuration for each node shall be, a minimum of 8GB [[RAM]], 2 or more [[CPU]] Cores, 1+ [[GBE]] x 2 [[Network adapter]]s.
 +
#A local storage of 256 GB [[SSD]] shall be used for [[OS]] and logs
 +
#This cluster shall host [[OpenStack]] dashboard, neutron and authentication agent and any other dependencies.
 +
 
 +
====Compute Cluster====
 +
#A two node cluster shall be implemented to host [[OpenStack Nova]].
 +
#Hardware configuration for each node shall be, a minimum of 32GB [[RAM]], 4 or more [[CPU]] Cores, 1+ [[GBE]] x 2 [[Network adapter]]s.
 +
#A local storage of 256 GB [[SSD]] shall be used for [[OS]] and logs
 +
#This cluster shall host [[OpenStack Nova]] and Ironic ([[Bitfrost]] ??) services along with any other necessary dependencies.
 +
 
 +
====Cinder Cluster====
 +
#A two node cluster shall be implemented to host [[OpenStack Cinder]].
 +
#Hardware configuration for each node shall be, a minimum of 4GB [[RAM]], 2 or more [[CPU]] Cores, 1+ GBE x 2 [[Network adapter]]s.
 +
#A local storage of 256 GB [[SSD]] shall be used for [[OS]] and logs
 +
#This cluster shall host [[OpenStack Cinder]] and any other necessary dependencies.
 +
 
 +
====Senlin Cluster====
 +
#TBD
 +
 
 +
===Security Requirements===
 +
#TBD
 +
 
 +
===Internet and Floating IPs===
 +
#At a minimum pack of 10-12 real [[IPv4]] would be required
 +
#Each application interfacing user shall use 1-2 [[IP]] addresses as per application and design requirements.
 +
 
 +
===Networking Requirements===
 +
#[[GlusterFS]] and [[Swift]] cluster shall have two separate networks
 +
#[[VLAN]] tagging if necessary
 +
#Number of networking
 +
#TBD
 +
 
 +
===Monitoring and trouble Requirements===
 +
#What monitoring tools
 +
#Where to install
 +
#TBD
 +
 
 +
==Development==
 +
The [[RFB]] has been posted and the following responses are collected so far:
 +
#I will try to develop a [[proof of concept]] (PoC) based on the available and known requirements. This will help to understand the other requirements and then to develop a full scale private cloud.
 +
#We would document a clear set of functional and non functional requirements clearly capturing the essence of what is needed. After that, we will document the architecture and design, implement and test followed lastly by acceptance and handover to you, the customer. However, in order to develop an accurate schedule and budget, we would need to run a [[due diligence]] exercise. This due diligence would include detailed analysis and documentation of your functional and non-functional requirements for the private cloud project. We have material and a methodology we use. The due diligence would take approximately 3 days and we would then hand over to you the following deliverables:
 +
#*A detailed requirements specification
 +
#*A project plan
 +
#*A work package breakdown with effort and roles required
 +
#*A risk register and mitigation plan.
 +
#What are performance and storage expectations are from applications that you want to run?
 +
#On a very high level, these are the steps I'd follow based on the current information.
 +
#*meeting with you to understand your specific needs in detail
 +
#*assuming this is a private cloud deployment, we need to understand the size of the applications you'll need deployed in that cloud
 +
#*with this you can determine the amount of physical servers needed depending on the number of instances, and virtual resources needed (calculation based on physical hardware resources)
 +
#*analyze scalability and performance needs
 +
#*decompose in stories/requirements that can be estimated and implemented by the team of your choice.
 +
#Putting requirements together for your private cloud would require following inputs:
 +
#*Are you a startup or want to migrate to the private cloud.
 +
#*What is the objective that you would like to achieve?
 +
#*What is the size of the private cloud that you envisage?
  
==Further development==
+
==See also==
:''Main wikipage: [[CNM Servers (development)]]''
+
*[[Cloud lexicon]], a listing of links about computer terms
  
[[CNM Servers (development)]] is the cloud service model of [[CNM Cloud]] that offers services of its ecosystem of servers and cloud OS, which is OpenStack. This infrastructure enables operations of [[Opplet]] and [[CNM app]]s.
+
[[Category: CNM Cyber Orientation]][[Category: Articles]][[Category:CNM Cloud products]]

Latest revision as of 23:27, 14 April 2024

CNM Farms (formerly known as CNM Servers, sometimes referred as CNM Platform; hereinafter, the Farms) is The combination of computing servers and container engines that host Opplet, as well as those parts of CNM Lab Farm that are provisioned to run serverless systems. The Farms provide both Worldopp Middleware and CNM apps with functionality.


Contents

Application servers

Currently, every of four existing application servers that support Opplet is build on one droplet of DigitalOcean. Every of these Platform features 2 GB Memory, 50 GB Disk, and Ubuntu 18.04.2 x64.

All of those Platform are for compute. However, some need for one or more testing servers, tentatively called a fellow server, is identified. Some control servers may also be considered.

Campus Farm

Main wikipage: CNM Campus Farm
The CNM Campus Farm supports...

Fed Farm

Main wikipage: CNM Fed Farm
The CNM Fed Farm supports one instance of WorldOpp Middleware (Opplet; shall be located at https://cabin.friendsofcnm.org).

Bureau Farm

Main wikipage: CNM Bureau Farm
The CNM Bureau Farm is a production server that shall support:
  1. One instance of Educaship Moodle (Moodle; currently, located at https://cert.friendsofcnm.org).
  2. One instance of CNM Mailware (currently, Roundcube; currently, not located at any URL).
  3. One instance of Educaship GitLab (Redmine and SVN linked to Bitbucket's file storage; currently, located at https://lab.friendsofcnm.org).
  4. One instance of CNM Linkupware (SuiteCRM; currently, located at https://linkup.friendsofcnm.org).
  5. Two instances of Educaship MediaWiki setup for two languages (MediaWiki; currently, located at https://wiki.friendsofcnm.org).
  6. Several instances of Educaship WordPress (WordPress; currently, located at https://worldopp.org).
  7. One instance of Educaship AVideo (YouPHPTube; currently, located at https://tube.friendsofcnm.org).
  8. One instance of Educaship HumHub (HumHub; shall be located at https://social.friendsofcnm.org).

Lab Farm

Main wikipage: CNM Lab Farm
The CNM Lab Farm is used for learning and testing. It shall support all the applications installed at the EndUser Farm and, in addition, one Humhub instance.

Support servers

Database server

No database server is currently deployed.

File server

No file server is currently deployed. CNM Lab utilizes Bitbucket to satisfy its file storage needs.

Mail server

Several CNM apps currently deploy Sendmail as their mail servers.

Web server

All web servers of Opplet are currently built on Apache HTTP Servers.

Development

The development of the Platform can be divided in two parts -- the historical endeavors and further projects.

Historical endeavors

Main wikipage: CNM Cloud Project
Historically, the WorldOpp Fellow Staff has undertaken the endeavors to develop the Platform under the CNM Cloud Project and, as of July of 2019, Romanof has completed the overwhelming majority of that work.

Further projects

The promising cloud service model of Opplet that shall offer services of its ecosystem of servers and cloud OS, which is OpenStack. This infrastructure enables operations of CNM apps.

See also

Related lectures

a draft for the promising CNM Farms that both:


Architecture

The Infrastructure supports cloud functions as follows:

  1. Development and production. If more than one, this cloud servers shall be located in the same data center.
  2. Backup
  3. Testing
  4. Demonstration.

Architectures of separate servers have not determined yet. If we can not think of anything better, we can take https://ru.hetzner.com/hosting/produkte_vserver/private-cloud

Requirements -- first draft

The Infrastructure:

  1. Utilizes OpenStack as the cloud operating system;
  2. Utilizes OpenStack Keystone based on LDAP as identity provider;
  3. Shall enable both Opplet and CNM apps;
  4. Most likely, will use hetzner.de dedicated servers at the beginning.

Backup and recovery

The backup work model and the ability to restore an operational state, including backups for all of its own data and development, are core requirements. Topics to be addressed are:
  • Restarts:
    1. What parts and why shall be restarted;
    2. How -- by request and automatically -- restarts shall be initiated
    3. How often?
    4. What are the dependencies of a restart?
    5. What are the prerequisites of a restart?
    6. Why are potential challenges or limitations faced when initiating a restart?
    7. What is the plan if a restart fails?
    8. What is the duration of a restart?
    9. What is the outage?
  • Outside docker-registries with the appropriate repositories:
    1. Where is the docker-registry located?
    2. Why was the docker registry created?
    3. Explain what is included in the repositories.
    4. Why are the repositories necessary?
  • Push launches:
    1. A push procedure shall be planned with a corresponding increase in the number of subversions.
    2. What is a monthly frequency desirable?
    3. Why is there a corresponding increase in the number of subversions per push?
    4. What are the expected challenges faced (if any)?
    5. What is done in the initial process to alleviate these challenges?
    6. What are the unexpected challenges faced?
    7. How are the unexpected challenges (if any) resolved?
    8. What would you do differently if you know what the challenges are going to be?
  • Tested recovery:
    1. Why is recovery testing necessary?
    2. When is recovery tested?
    3. How do you test recovery?
    4. List the steps shall be taken when testing recovery.
    5. List the steps to initiating the backup.
    6. List expected or unexpected results.
    7. How lessons learned will be managed?
  • Backup
    1. What and how to backup?
    2. Where to store the backup data?
    3. What is a frequency desired? Why is the backup frequency selected as optimal?
    4. What are the steps to initiating a non-planned backup?

Restarts

  1. All the parts in the cluster shall be able to be restarted if the restart is initiated by request. The detailed instructions for these restarts shall be a part of the Infrastructure.
  2. Automatic restarts shall be executed if the service gets stuck.

-> What are the dependencies of a restart?

If we are using the the provider network then we can restart the controller and compute individually or both at once. But the virtual machines running on the  host will get restarted. In case of self service the communication between the virtual machines can be interrupted

-> What are the prerequisites of a restart?

If you want to need have frequent restart then you can setup openstack in high availablity mode so that VM's won't do down. for that we can do that using storage Server like CEPH

-> Why are potential challenges or limitations faced when initiating a restart?

 Improper restart can damage services like database or network.

-> What is the plan if a restart fails?

 If we have set up HA then the loss will be neglibile. Otherwise we have to evacuate the virtual machines from the server.

-> What is the duration of a restart?

Restart of server can take 5 minutes min. Restart of server can take 1 minute.

-> What is the outage?

Outage can be a power ,network or resources outage. That can make the resources
=================

=> Outside docker-registries with the appropriate repositories:

=================

-> Where is the docker-registry located?

We can use public or private registry

-> Why was the docker registry created?

It is repository that contains docker images. So that we can pull own request

-> Explain what is included in the repositories.

Docker repositories only contains docker images. It is a webbased panel where we can request the docker image

-> Why are the repositories necessary? Public repositories have a purpose to collect all the images. So that they can be shared to everyone and anyone can use it according to their requirement. Private repositories are for private use like they are only available within the organization.

=================

=> Push launches:

=================

A push procedure shall be planned with a corresponding increase in the number of subversions. -> What is a monthly frequency desirable? -> Why is there a corresponding increase in the number of subversions per push? -> What are the expected challenges faced (if any)? -> What is done in the initial process to alleviate these challenges? -> What are the unexpected challenges faced? -> How are the unexpected challenges (if any) resolved? -> What would you do differently if you know what the challenges are going to be?


I am sorry I don't have much experience with git or SVN. But usually the changes are needed be pushed after the code is final even if it is just a subpart.


====================

=> Tested recovery:

=======================

-> Why is recovery testing necessary?

 Recovery testing is necessary to ensure that we taking proper,complete and working backup for the data or instance.

-> When is recovery tested?

It can only be tested after the setup is complete and data/application is runnning on the backup.

> How do you test recovery?

In case of openstack we can use the snapshot. In case of a website we can test it with the web data and database of the website.


-> List the steps to initiating the backup. There is cinder backup service which will keep a copy of cinder drive. We can use image snapshots to create backup's and integrate it with shell script. Which I have done recently.

-> List expected or unexpected results.


-> How lessons learned will be managed?

=============================

=> Backup

=======================

-> What and how to backup? There is cinder backup service which will keep a copy of cinder drive. We can use image snapshots to create backup's and integrate it with shell script. Which I have done recently.

-> Where to store the backup data? We can store images in seperate drive by mounting it on the /var/lib/glane/images. So we can have seperate Storage.

-> What is a frequency desired? Why is the backup frequency selected as optimal? Creating the whole backup of cinder drive can increase storage cost where as if we are using image snapshot's for backup then it might reduce the storage but the drives attached seperately will not be backed up. The frequency of backups generally depends upon the environment. In most cases the backup policies are set to 7 days. So that we can go back and revert the changes made 7 days ago.

-> What are the steps to initiating a non-planned backup? We can do the manual backup of the instances using snapshots or we can manually copy the files or take manual database dump’s.

If you want to run Docker and going with the orchestration tools/application like kubernetes I would recommend highly against it as it can make the openstack network slow. Keeping them seperate would be a good choice.

I will give you detailed server specifications. I will prepare that just wanted to know if you want separate storage servers for CEPH. I will give you the chance to increase your storage without any issue and will provide high availability feature but it will require faster switch

We can use opeenstack magnum service to deploy containers

Server specs

IF you are using CEPH storage then the specs are these :-

======

For controller

======

8 GB RAM 4 CPU RAID 1 500SSD x2 or 250GB x2

======

For compute

======

Min 32GB ram and 12 CPU's RAID 1 500SSD x2 or 250GB x2 (FOR DEV) Min 32GB ram and 12 CPU's RAID 1 500SSD x2 or 250GB x2 and 1 additional SSD or NVME for ceph cache (FOR PROD)

You can increase or decrease number of server's by replication same specs

======

For storage server

======

Min 16GB RAM and 12 CPU RAID 1 minimum capacity HARDRIVE OR SSDx2 (only for OS) and 1 NVME or SSD for ceph jouranl's and 2TBx5 Harddrives for data storage. For Ceph we need atleast 2 server and HA we need 3 (Recommended)

For storage Server RAM requirement is 1GB ram per 1TB of ceph storage and 1 cpu per ceph osd

====================================================================================================================================

As for Network We need atleast 10GBPS LAN cards. 2 cards for storage server and 2 10GBPS lan card and 1GBPS for COMPUTE and controller.

Executive roles

System Administrator

Duties
  1. Configuring servers, fault-tolerant solutions, infrastructure elements;
  2. Install / install servers / services, upgrade existing ones;
  3. Monitoring system performance
  4. Creating file systems
  5. Create a backup and restore policy
  6. Organization of remote access;
  7. Monitoring of network communication
  8. Update the system as soon as a new version of the OS and application software is released
  9. Implementing policies to use a computer system and network
  10. Information Security;
  11. Administration of users (setting up and maintaining an account). Monitoring networks to ensure security and availability for specific users.
  12. User support;
  13. Troubleshoot problems reported by users.
  14. Configuring user security policies.
  15. Managing passwords and identity
  16. Documentation in the form of an internal wiki
  17. Routing protocols and configuration of the routing table.
  18. Configurations for authentication and authorization of directory services.
  19. Writing server software;
  20. The system administrator sets tasks for writing the necessary modules to programmers, and introduces rules for working with software for the whole company.

Administrator of particular application

  1. Develop rule policy inside application
  2. Maintain application, version control, notify System Administrator about stable updates, issues of application

Admin Tasks

Ongoing tasks for which a monthly fee will apply: - Monitor the servers for any problem and quickly respond to fix it. - Make sure that the periodic backups are done and in complete health. - Maintain the health of all the services that run on the servers (ie. apache, postfix, mysql etc.) - Keep the servers fully updated with the latest patches and updates to prevent security problems and maintain performance - Keep track of performance and improve it with the latest tweaks. - If needed suggest hardware/resource upgrades. - Maintain the firewall rules

Additional tasks that will be billed by hour: - Development team requests - Installment of new software / scripts - Installment or configuration of new servers - Weekly meeting

Tasks ideas

daily routine checkups on logs -- update the server and upgrade it manually -- make manual backup if necessary -- check firewall setting -- set a restriction for supposedly required ports only set restriction on ssh for allowed users only -- install anti malware script if necessary -- delete logs after reviewing mostly if it eats a lot of hard disk space

  1. if its a newly created server,
    • create a sudo user although I have access to the root password;
    • install the required dependencies and libraries for the server for what uses its going to be;
    • Like if it's a mail server or a webserver; for every extension or dependencies even the main script or program to be install the server should be updated as always.
  2. If its the existing server,
    • check the backup of the server.
    • after the access for root has given to me. I would create a sudo user and login using that username.
    • check what are the process running from the server.
    • check the installed packages on the server.
    • Update the server and check which one is necessary to upgrade.
    • check and monitor the logs from ssh and http.
    • check the disk allocation.
    • Delete old auth and http logs.
    • check if the ufw is installed and active.
    • check if the ports are filtered.
    • check the ssh config if the allowed users is defined.

optimize the servers to run much more faster and smoothly by nginx,apache,php and mariadb tuning. use multiserver plugin for the video site to speedup the website performance.

Requirements -- second draft

These requirements are a set of detailed requirements for the Infrastructure. This document has been drafted to assist cloud developers to come up with detailed implementation of enterprise private cloud with an objective to host BigBlueButton, Moodle, MediaWiki, Odoo, and Redmine end-user software applications. Each of these application shall serve more than 100 users with a peak load capacity of 20 simultaneous users. The requirements have been drafted keeping in mind that users of above applications would require an uninterrupted very high availability.

The target audience for this document is cloud architects, DevOps engineers and system administrators.

Block storage

Block Storage Objectives

  1. capacity requirement of 2T
  2. A peak speed of 110Mbps of data transfer
  3. A peak speed of 15-20Mbps of data transfer rate with about 20 parallel transfers
  4. 3x Data redundancy
  5. Super scalability
  6. Uniform load distribution
  7. Shall be consumed by VMs and Containers
  8. Shall be consumed by applications
  9. Shall be consumed by OpenStack
  10. An acceptable latency

Storage Requirements

  1. Distributed striped GlusterFS volume shall be implemented for data redundancy, load balancing and scalability
  2. GlusterFS shall be implemented for serving block storage
  3. System should be implemented with at least three or more stripes.
  4. A minimum of three bare metal servers shall be used to implement GlusterFS.
  5. Each of the node shall have at least 3 SSD drives each with capacity of 1TBx2 terabytes and 125GBx1.
  6. Hardware configuration for each node shall be, a minimum of 4GB RAM, 2 or more CPU Cores, 1+ GBE x 2 Network adapters.

Object storage

Object Storage Objectives

  1. capacity requirement of 3T
  2. A peak speed of 110Mbps of data transfer
  3. A peak speed of 15-20Mbps of data transfer rate with about 20 parallel transfers
  4. 2x data redundancy
  5. Super scalability
  6. Shall host Storage images, Data backups and System snapshots
  7. Shall host Docker Registry

Object Storage Requirements

  1. Distributed OpenStack swift object volume shall be implemented for data redundancy, load balancing and scalability
  2. Swift object shall be implemented for serving Block storage
  3. A minimum of three bare metal servers shall be used to implement Swift.
  4. Each of the node shall have at least 3 SATA/SCSI drives each with capacity of 1TBx2 terabytes and 125GBx1.
  5. Hardware configuration for each node shall be, a minimum of 4GB RAM, 2 or more CPU Cores, 1+ GBE x 2 Network adapters.
  6. The management network shall be separate from data for replications
  7. The OpenStack controller cluster shall be used to authenticate and authorise the usage of object storage.
  8. Possibility of using data and management network separate

BigBlueButton

BigBlueButton Specific Objectives

  1. This cluster should be built using bare metals.
  2. The streaming data that comes in and goes out should directly terminate on this application
  3. A minimum bandwidth of 80Mbps internet data streaming capacity is required.

BigBlueButton Requirements

  1. A two node cluster shall be implemented to host BigBlueButton
  2. Hardware configuration for each node shall be, a minimum of 8GB RAM, 4 or more CPU Cores, 1+ GBE x 2 Network adapters.
  3. Big blue button should be specifically implemented on Ubuntu 16.04 64-bit OS
  4. $A minimum of 500G of GlusterFS storage shall be made available to the cluster
  5. 80Mbps symmetrical data transfer rate shall be made available to the cluster.

Container/Magnum Cluster Requirements

  1. A two node cluster shall be implemented to host Dockers
  2. Hardware configuration for each node shall be, a minimum of 32GB RAM, 4 or more CPU Cores, 1+ GBE x 2 Network adapters.
  3. A best possible OS shall be chosen to implement this cluster, to accommodate maximum number of applications.
  4. At a minimum of ten applications shall run simultaneously.
  5. A minimum of 300G of GlusterFS storage shall be reserved for use by Containers
  6. A minimum of 300G of Swift object storage shall be reserved for use by docker registry
  7. All applications other than BigBlueButton shall be hosted using these Containers

OpenStack Objectives

  1. Should be able to act as single point infrastructure management console.
  2. A single authentication, authorisation agent for all the applications in the cloud.
  3. Should be able to make all the cluster and nodes in concert.

Controller Cluster

  1. A two node cluster shall be implemented to host OpenStack controller.
  2. Hardware configuration for each node shall be, a minimum of 8GB RAM, 2 or more CPU Cores, 1+ GBE x 2 Network adapters.
  3. A local storage of 256 GB SSD shall be used for OS and logs
  4. This cluster shall host OpenStack dashboard, neutron and authentication agent and any other dependencies.

Compute Cluster

  1. A two node cluster shall be implemented to host OpenStack Nova.
  2. Hardware configuration for each node shall be, a minimum of 32GB RAM, 4 or more CPU Cores, 1+ GBE x 2 Network adapters.
  3. A local storage of 256 GB SSD shall be used for OS and logs
  4. This cluster shall host OpenStack Nova and Ironic (Bitfrost ??) services along with any other necessary dependencies.

Cinder Cluster

  1. A two node cluster shall be implemented to host OpenStack Cinder.
  2. Hardware configuration for each node shall be, a minimum of 4GB RAM, 2 or more CPU Cores, 1+ GBE x 2 Network adapters.
  3. A local storage of 256 GB SSD shall be used for OS and logs
  4. This cluster shall host OpenStack Cinder and any other necessary dependencies.

Senlin Cluster

  1. TBD

Security Requirements

  1. TBD

Internet and Floating IPs

  1. At a minimum pack of 10-12 real IPv4 would be required
  2. Each application interfacing user shall use 1-2 IP addresses as per application and design requirements.

Networking Requirements

  1. GlusterFS and Swift cluster shall have two separate networks
  2. VLAN tagging if necessary
  3. Number of networking
  4. TBD

Monitoring and trouble Requirements

  1. What monitoring tools
  2. Where to install
  3. TBD

Development

The RFB has been posted and the following responses are collected so far:

  1. I will try to develop a proof of concept (PoC) based on the available and known requirements. This will help to understand the other requirements and then to develop a full scale private cloud.
  2. We would document a clear set of functional and non functional requirements clearly capturing the essence of what is needed. After that, we will document the architecture and design, implement and test followed lastly by acceptance and handover to you, the customer. However, in order to develop an accurate schedule and budget, we would need to run a due diligence exercise. This due diligence would include detailed analysis and documentation of your functional and non-functional requirements for the private cloud project. We have material and a methodology we use. The due diligence would take approximately 3 days and we would then hand over to you the following deliverables:
    • A detailed requirements specification
    • A project plan
    • A work package breakdown with effort and roles required
    • A risk register and mitigation plan.
  3. What are performance and storage expectations are from applications that you want to run?
  4. On a very high level, these are the steps I'd follow based on the current information.
    • meeting with you to understand your specific needs in detail
    • assuming this is a private cloud deployment, we need to understand the size of the applications you'll need deployed in that cloud
    • with this you can determine the amount of physical servers needed depending on the number of instances, and virtual resources needed (calculation based on physical hardware resources)
    • analyze scalability and performance needs
    • decompose in stories/requirements that can be estimated and implemented by the team of your choice.
  5. Putting requirements together for your private cloud would require following inputs:
    • Are you a startup or want to migrate to the private cloud.
    • What is the objective that you would like to achieve?
    • What is the size of the private cloud that you envisage?

See also