Cloud Integrated Services (CIS) on OES
Best Practice Guide
CIS is a new service that became available with the release of OES 2018 SP1, I believe sometime in 2019. I started working with it shortly after it was released due to a large scale customer implementation. Since then I have worked with the product extensively as it has been developed and improved up until now (Currently working with OES 2018 SP2, with SP3 release right around the corner.)
This handy guide helps you through my entire recommended process from start to finish.
Step 1: Don't Do It.
CIS is a difficult product that is not ready for production. I consider it an early beta product at best. Besides all the bugs and problems, I have found the following to be true:
- In it's current state it is literally impossible to install the product successfully without getting the developers involved.
- Development resources are extremely limited and mostly outsourced to a team located in India. This creates language barriers, time zone challenges, and extremely poor quality phone lines when you're trying to do a phone call with them.
- The OES support team does not know the product or how to work with it. They rely almost completely on developer resources.
- The Documentation sucks and does not provide the level of detail necessary to properly implement the product.
- Micro Focus has not published any useful TIDs related to overcoming the many technical challenges you will face.
- Regardless of what problem you have, you will likely have to have a dial-in with developers that will involve 2-3 hours worth of time. They will likely delete and recreate all of the certificates used by the solution. When that doesn't work, they will then tell you they will have to get back to you with a solution. You will then wait 2-3 weeks.
- You'll find out that the entire solution includes a bunch of applications and components you've never seen or used before (Apache Kafka, Apache ZooKeeper, Elasticsearch, and a database), and aren't used by any other OES or Micro Focus service. You'll also find that there are no tidbits of useful information out there about how to manage those components from an OES/CIS perspective.
- It's possible, and likely, that every problem you experience will result in a "defect" being created which will then result in a patch being developed specifically for your issue. Expect this process to take 3-4 weeks.
- In many cases, your project won't be able to progress until each defect is resolved. You won't know what defects lie ahead until you get the current defect resolved. This is because many of the steps require that you complete them in a certain order before moving on to the next step.
Step 2: If you're dumb and still want to install CIS
Maybe you still want to do a project. Whatever. Keep reading for some guidelines if you're hell bent on torturing yourself.
Step 3: Don't install a Single Server "pilot project"
This is critical and could save you months of time.
It's common for a project of this magnitude to go through a test phase or pilot. While it seems logical to build a single server for this purpose, you will find that this is an absolute waste of time. You're better off to just build a production environment and do all of your testing with that. Here are my reasons why:
- The single server environment would appear on the surface to be simple, but it's not. You will experience a great number of challenges, and nobody needs this kind of stress. There is literally nothing learned in the Single Server environment that will help you in the production environment.
- If you setup a single server environment, and then you want to implement the project the recommended way (minimum of 6 servers), you will be beating your head on the wall for both the pilot project and the production environment.
- You won't be able to have a Single Server environment and a Production Environment in the same tree working side by side. It's plausible for you to think that you can keep your pilot project running while building a simultaneous production environment, but it won't work. This is due to the way NSS volumes are used by CIS, and also the way the CIS object configuration is stored in eDirectory.
- Somewhere along the way, I was told that a production environment should not be run on a Single server. Therefore if you want support (Assuming you can get it), you need to use a multi-server environment.
Step 4: Design a Production Environment right from the start
When building your production environment, here are some things you'll want to do that will help. Note that this won't make it perfect, but will go a long way.
- Build each server from scratch and use the latest available OES 2018 install media.
- Register the server and apply all available updates before you insert the OES server into the tree or install the CIS components. This ensures that you are running the newest code before attempting any of the CIS configurations.
- Knowing that you need 6 or more servers, name your servers in a way that makes sense and according to their function.
- Ensure that the servers are registered with your DNS services and are able to resolve by name.
- Do not rename servers after they have had CIS installed on them. There are too many things that go wrong if you rename a server, especially one with CIS on it.
- Read the entire documentation forward and backward several times before installing. You will find that there are system requirements strewn throughout the documentation and it is not all located in the "System requirements" section. If you aren't aware of this, you could build a server with an unsupported file system and not realize it until it's too late.
- Document every detail of every component on every single server. IP Addresses, names, port numbers, services, paths, etc. Don't assume that you'll be able to figure it out later. You likely won't.
General Server Inventory / What You Need
During the process of installing a production environment you will need a minimum of the following OES servers:
- Three (3) Infrastructure Servers
- One (1) Data Scale Server
- One (1) Database Server
- One (1) Main CIS server (Where the management occurs)
There are suggestions in the documentation to use OES Clustering for even more confusion. Personally I have not been a fan of Novell Clustering for some time. I won't go into details here, just know that I don't even touch on OES Clustering in this document.
Helpful Troubleshooting Tips
Here are some tips that can help you when you're troubleshooting CIS.
- The command "cishealth -verbose" will tell you what you believe is true, that everything is broken.
- The command "docker node ls" will tell you what nodes are broken in your "Infrastructure Services" clusters.
- The command "docker ps" will tell you what docker processes are running. It likely won't mean anything to you, but it's helpful to know that on the Infrastructure Services server, you should have several processes running.
- The command "rcdocker restart" will restart the docker services on your Infrastructure Service systems.
WTF is up with the Certificates
Some of this is just a braindump until I can organize and explain it
A major problem that I have experienced numerous times is with certificates. The common theme seems to be that certificates as installed by OES are not valid (or not good enough) for CIS. But it could vary depending on how the server got installed. Here are some things I can tell you about certificates:
- When OES is installed, the certificates are exported to the file system and used by some of the standard OES services. However, CIS does not use them in those default locations.
- When CIS is installed, the certificates are copied to CIS specific path locations rather than just referencing the original certificates. This means that if you ever recreate or update the OES certificates, they have to be manually copied to the CIS location.
- /etc/opt/novell/cis/certs seems to be one of the path names used by CIS.
- However, it's possible, depending on the server type, that the path is at /media/nss/DATA/etc/opt/novell/cis/certs (Some NSS volume that you've created and configured for use by CIS, probably the main CIS service system).
- nslookup xx.xx.xx.xx should return fqdn of your server.
- Certificate needs the server name and ip address in the Subject Alternative Name apparently.
- Repair certificates
- /etc/ssl/servercerts/servercert.pem needs to have Subject Alternate Name that matches the and apparently it doesn't by default
- "ndsconfig upgrade -j" recreates the certa nd forces it out to the services that use it
- "openssl x509 -in /etc/ssl/servercerts/servercert.pem -noout -text" Should show the server name in the subject alternative in FQDN format.
- server name probably should be in FQDN format in the Subject Alternative. This is set manually when customizing the cert.
- don't forget to do 'namconfig -k" after recreating server certs.
- copy Certificates from /etc/ssl/servercerts to /etc/opt/novell/cis/certs
- need to update kafka keystore on all docker swarm nodes
- on the main cis server after recreating certs:
- systemctl restart oes-cis-configuration.service
- systemctl restart oes-cis-server.target
- Run "kafka_keystore_update.sh" script on the docker swarm servers (IS Services). Restart docker "rcdocker restart"
Some problem with the Database, needed updated (On the MAIN server) after certs were updated.
- resetdbcred.zip file from developers. a script for something related to the database.
- resetdbcred -zk_url oescisis-int1:2282 -db_pass DJDSFLSDKF(Password root)
- /etc/opt/novell/cis/.creds .encCISCreds and .encCISKey were updated to todays date.
- systemctl restart oes-cis-configuration.services oes-cis-fluentbit.service
- systemctl restart oes-cis-server.target
NOW I'm able to access the main CIS Management page.
Need to fix S3 Target Cert. Ensure PEM format. Need to convert .crt to .pem
- systemctl restart oes-cis-data.service
Connect to S3 Target
Restart Agents on OES SErvers
systemctl restart oes-cis-agent.service oes-cis-recall-agent.service oes-cis-scanner.service
On Each OES HOst
nds.conf is where agent name comes.
pool resources, get name from vfs calls.
Select Key type as SSL or TLS and Extended key usage as Server authentication and User authentication, then click Next.