Authored by Mike Guerin, TeamCain
When it comes to maintenance planning, all the layers of your E1 technology on all infrastructures must be taken into account. You’ll want to think about your network tier (like routers, switches etc.), infrastructure tier (your server hardware, operating systems), your database, the runtime code on the server (BSFN, UBE) and the overall presentation/interface of your system.
Ensuring that you have identified all infrastructure and software components at each layer of technology is the key to building a proper maintenance and sustainment plan. For each infrastructure component, you’ll want to have a strategy for log maintenance, user access ID maintenance, security and access maintenance, patches and updates maintenance, volume testing and health checks.
Many of today’s software and hardware devices include settings for log cleanup and maintenance but it’s still important to ensure you have a schedule for cleanup of logs either manual or automated. You can create targeted logs to be set only when certain events occur (like errors only) in order to control access and the growth of the logs. In order to create a strategy to maintain your logs, you can think about which logs you should delete after a certain amount of time, you can cycle logs when they reach a certain size and you can keep “x” number of logs of “yy” MB in size to save space.
User Access ID Maintenance
The first step in developing a strategy for maintaining your users/IDs is to check which IDs have access to the equipment and the software. Instead of deleting IDs that are no longer active, just disable them. This allows for aligning to logs/history and eventually you can delete those IDs. This check is solely for user access IDs and not security rights assigned to the ID. ID checks should be done more frequently than checking security rights. To maintain user access, align this task with internal audits so you can keep it up to date and organized to have a clear picture of who can access the software and hardware.
Security & Access Maintenance
Similar to user access IDs, you’ll want to verify the access levels assigned to each ID is correct for their job role. Access level control can go from network equipment to more complex roles, like E1 Object Security, so making sure they are correct can save you against future confusion. In order to maintain security and access, verifying total level of access is critical to do regularly for compliance. The most common issue with E1 security is if a user has multiple roles in their job function but again, must be done to maintain compliance.
Patches & Updates Maintenance
Before thinking about your maintenance strategy for patches and updates, you need to remember that every company is different and you have to think what is a good fit for your company when designing a patch and update strategy for E1. You’ll want to apply to a non-production scenario where-ever possible, test with a predesigned test, then apply to production. Updates should be applied to all layers of technology on a schedule based on different factors. The first is the criticality of fixes at that later. Next you’ll want to review the schedule of patch releases from the vendor as well as the availability of outage windows for that layer of technology to allow for updates to be applied. Finally, you’ll need to stay in compliance with patch levels that are being introduced higher in the layers of technology.
Volume testing without a specific service level agreement as guidance for requirements is like building a rocket for space flight with no final destination identified. It just doesn’t make sense to not have a goal in mind. So you’ll want to ensure the following when developing your maintenance strategy:
- Is the server sizing and tuning for E1 appropriate for the anticipated volume of users and interfacing products transactions?
- How many more users and how much more volume can be added to the existing servers with their tuning before performance dissolves?
You should prepare to review volume testing maintenance once a year and it usually involves a high water mark and a stress test. With the high water mark, you steadily increase the load traffic until you reach an unacceptable level of performance. This provides data about how far you can go beyond the anticipated go-live volume levels before more sizing and tuning must be done. The stress test ensures that runtimes meet the required service level agreements in all areas. If the system is running slow, you would be getting lots of feedback but the stress test is valuable if you want to do tuning adjustments. Maintenance of your volume testing is one of the most daunting and difficult steps in the maintenance process but you have to take it one step at a time in order to approach your goal and you’ll see more benefits in the long run. It’s important to remember to always involve your users in testing because with a mix of new and old users, you’re more likely to see various situations occur.
Health checks are a series of regularly run verifications of your E1 system, interfacing products and underlying technologies to find errors. Errors are like icebergs. You only see 10% of them and you won’t see what’s under the water until it’s too late. Issues are usually unknown to users as they don’t produce an automatic message so the user knows something is wrong. Running health checks allows faster identification and resolution of issues. Checking all levels of technology on a regular schedule is one of the most commonly overlooked maintenance areas. When you do regular maintenance, you can quickly and easily check all the layers of technology and see if there are any difficulties that might be a concern in the future.