Client Monitoring at Machinery Manufacturer Festo
User satisfaction depends on how the system performs on the client side. Automation technology specialist Festo has taken this into account.
The server team gives the green light. The network engineers report: no issues. But the user on the other side of the globe sees red. Festo AG & Co. KG, headquartered in Esslingen-Berkheim, has also experienced this from time to time. From there, Festo IT centrally provides its services to international users. And from there, a network is operated that also includes locations in China and Brazil, 10,000 kilometers away.
What the project aimed to achieve
Time differences and linguistic diversity don’t make IT support any easier. Cultural differences also come into play: A German user in the office next door will provide quick and direct feedback to support if something isn’t working. “The threshold for filing a complaint is significantly higher in other cultures,” notes Matthias Schmidt, Head of Information Management Workplace Services at Festo. And then there is the phenomenon described at the beginning: Users complain about poor performance, low bandwidth, or client malfunctions, even though all services in the data center and network area are provided in accordance with service level agreements. “Just because a service is fully available doesn’t mean it can be accessed optimally,” explains Helmut Claß, Head of User Services at Festo.
While the status of backend systems can be determined with virtually complete accuracy, conducting a comprehensive health check of the client environment proves to be difficult, if not impossible. Tools provide a wealth of information about the hardware and software installed on a workstation. However, when it comes to behavior, performance, and stability, the client remains a major unknown. The task, therefore, was to develop new methods and solutions to detect performance issues and behavioral anomalies in the client landscape at an early stage—ideally, before the user even notices the problems.
What Approaches Don't Work
Of course, you can manually gather information from various log and configuration files or ask users for details. But that’s cumbersome and time-consuming—especially when the issue doesn’t occur consistently; in such cases, users might simply refer to “performance issues,” connection drops, or blue screens. “The information needed to understand the problem is often insufficient,” says Alexander Mack, a member of the project team at Festo: “This makes it nearly impossible to investigate the root cause.”
It is also possible to build a data warehouse that assists in a comprehensive analysis of information from
various systems (network monitoring, virus scanners, software management, etc.). This, too, is
complex and time-consuming. Programming robots to perform automated standard checks on the client was also out of the question. For such routine tests, there must at least be a suspected cause. “A change in perspective was necessary,” says Mack: Client and network availability should be viewed from the user’s perspective, and this required a solution that continuously analyzed activity data for irregularities—similar to the monitoring solutions used in the server environment.
What the solution entailed
The Festo IT team worked with the consulting firm Consulting4IT from Waldbronn to find such a solution. This brought the client monitoring solution from Swiss software provider Nexthink into play: Its collector, a passive driver requiring 500 KB of memory, analyzes IT services directly at the user’s end, identifies connections and destinations, and monitors all key events in real time—including resource usage, crashes, bandwidth, errors, and more. “We were skeptical,” admits Workplace Manager Schmidt: “Client monitoring in general was still in its infancy. Moreover, Nexthink was largely unknown in the German market. Without a contact person nearby, we would not have agreed to proceed further.”
How the project unfolded
In January 2013, Consulting4IT presented the solution. Festo decided to test its performance in specific areas—starting with China due to the urgency of the situation. The project kicked off in March. Within two days, a preconfigured appliance was implemented to continuously monitor 3,500 computers across Festo’s Asian subsidiaries. After the collector was deployed via software distribution, it collected data for four weeks. Thresholds were defined, and the system triggered an alarm whenever these thresholds were exceeded.
The subsequent presentation of the test data revealed a few surprises: First, the number of active devices per Wi-Fi access point was higher than expected; in some cases, up to 20 workstations were connected to a single access point. Second, users frequently had specialized requirements for which the respective device was not originally designed. Overall, the on-site infrastructure was not sufficiently scaled in all areas, as the analysis clearly demonstrated. Based on these results, Festo decided to invest in Nexthink licenses to enable real-time ad hoc analyses at any time and comprehensive evaluations over longer periods.
When things got critical
Transparency on the client side is not universally welcomed. After all, it suggests that information about user behavior could also be collected and analyzed.
“We were aware of the implications,” Schmidt reports, “which is why the works council and the data protection officer were involved in all discussions from the outset.” To protect user privacy, Nexthink allows data to be anonymized.
The next steps
Following its rollout in Asia, Nexthink is now being deployed across approximately 11,500 clients in Europe and the Americas. The next step will focus on first- and second-level support: If a support agent knows when an error occurred and can view all concurrent events in its context, they will be able to quickly identify the cause. In the long term, client monitoring will also be used in security management. “Unlike with traditional security solutions, I don’t need to know the attacker’s footprint to identify them,” explains Schmidt. “Suspicious behavior—such as uploading large amounts of data in combination with running an unknown EXE file—is enough to trigger an alert.”