4-6 Month Contract initially.
Responsible for maturing the ASRS’ monitoring, alerting and reporting capabilities. Presently, the IT ops team relies on a mix of tools and applications’ built-in features to view and receive notifications when systems and services are in need of attention. The ideal candidate will be capable of inventorying all ASRS IT services, assessing/documenting current monitoring/alerting/reporting condition for each (as-is state) and designing/developing/executing/documenting a plan to create a comprehensive, cohesive and consistent operational monitoring/alerting/reporting program. Tools are in place but require a focused and engaged professional to optimize and make the suite useful and usable for the operations team. Additionally, the candidate will apply understanding and experience of service level agreements when improving the monitoring/alerting/reporting program.
ESSENTIAL FUNCTIONS• Work with the Network/Systems/InfoSec/Applications teams and SolarWinds Support to troubleshoot and understand system faults and application performance issues.• Inventory and assess critical applications and ensure monitoring and alerting is optimized and proactively notifying/escalating with the appropriate personnel for follow up actions.• Inventory and assess existing system/application alerting and develop/apply standards so information is clear, consistent and provokes action based on priority.• Develop, support and manage complex performance, health and inventory reports (proficiency with SQL queries a plus).• Interface with appropriate personnel to ensure proper escalation during outages or periods of degraded system performance.• Oversee the daily SolarWinds execution and maintain an expert view of the current SolarWinds state of operations.• Build and enhance the operations team’s service-oriented NOC view for quick and easy identification of looming and/or active events and upstream/downstream dependencies.• Develop sound solutions to Tier 3-4 complex problems, produce system level reports, report status to senior level management.• Apply automation techniques to lessen the burden of day-to-day changes requiring updates to system monitoring.• Design and implement Groups and Dependencies to streamline monitoring and alerting.• Other SolarWinds-related improvements, upgrades and optimizations as identified or requested by management.• Monitor, analyze and pinpoint subsystem performance issues on Windows and Linux Operating Systems (perfmon/PAL, etc).• Troubleshoot and pinpoint root cause of application performance and availability issues.
BASIC QUALIFICATIONS• Bachelor's degree in Engineering, Computer Science or related field and 4+ years of relevant experience is required. Related experience, training, and/or certifications accepted in lieu of a degree.• Subject Matter Expert on SolarWinds NPM, SAM, WPM and IPAM, and growing, maintaining and improving the SolarWinds application infrastructure.• Thorough understanding of monitoring protocols, including SNMP, WMI, ICMP.• Ability to read and write SQL to facilitate advanced monitoring, alerting and reporting.• Experience maintaining the integrity and security of servers and systems.• Understanding of web application architecture and ability to troubleshoot the issues up and down the stack (n-Tier apps e.g., LAMP, MEAN, etc).• Ability to monitor, analyze and pinpoint subsystem performance issues on Windows and Linux Operating Systems.• Ability to script for automation and information gathering using Power Shell.• Experience with ITIL/Service Management (especially incident, problem, configuration and change management).• Understanding of container technologies (such as Docker).• Excellent communication skills.• Problem Solving/Critical Thinking.• Customer Service Focus.• Excellent Time Management.