Asm Health Checker Found 1 New Failures: Updated

The message "ASM Health Checker found 1 new failures updated" is not a death knell for your database. Instead, it is an early warning system that Oracle ASM has detected a single, specific anomaly in your storage infrastructure. By methodically examining the ASM alert log, querying the dynamic performance views, and investigating the OS/storage layer, you can quickly identify whether the issue is a transient path failure, an offline disk, or a more serious metadata corruption.

Remember: One new failure means you have time to react, but you should react immediately. Ignoring the alert can lead to a cascade of failures, especially in lower redundancy configurations. With the diagnostic steps and remediation strategies provided in this guide, you can confidently resolve the alert and restore your ASM environment to full health.

Stay proactive, monitor your disk groups regularly, and your ASM health checker will reward you with silence—the best alert of all.


Keywords used: ASM Health Checker found 1 new failures updated, ASM alert log, Oracle ASM troubleshooting, disk group failure, v$asm_diskgroup, offline disks, ASM metadata corruption, multipath failure, Oracle RAC health check.

The message "ASM Health Checker found 1 new failures updated" is a critical alert typically found in the Oracle Automatic Storage Management (ASM) alert logs. It indicates that the Oracle Fault Diagnosability Infrastructure has detected an issue—such as metadata corruption or disk accessibility problems—and has created an "incident" for further investigation. What This Failure Means

When this message appears, it usually follows a specific event like adding a disk, a rebalance operation, or a diskgroup dismount. The "failure" refers to an entry in the Automatic Diagnostic Repository (ADR), which tracks critical errors that could impact data availability.

Incident Detection: The health checker identifies a specific occurrence (incident) of a broader problem, such as a lost disk or corrupted block.

Automatic Analysis: Upon detection, the infrastructure often runs deeper health checks to look for data block, undo, or redo corruption.

Resource State: You may notice your diskgroup resources in an INTERMEDIATE or OFFLINE state when this occurs. Common Causes 7 Diagnosing and Resolving Problems - Oracle Help Center

ASM Health Checker Found 1 New Failures Updated: What It Means and How to Resolve It

Automatic Storage Management (ASM) is a vital component of Oracle databases, responsible for managing storage resources and providing a layer of abstraction between the database and the underlying storage devices. The ASM health checker is a built-in tool that monitors the health and performance of ASM instances, alerting administrators to potential issues before they become critical problems.

If you've received a notification that the "ASM health checker found 1 new failures updated," it's essential to understand what this message means and take prompt action to resolve the issue. In this article, we'll delve into the details of ASM health checking, explore the possible causes of this error, and provide step-by-step guidance on how to troubleshoot and fix the problem.

Understanding ASM Health Checking

The ASM health checker is a continuous monitoring process that checks the health and performance of ASM instances. It collects data on various aspects of ASM operations, including:

The health checker uses this data to identify potential issues, such as disk failures, performance bottlenecks, or configuration problems. When an issue is detected, the health checker updates the ASM alert log with a failure message, indicating the type and severity of the problem.

What Does "ASM Health Checker Found 1 New Failures Updated" Mean?

When you receive a notification that the "ASM health checker found 1 new failures updated," it means that the ASM health checker has detected a new issue with the ASM instance or one of its associated disks. The failure message is updated in the ASM alert log, indicating that a new problem has been identified.

The failure message may indicate a variety of issues, including:

Causes of ASM Health Checker Failures

There are several possible causes for ASM health checker failures, including:

How to Troubleshoot and Resolve ASM Health Checker Failures

To troubleshoot and resolve ASM health checker failures, follow these steps:

Step-by-Step Troubleshooting Guide

Here's a more detailed, step-by-step guide to troubleshooting ASM health checker failures:

Step 1: Check the ASM Alert Log

Step 2: Verify ASM Disk Status

Step 3: Investigate Disk Performance

  • Identify potential bottlenecks or issues with disk performance.
  • Step 4: Review ASM Configuration

    Step 5: Check Database and Storage Connections

    Resolving ASM Health Checker Failures

    Once you've identified the root cause of the ASM health checker failure, take corrective action to resolve the issue. This may involve:

    By following these steps, you can troubleshoot and resolve ASM health checker failures, ensuring the stability and performance of your Oracle database and ASM environment.

    Conclusion

    The "ASM health checker found 1 new failures updated" message indicates a potential issue with the ASM instance or one of its associated disks. By understanding the causes of ASM health checker failures and following a step-by-step troubleshooting guide, you can identify and resolve issues before they become critical problems. Regular monitoring and maintenance of ASM instances and disks can help prevent health checker failures and ensure optimal performance and stability of your Oracle database and storage environment.

    Troubleshooting "ASM Health Checker Found 1 New Failures Updated"

    If you are an Oracle Database Administrator, seeing the alert "ASM Health Checker found 1 new failures updated" in your logs or monitoring dashboard (like Enterprise Manager) can be a bit jarring. This message is the Oracle Automatic Storage Management (ASM) framework’s way of telling you that its internal diagnostic engine has detected an issue that could compromise the health of your storage layer.

    Here is a deep dive into what this error means, why it happens, and how to resolve it. What is the ASM Health Checker?

    The ASM Health Checker is a proactive diagnostic utility that runs within the Oracle Grid Infrastructure. It constantly monitors the state of ASM disk groups, metadata consistency, and background processes.

    When it detects a discrepancy—such as a corrupted metadata block, a disk timeout, or an offline disk—it logs a "failure." The "Updated" status usually means the health check engine has refreshed its findings and confirmed that the issue is persistent and requires administrator intervention. Common Causes for This Alert

    While the message itself is a general notification, the "1 new failure" usually stems from one of the following:

    Disk Connectivity Issues: A physical disk or LUN has become unreachable or is experiencing intermittent latency.

    Metadata Corruption: Inconsistency in the ASM Allocation Units (AU) or disk headers.

    Disk Group Imbalance: A rebalance operation failed or was interrupted, leaving the disk group in a "degraded" state.

    Offline Disks: A disk was dropped or taken offline due to I/O errors, but the redundancy (if using Normal or High redundancy) kept the database running. Step-by-Step Resolution Guide 1. Identify the Specific Failure

    The alert message is just the "headline." You need to find the specific error code (like ORA-15032 or ORA-15078).

    Check the Alert Log: Navigate to your ASM diagnostic trace folder and check the alert_+ASM.log.

    Use ADRCI: Run the command adrci and use show alert to see the most recent incidents and their specific impact. 2. Query the ASM Views

    Log into your ASM instance via SQL*Plus (sqlplus / as sysasm) and run the following to see the status of your disks:

    SELECT group_number, name, state, type FROM v$asm_diskgroup; SELECT path, header_status, mode_status, state FROM v$asm_disk; Use code with caution.

    Look for any disks where the header_status is CANDIDATE (instead of MEMBER) or mode_status is OFFLINE. 3. Check for Ongoing Rebalances

    Sometimes the health checker flags a failure if a rebalance is stuck. SELECT * FROM v$asm_operation; Use code with caution.

    If an operation is hanging, you may need to investigate the underlying I/O subsystem. 4. Run a Manual Check (The "Check" Command)

    You can force ASM to verify the consistency of a disk group to see if it clears the error or provides more detail: ALTER DISKGROUP CHECK ALL; Use code with caution. Proactive Tips to Prevent Future Failures

    Monitor I/O Latency: Often, the health checker finds a "failure" simply because a storage array is too slow. Monitor your OS-level tools like iostat or sar.

    Update Grid Infrastructure: Ensure you are on the latest RU (Release Update), as Oracle frequently releases patches for ASM Health Checker "false positives."

    Verify Redundancy: Always ensure your critical disk groups are at least on "Normal" redundancy to allow the health checker to find and fix issues without taking the database offline.

    The "ASM Health Checker found 1 new failures updated" alert is a call to action. It usually indicates a physical storage hiccup or a metadata inconsistency. By checking the ASM alert logs and querying v$asm_disk, you can usually pinpoint the culprit disk and bring it back online or replace it before a total outage occurs.

    The message "ASM Health Checker found 1 new failures" is a critical alert typically generated by Oracle Automatic Storage Management (ASM). It indicates that the background health monitor has detected a significant issue within the storage layer that could impact database availability. Immediate Diagnostic Steps

    To identify the specific cause, you should immediately examine the ASM alert log and current disk status:

    Check the Alert Log: Look for ORA- errors (like ORA-15130 or ORA-15063) in the trace file directory:

    Path: /u01/app/oracle/diag/asm/+asm//trace/alert_.log.

    Verify Diskgroup Status: Run the following command in the ASM instance to see which group is affected:

    SQL> SELECT name, state, offline_disks FROM v$asm_diskgroup;.

    Check Individual Disk Health: Identify if a specific disk has dropped or is hung:

    SQL> SELECT path, header_status, mode_status FROM v$asm_disk;. Common Causes & Solutions KB88485 - My Oracle Support asm health checker found 1 new failures updated


    In multipath environments (e.g., DM-Multipath on Linux, PowerPath on AIX), a loss of one path to a disk does not immediately offline the disk. However, the ASM Health Checker detects increased I/O latency or path errors and reports a new failure, even if the disk remains online.

    Rarely, the "ASM Health Checker found 1 new failures updated" message appears without any actual hardware or storage issue. This can happen due to:

    If you cannot find any underlying failure, cross-check with Oracle Support (My Oracle Support) for known bugs in your ASM version. Apply the latest Grid Infrastructure patchset if needed.


    If you are an Oracle Database Administrator (DBA) managing an Oracle Real Application Clusters (RAC) environment, you have likely encountered a cryptic but critical message in your alert logs or monitoring console: "ASM Health Checker found 1 new failures updated."

    At first glance, this message can induce panic. Does it mean data loss? Is your disk group about to crash? Will your production database go offline? Fortunately, in most cases, this alert is a proactive warning from Oracle’s Automatic Storage Management (ASM) diagnostics framework. However, ignoring it can lead to severe performance degradation or service interruption.

    This comprehensive guide will dissect every aspect of this error message. We will explore what the ASM Health Checker is, why it triggers this alert, how to diagnose the specific failure, and step-by-step remediation strategies.


    Introduction
    The message "ASM health checker found 1 new failures updated" signals that a monitoring component (an ASM health checker) has detected and recorded a newly identified failure in a system. This brief notification encapsulates operational realities—detection, state change, and the need for response—and invites examination of its technical meaning, potential causes, implications, and recommended actions.

    What the message means

    Possible contexts and specific interpretations

    Likely root causes (examples)

    Operational impacts

    Recommended immediate steps (triage checklist)

    Longer-term remediation and prevention

    Communicating about the incident

    Conclusion
    The single-line notice "ASM health checker found 1 new failures updated" is a prompt to investigate. While one new failure may be harmless in a fault-tolerant system, it can also be the first sign of worsening conditions. Rapid, evidence-based triage followed by durable fixes and improved monitoring reduces risk and operational burden.

    The message "ASM Health Checker found 1 new failures" typically appears in the Oracle Automatic Storage Management (ASM) alert log when a critical issue—such as a disk failure or a forced diskgroup dismount—is detected. This is part of Oracle's fault diagnosability infrastructure designed to capture diagnostic data at the first sign of trouble. Immediate Actions to Take

    If you see this message, follow these steps to identify and resolve the failure:

    Check the ASM Alert Log: Review the alert log (often located in /u01/app/grid/diag/asm/+asm/+ASM/trace/alert_+ASM.log) for errors preceding the health checker message, such as ORA-15130 (diskgroup being dismounted) or ORA-15032.

    Run ADRCI: Use the ADR Command Interpreter (ADRCI) to view the specific "incident" or "problem" that was logged. Command: adrci> show problem or adrci> show incident

    Verify Diskgroup Status: Log into the ASM instance and check if any diskgroups are offline or if disks have been dropped. SQL> select name, state from v$asm_diskgroup;

    SQL> select name, header_status, mode_status from v$asm_disk;

    Investigate I/O Failures: Look for hardware-level issues, such as storage path failures, SAN/NFS connectivity problems, or OS-level permission changes that might have caused the disk to go offline. Common Causes

    Disk Path Failure: The OS can no longer see the physical storage device.

    Forced Dismount: ASM may force a dismount if too many disks in a failure group are lost, exceeding the redundancy limit.

    Communication Issues: In a RAC environment, network or heartbeat failures between nodes can trigger ASM health alerts.

    For automated assistance, you can use tools like Oracle ORAchk to run a comprehensive health check on your entire Oracle stack.

    It sounds like you're referencing a log or output from an ASM health check (likely Oracle Automatic Storage Management). A useful review would typically include:

  • Comparison to previous check: what was the previous failure count? Were existing failures resolved?
  • Suggested actions: repair steps, commands (e.g., ALTER DISKGROUP ... CHECK, REPAIR, DROP/ADD disk), or need for manual intervention.
  • Impact assessment: risk to database availability or redundancy (normal/high redundancy).
  • If you share the actual failure text or log snippet, I can help interpret it and recommend next steps.

    ASM Health Checker alert "found 1 new failures updated" typically indicates that the BIG-IP system's internal monitoring has detected a specific resource or service failure within the Application Security Manager (ASM)

    . This is often triggered when a monitored resource crosses a predefined threshold or a critical daemon stops responding. Immediate Review Checklist To review and resolve this failure, follow these steps: Identify the Failure Source : Navigate to Security > Reporting > Settings > ASM Alerts

    in the Configuration utility. This screen displays which specific health alert was triggered (e.g., CPU usage, memory limits, or database connectivity). Check Daemon Health : Verify if critical ASM processes like asm_config_server are running. You can check this via the command line using tmsh show /sys service Investigate Recent Changes

    : Review the audit logs for recent maintenance activities, such as software upgrades, re-licensing, or configuration loads, which are common triggers for ASM health failures. Examine MySQL Database Status The message "ASM Health Checker found 1 new

    : ASM relies heavily on an internal MySQL database. Check for database corruption or space issues by running tmsh load sys config verify or reviewing /var/log/asm for SQL-related errors. Utilize iHealth Diagnostics : Generate a file and upload it to the F5 iHealth portal

    . This will automatically compare your system state against known bugs and best practices to pinpoint the exact failure. Common Root Causes

    Configuring BIG-IP ASM system resource alerts using ... - My F5

    "ASM Health Checker found 1 new failures updated" typically indicates that an automated diagnostic system has detected a potential issue within an Automatic Storage Management (ASM) environment

    . Depending on your specific infrastructure, this usually refers to either F5 BIG-IP Application Security Manager (ASM) Oracle Automatic Storage Management (ASM) 1. F5 BIG-IP ASM (Application Security Manager) In the context of F5, this message likely stems from the BIG-IP system health monitoring

    . It means the internal health checker has identified a failure in a service or a violation that requires attention. Common Causes Service Instability : Critical daemons (like asm_config_server ) might have hung or crashed. Resource Exhaustion : The disk partition for logs (

    ) may be full, preventing new security events from being recorded. Configuration Mismatches

    : A recent policy update or "Check for Updates" for attack signatures might have failed. Recommended Actions Check Daemons tmsh show sys service asm to ensure all core services are running. Review Logs /var/log/asm /var/log/ltm for specific error codes. Restart Services : If services are hung, use pkill -f asm_config_server (restarting these generally does not impact live traffic). 2. Oracle ASM (Automatic Storage Management)

    If you are running an Oracle database, this alert typically comes from the Fault Diagnosability Infrastructure Health Monitor , which detects corruption or hardware failures. Oracle Help Center

    Instant Fix for Oracle ASM Disk Failures The message ASM Health Checker found 1 new failures means the Oracle ASM background diagnostic engine detected a serious disk, disk group, or storage accessibility issue. When this error appears in the ASM alert log, it is usually preceded by underlying I/O dropouts or timeout warnings. This requires immediate DBA intervention to prevent data loss or complete cluster eviction. 🛠️ Root Causes of the ASM Failure Alert

    The Oracle Automatic Storage Management (ASM) Health Checker periodically polls the storage environment's overall health. Below are the most common scenarios that trigger this alert:

    Storage Path & Multipath Failures: Intermittent loss of connectivity to the SAN/LUNs causes heartbeat timeout warnings (e.g., Waited 15 secs for write IO).

    Partner Status Table (PST) Corruption: Too many offline disks in the PST disable the read quorum, triggering a forced dismount.

    I/O Timeouts: Slow response times from the storage subsystem cause the Oracle ASM instance to drop the impacted disks.

    Storage Configuration Drift: Re-scans, OS reboots, or sector size changes (ORA-15085) on the SAN break the shared storage layer. 📋 Comprehensive Troubleshooting Guide

    When your ASM instance registers a failure, use this sequence of administrative tasks to evaluate and fix the problem. 1. Locate the Relevant Trace Files

    Before making any changes, retrieve the trace file that corresponds to the background error. Look for lines right above the alert in your ASM alert log to identify the specific RBAL or GMON background trace file.

    # Locate your ASM Alert log using the ADRCI tool adrci> show alert -p "message_text like '%ASM Health Checker%'" Use code with caution. 2. Verify Your Current Disk Group Status

    Run the following SQL query within the SQL*Plus environment of the affected ASM instance to identify the disk group's operational mode:

    SELECT group_number, name, state, type, total_mb, free_mb FROM v$asm_diskgroup; Use code with caution.

    MOUNTED: The disk group is normal; the issue might be confined to a single disk.

    DISMOUNTED: The disk group has dropped offline. This indicates a loss of disk quorum. 3. Check for Ongoing Rebalance Operations

    A manual or automatic rebalance may clear the problem if the disk group maintains redundancy. Check the background work status:


    $ asmcmd health check
    ...
    FAILURE: Disk group DATA – Disk DATA_0002 is offline
    ...
    

    Connect to your ASM instance using sqlplus / as sysasm and run the following diagnostic queries:

    A. Check disk group overall health:

    SELECT name, state, type, total_mb, free_mb, offline_disks 
    FROM v$asm_diskgroup;
    

    If offline_disks > 0, you have confirmed physical disk failures.

    B. Identify failing disks:

    SELECT group_number, disk_number, name, path, state, mode_status, failgroup 
    FROM v$asm_disk 
    WHERE state != 'NORMAL';
    

    Disks in FORCING state (attempting recovery) or OFFLINE state are the culprits.

    C. Check for I/O errors (recent history):

    SELECT * FROM v$asm_disk_iostat 
    WHERE read_errs > 0 OR write_errs > 0 OR bytes_read = 0;
    

    D. Examine ASM operations:

    SELECT * FROM v$asm_operation;
    

    Look for active rebalancing or recovery operations that may have been triggered by the failure.