2021-06-08  views:3703

Common types of man-made faults in the operation and maintenance of UPS power systems in data centers


Huabao Golden Warrior ups power supply

  

   The reliability of the data center power system is very important. It is conceivable that no matter how sophisticated the IT equipment is, how superior the function of the system, and how high its reliability is, once the power goes out, no matter how good the system is, it will not work. Therefore, the importance of equipment maintenance during operation cannot be ignored. It can be seen that the burden on the shoulders of maintenance personnel is very heavy.

   

   In order to ensure the reliable operation of the power supply system, many good measures have been formulated in many places. But even so, there are many loopholes. The reliability of the equipment has been determined after leaving the factory. For example, some are born with deficiencies. For example, some power output isolation transformer windings use aluminum enameled wires instead of copper enameled wires. In all cases, accidents will happen when running at full load... but because of the equipment itself Statistics of failures caused by quality problems show that less than 30%, 70% of failures are acquired, that is, man-made failures, and their performance is as follows:

  

  1 Failure caused by improper selection

  

  (1) The basic concept is unclear and easy to be misled by the manufacturer. For example, a highway bidding for UPS power supply requires the UPS to have the ability to continue to supply power without discharging the battery after one or two phases are interrupted in the input. Because some manufacturers advertise that the battery of his UPS does not discharge after one phase of the input is interrupted, the UPS still has 50% of the power supply capacity; the battery still does not discharge after the input of two phases is interrupted, and the UPS still has 25% of the power supply capacity, which extends the battery's power. Service life. Users think this performance is good, and it is not difficult to find its shortcomings with a little brainstorming: if you want to enjoy its advantages, you must purchase a UPS with 4 times the load capacity, otherwise the current load will not be driven after one phase is broken. Then again, what if the UPS disconnects the two wires behind the input switch? Repair it or not? When will it be repaired? Can it be repaired after the power is completely cut off? Wait for how to solve this series of problems. If the user really buys such a UPS according to the actual capacity of the load, this is a huge hidden danger, which is a problem that cannot be solved by operation and maintenance.

  

  (2) Reasons for inconvenience. For example, some users have been using a certain brand of machine since the last century. At that time, due to objective reasons, despite the low input power factor, low efficiency, large size, high power consumption and high price, it was impossible and inconvenient to solve it. New models that are much superior to the original ones have already come out. For example, the new high-frequency online UPS saves 50,000 kilowatt-hours of electricity per 100 kilowatts per year compared with the original power-frequency structure UPS, and this multi-megawatt capacity The computer room can save millions of kilowatt-hours of electricity every year. However, for some reason, the energy-consuming machine was still included in the bid book without choosing the energy-saving equipment. I was afraid that it would not be safe to do so, and the structural characteristics of the machine were written into the bid book. This not only increases the investment and floor space of air-conditioning equipment, but also undoubtedly lays down hidden dangers for future operations. This is another problem that cannot be solved in operation and maintenance.

  

  (3) Pursue low prices. Some users think that UPS power supplies are all the same, so they pursue low prices, which leads to failures. For example, a highway headquarters was greedy for cheap, and installed the machine on the first day and caught fire on the second day; a life insurance company purchased a machine at a low price and burned almost all the input circuits of IT equipment due to UPS failure in less than half a year, causing the system to be paralyzed ; Another example is a megawatt data center with multiple UPSs connected in parallel, and within a few months of installation, one of the inverter power tubes in one of the UPSs broke down and all UPSs were tripped...

  

  2 Failure caused by improper use environment

  

   Do not install the machine in accordance with the environmental requirements in the manual, and some even put the UPS in the corridors and dripping basements that you can walk through. For example, a few 200kVA UPS power supplies are placed in a bungalow with only one layer of prefabricated panels on the roof, and the air conditioners are just two 5P comfortable air conditioners. Another example is a glass factory that places the UPS power supplies in a powdery factory building, and so on. Caused frequent failures.

  

  3 Failure caused by incomplete system

  

   For example, some personnel on duty randomly connect electric stoves, rice cookers and vacuum cleaners to the UPS power supply, causing overload and tripping; some personnel’s food causes rats to get into the machine and cause fire...

  

  4 Handover failure

  

   This type of failure is mainly caused by the lack of a group of management personnel or poor cooperation. For example, in a railway station ticketing system, the front check-in personnel disconnected the external battery pack of the UPS power supply due to moving the machine location, and did not explain to the latecomer afterwards, which resulted in the failure of the mains and UPS power supply at the same time...

  

  5 Experience failure

  

  Experience is indispensable, it is a rare treasure. But experience has its relativity, that is, the experience gained on a certain UPS power supply may not be completely suitable for another UPS power supply, otherwise it will lead to failure. A telecommunications bureau used the same method to start another brand machine without reading the manual, which caused the inverter to burn out.

  

  6 Oversight failure

  

  Some devices will experience aging or early failure during operation, and failure will result if they are not checked in time. These cannot be found in automatic monitoring. For example, a fuse that starts to bend due to aging, loosening of battery structural screws, and micro cracks in the battery case after a long period of battery discharge, etc., if not discovered in time or not handled in time after discovery, it can cause failure.

  

  7 Failure caused by rushing into battle

  

   Do not have the slightest impatience when doing maintenance, you have to think about it before doing it. An engineer of a company wants to overhaul the UPS power supply that a user is running. According to the regulations, the UPS power supply should be removed from the maintenance bypass switch and then overhauled. However, according to the procedure, the automatic bypass must be activated first, and then the maintenance bypass switch should be closed. . Perhaps the project has other urgent matters to be done. After entering the computer room, the bypass switch was closed without consideration, which caused the inverter power tube to explode.

  

  8 Secondary failure caused by improper maintenance

  

   Regular maintenance of UPS power supply is necessary, but there should be a set of strict management procedures. Those who are irresponsible and do not perform regular or irregular maintenance according to regulations are important reasons for machine failure. In addition, it can also cause malfunctions during maintenance. For example, when measuring the potential of a circuit board with a multimeter probe, the probe will short-circuit two points and cause a malfunction. When a user discharges the battery, he removes the battery from the UPS power supply. When the battery is connected back after the discharge, the model is liberated, causing the current to explode. Another example is when an engineer accidentally slipped the adjustable wrench and hit the control panel when replacing the centrifugal fan. He didn't care at the time. After the fan was replaced, he couldn't turn on the machine. The inspection found that a device leg was broken...

  

  9 Failure caused by static electricity

  

   A computer room was shut down for maintenance as usual, but it could not be turned on after maintenance. After inspection, it was found that a component had a voltage breakdown. Recalling the maintenance process, it was found that the control board was swept through dust with a plastic toothbrush. Plastics can generate several thousand volts of friction electrostatic voltage on the surface of the drying device. Because the small signal circuit in the machine uses some MOS devices, these devices have low withstand voltage and are most afraid of static electricity. After measuring an ordinary plastic bag, it can generate an electrostatic voltage of 3000V by rubbing it with a circuit board. Therefore, it is best to put a grounding ring on your wrist when inspecting these circuit boards.

  

  10 Failure caused by overconfidence

  

   Self-confidence in doing things is the foundation of success, but overconfidence can sometimes make mistakes. For example, an international bank should update its equipment after UPS power supply has been running for 5 years, and the manufacturer has repeatedly reminded it. Since the UPS power supply has rarely had problems in the past 5 years, the user in charge repeatedly answered "No need to update". As a result, the UPS power supply stopped supplying power for two hours due to aging failure after a few months, causing global business to be interrupted for two hours, resulting in a great loss .

  

  According to international statistics, the nominal battery service life of 5 years is no more than 3 years. Usually, it should be replaced within 2 years if it is not maintained. The battery in the terminal of an airport was originally equipped for 4 hours, and it was still not replaced after 3 years. A power outage on the external power grid resulted in only 4 hours of UPS backup time. The power outage caused losses...

  

   There are many similar man-made malfunctions, so I won’t list them one by one.

  

   In the final analysis, the selection of the power supply system is the first level. Failure to control this level first planted the seeds of hidden dangers. The connection, operation and maintenance, and use environment of the power supply system are very important. With good equipment, if there is no good operation and maintenance, hidden dangers will be buried.