1 TDTWG Report to RMS SCR 745 ERCOT Unplanned System Outages Wednesday, July 13th.
-
Upload
claire-strickland -
Category
Documents
-
view
222 -
download
0
Transcript of 1 TDTWG Report to RMS SCR 745 ERCOT Unplanned System Outages Wednesday, July 13th.
1
TDTWG Report to RMS
SCR 745
ERCOT Unplanned
System Outages
Wednesday, July 13th
2
Motion
SCR745 includes:
(1.) a system evaluation and
(2.) a recommended solution based on a review of the evaluation.
SCR745 will be sent to the TAC and Board for consideration and possible approval.
3
SCR 745 Analysis Approach
•SCR 745 requested ERCOT to perform in depth analysis in order to determine root causes for unplanned system outages.
•ERCOT in depth analysis indicates the current architecture supporting the Retail Market contains multiple single points of failure.
•While it is not possible to totally eliminate any possibility for an ERCOT system outage, it is possible to implement solutions that drastically reduce unplanned system outages for ERCOT by removing these single points of failure.
•This presentation includes the solutions identified.
4
Paperfree Process Servers
Key:
PROXIES
INTERNET
OUTBOUND
NAESB
TCH
EAI
TCH Database
Single Retail Database Server(Multiple Oracle Databases)
PAPERFREE
SIEBEL
Bi-Directional Data Flow
Siebel DatabaseNAESB Database Paperfree DatabaseOutbound Data Flow
Inbound Data Flow
INBOUND
Paperfree File Server
FIREWALL
Solaris
W2K
W2K
HP
W2K
W2K
HP
IN/OUT
DMZ
SWITCH
Single Point of Failure
Retail Systems• NAESB • PaperFree• TCH-EAI (Transaction Clearing House)• All Retail (Database Server)
Market Participant
5
The following options are being presented to assist RMS in reviewing and eventually approving the best solutions for resolving unplanned ERCOT system outages.
The 4 options include:
1 of 2 options for NAESB Proxy Server improvements
1 of 3 options for NAESB Application (dependent on NAESB Proxy Server option) 1 of 2 options for PaperFree improvements
1 of 3 options for Database Server for All Retail
System Options Included
6
Current NAESB Architecture
Key:
PROXIES
INTERNET
OUTBOUND
NAESB
TCH Database
Retail Database Server
(Oracle Databases)
Bi-Directional Data Flow
Siebel DatabaseNAESB Database Paperfree DatabaseOutbound Data Flow
Inbound Data Flow
INBOUND
FIREWALL
Solaris
W2K
HP
IN/OUT
DMZ
SWITCH
Single Point of Failure
The Retail Transaction communication system using the North American Energy Standard Board Electronic Delivery Mechanism (NAESB EDM) V 1.6. This system is an internet based protocol.
The current NAESB architecture includes 2 NAESB Proxy servers in Taylor and 2 NAESB Proxy servers in Austin (to be used for disaster recovery only).
Due to the large quantity of data and critical timing for that data, the current NAESB architecture is insufficient for supporting the Texas Retail Market.
7
NAESB Proxy Server Options Option 1 – Fully Clustered* V880 Solution –
4 V880 NAESB Proxy Servers
Summary – Maximum reliability solution. This option will provide a fully clustered and fault tolerant solution; opportunity to consolidate the current 18 production proxy servers including the servers identified in Option 2
This option virtually eliminates the potential for NAESB proxy outages, unplanned or planned.
This option will provide 99.99% availability for the NAESB proxy servers.
*Cluster: A group of servers that are typically on different physical machines and have the same
applications configured within them, but operate as a single logical server.
8
NAESB Proxy Server Options
Option 2 – 4 V120 NAESB Proxy Servers.
Summary – Minimum reliability solution.
This option will provide redundancy to address the single point of failure. Two servers will be located in Taylor and two servers will be located in Austin.
This will not be a clustered solution it will be a load balance solution. V120 servers cannot cluster.
This solution will reduce the frequency and duration of proxy outages, is not as costly as option 1 but is also not as a robust solution as Option 1.
9
NAESB Application Options
Option 3 - Separate Application Server Cluster
This option moves peripheral NAESB processes (data encryption, decryption) to the PaperFree cluster and separates inbound and outbound transmissions to disconnected clusters.
10
NAESB Application Options
Option 4 Hybrid Application Cluster
This option creates an application cluster for inbound transactions and moves outbound transaction processing to the PaperFree system in order to utilize PaperFree’s load balancing and high availability capabilities.
11
NAESB Application Options
Option 5 – Combined Application Cluster
This option combines inbound and outbound transaction processing into a single application cluster.
12
Summary of NAESB Application Cost
Option 1 V880 Server Cluster $370,000
Option 2 V120 Server Redundancy $97,000
Option 3 Separate Application Server Cluster $175,000
Option 4 Hybrid Application Cluster $165,000
Option 5 Combined Application Cluster $235,000
Must choose one selection of Option 1 or Option 2 and one selection of Option 3, Option 4 or Option 5.
An additional cost of $66,105 identified for Training, Business Process and Monitoring.
Blue highlighting identifies recommended solution
13
PaperFree
Paperfree Process Servers
TCH Database
Retail Database Server
(Oracle Databases)
PAPERFREE
Siebel DatabaseNAESB Database Paperfree Database
Paperfree File Server
W2K
HP
Key:
Bi-Directional Data Flow
Outbound Data Flow
Inbound Data Flow
Single Point of Failure
Paper Free includes the data validation and transformation system. The current architecture contains a single disk share for multiple load balanced application servers. This disk is the single point of failure for this system.
14
PaperFree Options
Option 1 – Clustered File System Server solution
This option represents the maximum availability solution.
TCH Database
Retail Database Server
(Oracle Databases)
PAPERFREE
Siebel DatabaseNAESB Database Paperfree Database
PaperFree (Option 1)
Paperfree File ServerCluster
W2K
HP
Key:
Bi-Directional Data Flow
Outbound Data Flow
Inbound Data Flow
Single Point of Failure
Paperfree Process Servers
File
ser
ver c
lust
er V
irtua
l IP
15
PaperFree Options
Option 2 – Local File System Solution
– This option supports the load balancing applications
– The system will still be active with a single sever failure; however server interruptions may result in delays in processing persistent data for the server experiencing an interruption.
TCH Database
Retail Database Server
(Oracle Databases)
PAPERFREE
NAESB Database Paperfree Database
HP
Key:
Bi-Directional Data Flow
Outbound Data Flow
Inbound Data Flow
Single Point of Failure
Paperfree Process Servers
16
Summary of PaperFree Costs
• Option 1 – Clustered File System Server solution– $75,000
• Option 2 – Local File System Solution– $105,000
Blue highlighting identifies recommended solution
17
All Retail System
Paperfree Process Servers
Key:
PROXIES
INTERNET
OUTBOUND
NAESB
TCH
EAI
TCH Database
Single Retail Database Server(Multiple Oracle Databases)
PAPERFREE
SIEBEL
Bi-Directional Data Flow
Siebel DatabaseNAESB Database Paperfree DatabaseOutbound Data Flow
Inbound Data Flow
All Retail (Database Server)
INBOUND
Paperfree File Server
FIREWALL
Solaris
W2K
W2K
HP
W2K
W2K
HP
IN/OUT
DMZ
SWITCH
Single Point of Failure
18
All Retail System
Key:
Retail Database Server
(Oracle Databases)
Bi-Directional Data Flow
Siebel DatabaseNAESB Database Paperfree DatabaseOutbound Data Flow
Inbound Data Flow
HPSingle Point of
Failure
The All Retail System is the database server which houses each system’s database ( NAESB, PaperFree, Siebel and TCH-EAI). This Database server is a single point of failure for multiple Retail Systems.
All Retail System Goal:Provide high availability for all databases that support the Retail Applications including; NAESB, PaperFree, Siebel, TCH-EAI. This will allow processing of data to continue in the event of a database server failure.
19
Database Server High Availability Options
Option 1 - All HP-UX Oracle Real Application Cluster (RAC)
Option 2 - All Linux Oracle Real Application Cluster (RAC)
For options 1 and 2:
Provides active redundancy for database connectivity for all retail databases
Complex to implement
Removes single point of failure at the database server level
20
Database Server High Availability Options
Option 3:– NAESB Linux Oracle RAC and Different Standby/cluster solution for
the rest of the Retail databases• Provides active redundancy for database connectivity for
NAESB database• Less complex to implement as NAESB database is small and
easier to migrate• Provides option to migrate PaperFree and Siebel to migrate into
this RAC• Removes single point of failure at the database server level
– Veritas cluster, or Oracle Standby or Oracle RAC for other databases on HP-UX or Linux for appropriate availability requirements.
• Phased implementation NAESB first and other databases next• Removes single point of failure at the database server level
21
Database Server High Availability Options
• Summary– All three options provide highest availability
architecture for NAESB database.– Option 1 and 2 provide highest availability
architecture for all databases, however, they are most expensive and complex to implement and manage.
– Option 3 provides highest availability option for the NAESB database and will provide appropriate high availability solutions for the rest of the retail databases in subsequent phases. Easier to implement in phased manner addressing acute availability needs first.
22
Summary of Database Server High Availability Costs
Cost– Options 1&2 Oracle RAC
• Hardware – $450,000• Cluster SW – $400,000• Oracle RAC SW - $400,000• Cluster Ext Service - $100,000• Oracle RAC Ext Service - $100,000• Internal project cost (FTE) - $180,000
• Total: $1,630,000
– Option 3 Partial Oracle RAC + Alternate Solution for remaining
• Hardware – $400,000 - $600,000• Cluster SW –$100,000 - $400,000• Oracle RAC SW - $0-$400,000• Cluster Ext Service –$0-$120,000• Oracle RAC Ext Service - $120,000 - $180,000• Internal project cost (FTE) - $120,000 - $180,000
• Total: $890,000 - $ 1,650,000
23
Next Steps
Today if recommended by RMS, TDTWG will facilitate a technical workshop to be held before the next RMS meeting.
This workshop is intended to help RMS members and interested Market Participants review the in depth system evaluation in order to select recommended solution(s) for approval at the August RMS meeting.
24
Questions