## Variation of CAISO's Nodal Prices in the Day-Ahead Market

Sept. 1, 2021
Nodal Pricing Duration Curve Standard Deviation Plotly PostgreSQL Python

## Domain

#### Introduction

This blog post is about one of the most fundamental characteristics of the wholesale electricity markets in the US that is distinct from other electricity markets, such as those in Europe. It is about electricity markets with nodal pricing at the example of CAISO's nodal pricing system. In a nodal pricing system, unique prices are determined at a particular time for all the supplying or generating resource locations. Whereas within a wholesale market that applies a zonal model there is only one market price for the market area at a particular time applicable to all the resources.

#### Nodal Pricing

Nodal prices are prices for electricity consumed or generated at a particular location at a given time. The nodal price is referred to as Locational Marginal Price (LMP). The LMP is the price for serving an additional increment of demand or supply at that location. CAISOs' LMPs are composed of three distinct price components: The marginal cost of energy (MCE), the marginal cost of losses (MCL), and the marginal cost of congestion (MCC). The MCE reflects the marginal cost of meeting the system-wide demand for electricity. Therefore, it is also referred to as the system marginal energy cost (SMEC). While the MCE/SMEC is the same for all the individual nodes at a given time, the marginal cost of losses and congestion can vary significantly between the different locations. In a theoretical system without any transmission losses and any congestion, the MCL and MCC would equal zero, and hence all the individual LMPs would only consist of the MCE and essentially converge to a uniform zonal price. But since losses naturally occur in an electrical system, all the LMPs typically have an MCL component different than zero. (On a side note: ERCOT uses LMPs, but does not include MCL in their LMPs.) Building the electrical system one has to decide which capacities of the grid are necessary and economically feasible. Therefore, you generally won't encounter a grid with unlimited transmission capacities to all of its nodes. Because of these physical limitations to transmit electricity to an arbitrary location, situations arise when the system's transmission capacities are not sufficient to serve all the demands least expensively. Although the system was able to meet the system's total demand, there would have been a generator available to supply electricity for a lower price, but was not able to because there was not enough transmission capacity for their electricity. This additional component to the electricity price due to congestion is captured by the MCC. (In a zonal model transmission limitations are disregarded at the time of the market runs, and resources are re-dispatched when congestion occurs.) So in a nodal pricing system, the MCL and MCC are essentially the price tags for additional demand or supply and their effect on the system's losses and congestion respectively. It is important to point out that the MCL, as well as the MCC, may be positive or negative. An incremental demand could also reduce the system's losses with the effect of a negative MCL component of their LMP.

#### Price Variance

As mentioned, price differences between individual nodes may be significant, depending on the location and the interconnected resource's marginal effect on the system's losses and congestions. Thus, from a generator's point of view, the location poses an inherent risk. If a generator was to make locational decisions for new generation solely on wholesale market prices, the nodal model would incentivize the generator to choose a location benevolent to the market. The decision of course depends on various other factors as well, that could outweigh the impact of the location on the LMPs. Yet, it illustrates the significance of LMPs on the operation of a resource. To get a better insight into how much LMPs generally vary between different locations, this post explores and quantifies actual LMPs variance in CAISO's Day-Ahead market (DA). Before moving on to the parts describing how this is quantified methodically and technologically, let's first take a high-level look at CAISO's node definitions, and which of the nodes were taken into account for the analysis.

#### Pricing Nodes

The CAISO employs a detailed model representation of the physical power system network called the Full Network Model (FNM) to determine LMPs. The physical model represents the transmission lines including information about interconnected generation and load. Every generating resource and load receives a node identification at the interconnection with the network. These physical nodes are called CNodes. A node for which the CAISO calculates LMPs is called a Pricing Node (PNode). On top of that, individual PNodes may be grouped into Aggregated Pricing Nodes (APNodes). LMPs for APNodes are calculated as weighted averages of the LMPs of their individual PNodes. This analysis will only explore the LMPs of APNodes to limit the total data requirements that would be necessary if the analysis was calculated based on the PNodes of which more than eight times as many exist compared to APNodes. Of all the APNodes, about 70 % are associated with generating resources, while the remaining 30 % are associated with other locations such as loads, or other types like trading hubs or interties. Because this analysis attempts to explore price variations for generating resources, all LMPs of other APNodes are excluded from the analysis.

## Method

A good and relatively easy way to describe the variance of the LMPs is by calculating their standard deviations for every hour of the year. The standard deviation of the LMPs can be interpreted by how much the LMPs fluctuate on average between different locations for the hour that they are calculated for. For visualization purposes, the resulting deviations are then ordered decreasingly from highest to lowest deviations. This approach is adopted from load duration curves (LDC). The method was found in this research paper , that analyzes nodal prices if Germany abandoned its zonal pricing model and introduced a nodal pricing model instead.

LDCs are typically used by utilities to describe the yearly load behavior of individual resources or loads at aggregated levels, such as a transformer substation for instance. The LDC is created by simply plotting the yearly load data in descending order for one year, and helps to quickly extract the information for how long a load remains above or below a particular threshold during one year. A steeper curve would typically result from a rather dynamic load behavior, while a static load behavior would present a relatively flat LDC. But keep in mind, that because of sorting the data, the time-dependency of the time-series data is lost. Hence, the LDCs of two distinct resources could look identical, even if one of the resource's loads fluctuates on an hourly basis while the load of the other resource remains relatively flat between individual hours but experiences a steady decrease over a year.

By applying this method to the calculated standard deviations, it is possible to quickly analyze the variance of the underlying market prices. For instance, the value for the hour 4380 of the ordered standard deviation reveals that for 50 % of the year, the LMPs differ more than the obtained value.

## Technology

In this section, the programming aspects of conducting the analysis are presented. This includes the data retrieval from CAISO, processing, and analysis of the LMP, and visualization of the results.

#### Data Retrieval

The CAISO operates the open-access data portal called OASIS . The data portal provides a lot of information regarding the transmission system, the various resources operating in the market, and also the electricity prices of the different markets that the CAISO operates. For this analysis, we are interested in the LMPs of the Day-Ahead market stage (DA) with data for at least one entire year. If you open OASIS in a browser, you can find find the LMPs by navigating to PRICES, and select Locational Marginal Prices in the drop-down menu. Afterward, you can select a date range, DAM for the DA market stage, and options to select data for all or a particular subset of nodes.

But downloading this much data manually would be very tedious. So I was looking to implement a solution to automate the data retrieval process. Fortunately, OASIS provides a publicly available API, which stands for Application Programming Interface, and allows programs to communicate with each other. The API is available at the location http://oasis.caiso.com/oasisapi, and is accessible by sending valid HTTP requests to that URL. To request a report, the report's name, as well as additional report specifications must be defined by passing key/value pairs to the URL before sending the request. For instance to request a report for the DA market LMP's, you add the key/value pair '?queryname=PRC_LMP' to the URL, where 'queryname' is the key, and 'PRC_LMP' is the value. When OASIS receives this request it parses the provided key/value pairs, and if everything was transmitted as the API expects, it returns an HTTP response and attaches the requested report. By default, OASIS returns a report in form of an XML file, but it is also optional to request is returned in form of a CSV file. To do so, you simply add the key/value pair '?resultformat=6' to the URL. To retrieve the data, I developed the API client that defines valid URLs, sends the HTTP-requests to the API in Python by utilizing the package requests , and processes OASIS's response afterward. The two API services that were accessed for the analysis were the service 'ATL_APNODE' to retrieve information about the nodes, together with the aforementioned service 'PRC_LMP' to request the market price data. With these two services all the necessary information about the nodes and market prices dating back until Apr. 2018 could be retrieved.

#### Data Processing

When the request concludes successfully, the API response includes the requested report compressed in a ZIP file. But what happens when the request does not conclude successfully? This would occur if the URL is not valid, for instance when required parameters are not included in the URL. In that case, the API returns a ZIP file, that consists of an XML file named 'INVALID_REQUEST.xml'. Depending on the cause for the request to fail, the XML file provides additional information such as an error code and an error description.

In both scenarios, several steps are necessary to extract the relevant data from the HTTP response. The following outlines the steps that worked for me:

1. Check if the returned HTTP status code is '200', and raise an 'HTTPError'- exception if not.
2. With the content from the response, create a Buffered Reader with the class 'BytesIO' from the io -package object, and with the Buffered Reader object, create a 'ZipFile'-object from the identically named package zipfile .
3. Extract the file from the ZipFile object.
4. Check if the extracted file's name is 'INVALID_REQUEST.XML', and raise an InvalidURL- exception if yes. If not, the extracted file is returned to the invoking procedure in form of a bytes-object.
A truncated version of the client class essentially looks like below. Keep in mind that some additional checks and try-except-blocks are excluded for reasons of simplicity:

        
import requests
import io
import zipfile as zf

class OasisClient(object):
def __init__(self):
# ...

def get_report(self, url):
response = requests.get(url)

if response.status_code != 200:  # (Nr. 1)
raise requests.exceptions.HTTPError

response = zf.ZipFile(io.BytesIO(response.content))  # (Nr. 2)

response_file = response.namelist()[0]

response = response.open(response_file).read()  # (Nr. 3)

if response_file == "INVALID_REQUEST.xml":  # (Nr. 4)
raise requests.exceptions.InvalidURL

return response


Once the 'get_report'-function of the client class concludes without an exception, the actual data is parsed and stored in a Postgres database. We will skip the parsing and storing of the data, and jump right to the analysis. At that point, we assume the node information and market prices are successfully stored in two separate tables called 'node' and 'prc_lmp_dam' in the database.

#### Data Storage and Analysis

Since PostgreSQL provides the functionality to calculate statistical values, it was possible to run the analysis directly in Postgres. The analysis is a two-step process:

1. First, two separate views with price data for each year are created. (A view is basically just a stored query that is executed by the database each time the view is referenced. For instance when running a select-statement on the view.) The view definition implements additional filtering on the data, such as:
1. Filter nodes with the APNode-Type 'AG'.
2. Filter nodes that are active the entire year.
3. Filter nodes with LMP data for the entire year.
4. Filter LMPs based on the filtered nodes, and for the respective year.
Before filtering, there are about 32.2M data points with LMP data for the two years. After filtering, the data points are reduced to about 25.5M, with 881 relevant APNodes for the year 2019, and 950 APNodes for 2020.
        
CREATE VIEW apnode_price_2020 AS

WITH good_node AS (
SELECT
t_main.p_id AS node_id
FROM (
SELECT
prcd_node_node_id AS p_id
, prcd_timestamp AS p_ts
, prcd_lmp AS p_p

FROM prc_lmp_dam

WHERE
-- (Nr. 3):
prcd_timestamp > '2020-01-01 00:00:00'::TIMESTAMP AT TIME ZONE 'PDT'
AND prcd_timestamp <= '2021-01-01 00:00:00'::TIMESTAMP AT TIME ZONE 'PDT'
) t_main INNER JOIN node
ON t_main.p_id = node.int_node_id
WHERE
-- (Nr. 2):
node.node_start <= '2020-01-01 00:00:00'::TIMESTAMP AT TIME ZONE 'PDT'
AND node.node_end >= '2021-01-01 00:00:00'::TIMESTAMP AT TIME ZONE 'PDT'
AND node.node_apnode_type = 'AG'  -- (Nr. 1)

GROUP BY t_main.p_id, node.node_apnode_type
HAVING COUNT(t_main.p_p) >= 8760  -- (Nr. 3)
ORDER BY SUM(t_main.p_p) ASC
)

SELECT
*
FROM prc_lmp_dam INNER JOIN good_node
ON prc_lmp_dam.prcd_node_node_id = good_node.node_id  -- (Nr. 4)
WHERE
-- (Nr. 4):
prcd_timestamp > '2020-01-01 00:00:00'::TIMESTAMP AT TIME ZONE 'PDT'
AND prcd_timestamp <= '2021-01-01 00:00:00'::TIMESTAMP AT TIME ZONE 'PDT';

2. Second, the standard deviation of LMPs is calculated with the PostgreSQL aggregate function 'stddev_samp' for each node, and the result is stored in a materialized view. Contrary to a regular view, a materialized view physically stores the output of the view definition to the database. Once the materialized view with the analysis result is created, it is possible to directly query the outcome without instructing Postgres to rerun the analysis.

#### Data Visualization

Eventually, the results are plotted with the free and open-source library plotly. As you are going to notice, plotly charts are generally interactive and provide additional functions like zooming into a particular area of the chart, easily switching back to the default axes configuration, or functionality to export the chart. All of that comes as a standard feature with plotly without requiring any additional programming.

## Conclusion

Below the analysis results are plotted for the years 2019 and 2020. The dashed lines with percentages provide the information for how many hours of the year the standard deviations are higher than the value where the dashed lines intersect. For 80 % of the time of the year, the standard deviation is higher than $0.949 per MWh (2019) and$ 0.853 per MWh (2020). For 20 % of the time, it is higher than $4.799 per MWh (2019) and$ 5.494 per MWh (2020).