117
Solar energy time series analysis via
markov chains
1.- mbemveiga@gmail.com
2.- gabrielsigaud@gmail.com
3.- Industrial Engineering Department Pontical Catholic University of Rio de Janeiro
cyrino@puc-rio.br ORCID: 0000-0003-1870-9440
4.- Industrial Engineering Department Pontical Catholic University of Rio de Janeiro
gustavo.melo.rio@gmail.com
Marianne Bechara Elabras da Motta Veiga
1
, Gabriel Kelab Sigaud
2
, Fernando Luiz Cyrino Oliveira
3
,
Gustavo de Andrade Melo
4
Recibido: 18/11/2024 y Aceptado: 04/2/2025
118
119
Brasil, ante un escenario global de preocupación por el cambio climático, viene incrementando el
uso de energías renovables, especialmente la energía solar en los últimos años. Con el crecimiento
de su participación, las características de la energía solar, como la intermitencia y las uctuaciones
aleatorias, vienen afectando la planicación de la operación del Sistema Eléctrico Brasileño (SBE). Tales
factores pueden ser estudiados con modelos de series de tiempo, auxiliando la planicación de plantas
generadoras y SBE. Con el n de contribuir al análisis factorial, el objetivo de esta investigación es
analizar las características de la generación de energía fotovoltaica en las estaciones meteorológicas del
año en dos regiones de Brasil con diferentes incidencias solares. Para ello, se aplica una metodología
basada en conceptos de Cadenas de Markov para dos series de tiempo estacionarias. El trabajo se
destaca por la subdivisión de las series de tiempo entre las estaciones climáticas, por el uso de datos
aún no estudiados y por la presentación de la metodología y resultados en detalle. El objetivo de la
investigación fue alcanzado con éxito, evidenciando las diferencias entre los modelos de generación de
energía solar entre las estaciones meteorológicas y las dos regiones estudiadas.
Brazil, given a global scenario of concern with climate change, has been increasing the use of renewable
energy, especially solar energy in the last years. With the growth in its participation, the characteristics of
solar energy, such as intermittence and random uctuations, have been aecting the operation planning
of the Brazilian Electricity System (BES). Such factors can be studied with time series modeling, helping
the planning of power plants and BES. In order to contribute to the factor analysis, the objective of
this research is to analyze the characteristics of photovoltaic energy generation in the meteorological
seasons of the year in two regions of Brazil with dierent solar incidences. For this, a methodology
based on Markov Chain concepts is applied for two stationary time series. The work stands out for the
subdivision of the time series between the climatic seasons, for the use of data not yet studied and for
the presentation of the methodology and results in detail. The objective of the research was successfully
achieved, making evident the dierences between the solar energy generation models between the
meteorological seasons and the two regions studied.
PALABRAS CLAVE: Fuentes de Energía Renovable, Fuentes de Energía Variables, Energía Solar,
Estaciones Climáticas, Cadenas de Markov, K-means
KEYWORDS: Renewable Energy Sources, Variable Energy Sources, Solar Energy, Climatic Seasons,
Markov Chains, K-means
Resumen
Abstract
120
1. INTRODUCTION
Faced with a scenario of concern about climate
change, countries are carrying out the energy
transition, thus moving away from using fossil
energy sources and increasing the use of
renewable sources (Malar, 2022). According
to the International Renewable Energy Agency
(2023), the planet had an increase in renewable
energy capacity in 2022 of 13% compared
to the previous year. Renewable energies are
considered inexhaustible, as they can always be
renewed by nature, and generate considerably
lower environmental impacts than non-renewable
energies (EPE, 2022).
Brazil has been following this transformation
in the world’s energy matrix. According to the
2023 National Energy Balance, 47.4% of Brazil’s
domestic energy supply in 2022 came from
renewable sources. In 2013, this percentage was
40.6%, that is, in 9 years, there was an increase of
approximately 17% (EPE, 2023).
In this context, solar energy is a source that
deserves to be highlighted. In 2022, it accounted
for 3.6% of the domestic energy supply in Brazil.
In addition, between 2021 and 2022, it had an
82.4% growth in installed capacity, being the
fastest growing in the country (EPE, 2023). With
the increase in its use in Brazil, its characteristics,
such as intermittency and random uctuations, will
aect even more the country’s energy generation.
Solar energy is generated from solar radiation,
captured by photovoltaic panels. In addition to
being renewable, it has the advantages of being
silent, requiring little maintenance and being able
to be installed in a short time (Imho, 2007). With
the increase in its use in Brazil, its characteristics,
such as intermittency and random uctuations,
will increasingly aect the country’s energy
generation. Considering this scenario, the use of
time series modeling and simulation methods to
study this impact is important for the planning of
the plants and the BES.
In order to contribute to this theme, the objective
of this work is to analyze the characteristics
of photovoltaic energy generation in dierent
climatic seasons (summer, autumn, winter and
spring) in two regions of Brazil with dierent solar
incidences. For this, the time series discretization
approach was used for Markov Chain modeling, a
methodology already widely used in the literature
for the analysis of electric energy time series.
Furthermore, the subdivision by climatic season
diers from other studies because it is based on
a natural phenomenon, as opposed to monthly
subdivisions, which are more frequently used, for
example.
It is worth noting that this study presents relevant
dierentials in the literature. In the rst place, to
the authors’ knowledge, data that have not yet
been studied are used. Also, these data are from
two plants located in regions with considerably
dierent characteristics and were divided by the
climatic seasons of the year, which allowed both
geographical and temporal comparisons.
The analysis presented in the study was carried
out through two daily photovoltaic energy
generation databases from ONS (National Electric
System Operator): Nova Olinda Complex, located
in Piauí (PI) and founded in 2017 (G1, 2017); and
Guaimbê Complex, located in the state of São
Paulo (SP) and inaugurated in 2019 (G1, 2019).
According to Gadelha de Lima (2020), the state of
Piauí has dierent meteorological characteristics
depending on the quarter of the year, which could
justify a division into four seasons.
Figure 1 shows the location of the two plants on
the brazilian solarimetric map. This map is an
adaptation of the one presented in the Brazilian
Atlas of Solar Energy (Pereira et al., 2017) and
shows the annual average of the total daily normal
direct irradiation over Brazil. It is possible to
perceive the dierence in the averages of direct
irradiation between the two locations of the plants,
which is greater in the Nova Olinda Complex
(Ribeira do Piauí PI) in relation to the Guaimbê
Complex (Guaimbê – SP).
121
2. THEORETICAL FRAMEWORK
The applied methodology is exploratory and can
be divided into three main phases. The rst relates
to data pre-processing, including data collection,
analysis, and treatment. In the second phase,
data processing is performed, involving modeling
via Markov Chains and obtaining results such
as stationary distribution, recurrence time, and
In the literature, there are several renewable
energy modeling studies that apply the concept
of Markov Chains in their methodologies. Sigauke
and Chikobvu (2017) performed an analysis of
daily peaks of electricity demand through Markov
Chains, seeking to nd the stationary distribution
(distribution of states in which the chain will
stabilize). To do this, the authors used demand
data from South Africa from 2000 to 2011.
Models with two states were considered, being
the positive or negative variations between the
days, and with three states, where the dierence
Figure 1 - Brazilian Solarimetric Map - Average annual normal direct irradiation.
Source: Adapted from Pereira et al. (2017).
rst passage time. In the last phase, data post-
processing, the results obtained were analyzed
for comparison between the climatic seasons and
between the plants.
between small and large positive variations was
considered.
Maçaira et al. (2019), faced with a scenario of
increased wind energy use in Brazil, showed that
the dispatch model used in the period of their
research did not consider the stochastic behavior
of this energy source. The model, which sought
to optimize long-term energy planning, only
evaluated the future aspects of water and thermal
sources. In view of this, the work proposed
the wind-hydrothermal dispatch model, which
122
A methodology based on Markov Chains was
applied to modeling the time series of photovoltaic
solar power generation. Figure 2 shows the
owchart with the main stages of the methodology,
3. METHODOLOGY
divided into the data’s pre-processing, processing,
and post-processing phases.
Figure 2 - Main steps of the methodology.
incorporated wind power generation using the
MCMC (Markov-Chain Monte Carlo) method to
simulate energy scenarios.
Ma et al. (2020) proposed a methodology for
aggregating solar photovoltaic time series data
through clustering via k-means, Markov Chains,
and Monte Carlo simulation. For the authors,
Markovian processes eciently represent the
transitions of photovoltaic power generation time
series. Based on the proposed k-means-MCMC
methodology, initially, the power generation data
should be grouped following the optimal number
of clusters, and then the transition matrix should
be assembled. Finally, from this matrix, energy
scenarios are generated via simulation.
Melo (2022) sought to show the spatial and
temporal complementarity between variable
renewable energies through the joint stochastic
modeling and simulation of solar and wind energy.
To this end, it used two methodologies and
performs three applications, through databases
of mills located in the Northeast of Brazil. Both
methodologies use Markov Chain modeling,
Monte Carlo simulation to obtain scenarios, and
the k-means technique to perform data clustering.
123
The pre-processing phase consists of obtaining,
analyzing, and treating data. The data of the time
series of daily photovoltaic energy generation of
the Nova Olinda (Piauí) and Guaimbê (São Paulo)
complexes were obtained from the National
Electric System Operator (ONS, 2022) for a period
of four years, from 06/21/2018 to 06/20/2022, with
a total of 1,461 observations for each complex.
The only two variables used were date and
energy generation. According to Ma et al. (2020),
due to the characteristics of photovoltaic power
generation data, the optimal time scale to fragment
scenarios would be daily. The methodology is
applied rst to the Nova Olinda Complex and then
to the Guaimbê Complex, so the two series are
worked separately in the modeling.
A preliminary analysis of the data obtained
from energy generation during the period was
performed. First, to test the stationarity of the
time series over the four years, Augmented
Dickey-Fuller (ADF) unit root tests were carried
out. The null hypothesis of the ADF test is that
there are unit roots in the time series and,
therefore, it would not be stationary (Dickey, D.;
Fuller, 1979). The stationarity test is essential for
3.1. Pre-processing
3.2. Processing
the application of the Markov Chain concepts,
because a non-stationary series depends on time,
and in Markovian processes, the probabilities of
transition to the next state depend only on the
current state (Norris, 1998). Furthermore, non-
stationarity would mean a change in the installed
capacity of the plants.
To complete the pre-processing phase, a
treatment of the databases is carried out so that
the time series can be modeled as Markov Chains.
First, the null or missing values were replaced by
the averages of the month in the corresponding
year, as it is an adequate estimate for the value
of generation in the period, given seasonality.
Then, so that the time series could be analyzed
by climatic season, they were subdivided into four
subsets: Summer, Autumn, Winter, and Spring.
In order to group the observations with greater
similarities, the subsets of the solar energy
generation time series, divided by climatic
season, were discretized into markovian states
independently. The clustering method used
was k-means (MacQueen, 1967), as it is easily
programmable and computationally economical.
In the k-means method, a number k of clusters
is pre-specied, and initial k centroids (average
value of clusters) are dened based on a random
variable. Then, the following steps are performed:
Observations are assigned to the nearest centroid
cluster by calculating the distance from each
observation to each centroid; New k centroids
are calculated from the average of intra-cluster
observations; Iterations of steps 1 and 2 are
3.2.1 Series discretization via k-means
performed until the centroid values do not change
further The method can be summarized by the
objective function (1).
However, to apply the k-means method, it is
necessary to pre-dene the number k of clusters.
According to Fritz et al. (2020), choosing the
wrong values for k can lead to poor results,
124
and to choose the ideal number of clusters, it is
common to use the elbow method, rst discussed
by Thorndike (1953). As the number of clusters
increases, the sum of the squared error of the
distance between the observations and the
centroids tends to decrease (Thorndike, 1953).
Hence, the elbow method helps to limit the choice
of very high values for k, in which there are no
relevant benets with the addition of a new cluster.
The elbow method can be used in conjunction with
the k-means method to nd the optimal number of
clusters (Fritz et al., 2020).
To apply the elbow method using k-means, it is
rst necessary to perform the k-means steps for
each k-value up to a chosen maximum number.
Then, the sum of the intra-cluster squared error,
or Within-Cluster-Sum of Squared Errors (WSS),
is calculated for each clustering obtained by the
k-means result. The WSS consists of the sum of
the square of the euclidean distances from each
observation to the centroid of the cluster to which
it belongs.
Consequently, a graph can be created that
presents the WSS for each value of k. So it is
possible to observe the point k at which the curve
presents a “fold”, like an elbow, and it can be
inferred that the dierence between the WSS of
k and k+1 would not provide substantial gains to
clustering.
The next step is to create the daily transition
matrices of states, P. Transition matrices are
composed of the transition probabilities pi,j
between a state i and a state j between a period n
and n+1 (Chung, 1960).
In this step, based on Melo (2022) and Ma et al.
(2020), the transition probabilities are calculated
by the ratio between the number of occurrences
of transitions from state i to state j and the
3.2.2 Creating State Transition Matrices
The transition probabilities and transition matrices
are represented by (2) and (3), respectively.
total occurrences of transitions from state I, as
represented by (4).
125
To analyze the properties of the transition
matrices, three measures of interest were
calculated: Stationary distribution (π) - represents
the distribution of states in which the chain will
stabilize, satisfying the equations (5) and (6);
Recurrence time (mii) - the expected number of
periods for a system in state i to return to that
Interpreting the above concepts, the measures
presented are important to assist in analyzing the
behavior of the Markov Chains model when the
process stabilizes. With a stationary distribution,
it is possible to identify the most frequent states
of the system, where the process is most likely
to be in the future. The recurrence time allows us
to understand, for example, the average time to
return to a state of maximum or minimum energy
Finally, in the post-processing phase, the analysis
and evaluation of the results obtained in the
previous phase were carried out, with the objective
of analyzing the characteristics of the generation
of the two plants in the four climatic seasons and
in regions of Brazil with dierent solar incidences.
In this phase, the main purposes were: to identify
the most frequent states of each season; to
compare the recurrence times of the most extreme
power generation states; and to compare the rst
3.2.3 Obtaining the results
state again, as in the equation (7); First passage
time (mij) - The number of periods expected for a
system in state i to rst passage through state j, as
in the equation (8) (Chung, 1960).
generation, while the rst passage time would
indicate the average transition time between
these two states.
3.3. Post-processing
passage times between the states of highest and
lowest power generation of each climatic season.
126
4. DISCUSSION AND PRESENTATION OF RESULTS
In this chapter, the results of the methodology’s
application are presented for the two plants
individually, starting with the Nova Olinda Complex
(PI) and, later, addressing the Guaimbê Complex
(SP). Finally, the results of the two plants are
compared. All the computational steps in this
When testing the stationarity of the time series of
the Nova Olinda Complex in the analyzed period,
the result obtained was a p-value lower than 0.01,
i.e., the null hypothesis that the time series would
not be stationary is rejected. Thus, it is concluded
that the time series is stationary and, therefore, the
installed capacity is constant, which is fundamental
for the Markov Chain modeling performed in this
work. The stationarity of the time series in the
period can be seen in Figure 3, which represents
the average daily generation per month. In addition,
the series presents considerable volatility and
annual seasonality, with higher energy generation
4.1. Nova Olinda Complex (Piauí)
chapter were performed in the programming
language (R Development Core Team, 2009).
4.1.1 Pre-processing
4.1.1.1 Collection, analysis and treatment of data
in the months of July, August, and September
and lower generation in the months of December,
January, February, and March, while the other
months assume intermediate energy generation
values. It is possible to notice greater similarities
in the data in the months of the same climatic
season. Due to this observation, an opportunity is
identied to model the time series by subdividing it
into four subsets, one for each climatic season, for
a better representation of the data in each period.
Figure 3 - Average daily generation - Nova Olinda Complex.
Source: Based on data from ONS (2022).
127
4.1.2 Processing
4.1.2.1 Discretization of the series via k-means
The discretization of the photovoltaic time series
was performed individually for each climatic
season, so that the number of clusters and
the values for the centroids were better suited
specically to each of the subsets.
The rst step in the execution was to create a
function that would calculate the k-means for
values of k from 1 to 20. The maximum number
of 20 clusters was chosen because it was veried
that this is a sucient amount to represent the
data. The second step was to create a function
that returned WSS for each of the 20 clusters.
The third step was to apply the previously created
functions to each of the subsets created. The
fourth step was the application of the elbow
Table 1 shows that winter has the highest daily
average of energy generation in the Nova Olinda
Complex in Piauí, with 1,572.87 MWh/day.
Meanwhile, the summer has a daily average of
32% lower than that of winter, with 1,066.39
MWh/day, probably due to a higher number
of cloudy days in this period of the year, which
reduces the average daily solar radiation in the
Table 1: Measures of daily energy generation - Nova Olinda Complex.
Table 2: Ideal number of clusters - Nova Olinda Complex.
Table 3: Centroids of the states - Nova Olinda Complex.
region of the plant. Furthermore, it is also possible
to note that winter has the lowest standard
deviation, while spring, the second season with
the highest average energy generation, has the
highest standard deviation, therefore, a greater
dispersion of data.
method. With the results of the WSS calculation,
a list was created that contained the ratio between
the WSS of a number k and k+1 of clusters for
k=1 to k=19. Then, for each of the subsets, the
k-value of clusters in which the calculated ratio was
greater than 0.90 was identied, i.e., the number
of clusters necessary for the reduction of the sum
of the intra-cluster squared error, when including a
new cluster, to be less than 10%, which would not
justify the addition.
Thus, the k-means result for each of the subsets
found the ideal number of clusters (Table 2) and
centroid values (Table 3).
128
4.1.2.2 Creating State Transition Matrices
4.1.2.3 Obtaining the results
In this step, the transition matrices of the Nova
Olinda Complex (Figure 4) were constructed from
All the transition matrices created were classied
as irreducible and ergodic, important properties for
the Markov Chain to have a stationary distribution.
Figure 4 - Transition matrices - Nova Olinda Complex.
Table 4: Stationary distribution - Nova Olinda Complex.
Then, stationary distributions (Table 4), recurrence
times (Table 5), and rst passage times (Figure 5)
were calculated.
the transition frequencies between the states for
each subset.
129
Table 5: Recurrence time - Nova Olinda Complex.
Figure 5 - First passage time - Nova Olinda Complex.
130
4.1.3 Post-processing
4.1.3.1 Evaluation and interpretation of results
After obtaining the model’s results, it becomes
possible to analyze and interpret the generated
values and better understand the behavior of the
daily photovoltaic power generation time series,
mainly from the stationary distributions, recurrence
times and rst passage times.
From the stationary distribution, in Table 4, it
is observed that the system presents higher
probabilities for the states of intermediate
generation values in the summer and spring
seasons. Also, the probabilities decay little by
little and in a similar way for the lower and higher
extreme states. Another way to analyze it is
by the time of recurrence of the states in Table
5. In both seasons, the recurrence times of the
extreme states are signicantly higher compared
to the central states and very close to each other.
Analyzing the extreme states, in the summer,
states 1 (289 MWh) and 11 (1,819 MWh) have a
recurrence of 25 and 29 days, respectively, while
states 1 (309 MWh) and 8 (1,969 MWh) in spring
have a recurrence of 18 and 19 days, respectively.
Consequently, the tendency is for the system
to remain in medium-generation states and the
extremes to be rarer, with lower expectations
of low or high generation in a day, especially in
the summer, whose recurrence times of extreme
states are even longer.
Autumn, on the other hand, has a higher probability
of being in the central and upper states, with lower
probabilities in states of lower energy generation,
comparatively. At this season, the two states with
the highest power generation, states 11 (1,582
MWh) and 12 (1,709 MWh), have recurrence
times of 9 and 13 days, respectively. Meanwhile,
the recurrence times of the two lowest-generation
states, states 1 (307 MWh) and 2 (624 MWh),
are 37 and 22 days, respectively. In addition, the
recurrence time of state 1 of autumn is the longest
among all states of all seasons, i.e., autumn
presents the longest average period for the system
to return to low levels of power generation. Hence,
it appears that the system has a tendency towards
higher states with a lower risk of low generation.
Furthermore, by investigating the rst passage
times of summer and spring, it is possible to
analyze that the time to leave the state of lowest
energy generation and reach the state of highest
generation for the rst time is longer than the
reverse. For example, the rst passage time from
state 1 (309 MWh) to state 8 (1,969 MWh) in the
spring is 51 days, while the time from state 8 to
state 1 is 30 days, approximately 40% shorter.
Therefore, although the probabilities of the system
being at each extreme are close, once the system
is in a low-generation state, it will take longer
to reach the higher-generation states in both
seasons.
Meanwhile, when looking at winter, the rst
passage times between the two most extreme
states, lower and upper, are close 32 days from
state 1 (875 MWh) to state 8 (1,945 MWh) and
30 days from state 8 to state 1— although their
recurrence times are quite dierent (20 days for
state 1 and 11 days for state 8). It is interesting to
note that the rst passage times between states
with more distant generation levels may be shorter
than among others with closer generations. For
example, the rst passage time from state 2
(1,209 MWh) to state 3 (1,407 MWh) is 11 days,
while the time from state 2 to state 6 (1,720 MWh)
is 8 days.
131
By testing the stationarity of the time series of the
Guaimbê Complex in the analyzed period, it was
concluded that the time series is stationary and,
therefore, the installed capacity is constant. The
stationarity of the time series in the period can be
seen in Figure 6, which represents the average
Table 6 shows that spring has the highest daily
average of energy generation in the Guaimbê
Complex, with 752.92 MWh/day. Meanwhile, the
summer has a daily average of 5% lower than
that of spring, with 717.33 MWh/day, being the
lowest average for the plant. Consequently, the
low variability of energy generation between the
seasons of the year is evident, with all values
4.2. Guaimbê Complex (São Paulo)
4.2.1 Pre-processing
4.2.1.1 Collection, analysis and treatment of data
daily generation per month. In addition, the series
has considerably lower volatility than that of the
Nova Olinda Complex, with low variations in
energy generation over the months.
being considerably close to the general average.
Also, the standard deviation of the seasons also
assumes close values.
Table 6: Measurements of daily energy generation - Guaimbê Complex.
132
4.2.2 Processing
4.2.2.1 Discretization of the series via k-means
The discretization of the photovoltaic energy time
series for the Guaimbê Complex was performed
with the same method as the Nova Olinda
Complex, but in a totally independent way, using
the k-means technique and the elbow method.
The ideal number of clusters and centroid values
are shown in Tables 7 and 8, respectively.
Table 7: Ideal number of clusters - Guaimbê Complex.
Table 8: Centroids of the states - Guaimbê Complex.
Figure 7 - Transition matrices - Guaimbê Complex.
Thus, the transition matrices of states of the
Guaimbê Complex were created, represented in
Figure 7.
Then, stationary distributions (Table 9), recurrence
times (Table 10), and rst passage times (Figure 8)
4.2.2.2 Creating State Transition Matrices
4.2.2.3 Obtaining the results
were calculated.
133
Table 9: Stationary distribution - Guaimbê Complex.
Table 10: Recurrence time - Guaimbê Complex.
Figure 8 - First passage time - Guaimbê Complex.
134
4.2.3 Post-processing
4.3 Comparison of results
4.2.3.1 Evaluation and interpretation of results
With the measurements of interest obtained for the
Guaimbê Complex, the next step is to analyze the
model’s characteristics for each of the seasons.
In summer, the stationary probability of the system
being in the state of lower power generation is the
lowest (2.27%), resulting in a recurrence time of
44 days, that is, the occurrence of a state of low
generation is extremely rare, and, once in this
state, many transitions are expected for the return.
Furthermore, state 2 (386 MWh) has the second-
lowest stationary probability at 7.47%, followed
by state 8 (1,018 MWh) at 10.11%. The states
with the highest probabilities are the higher power
plants, and the state with the highest stationary
probability is state 7 (909 MWh), with 19.86%.
Looking at autumn, the system has a higher
probability of being stationary in the upper central
states of power generation values in the seasons,
with the probabilities gradually decreasing to
the lower and upper extreme states. The two
states with the longest recurrence times are the
extremes, states 1 (244 MWh) and 9 (992 MWh),
with 15 and 23 days, respectively. Winter, on the
other hand, in the Guaimbê Complex, has higher
stationary probabilities for the upper states and
very low probabilities for the four states with lower
energy generation. An interesting case is that the
recurrence time of state 2 (324 MWh) is 38 days,
which is approximately 40% longer than the time
of state 1 (182 MWh) 27 days. In this way, the
risk of the system being in low-generation states is
lower, and there is an expectation of higher energy
generations, comparatively.
Spring has more balanced stationary probabilities
among its eight states, with the exception of state
1 (269 MWh), which has lower power generation
and a probability of only 5%.
Analyzing the times of the rst passage, it can
be seen that, in the summer of the Guaimbê
Complex, the time to leave the state of the highest
generation to the state of the lowest generation is
more than double the reverse. The rst passage
time from state 8 (1,018 MWh) to state 1 (234
MWh) is 47 days, while from state 1 to state 8
is 20 days. Hence, this characteristic is favorable
to generation because the average time to have
a low generation from a high generation is high.
However, autumn and winter have rst passage
times with the reverse logic, it takes longer to move
from a state of lower generation to one of greater
generation. This analysis is important because, in
low-generation situations, the expected time to
return to high-generation is longer. In the autumn,
the rst passage time from state 1 (244 MWh) to
state 9 (992 MWh) is 38 days, and the reverse is
23 days. Meanwhile, in winter, the time from state
1 (182 MWh) to state 11 (983 MWh) is 62 days,
and the reverse is 30 days.
Analyzing the time series of the Nova Olinda
and Guaimbê complexes, the dierences in the
variability of the average photovoltaic energy
generation throughout the year are evident since
Nova Olinda presents seasonality with higher
average generation in winter and lower in summer,
which is the wet period, while the averages of
Guaimbê are closer in all climatic seasons. In the
case of Nova Olinda, the reason for subdividing
the series by the climatic seasons to perform the
modeling is more evident, however, although the
Guaimbê Complex presents more homogeneous
monthly averages, the results for the stationary
distributions and recurrence and rst passage
135
times were signicantly dierent in each season,
as previously analyzed. Thus, the subdivision by
climatic season proved to be relevant for both
plants.
Another interesting fact is that the climatic
seasons aect each region dierently as well, with
similarities between dierent seasons in the two
regions. For example, the highest concentration
of stationary probabilities in upper central states is
a case present in summer and spring in the Nova
Olinda Complex, but it also happens in the autumn
in the Guaimbê Complex. On the other hand, the
autumn of Nova Olinda is similar to the winter of
Guaimbê because the states of lower generations
have signicantly lower stationary probabilities
than the others and higher probabilities in the
higher states. Meanwhile, Nova Olinda’s winter
and Guaimbê’s spring are the seasons with the
most balanced stationary probabilities between
the states.
In addition, analyzing the rst passage times,
other similarities were found. The cases in which
the rst passage time from the state with the
highest generation to the lowest was longer than
the inverse were the autumn in Nova Olinda and
the summer in Guaimbê. The opposite happened
in the summer and spring in Nova Olinda and in
the autumn and winter in Guaimbê. On the other
hand, the winter of Nova Olinda and the spring
of Guaimbê had the closest rst passage times
when comparing the most extreme states.
Finally, BES can use this analysis to assist in
the country’s energy planning by calculating the
probability of possible scenarios of low or high
photovoltaic generation by region and climatic
season. The detailed study of the characteristics
of renewable sources brings greater security to
the supply of energy demand in the country.
Brazil has been going through a process of
changing its energy matrix and increasing the use
of renewable energies non-dispatchable. In this
context, photovoltaic solar energy has stood out
due to the signicant growth of its share in the
country. Hence, its characteristics of intermittency
and random uctuations have a greater impact on
the national energy supply scenario. Therefore,
the study of photovoltaic generation through
modeling methods is relevant, and an opportunity
to contribute to the literature was found through
the present work.
This work studies the generation characteristics
of two photovoltaic solar power plants located
in regions with solar incidences of dierent
magnitudes and seasonalities. The methodology
used was based on Markov Chains. The time series
were subdivided among the climatic seasons
of the year. Then, the state transition matrices
5. CONCLUSION
were created, and the results of the measures of
interest, such as stationary distribution, recurrence
time, and rst passage time, were investigated.
Consequently, it was possible to analyze the
dierences between the photovoltaic energy
generation in the dierent seasons and regions. In
this way, the objective of the work was achieved
in a pertinent way.
Conrming the initial hypothesis, the results
showed signicant dierences in solar energy
generation between the regions and between the
climatic seasons, which evidenced the relevance
of the comparative study carried out. By analyzing
and better understanding the specicities of each
location and season, power plants and the Brazilian
Electric System can plan more eciently about
energy generation, analyzing the probabilities of
the occurrence of states of dierent generation
values.
136
6. REFERENCES
Chung, K. L. (1960). Markov Chains with Stationary Transition Probabilities. Springer-Verlag.
Dickey, D. & Fuller, W. (1979). Distribution of the Estimators for Autoregressive Time Series With a Unit Root.
Journal of the American Statistical Association, v. 74, p. 427-431.
EPE (2022). Fontes de Energia. https://www.epe.gov.br/pt/abcdenergia/fontes-de-energia
EPE (2023). Balanço Energético Nacional 2023. Ministério de Minas e Energia.
Fritz, M., Behringer, M. & Schwarz, H. (2020). LOG-Means: eciently estimating the number of clusters in large
datasets. Proceedings of the VLDB Endowment, v. 13, n. 12.
G1 (2019). Complexos de energia solar são inaugurados em duas cidades do interior de SP. https://g1.globo.
com/sp/bauru-marilia/noticia/2019/08/15/complexos-de-energia-solar-serao-inaugurados-em-duas-cidades-
do-interior-de-sp.ghtml
Gadelha, M. L. (2020). Climas do Piauí: interações com o ambiente. Edufpi.
IEA (2023). Renewable Energy Market Update - June 2023. https://www.iea.org/reports/renewable-energy-
market-update-june-2023
Imho, J. (2007). Desenvolvimento de Conversores Estáticos para Sistemas Fotovoltaicos Autônomos. Master’s
thesis presented to the School of Electrical Engineering of Universidade Federal de Santa Maria.
Ma, M., Ye, L., Li, J., Li, P., Song, R. & Zhuang, H. (2020). Photovoltaic Time Series Aggregation Method Based
on K-means and MCMC Algorithm. Asia-Pacic Power and Energy Engineering Conference, v. 2020-September,
n. 9220338.
Maçaira, P., Cyrillo, Y. M., Cyrino, F. & Souza, R. C. (2019). Including wind power generation in Brazil’s long-term
optimization model for energy planning. Energies, v. 12, n. 826.
Macqueen, J. (1967). Some methods for classication and analysis of multivariate observations. In Proceedings of
the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Oakland, CA, USA, p. 14.
Malar, J. P. (2022). Conheça os tipos de energia renovável e quais são usados no Brasil. CNN Brasil. https://www.
cnnbrasil.com.br/business/conheca-os-tipos-de-energia-renovavel-e-quais-sao-usados-no-brasil
Melo, G., Cyrino, F. & Maçaira, P. (2022). Simulação estocástica conjunta de energias renováveis. Master’s thesis
– Departamento de Engenharia Industrial, Pontifícia Universidade Católica do Rio de Janeiro.
Nascimento, A. & Araújo, T. (2017). Maior parque solar da América Latina é inaugurado no Piauí. https://g1.globo.
com/pi/piaui/noticia/maior-parque-solar-da-america-latina-e-inaugurado-no-piaui.ghtml
Norris, J. R. (1998). Markov chains. Cambridge university press.
ONS (2022). http://www.ons.org.br
Pereira, E. B., Martins, F. R., Gonçalves, A. R., Costa, R. S., Lima, F. L., Rüther, R., Abreu, S. L., Tiepolo, G. M.,
Pereira, S. V. & Souza, J. G. Atlas brasileiro de energia solar. 2.ed. São José dos Campos: INPE.
R Development Core Team (2009). R: A language and environment for statistical computing. R Foundation for
Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0. http://www.R-project.org
Sigauke, C. & Chikobvu, D. (2017). Estimation of extreme inter-day changes to peak electricity demand using
Markov chain analysis: A comparative analysis with extreme value theory. Journal of Energy in Southern Africa, v.
28, n. 4.
Thorndike, R. L. (1953). Psychometrika, v. 18, p. 266-267.