I Introduction
To consider the temporalspatial correlation of multiple wind farms’ output (MWO) in probabilistic wind power forecasting, one can first construct the GMMbased joint PDF of MWO at different time periods, and then directly build the conditional PDF of the output of each wind farm (WF) in the next period with respect to the observations of MWO during the current periods[1].
The construction of the joint and conditional PDF requires complete observations, each of which gathers all the corresponding MWO data at different time periods. Since every WF can only observe its outputs at different time periods, thus the complete observations are vertically partitioned among all the WFs (vertical partitioning: the attributes are divided across sites and the sites must be joined to obtain complete information on any entity [2]). However, for protecting data privacy, WFs with different stakeholders may refuse to share those raw data to compose the complete observations for constructing PDF. To solve this privacy issue, the privacypreserving distributed method is a feasible alternative.
For constructing the GMMbased PDF, the expectationmaximization (EM) algorithm is commonly used
[3]. Nevertheless, for privacypreserving distributed EM algorithm, existed researches mainly focus on horizontally partitioned data (horizontal partitioning: each entity is represented entirely at a single site [2]). To the best of our knowledge, rarely has literature addressed to deal with the vertically partitioned data to build GMM. Therefore, based on secure multiparty computational (SMC) method[4], this letter proposes a privacypreserving method to build the GMMbased joint and conditional PDF.Ii Notations
We first define domain for WFs, for periods (normally ) and for observations. Let
denote the random variable of the output for the
mth WF at the tth period, where and . Then We aim to construct the joint PDF of . The observations of are represented by (). To obtain a complete , the corresponding observations of all WFs must be gathered together.We utilize GMM to build the joint PDF. GMM is a parametric model represented by a convex combination of
multivariate Gaussian distribution functions. We define domain
, then the parameter set of GMM is defined as . The GMMbased joint PDF of is given as follows:(1) 
where is the weight coefficient, and is the j
th multivariate Gaussian distribution function with mean vector
and covariance matrix . The precision matrix is defined as . The elements of are represented by (). The diagonal elements of or are represented by or (), and the non diagnoal elements by or ().Iii Construction of The Joint PDF
To obtain the joint PDF in (1
), the key lies in estimating the
of GMM. We utilize the EM algorithm to fulfill the estimation. This algorithm is consist of Estep and Mstep [3]. For the kth iteration of the jth Gaussian component, the Estep is given in (2) and Mstep in (3).(2) 
(3a)  
(3b)  
(3c) 
Both the two steps require () for calculation. To protect data privacy, we propose a privacypreserving distributed EM (PDEM) algorithm to handle this privacy issue. The privacy preservation is defined as: the communication data between WFs cannot divulge the raw data.
Iiia Private Estep
In the Estep, we assume that all WFs have acquired the updated in the (k1)th iteration. The aim of the private Estep is to make sure that every WF is able to calculate (2) without revealing raw data. The essence of (2) lies in the calculation of the Gaussian component:
(4) 
where raw data is only required in the exponential item . We further reorganize into (5):
(5a)  
(5b)  
(5c)  
(5d)  
(5e) 
where (5d) and (5e) can be calculated by each WF. For calculating (5c), each WF has to gather the results of (5d) computed by other WFs. Since the results of (5d) doesn’t reveal the raw data, thus these value calculated by other WFs can be shared. Thereafter, the (5b) can be obtained by each WF. For (5a), each WF also has to gather the results of (5b) of all WFs. Similarly, in (5b) doesn’t reveal any raw data, thus this value can also be shared to each WF to calculate (5a). Then the Gaussian component in (4) is obtainable by every WF. Finally, each WF is able to accurately complete the calculation of the Estep in (2) by the value of Gaussian component in (4) without revealing any raw data.
IiiB Private Mstep
After the private Estep, each WF possesses the value of (). Therefore, every WF is able to compute (3a) directly. However, is required in (3b) and (3c). To avoid revealing raw data, we further reorganize these equations by rearranging the elements of and into (6) and (7), where , and ,.
Equation (6) and (7a) for all time period are obtainable by each WF, and no any WF needs to reveal raw data. Thus, values obtained by (6) and (7a) can be shared among WFs to compose a complete and all diagonal elements of .
For (7b), the raw data of the th WF at the th time period and the th WF at the th time period are needed to calculate a scalar product in (8).
(6) 
(7a)  
(7b) 
(8)  
Since all WFs possess the value of (), thus knowing both and means knowing all the raw data. To protect the data privacy, we utilize the secure scalar product (SSP) technique, which can securely compute the scalar product of two vectors, to calculate (8). The calculation process of the SSP technique is summarized as follows [4]:

Both the th and th WF choose a same random matrix .

The th WF generates a random vector , and send to the th WF.

The th WF calculates the scalar product , and also calculates . Then the th WF send the and to the th WF.

The th WF finally calculates the scalar product through , and then send it to the th WF.
Through the SSP technique, both the th and th WF can acquire the scalar product without revealing any raw data. Then (7b) can be computed by the th and th WF (,). Eventually, through sharing (6) and (7a), and utilizing SSP technique, every WF is able to accurately calculate the Mstep with the protection of data privacy.
Iv Construction of The Conditional PDF
Our aim is to construct the conditional PDF of for the given current outputs of all WFs. Let denote the index of the current time period, then the current outputs is represented by . Obviously, if , the conditional PDF of can be viewed as the predictive PDF of the th WF’s output at the next period based on the current outputs of all WFs.
Once the joint PDF in (1) is built via the PDEM algorithm, the conditional PDF can be constructed:
(9) 
where the parameters of the conditional PDF can be specified via (10):
(10a)  
(10b)  
(10c) 
where , and are given as follows:
Apparently, each WF can compute (10c) directly with the of the joint PDF. However, to calculate (10a) and (10b) needs , which is consist of raw data. To avoid revealing any data privacy, we further reorganize (10a) into (11) and (10b) into (12) . Note that the calculation of (10a) is similar to the calculation of (2), thus the reorganization of (10a) is similar to that of (2). Due to limited space, we only details the computation parts of (10a) that have data privacy preserving problem, which is defined as () and ().
(11a)  
(11b)  
(11c) 
(12)  
It can be observed that raw data are involved in the weighted sum in (11a), (11c) and the last item of (12). To avoid revealing raw data, we utilize secure sum (SS) technique, which can securely compute the weighted sum without sacrificing data privacy. Take (11a) for example, the details of the SS technique are summarized as follows [4]:

Assume that the sum of (11a) lies in the range [0, N). N can be set as the sum of the capacity of all the WFs.

The 1st WF generates a random number , which is uniformly chosen from [0, N). Then the 1st WF send to the 2nd WF.

For the remaining WFs (), the th WF sends to the ()th WF.

When the 1st WF receives the , this WF can finally compute . Then the value of will be shared among WFs.
With the SS technique, the weighted sum in (11a), (11c) and (12) can be computed without revealing any raw data. Then the parameters of the conditional PDF in (10) are obtainable, so is the conditional PDF.
It’s worth noting that the th WF doesn’t participate in the calculation process of the last item in (12). The value of this item is calculated by the rest WFs, and only useful for the th WF. Through this design, we can ensure that each WF only can obtain its own conditional PDF without knowing the conditional PDF of others.
V Discussion
We define the centralized method as the calculation method which can gather the raw data of all the WFs for constructing PDF. Since both SSP and SS techniques can accurately and safely calculate scalar product and weighted sum without any approximation, the proposed method and the centralized method are mathematically equivalent, thus the constructed PDFs of the two method are exactly the same.
The cost of preserving privacy is the increase of communication traffic. Set , and , then in the entire calculation process of the two method, the upstream and downstream total communication traffic of a WF are given in Table I. Since communications occur in every iteration of PDEM algorithm for every observation, thus there is a significant increase for the communication traffic of the proposed method when compared to the centralized method. However, the total communication traffic is still very small and can be fully satisfied under the current bandwidth conditions.
Centralized Method  Proposed Method  

Upstream Traffic  0.08 Mb  8.93 Mb 
Downstream Traffic  Mb  27.82 Mb 
The entire process of the proposed method does not require interaction of the WFs with raw data, thus the data privacy is protected. Meanwhile, the proposed method and the centralized method are mathematically equivalent. The communication traffic of the proposed method has increased, but the total traffic is still very small and can be satisfied.
References
 [1] Z. Wang, C. Shen, Y. Xu, F. Liu, X. Wu, and C. C. Liu, “Risklimiting load restoration for resilience enhancement with intermittent energy resources,” IEEE Transactions on Smart Grid, pp. 1–1, 2018.
 [2] X. Lin, C. Clifton, and M. Zhu, “Privacypreserving clustering with distributed em mixture modeling,” Knowledge and Information Systems, vol. 8, no. 1, pp. 68–81, Jul 2005. [Online]. Available: https://doi.org/10.1007/s1011500401487

[3]
R. Singh, B. C. Pal, and R. A. Jabr, “Statistical representation of distribution system loads using gaussian mixture model,”
IEEE Transactions on Power Systems, vol. 25, no. 1, pp. 29–37, Feb 2010.  [4] C. Clifton, M. Kantarcioglu, J. Vaidya, X. Lin, and M. Y. Zhu, “Tools for privacy preserving distributed data mining,” SIGKDD Explor. Newsl., vol. 4, no. 2, pp. 28–34, Dec. 2002. [Online]. Available: http://doi.acm.org/10.1145/772862.772867
Comments
There are no comments yet.