In this tutorial, we’ll show how to root a tree.

Typically, **trees are stored as hierarchical structures with one node serving as the root:**

The root acts as the entry point through which we can reach all other nodes by following parent-child edges.

However, trees can sometimes be stored as graphs, meaning they lack a root but instead have an adjacency list or matrix.

Our goal is then to **convert such a graph into a tree.**

**To make node the root, we use depth-first search (DFS) to visit all the nodes starting from :**

**The stack contains the candidate leaves of the tree**. These nodes are the ones whose children we know nothing about yet. In each iteration, we check the deepest candidate to determine if it has any children. If it does, we add them to the stack. Otherwise, we confirm the node as a leaf. In either case, we remove the node from the stack because it’s no longer a candidate.

Once we traverse all the vertices, we’ll have a tree spanning the entire input.

Let’s root the following graph at node 5:

At first, the stack contains only the root, node 5. Then, we get its children, nodes 2, 6, and 8, add them to the tree, and push them onto the stack. Afterward, we pop node 2, visit its child, add it to the tree, and push it onto the stack. Continuing this process, we get the following:

The algorithm stops when the stack becomes empty.

Let be the maximal number of children a node can have and the total number of nodes. **The time and space complexity depend on the way the nodes are stored initially.**

**If we use the adjacency lists, we’ll traverse each edge twice and visit each node at most times.** That’s because the function iterates only over the nodes of the given node. If is constant with respect to , the time complexity is since a tree with nodes has edges. The space complexity will also be .

However, if we use the adjacency matrix, we’ll take memory to store the nodes. The function will then traverse all the bits in the corresponding row to find the neighbors of . So, the time complexity will also be .

Our algorithm assumes the root is already chosen. **It’s often a good idea to choose the center of the tree as its root,** as that minimizes the resulting tree’s height.

**DFS is effective only if the input nodes form a tree**. If there are cycles, DFS gets stuck in a loop. To avoid that, we can mark each node the first time we add it to the tree.

Another issue is that the nodes might be disconnected. In this case, will span only the nodes in the same component as the chosen root node.

In this article, we showed how to root a tree using the depth-first traversal with a time complexity of . However, if the input nodes don’t form a tree, this approach won’t work without modifications.

The post How to Root a Tree? first appeared on Baeldung on Computer Science. ]]>In this tutorial, we’ll explain what is independent component analysis (ICA). This is a powerful statistical technique that we can use in signal processing and machine learning to filter signals. Besides the explanation of the ICA concept, we’ll show a simple example of the problem that ICA solves.

The simplest way to understand the ICA technique and its applications is to explain one problem called the “cocktail party problem”. In its simplest form, let’s imagine that two people have a conversation at a cocktail party. Let’s assume that there are two microphones near both people. Microphones record both people as they are talking but at different volumes because of the distance between them. In addition to that, microphones record all noise from the crowded party. The question arises, how we can separate two voices from noisy recordings and is it even possible?

One technique that can solve the cocktail party problem is ICA. **Independent component analysis (ICA) is a statistical method for separating a multivariate signal into additive subcomponents. It converts a set of vectors into a maximally independent set.**

Following the image above, we can define the measured signals as a linear combination:

(1)

where are independent components or sources and are some weights. Similarly, we can express sources as a linear combination of signals :

(2)

where are weights.

Using matrix notation, source signals would be equal to where is a weight matrix, and X are measured signals. **Values from are something that we already have and the goal is to find a matrix such that source signals are maximally independent. **Maximal independence means that we need to:

- Minimize mutual information between independent components or
- Maximize non-Gaussianity between independent components

To successfully apply ICA, we need to make three assumptions:

- Each measured signal is a linear combination of the sources
- The source signals are statistically independent of each other
- The values in each source signal have non-Gaussian distribution

Two signals and are statistically independent of each other if their joint distribution is equal to the product of their individual probability distributions and :

(3)

From the central limit theorem, a linear combination between two random variables will be more Gaussian than either individual variable. If our source signals are Gaussian, their linear combination will be even more Gaussian. The Gaussian distribution is rotationally symmetric, and we wouldn’t have enough information to recover the directions corresponding to original sources. Hence, we need the assumption that the source signal has non-Gaussian distribution:

To estimate one of the source signals, we’ll consider a linear combination of signals. Let’s denote that estimation with :

(4)

where is a weight vector. Next, if we define we have that:

(5)

From the central limit theorem, is more Gaussian than any of the and it’s least Gaussian if it’s equal to one of the . I**t means that maximizing the non-Gaussianity of will give us one of the independent components.**

One measurement of non-Gaussianity can be kurtosis. Kurtosis measures a distribution’s “peakedness” or “flatness” relative to a Gaussian distribution. When kurtosis is equal to zero, the distribution is Gaussian. For positive kurtosis, the distribution is “spiky” and for negative, the distribution is “flat”.

To maximize the non-Gaussianity of we can maximize the absolute value of kurtosis

(6)

To do that, we can use the FastICA algorithm. **FastICA is an iterative algorithm that uses a non-linear optimization technique to find the independent components. **Before applying this algorithm, we need to do centering and whitening the input data. It ensures that the mixed signals have zero means and that the covariance matrix is close to the identity matrix.

There are several other algorithms for solving ICA:

- Infomax – maximizes the mutual information between the mixed signals and the estimated independent components
- The joint approximated diagonalization of eigenmatrices (JADE) – separates mixed signals into source signals by exploiting fourth-order moments
- Particle swarm optimization (PSO) – heuristic optimization algorithm that searches the mixing matrix that separates the mixed signals into independent components

ICA has a wide range of applications in various fields, including:

- Signal processing for speech, audio, or image separation. We can use it to separate signals from different sources that are mixed together
- Neuroscience – to separate neural signals into independent components that correspond to different sources of activity in the brain
- Finance – with ICA is possible to identify some hidden features in financial time series that might be useful for forecasting
- Data mining – it’s possible to find patterns and correlations in large datasets

In this article, we described the ICA technique by providing a simple example of the problem it solves. We also presented a mathematical definition and explained important terms that we used. ICA is a powerful technique that might be very useful in signal analysis and can uncover hidden patterns.

The post What is Independent Component Analysis (ICA)? first appeared on Baeldung on Computer Science. ]]>**There are several possible implementations of the stack data structure, based on fixed-size arrays, dynamic arrays, and linked lists**.

In this tutorial, we’ll implement the stack using the fixed-size array representation.

In this representation, the stack effectively consists of the following data:

- – the fixed-size array for keeping the stack elements
- – an integer counter holding the index of the stack’s top element
- – an integer value that defines the size of the array

In our discussion, we assume the array to be 1-indexed. So, the first inserted element of the stack has index 1. Similarly, the last inserted element has an index equal to the .

Let’s go over the process of stack modification so we understand how the implementation operates. Assume we have an empty stack of integers with :

Let’s add integer 11 to the stack:

Next, we add two more integers, 7 and 25, to the stack. **Note how changes according to the elements added**:

Finally, we remove the top element of the stack and get the following stack. **Note that we do nothing with the element previously pointed by the top (25 in this case)**. We only update the counter, and the previous top element will just get overwritten by a new value when new elements are added to the stack:

We ignore adding new elements to the stack when it’s full. Similarly, we do nothing when requests to remove elements and the stack is empty. Practically, it’s worth adding error-handling logic in these situations.

Now, we’ll show how to implement the stack operations using the fixed-size array representation. The pseudocodes are short and easy to understand. **Additionally, it’s straightforward to translate the pseudocodes to a programming language implementation**.

Below we can find the pseudocode of the *push* operation. The operation first checks if there’s a space for the new element. Then, it updates the *top* counter and adds the new element on top of the stack:

The *pop *operation’s logic is the opposite of *push. *First, it checks if there’re elements in the stack. Then, it updates the *top* counter to show the new top element of the stack:

*Top *simply returns the top element of the stack:

To check if the stack contains any elements, the *empty* operation is used:

The* full *operation checks another boundary condition – whether the stack is full and can’t accept new elements to be added:

The immediate benefit of the implementation is its easiness – it’s quite straightforward to understand and implement. Another benefit is the restriction on the number of elements that can be put into the stack. **Consequently, the implementation naturally fits applications where the stack size needs to be limited**.

**On the other hand, the same feature of the limited size introduces a weakness – the implementation is hard to use for cases when the elements count is variable and isn’t known beforehand**. In this case, reserving a big chunk of memory may be redundant. Contrarily, reserving a small chunk of memory may cause the to stack run out of memory, and we may have to use multiple stacks. **Dynamic array and linked list implementations don’t have this limitation as they grow their capacity as needed**.

In this topic, we’ve presented the fixed-size array representation of stacks. Additionally, we’ve provided the pseudocodes for the operations of the stack based on the representation. Finally, we’ve pointed out the strengths and weaknesses of the implementation.

The post Array Implementation of Stacks first appeared on Baeldung on Computer Science. ]]>In this tutorial, we’ll discuss secure computation and some of the practical challenges that it tries to address.

Further, we’ll discuss key cryptographic constructions that have been developed to address these challenges. In the process, we’ll understand their practical applications.

Cryptography has been in existence for a reasonably long period now, it even predates the advent of computers. Of course, modern cryptography kicked off with the development of computers. But, the **focus was mostly on securing data in transit and data in storage**.

Over the last few decades, more and more sensitive data has been generated, processed, transferred, and stored in cloud environments. Moreover, distributed and decentralized computing environments have created **new challenges for cryptographic systems**.

This is where secure computation plays an important role. Secure computation **refers to computations where the data remains secure throughout**. The emphasis here is on the security of data while in use. There are several practical applications of secure computation.

A data application typically fetches encrypted data from some form of storage on a secure network. Then, it has to decrypt the data to perform computations on them. Further, it may encrypt the outcome and send it to the storage over the secure network:

While it serves the purpose for a large part, it still **leaves the data vulnerable during computation**. It may not be desirable for some use cases dealing with sensitive data. What if it were possible to perform computations more securely, like without decrypting them?

Distributed data computations typically involve two or more parties. Each party may be holding their data securely through encryption. In a typical scenario, one of the parties plays a central role by fetching data from others, decrypting them, and then performing the computation:

This implicitly **requires trust between the central party and all other parties**. However, there can be situations where each party would like to keep their data secret from the other while still participating in the computation. Is this even possible?

The challenges we’ve discussed in secure computing have been the key areas of research in cryptography for a while now. Several cryptographic constructs have been created to address some of these challenges. We’ll discuss two key cryptographic constructs in this tutorial.

Homomorphic encryption is a type of encryption that **allows us to perform computations on the encrypted data without decrypting them first**. The result of such computation remains in the encrypted form:

Interestingly, if we decrypt the result, it’ll be identical to the output of the exact computation had we done this on the unencrypted data in the first place.

Homomorphic encryption **enables a wide range of outsourced services for sensitive data**. For instance, we can send medical data to a third party for predictive analysis without worrying about data privacy concerns.

The word homomorphism in algebra is a structure-preserving map between two algebraic structures of the same type. Hence, it describes the transformation of one data set into another while preserving relationships between elements in both sets:

Now, extending from this definition, we can think of the **encryption and decryption functions as homomorphism between plaintext and ciphertext spaces**. This is the fundamental working principle behind homomorphic encryption.

In practice, homomorphic encryption works best with data represented as integers while using operations like addition and multiplication. It requires a few interactions and uses arithmetic circuits focusing on additions and multiplications.

Every operation is a mathematical function, and **every function can be converted to an equivalent circuit**. Hence, for every function , there is some circuit such that gives the same output. The circuit model of computation makes the construction of an encryption scheme simpler.

Homomorphic encryption includes several encryption schemes that allow us to perform different classes of computations over encrypted data:

**Partially Homomorphic Encryption (PHE)**: These include the schemes in which we can perform one defined operation like multiplication or addition on the ciphertext an infinite number of times. These are relatively simpler algorithms to design.**Somewhat Homomorphic Encryption (SHE)**: These include the schemes which allow us to perform a limited number of operations like addition and multiplication on the ciphertext. These algorithms are relatively more difficult to design.**Fully Homomorphic Encryption (FHE)**: Finally, these are the schemes where we can perform an infinite number of operations like addition and multiplication on the ciphertext. These are the most elusive homomorphic algorithms to design.

Fully homomorphic encryption is often referred to as the holy grail of cryptography. It’s justifiable, considering the effort cryptographers have put into this for years. The idea of fully homomorphic encryption dates back to 1978, shortly after the RSA scheme was published.

After this, almost for three decades, **researchers came up with different algorithms with partial results**. For instance, the ElGamal cryptosystem (offers an unbounded number of modular multiplication), and the Goldwasser-Micali cryptosystem (offers an unbounded number of exclusive-or operations).

It wasn’t until 2009 that the first plausible construction of FHE was proposed by Craig Gentry. His idea was based on lattice-based cryptography:

It was soon followed by another construction by Marten van Dijk et al. in 2010 inspired by the earlier works of Craig Gentry, Levieil-Naccache, and Bram Cohen.

The key idea in the first-generation cryptosystems was to construct a somewhat homomorphic cryptosystem and then convert it to a fully homomorphic cryptosystem using bootstrapping. However, as more operations are performed on the ciphertext, the noise it contains keeps increasing.

The **second-generation cryptosystems were able to provide a much slower growth of noise** during homomorphic computations. These included the Brakerski-Gentry-Vaikuntanathan (BGV) scheme proposed in 2011 and the Brakerski-Fan-Vercauteren (BFV) scheme proposed in 2012, amongst others.

The third-generation cryptosystem was proposed by Craig Gentry, Amit Sahai, and Brent Waters (GSW) in 2013. The critical aspect of this scheme is that it avoids the expensive re-linearization step in homomorphic multiplication. It was later improved by FHEW in 2014 and TFHE in 2016.

Soon the fourth-generation cryptosystem was proposed by Cheon, Kim, Kim, and Song (CKKS) in 2016. It’s an approximate homomorphic cryptosystem that supports block floating point arithmetic. The search for the holy grail continues and will continue in the near future!

Several open-source **libraries offer implementations of the key fully homomorphic encryption schemes from different generations**, like BGV/BFV (second generation), FHEW/TFHE (third generation), and CKKS (fourth generation).

Shail Halevi and Victor Shoup developed the Homomorphic Encryption library (HElib) at IBM. It’s written in C++ and implements the Brakerski-Gentry-Vaikuntanathan (BGV) scheme and the Approximate Number scheme of the Cheon-Kim-Kim-Song (CKKS).

Simple Encrypted Arithmetic Library (SEAL) is another library developed by Microsoft Research. It’s also written in C++ and comes with implementing BGV/BFV schemes and the CKKS scheme. Microsoft SEAL has no required dependencies as such.

OpenFHE is an open-source project that provides implementations of the leading FHE schemes. It succeeds from earlier libraries like PALISADE (2010) and SIPHER (2014). OpenFHE was finally released in 2022 with several additional features like post-quantum public-key encryption.

With all the promises that fully homomorphic encryption offers, it still has very limited use. This is primarily because **all fully homomorphic schemes carry a large computational overhead**. This is defined by the computation time in the encrypted version vs the clear version.

Reportedly, homomorphic operations using the first version of HElib ran a trillion times slower than the corresponding plaintext operations. Of course, that has improved significantly since then, and the research to improve the performance continues.

There are other operational issues that applications employing fully homomorphic encryption tend to face. For instance, it **may not be generally efficient to run very large and complex algorithms** homomorphically, even when using a very efficient scheme.

Then there are complexities related to supporting multiple users in a system employing fully homomorphic encryption. The system may rely on an internal database. Now, for securing each user’s data, the system may have to support a separate database for every user.

Secure multi-party computation is a subfield of cryptography that **deals with methods for parties to jointly compute a function over their input while keeping those inputs private**:

In traditional cryptography, the objective is to secure communication or storage from an outside adversary. However, the objective of secure multi-party communication is to protect participants’ privacy from each other.

Secure multi-party computation allows data scientists and analysts to **compliantly, securely, and privately compute on distributed data without ever exposing it**. It also has a wide range of practical applications, like electronic auctions and electronic voting.

The basic formulation of a multi-party computation involves a given number of participants (, , …, ). Each participant has their private data (, , …, ). The objective is to compute the value of a public function :

For instance, here, the objective is to compute the average salary while keeping the salary of each participant secret from other participants.

Now, a **secure multi-party computation protocol should exhibit some basic properties** for it to be practically useful:

**Input Privacy**: This implies that no information about the private data held by the participants can be inferred from the messages sent during the execution of the protocol.**Correctness**: This implies that any subset of adversarial parties willing to deviate from the protocol should not be able to force the honest parties to output an incorrect result.

Security is a key consideration in any multi-party computation protocol. Unlike traditional cryptographic applications, **the adversary in a multi-party protocol is one of the protocol’s participants**. The protocol changes significantly based on the type and number of such adversaries in a system.

Let’s assume there are participants in such a protocol where participants can be adversarial. Now, the solutions and protocols are very different if we can assume an honest majority, defined by the condition .

Also, we can categorize adversaries faced by such protocols based on how willing they are to deviate from the protocol. This gives rise to different forms of security:

**Semi-honest (Passive) Security**: In this model, the adversaries merely cooperate to gather information from the protocol while not deviating from the protocol specification. Protocols achieving security in this model are quite efficient.**Malicious (Active) Security**: In this model, the adversaries may arbitrarily deviate from the protocol execution in an attempt to cheat. Protocols that achieve security in this model provide a very high-security guarantee, although at the cost of efficiency.

Although special purpose protocols started to appear in the 1970s, the secure computation was formally introduced as secure two-party computation in 1982 by Andrew Yao. This was specifically created to solve the famous Millionaire’s Problem.

Later, in 1986 Andrew Yao came up with the general protocol for evaluating any feasible computation. These two papers by Yao later inspired the development of what became known as Yao’s Garbled Circuit Protocol.

**Yao’s protocol is secure against semi-honest adversaries and is extremely efficient** regarding the number of rounds. The function here is conceived as a Boolean circuit which is a collection of gates connected with three different types of wires:

Basically, Yao’s protocol suggested how to garble a circuit so that two parties could learn the output of the circuit and nothing else. Here, the sender prepares the garbled circuit and sends it to the receiver, who obliviously evaluates the circuit:

The receiver learns the encoding corresponding to both his and the sender’s output sending back the sender’s encoding. The sender completes his part of the output. He then sends the mapping from the receiver’s output encodings to bits to the receiver, allowing the receiver to obtain their output.

The two-party computation protocols were later generalized for multi-party by Goldreich, Micali, and Wigderson (GMW). The computation here is **based on the secret sharing of all the inputs and zero-knowledge proofs for a potentially malicious case**.

In methods based on secret sharing, the data associated with each wire is shared amongst the parties, and a protocol is then used to evaluate each gate. the function here is defined as a circuit over a finite field, also known as the arithmetic circuit.

Two types of secret sharing are generally used; Shamir Secret Sharing, and Additive Secret Sharing. In both cases, it’s possible to distribute a secret among several parties by distributing shares to each party, where shares are random elements of a finite field:

Secret sharing schemes can tolerate an adversary controlling up to parties out of total parties. The here varies based on the scheme and the type of adversary. For instance, the Shamir secret sharing scheme is secure against a passive adversary when and an active adversary when .

The **additive secret sharing scheme can tolerate the adversary controlling all but one party**, that is . This is while maintaining security against a passive and active adversary. SPDZ is a popular system that implements MPC with additive secret shares and is secure against active adversaries.

Over the years, several practical implementations of multi-party protocols have been developed. The Fairplay system was one of the first systems offering two-party computation based on Yao’s protocol and multi-party computation based on BMR protocol.

Later many **improvements came in the form of efficiency and techniques for active security**. For instance, the application of the garbling technique and “cut-and-choose” paradigm for achieving active security. This was proposed by Lindell and Pinkas and later implemented by Pinkas et al. in 2009.

The more recent improvements include focusing on highly parallel implementations based on garbled circuits designed to be run on CPUs with many cores. For instance, Kreuter et al. described an implementation running on 521 cores of a powerful cluster computer.

Multiple demonstrations of the production capability of SMPC systems have been done in the recent past. For instance, Boston University prepared the Boston Women’s Workforce Council Report in 2016 where the custodian of the original data were Bostan-area employers.

However, large-scale practical applications of SMPC remain elusive. There have been many efforts in this regard lately. For instance, Security through Private Information Aggregation (SEPIA), and Secure Computation API (SCAPI) intends to provide practical implementations of SMPC protocols.

As we’ve seen, SMPC protocols generally use one of the secret sharing schemes. Now, secret sharing involves communication and connectivity between all parties. This obviously leads to **higher communication costs as compared to plaintext computation**.

As the nature of these protocols is generally complex, they tend to have computational overhead. For instance, random numbers are generated in these protocols in order to ensure the security of the computation. This adds to **the computational overhead and slows down the run time**.

Lastly, as we’ve seen earlier, SMPC protocols are vulnerable to attacks from colluding parties. Now, there are always stricter protocols providing better security against such adversaries. But, these come at the cost of efficiency and impede the practical use of the protocols.

Secure computation is an old yet very active field of research within cryptography. We went through homomorphic encryption and secure multi-party computation in this tutorial. These technologies have **matured to a level where we can use them at scale in production** systems to a certain extent.

In their basic forms, these technologies address different aspects of secure computation. But they do provide very similar functionalities. In fact, there have been **attempts to adapt homomorphic encryption to use multiple keys that enable multi-party computation**.

There are also cases where these technologies have been combined to serve a greater purpose. For instance, there is a **construction mixing multi-party computation with homomorphic encryption**. Here, homomorphic encryption is used as a subroutine to produce correlated randomness.

Their application is also guided by the performance of these technologies. The **performance here depends on the relative cost of computation** and bandwidth. For high-bandwidth settings, multi-party computation tends to outperform homomorphic encryption.

Together these technologies are **finding widespread use in industries like finance and healthcare,** where data privacy has always been a concern for outsourcing data analysis jobs. Moreover, innovative and secure ways are coming up for distributed voting, private bidding, and auctions.

Homomorphic encryption and multi-party computation are not the only technologies under the umbrella of secure computation. Several others serve more specific purposes. Let’s have a very quick look through some of the key technologies here:

**Functional Encryption**: Functional Encryption is a generalization of public-key encryption in which processing a secret key allows one to learn a function of what the ciphertext is encrypting. Hence, it allows a party to evaluate a specific function on an encrypted input during its decryption.**Zero-knowledge Proofs**: Zero-knowledge proof is a method in which one party (the prover) can prove to another party (the verifier) that a given statement is true while the prover avoids conveying any additional information apart from the fact that the statement is indeed true.**Differential Privacy**: Differential privacy is a system for publicly sharing information about a dataset by describing the patterns of groups within the dataset while withholding information about individuals in the dataset.**Oblivious Transfer**: Oblivious transfer is a method in which a sender transfers one of the potentially many pieces of information to a receiver but remains oblivious as to what piece (if any) has been transferred.

In this tutorial, we discussed secure computation and the challenges that are present in this field of study. Further, we covered homomorphic encryption and secure multi-party computation in little detail. We went through several protocols and their implementations that are available for our use.

The post Introduction to Secure Computation first appeared on Baeldung on Computer Science. ]]>In this tutorial, we’ll explain five concepts from graph theory: eccentricity, radius, diameter, center, and periphery.

We’ll begin by defining the shortest path distance since the definitions of these concepts depend on it.

Let’s say we have a graph , where denotes the nodes, the edges, and their weights (or lengths).

**The length of the shortest simple path between any two nodes is a distance metric** defined on the graph . Thus, the triangle inequality holds:

That’s because is the length of the shortest path from to , so any path between them is at least as long.

**The eccentricity of a node is the maximum of its distances to other nodes:**

For example, the eccentricity of the node in the following graph is 8:

This is the length of the path .

**The diameter of a graph is the maximum eccentricity of its nodes:**

**We define the radius as the minimum eccentricity:**

It’s worth noting that these two terms have multiple meanings. Diameters can also denote the longest paths in a graph, just as a radius can be any path whose length is equal to the graph’s minimum eccentricity.

In our example, the diameter is , and the radius is :

We can prove that **the diameter is bounded from above by twice the radius:**

This follows from the triangle inequality. Let and be two distinct nodes whose distance is equal to the diameter:

Let be a central node of . The triangle inequality tells us that:

Since is a central node, it holds that . Therefore:

**A node is peripheral if its eccentricity is equal to the graph’s diameter.** All the nodes with that property constitute the graph’s periphery:

On the other hand, **the center of a graph is comprised of the nodes whose eccentricity is equal to the graph’s radius:**

In our example, the periphery contains , , and , while the only central node is :

So, **the peripheral nodes are at the opposite ends of the graph**, while the central nodes are in between.

An example of when these concepts may be useful in determining the optimal placement of facilities on a graph. If we model a city map as a graph, it makes sense to build a facility at a central node because that will minimize the longest distance a citizen will have to travel to reach the facility.

Conversely, if the goal is to maximize the distance between two objects, we’d place them on the locations of two peripheral nodes on opposite sides of the town.

In this article, we explained several concepts from graph theory: eccentricity, radius, diameter, center, and periphery.

The diameter is always smaller than twice the radius, and the center minimizes the worst-case distances to other nodes in a graph.

The post Eccentricity, Radius, Diameter, Center, and Periphery first appeared on Baeldung on Computer Science. ]]>In computer science, a graph is a mathematical structure of objects that represent the relationships between them. **Thus a graph includes nodes (objects) with edges (connections) that denote the relationships between them**. Moreover, the edges of a graph can be directed or undirected.

In this tutorial, we’ll introduce the bridges in a graph, discuss their importance and discuss algorithms to detect them.

Suppose a graph , where and denote the vertices and edges, respectively. Therefore a bridge is defined as an edge if no other path between the u and v nodes exists. **So, if this edge is removed from the graph, the vertices u and v will no longer be reachable from one another by any other path, and thus two sub-graphs are generated**.

There are several kinds of bridges, each with its own characteristics. The most common ones are tree edges that belong in a tree graph structure. **In this particular graph form, all links are bridges. Thus, removing one edge disconnects the whole graph**. Moreover, there are the forward edges that belong to DFS trees. Forward edges link a node to one of its descendants in the DFS tree structure and, if removed, increase the number of connected components.

As we can see, the edge is a bridge because removing it disconnects the graph and creates two sub-graphs:

The two sub-graphs include the nodes , , , and , , . Now there is no path from one sub-graph to the other, and the nodes , , cant be reached in any way from the nodes , , , and vice versa.

Note that if we removed, for example, the edge , paths would remain for every node to reach everyone else, and therefore we understand that the edge isn’t a bridge in this graph.

Moreover, bridges can be useful in various situations, such as determining the least spanning tree, identifying loops in a graph, and even addressing computational geometry issues.

Understanding bridges in a graph has the potential to have far-reaching significance in industries such as telecommunications, computer science, and transportation.

Several algorithms can be used to detect bridges in a graph. All of these algorithms show certain benefits and limitations and depend on the specific problem and the characteristics of the graph. **The most common and straightforward algorithms are DFS (Depth-First Search) and BFS (Breadth-First Search), which are used to find all the articulation points or bridges in a graph**.

DFS traverses the graph depth-first, keeps track of the low-link values of each vertex, and compares it with the values of the previously checked vertices. So if the low-link value of the current vertex is less than the low-link value of the adjacent vertex, then this edge is probably a bridge. DFS algorithm has time complexity and space complexity.

BFS follows a similar idea. It performs a breadth-first traversal of the graph, keeps track of the level of each vertex, and compares it with the level of the previously checked vertices. So, if the level of the current vertex is greater than the level of the adjacent vertex, then this edge is probably a bridge. BFS algorithm has time complexity and space complexity.

Another common method to detect bridges in a graph is Tarjan’s algorithm.

Note that it is not necessary for a graph to contain a bridge. However, large graphs of real-world networks usually contain a certain number of bridges to ensure the graph’s connectivity.

In this article, we introduced bridges in a graph. In particular, we talked about algorithms for finding bridges, showed a basic example of a bridge, and highlighted the importance of finding bridges in graphs.

Bridges help to understand the main structure of a graph along with its most significant components and are critical in analyzing the graph’s connectivity design and robustness.

The post What Are Bridges in a Graph? first appeared on Baeldung on Computer Science. ]]>We often can observe the concepts of threat, vulnerability, and risk in our everyday lives. For example, in the real world, we have malicious people, dangerous animals, and infrastructure problems as threats. Being distracted or forgetting to lock our house before going to another place are, in turn, vulnerabilities. And the combination between threats and vulnerabilities is a risk, for instance, getting injured or stolen.

**These concepts, in turn, have a particular place in the digital world also.** In such a context, they have specific characteristics and objectives.

In this tutorial, we’ll investigate how threats, vulnerabilities, and risks occur in a computing scenario. First, we’ll examine why these concepts exist in the digital world while understanding what assets are. Thus, we’ll explore some technical definitions of threat, vulnerability, and risk. Then, we’ll discuss some protective measures to avoid dangerous scenarios in computing. Finally, we’ll outline and create relations between these concepts in a systematic summary.

**In recent decades, computing has become crucial for human lives.** Furthermore, beyond offline computing, the rising of computer networks enabled people from different places efficiently communicate and exchange data. This phenomenon increased the popularity of the digital world even more.

Due to their efficiency, computer programs got responsible for processing sensitive data and tasks, such as bank transactions, geolocalization references, and medical records. In this way, the digital data, and even the processing routines applied to them, turned into assets for people and organizations that own them.

It is relevant to highlight that, in our context, we define an asset as anything that represents any kind of value for its owner. So, a digital asset is value-added data or process that is executed or stored digitally.

**Thus, the existence of digital assets is the reason for the existence of digital threats. **These threats aim to exploit vulnerabilities, which creates digital risk scenarios.

Let’s understand, in detail, the definition and relation of threats, vulnerabilities, and risks from the digital world in the following sections.

A computing threat embraces several actions and events, typically enabled by a vulnerability, that may cause undesired effects on particular software or hardware. **In general, we have two categories of threats: intentional and non-intentional.**

Intentional threats represent malicious entities, hardware, or software that execute actions to deliberately harm a computing system, usually trying to get some advantage by doing that. **Hackers and crackers, for example, are intentional threats since they aim to create and execute attacks, such as ****phishing****, ****SQL injection****, and ****denial of service****.**

But, beyond the minds behind an attack, the resources developed that enable or automatically execute a malicious action are also intentional threats. Examples of such resources are any kind of malware (such as viruses, worms, and trojans) or adware.

Finally, we also have non-intentional threats which are related to accidental events. For instance, making an online service unavailable due to a broken fiber during a storm is a non-intentional threat: no one moved efforts to deny the service, but the circumstances made it unavailable anyway.

**A vulnerability is any weakness of a computing system or shortcomings in a security system.** So, threats see vulnerabilities as an easy way to achieve their objectives, turning them into exploits for their malicious actions.

Vulnerabilities can arise in computing systems for several reasons. Some of them are:

**Complexity**: the more complex a system, the more challenging to make it holistically secure; thus, the higher is the probability it has a non-identified vulnerability**Connectivity**: the greater the number of applications communicating with other entities, the higher the probability of a system facing threats coming from the network; even with a secure system, the applications executing may represent vulnerabilities**Maintenance**: the lack of maintenance or running outdated software may create vulnerabilities in computing systems. Many patches include vulnerability fixes, so applying them is essential

Furthermore, we can analyze vulnerabilities regarding which aspect of a computing system they relate. Some examples are shown next:

**Hardware**: vulnerabilities regarding the physical components of a computing system, such as over-heating, wet or dusty places, and outdated equipment usage**Software**: vulnerabilities in terms of software programs running in a computing system. For example, lack of tests, insecure coding, and software engineering problems**Network**: vulnerabilities that emerge from networked communications. Some examples are the usage of non-encrypted protocols and adopting an insecure network architecture**Staff**: vulnerabilities related to computing systems’ operators, administrators, and developers. They commonly emerge as inadequate recruiting processes and improper security awareness

It is relevant to highlight that avoiding vulnerabilities is as important as protecting our systems from threats. A computing system with few vulnerabilities gives little room for threats acting on them.

In short, we can understand risks as the potential problems caused due to a threat exploiting a vulnerability, such as the destruction, loss, or theft of an asset of a computing system.

**Thus, we can see risks as the product of three factors: the existence of assets, exploitable vulnerabilities, and threats that can exploit the available vulnerabilities.** The following Venn diagram shows risks in perspective of these three factors:

**We should note, however, that being at risk does not mean being attacked.** A computing system can be at constant risk and never have a problem or suffer with a concrete consequence, although the probability of it happening is high.

Protective measures for computing systems consider eliminating at least one of the three risk factors described in the previous section. But, we usually can not remove the assets from a computing system since the system itself is considered an asset (taking into account botnets, for example).

Furthermore, threats will exist regardless of our actions. Hackers create malicious programs and attack strategies every day. In this case, the best we can do is try to detect when some threat is allocated or acting in our computing systems and remove it. Executing frequent scans with antivirus software in the computing systems is an efficient way to do that.

So, the best we can do as operators or administrators is moving efforts to detect and remove vulnerabilities in our computing systems. There are multiple manners to do that. Among them, we can cite keeping the software running in our systems updated, following the best practices for deploying security solutions (such as firewalls and antiviruses), and installing only auditable software (or from trusted sources).

Moreover, we can use tools that scan computing systems to find vulnerabilities. These tools are great for getting an overview of our systems. But, they require expert judgment since they typically detect several false-positive vulnerabilities.

**In the modern world, data and services in computing systems have become valuable assets. **Due to this fact, malicious entities emerged, generating digital threats that aim to capture assets. These threats, however, take advantage of computing systems’ vulnerabilities, accessing them and executing undesired processes there.

The simple existence of assets, threats, and vulnerabilities let a computing system at risk. Specifically, risks regard the possibility of a system being attacked once all the necessary factors for it exists.

The following table summarizes the concepts and definitions of threats, vulnerabilities, and risks:

**In this tutorial, we studied threats, vulnerabilities, and risks.** First, we examined which are the primary conditions and motivations for the existence of threats, vulnerabilities, and risks. In the following sections, we in-deep investigated each one of such concepts. So, we saw some protective measures we can adopt for our computing systems. Finally, we reviewed all the studied concepts in a systematic summary.

We can conclude that threats will always exist in a connected digital world. Thus, our responsibility as systems operators and administrators is to reduce the number of vulnerabilities to the minimum possible, thus minimizing the risks of being attacked and losing valuable assets.

The post Threat vs. Vulnerability vs. Risk first appeared on Baeldung on Computer Science. ]]>In this tutorial, we’ll show how to arrange a numeric array so that its negative elements come before its positive ones.

In doing so, we’ll require the solution to keep the relative order of the same-sign elements.

Let’s say we have an array . Our goal is to rearrange it into so that all elements for are negative, and all that come after , if any, are positive.

Simultaneously, **we want to keep the relative order in the negative and positive sub-arrays.** So, for any , it should hold that and or that , but was to the left of in the original array.

For example, if , the correct rearrangement is .

Since , we’ll include any element equal to zero in the positive sub-array.

**We can start with two empty queues**: for the positive and for the negative elements of . **We populate them by iterating over and placing its elements into the queues based on their signs.**

After that, we append to to get :

Since we iterate over the input array once and concatenate two queues whose sizes sum to , **the time complexity is **. The only requirement is that storing a number in a queue be a constant-time operation.

**The space complexity is also linear** because we need space for and memory for queues.

If we don’t want to take extra memory, **we can rearrange the array using an algorithm similar to MergeSort.**

First, we split the array into pairs of consecutive elements:

If is odd, we’ll let be the last sub-array. For simplicity, we’ll assume is even since that won’t affect the correctness of this approach but will make it easier to explain.

In the next step, we swap the elements in each sub-array if the first one is positive and the second negative. So, we ensure that the signs are:

Afterward, **we merge and arrange consecutive pairs of previously processed sub-arrays.** That way, we arrange sub-arrays with four elements:

Processing larger and larger sub-arrays, we eventually get to rearrange the entire input array .

Let’s focus on two consecutive sub-arrays of in an iteration of the rearrangement process:

Let be the left’s sub-array negative part, and let contain its positive elements. Similar goes for the right sub-array and its parts and .

To merge the two sub-arrays to be in line with our requirements,** we need to swap with while preserving relative orders in both:**

**First, we reverse and separately. Then, we reverse them together.** To see why that works, let and .

So, we start with this:

If we first reverse the sub-arrays separately, we get:

Reversing them together gives us:

which is what we want to achieve.

Let () be the sub-array containing the elements of starting with and ending with (if , it stops at ). Here’s the pseudocode of our iterative MergeSort-like approach we call MergeArrange:

**We start with sub-arrays of length to sort the consecutive pairs.** Then, **we double the length iteratively until covering the entire array.**

In each iteration, the left sub-arrays for merging start at indices for some integer since two consecutive sub-arrays together contain elements. The corresponding right sub-arrays start at elements to the right.

Let’s use the example from the start:

In the first iteration, , so we have this split:

After merging, we get:

In the second iteration, , so we merge and arrange the consecutive pairs:

In the third iteration, , so we process the remaining two sub-arrays of length 4:

Now, we get . That means there’s only one sub-array, i.e., the entire array. Therefore, we stop since it’s in the order we want it to be.

There are at most iterations, each of which makes at most swaps. So, **the time complexity is .**

**We use only a constant number of auxiliary variables for looping and swapping, so the extra space complexity is .** However, t**he total space complexity is still ** since we need memory to store the initial array.

So, our **MergeArrange algorithm has the same complexity as MergeSort.** That isn’t a surprise because it works like bottom-up MergeSort sorting its input array using instead of the actual values .

In fact, any stable sorting algorithm using to sort the will do the job.

In this article, we showed two ways to rearrange a numeric array so that its negative elements get before its positive ones and those equal to zero, keeping the relative order of the elements in the same sub-array.

Our first solution uses two queues and runs in time and space. The algorithm we derived from MergeSort uses additional memory but runs in time.

The post Rearrange an Array to Put the Negative Elements Before the Positive Ones first appeared on Baeldung on Computer Science. ]]>In this tutorial, we’ll learn about the Agile methodology in software engineering.

**Agile programming is characterized by iterative and incremental development. It focuses on building a minimum viable product (or MVP for short) as quickly as possible, getting continuous feedback from customers, and responding to changes in requirements or technology.**

Moreover, Agile emphasizes flexibility and collaboration when developing software. It is based on the Agile Manifesto, a document that specifies the four values and 12 principles of agile software development.

Some examples of the most well-known agile frameworks are Scrum, Lean, Kanban, and Extreme Programming (XP).

According to the Agile Manifesto, the four agile values are:

**Individuals and interactions over processes and tools**: even though having the right tools is important when developing software, having the right people who can communicate and interact with each other is more valuable and important when solving problems and developing software**Working software over comprehensive documentation**: agile prioritizes delivering working software to customers and gathering feedback from them over creating extensive documentation before developing software**Customer collaboration over contract negotiation**: when we communicate and get continuous feedback from customers, we make sure that the product is being developed according to their requirements so that the final product satisfies their needs**Responding to change over following a plan**: agile values having a dynamic and flexible plan when developing software so that when changes occur in the real world, and new information emerges, the strategy changes accordingly

According to the Agile Manifesto, the 12 agile principles are:

- Satisfy the customer through early and continuous delivery of valuable software
- Welcome changing requirements, even late in development
- Frequently deliver working software
- Business people and developers must work together daily throughout the project
- build projects around motivated individuals, give them the right environment and support, and trust them to do the job
- The best way to communicate information within the development team is through face-to-face conversation
- Working software is the primary measure of progress
- Constant and sustainable development
- Attention to design and technical quality will increase agility
- Attention to simplicity
- Self-organizing teams generate the best architectures
- Teams should self-reflect regularly and change their actions accordingly

The waterfall methodology consists of sequential phases, and each phase depends on the result of the previous one:

Agile is not necessarily better than waterfall, and each one has its own use cases:

- We use Agile when we want to deliver to market quickly and then frequently update the product based on feedback
- We use waterfall when we have a clear set of project requirements that are not going to change later on

In this article, we learned about the Agile approach, its values and principles, and the differences between Agile and waterfall models.

The post What Is Agile Programming? first appeared on Baeldung on Computer Science. ]]>In this tutorial, **we’ll discuss multiobjective algorithms and Pareto frontiers with examples.** We’ll present the general steps in a multiobjective algorithm and explain the importance of the Pareto frontier.

Finally, we’ll explore some advantages and disadvantages of multiobjective algorithms.

**Multiobjective algorithms are optimization techniques used to find optimal solutions with respect to multiple objectives or goals.** We can use these algorithms to solve problems across various fields, including engineering, robotics, design, and public policy.

We utilize multiobjective algorithms to solve multi-faceted problems that require trade-offs and prioritization. Additionally, these algorithms are particularly useful for solving complex problems with no single best solution but rather a set of solutions based on different objectives.

Let’s discuss an example. A company wants to find the optimal production level for multiple products while minimizing costs and maximizing profits. Therefore, the decision variables include the production levels of each product. Additionally, the objective function consists of the total cost and total profit.

Furthermore, the constraints include production capacity and market demand for each product. The solution to this problem would be a set of production levels that balances the trade-off between minimizing costs and maximizing profits.

**Some examples of multiobjective algorithms are evolutionary algorithms, genetic algorithms, particle swarm optimization, tabu search, and simulated annealing.**

A Pareto frontier is a graph that plots the best solutions for each objective. The name Pareto is derived from the 80-20 principle, which states that 80% of the outcomes come from 20% of the causes. A Pareto graph returns the best solutions for a given problem. It also provides insights into how the solutions are distributed among all the objectives.

In a multiobjective optimization problem, there may be multiple solutions that are considered optimal, as each solution may optimize one objective function at the expense of the others. **The Pareto frontier is a set of solutions that represents the best trade-off between all the objective functions.** A solution not dominated by any other solution in the feasible solution space is considered to be on the Pareto frontier.

Generally, we plot the Pareto frontier solution on a two-dimensional graph, with one axis representing an objective and the other representing the best solution for that objective. It can help illustrate the trade-offs between different objectives.

Let’s talk about an example. A company produces multiple products and wants to find the best design for each product. The company aims to minimize the cost of production and maximize the product’s performance.

The decision variables in this problem would be the design characteristics of each product, such as materials and manufacturing processes. The objective functions include the total cost of production and the performance of each product. The constraints contain manufacturing capabilities and market demand for each product.

The Pareto frontier for this problem would be a set of product designs representing the best trade-off between minimizing production costs and maximizing performance. **A design on the Pareto frontier would be one that can’t be improved in terms of performance without increasing the cost of production and vice versa.**

Now let’s talk about the general steps in a multiobjective algorithm:

The first step is to define the problem and objectives**.** This step defines the problem as a set of decision variables and constraints. Additionally, we define the objectives. Furthermore, the objectives should be quantifiable so that we can evaluate the performance of the solutions.

The second step is to initialize the population randomly**.** In this step, we generate a random population. Additionally, the population’s size depends on the problem’s complexity. The next step is to evaluate the generated solutions. We evaluate the solutions according to the objectives. It’s an important step as it allows the algorithm to rank the solutions based on their performance. The evaluation process can be computationally expensive. Therefore, it’s crucial to choose efficient evaluation methods.

**In order to identify the optimal solution, we sort the solutions according to the Pareto ranking.** A Pareto ranking compares the solutions based on their performance on each objective. We call a solution Pareto-optimal if any other solution in the feasible solution space doesn’t dominates it. In other words, it’s a solution that we can’t improve in one or more objectives without worsening other objectives.

Based on the Pareto ranking, we identify the Pareto frontier solutions. Additionally, **we identify** **the Pareto frontier solutions by removing dominated solutions.** Furthermore, the remaining solutions form the Pareto frontier, representing the set of non-dominated solutions considered the best trade-off solutions. The next step is to select the non-dominated solutions for reproduction. In order to implement the reproduction step and generate new solutions, we apply genetic operators such as crossover and mutation.

Finally, we add new solutions to the population. At this point, we check if the solution we generated met the stopping condition. If it satisfies the stopping requirement, we return the Pareto optimal solution. Otherwise, we go back to the evolution of the randomly generated solution step and start with a new set of solutions. Furthermore, we repeat this step until the stopping criterion is met.

Multiobjective algorithms have several advantages over traditional optimization methods. They’re best suited to situations where we must prioritize different objectives.

**Multiobjective algorithms can help us understand how changes to one objective affect other objectives.** Additionally, we can find the set of Pareto optimal solutions representing the best trade-off between all objectives. This allows decision-makers to choose the solution that best fits their needs. Furthermore, they can be applied to many areas and assist in decision-making across many fields. They’re applicable in engineering, robotics, design, public policy, and many other fields.

These algorithms can handle large-scale, complex problems with a large number of decision variables and objectives. Additionally, multiobjective algorithms are robust to noise in the objective functions, making them more reliable for real-world problems.

**Multiobjective optimization algorithms can provide a better decision-making process** by providing a set of alternatives for the decision-maker to choose from instead of a single solution.

Now let’s talk about the disadvantages. **Multiobjective algorithms are typically more complex than traditional algorithms, making them difficult to implement and understand.** Additionally, these algorithms often require making trade-offs between different objectives. In practice, trade-offs between different objectives are challenging to perform.

Due to the high computational expense, **multiobjective algorithms suffer from scalability issues.** Finally, it’s hard to visualize the solutions to multiobjective problems.

In this tutorial, we discussed multiobjective algorithms and Pareto frontiers with examples. We explored the general steps in a multiobjective algorithm and the importance of the Pareto frontier. Finally, we presented some advantages and disadvantages of multiobjective algorithms.

The post Defining Multiobjective Algorithms and Pareto Frontiers first appeared on Baeldung on Computer Science. ]]>