Explore the differences between high-level and low-level designs in software engineering.

The post High-Level Design vs. Low-Level Design first appeared on Baeldung on Computer Science.]]>In this tutorial, we’ll discuss low-level and high-level design types in software engineering.

An aerial view of a building shows its overall structure, residential floors, commercial spaces, and amenities and how they connect seamlessly. Similarly, a high-level design (HLD) is like the aerial view of a software system. It defines the overall architecture, major components, and how they interact. We can think of it as the blueprint that lays the foundation for our software architecture.

So, HLD is the outcome of an initial phase of the design process where architects, system designers, or project managers discuss the structure and components of a system at a high level. **HLD shows the overall architecture and functionality of the system without delving into implementation details. **To learn more about system architecture, check out our article on layered architectures.

Let’s dive into some key characteristics of HLD:

**It specifies database design, systems, services, platforms, and relationships among modules**- It contains a description of hardware, software interfaces, and also user interfaces
**A solution architect creates it**- It details the user’s typical process workflow, along with performance specifications
- It describes significant limitations and trade-offs of the system

Just like architects focus on the budgets and timeline for their building along with a high-level design, in this phase, we weigh the costs and benefits of architectural design choices for our software. Though focusing on functionality and scalability is important, neglecting the financial and resource implications can result in inefficiencies and budget overruns later in the development cycle. For teams to make informed decisions that align with project goals and constraints, they must also consider these factors. This involves development time, maintenance complexity, infrastructure requirements, and potential scalability issues.

Moving inside the building, we dive into the exact layouts, materials, plumbing, electrical wiring, and more. Similarly, a low-level design (LLD) fleshes out the implementation details of every component specified in an HLD, such as class diagrams, algorithms, database schemas, and error-handling mechanisms.

In simple words, **LLD is the next stage in the design process, where designers and developers translate the high-level design into detailed technical specifications.**

Some key characteristics of LLD are:

**It is created by the developer manager and designers**- It involves the actual logic of all the system components
**It includes a step-by-step plan for the system’s implementation**- It works as a guide for developers during coding, ensuring that system functionalities are accurate and efficient

We can say that LLD delves into the technical specifics, translating the high-level design into detailed implementation plans:

So, we have two design types:

Let’s summarize their differences:

Aspect |
Low-Level Design |
High-Level Design |
---|---|---|

Focus | Detailed implementation specifics | Overall system architecture and design |

Components covered | Individual components/modules | Major system components and their interactions |

Design Artifacts | Class diagrams, data structures, algorithms, database schemas, error handling | Architecture diagrams, component diagrams, interface definitions, data flow diagrams |

Input | High-Level Design (HLD) | Business requirements, constraints, and goals |

Output | Code-level implementation | High-Level Design (HLD) plan |

Time of implementation | After HLD, during the coding phase | Early in the software development lifecycle |

Example decisions | Data structure choices, algorithm design, database normalization | Architectural style (e.g., microservices, monolith), technology stack, integration points |

Also known as | Macro-level design | Micro-level design |

Outcome | Review record, function design, and database design | Program specification and unit test plan |

HLD sets the software system’s overarching architecture and functionality, providing a roadmap for development to fulfill the requirements, while LLD deals with the implementation aspects. Our article on requirements contains details on how to differentiate between functional and non-functional ones.

In this article, we explained low-level and high-level software designs. HLD provides the vision, and LLD gives the detailed blueprint for execution. Knowing how to create both plans is essential for successful software development.

The post High-Level Design vs. Low-Level Design first appeared on Baeldung on Computer Science. ]]>NaNs can occur during training ML models and mess it up. In this article, we learn the common causes and fixes we can apply.

The post Common Causes of NaNs During Training first appeared on Baeldung on Computer Science.]]>While training neural networks, we can sometimes get NANs (Not-a-Number) as output.

**In this tutorial, we’ll review the common causes of NaNs in training neural networks and discuss how to prevent this**.

Training a neural network means adjusting weights to optimize performance, usually through error backpropagation. Training depends on several factors, including the network architecture parameters and the hyperparameters, such as the loss function, optimizer, and learning rates.

If these different factors are not configured correctly, they can often cause erroneous outputs, such as NaNs. The appearance of NaNs usually means that the neural network has encountered an error during training.

There are several causes of NaNs, which we discuss in detail here.

Real-world data often contains erroneous, missing, and inconsistent values due to human errors, data-collection issues, and other factors. These errors can sometimes manifest as NaNs in the input data.

During neural network training, a neuron takes values from the input data, weights, and bias and applies a mathematical operation that yields an output. The mathematical operation yields a NaN output if a NaN value is passed as input to a neuron. Let’s consider a dataset of food prices in different cities, expressed in USD:

City |
Eggs |
Milk |

London | 2.5 | 7.9 |

Venice | NaN | 8.9 |

Prague | 4.6 | NaN |

The second and third rows have NaN values for one of the three variables. Performing a mathematical operation on them will yield NaNs as output.

**In machine learning, the learning rate is a parameter that determines the magnitude of the step at each iteration of ****gradient descent**. The goal is minimizing the loss function, hence reducing the output error.

Setting the optimal learning rate is crucial for performance. If the learning rate is too high, it can lead to very large parameter updates during gradient descent, resulting in unusually large values and potentially causing what is known as ‘exploding gradients.’ Exploding gradients occur when the updates become significantly large during training, leading to unstable training and causing NaNs.

**The activation function transforms an input into an output through a mathematical operation during the forward pass**. Let’s consider the softmax activation as an example:

Softmax takes input values and outputs a set of probabilities. However, studies have shown that the softmax equation can sometimes result in a division by 0 in the denominator. This can lead to the NaNs during training.

**The loss function computes the magnitude of the error between the model’s input and the actual output**. Similar to the activation function, it can result in numerical errors, which can cause NaNs.

For example, the error calculated by the loss function could be so large that it exceeds the expected values of the loss function, resulting in NaNs.

Avoiding NaNs requires addressing their underlying causes. Let’s explore how we can avoid NaNs based on the causes we looked at:

Data preprocessing is a crucial step before training that involves any techniques applied to data to transform it from its raw state to a usable format. This includes handling categorical values, standardizing values, removing outliers, and handling missing and inconsistent values to prepare the data before training.

**To address errors such as NaNs in the input data, **we can employ a preprocessing imputation approach that replaces the NaNs with zeroes:

City |
Eggs |
Milk |

London | 2.5 | 7.9 |

Venice | 0 | 8.9 |

Prague | 4.6 | 0 |

This ensures the input data has no occurrence of NaNs before being passed to the model for training. Instead of zeros, we can use column means, medians, or any value considered neutral.

Hyperparameter optimization or tuning refers to finding the optimal hyperparameters for a neural network model that minimizes the loss function. These hyperparameters include parameters like learning rate and batch size. This is often an iterative process that involves testing different combinations of hyperparameters to find values that yield the best performance from the neural network.

In cases where NaNs result from the learning rate, **extensive hyperparameter tuning can help set the optimal learning rate for the model’s training process**. An adaptive learning rate or a learning rate scheduling technique can automatically adjust the learning rate during training. Learning rate schedulers automatically modify the learning rate according to a predefined schedule, while adaptive techniques automatically adjust the learning rate to optimize the loss function.

To handle NaNs caused by activation functions**, we can implement activation functions whose computations are robust enough to handle numerical errors**.

In the case of division-by-zero errors, implementing error-handling mechanisms can help to ensure that NaNs are not propagated through the network. For example, we can modify the computation of softmax by adding a small value to the denominator to prevent division by zero:

**To address loss function instability, we can employ numerically stable loss functions. **Like the activation functions discussed previously, we can modify the mathematical operations within loss functions to ensure they do not yield NaNs. This reduces the risk of NaNS being propagated throughout the network.

Additionally, we can scale the results of the loss function computations approaching extremely large or small values, as these can manifest as NaNs. This involves setting a predefined range for the loss function’s outputs.

**Gradient clipping is a technique to limit the value of the gradients within a predefined range**. For example, it is a common approach in gradient clipping to set a threshold or range for computed gradients during training. Values beyond the predefined threshold are scaled down to be within the predefined range. This approach helps to prevent very large values that manifest as NaNs during training.

In the previous sections, we reviewed the different causes of NaNs in training and the measures that can be taken to prevent NaNs. Now, let’s link the causes with the measures we can take to address them:

Causes |
Actions to take |

Input data errors | Extensive data preprocessing |

Learning rate | Hyperparameter tuning and gradient clipping |

Activation function | Implement robust activation functions |

Loss function | Stabilize the loss function |

By understanding the causes of NaNs and the appropriate measures that can be applied, we can minimize the occurrence of NaNs and effectively train neural networks.

In this article, we provided an overview of the common causes of NaNs during the training of neural networks. These sources include data errors, learning-rate issues, activation function abnormalities, and problems with loss function computations.

To avoid NaNs, we can clean data, tune learning rates, clip the gradients, and use robust activation and loss functions that aren’t susceptible to NaNs.

The post Common Causes of NaNs During Training first appeared on Baeldung on Computer Science. ]]>A quick tutorial to determine if a binary number is divisible by 3.

The post Check if a Binary Number Is Divisible by 3 first appeared on Baeldung on Computer Science.]]>In this tutorial, we’ll explore several ways to check whether a binary number is divisible by three.

First, we’ll show how to accomplish it by converting binary into decimal numbers. Then, we’ll see how to check by counting odd and even digits. Finally, we’ll explore how to create our own Deterministic Finite Automaton to check if a binary number is divisible by three.

One way to determine whether a binary number is divisible by three is to first convert it to a decimal number (base 10) and then check if such a number is divisible by three.

For example, let’s take the binary number 1100101101 and convert it to a decimal number:

```
1100101101 = (1 * 2^0) + (0 * 2^1) + (1 * 2^2) + (1 * 2^3) + (0 * 2^4) + (1 * 2^5) + (0 * 2^6)
+ (0 * 2^7) + (1 * 2^8) + (1 * 2^9)
1100101101 = 1 + 0 + 4 + 8 + 0 + 32 + 0 + 0 + 256 + 512
1100101101 = 813
```

Let’s go over the conversion steps. Starting with the rightmost digit, we multiplied each digit with its position’s power of two, starting with the exponent zero and increasing it by one for each following digit. To get the final value, we summed up the multiplication results. Accordingly 813 corresponds to the binary number 1100101101.

Going further, we’ll check whether the given number is divisible by three using the divisibility rule. **The rule says that a decimal number is divisible by three if the sum of its digits is also divisible by three.**

Thus, we can add the digits of the 813 as 8 + 1 + 3, which equals 12. Therefore, 813 is divisible by three, giving 271 as the result.

Another way we can check if a binary number is divisible by three is by applying a rule similar to the one for checking whether the decimal number is divisible by 11. **The rule states that a decimal number is divisible by 11 if the alternating difference and sum of its digits, processed from left to right, is divisible by 11**. This same pattern applies to binary numbers for testing divisibility by three.

Next, let’s apply and test the rule using the binary representation of the number 46, which is 00101110. We’ll alternate between subtracting and adding the digits:

```
x <- 0 - 0 + 1 - 0 + 1 - 1 + 1 - 0
x <- 2
```

As a result, we obtained the number two, which is not divisible by three. Therefore, the number 00101110 isn’t divisible by three.

**Moreover, we can rewrite the formula to specifically count the ones in the odd and even positions in the number**. If the difference between the counts is zero or divisible by three, then the number is divisible by three.

Next, let’s check whether the 00101110 is divisible by three by counting odds and even digits:

```
count the odd non-zero digits:
x <- 1 + 1 + 1
x <- 3
count the even non-zero digits:
y <- 1
subtract the results:
z <- x - y
z <- 3 - 1
z <- 2
```

Again we got a number that isn’t divisible by three, which means 00101110 isn’t divisible by three.

Lastly, let’s see how to achieve the same functionality using the Deterministic Finite Automaton, or DFA for short. At a high level, **DFA represents an abstract model that accepts or rejects a given string of symbols by running through a state sequence determined by the string**.

It consists of:

- Finite set of states ()
- A finite set of input symbols ()
- Transition function ()
- The initial state ()
- A final (accepting) state ()

When given an input string, a DFA traverses through its states based on the transitions dictated by the input symbols. The transition function tells us where to move next based on the current state and the input symbol. **If the DFA ends up in an accepting state after processing the entire input string, the string is accepted; otherwise, it’s rejected**.

Let’s construct a DFA to recognize binary numbers divisible by three.

It’ll have three states, each representing one of the possible remainder outcomes, 0, 1, or 2:

Since we’re dealing with binary numbers, the input symbols will be 0 and 1:

One way to represent the transition function is by using a transition table. To fill the transition table, let’s calculate the next state for each state based on the input symbol.

If the current state is and the input symbol is zero, we’ll move to state . This is because **when we append zero to a binary string, the value doubles**:

```
x <- 2x
// example
11 (decimal 3) -> 110 (decimal 6)
```

Similarly, if the input symbol is one, we’ll move to the state . **When we concatenate one to a binary string, the value doubles and increases by one**:

```
x <- 2x + 1
// example
11 (decimal 3) -> 111 (decimal 7)
```

Based on these facts, to calculate the next state , we’ll use the function:

Let’s explore its components:

- represents the current state
- represents the input symbol
- represents the next state

Now, let’s see how to determine the next state if the current state quals zero and the input symbol is one:

```
δ(0, 1) <- (2 * 0 + 1) mod 3
δ(0, 1) <- 1
```

The calculated result implies that, given the current state () and the input (), we need to move to state one ().

Using the same method, we can populate the entire table:

Current State | Next State for Input 0 (x=0) | Next State for Input 1 (x=1) |
---|---|---|

q_{0} (0) |
q_{0} |
q_{1} |

q_{1} (1) |
q_{2} |
q_{0} |

q_{2} (2) |
q_{1} |
q_{2} |

Next, based on the calculated values, let’s create a state diagram:

Each node represents one of the states. Arrows illustrate transitions from one state to another based on the input value.

**Our starting point will be q_{0}, regardless of the digit we’re beginning with.** Moreover,

Now that we’ve explored the theory, let’s check whether our DFA works correctly. Let’s use the binary number 1001 as an example. It represents the number nine in a decimal system. Additionally, the automaton reads the input from left to right.

We’ll start the process with *q _{0}*, which represents the initial state. The first digit is one, so we’ll move to

Our next digit is zero, which requires us to shift from *q _{1}* to

Then, because the next digit is zero, we need to go back to *q _{1}*:

Finally, the last digit is one, moving us back to the initial position, which is also our accepting state:

We can create DFA to test divisibility by other numbers in a similar manner. The only difference would be in the number of states DFA would have.

In this article, we learned how to use different ways to determine whether a binary number is divisible by three.

To sum up, one way to achieve this is to convert the binary number into a decimal number and then use the divisibility by three rule. It’s also possible to achieve the same functionality by counting the odd and even ones in the number and comparing their differences. Finally, creating the DFA is another way we can use to check whether the binary number is divisible by three.

The post Check if a Binary Number Is Divisible by 3 first appeared on Baeldung on Computer Science. ]]>Learn about p-hacking in statistics and how to avoid it.

The post What Is P-Value Hacking? first appeared on Baeldung on Computer Science.]]>In this tutorial, we’ll explain the concept of p-value hacking in research and provide some examples.

P-value hacking or p-hacking refers to research malpractice whose goal is getting statistically significant results.

Usually, we asses our research hypotheses through significance tests. We collect samples, choose the adequate statistical test, compute its statistic, and calculate the statistic’s p-value. **The result is statistically significant if the corresponding p-value is lower than the chosen significance threshold (0.01 or 0.05).** Significant results mean rejecting the null in favor of the alternative hypothesis.

The null hypothesis typically states there’s no effect, e.g.:

- the drug being tested does not improve patients’ condition or
- there’s no relationship between age and political preferences

The alternative hypothesis states there’s an effect, e.g., that the drug is effective or that two factors, such as age and party membership, are correlated. So, most of the time, **the alternative hypothesis is what we want to prove.**

**Journals prefer significant results because they constitute evidence for scientific hypotheses** (for instance, that a new drug is more efficient than the old therapy). In contrast, non-significant results mean that we failed to find evidence. Given the pressure to “publish or perish,” several techniques have emerged in the scientific communities as “boosters” of significance that increase a manuscript’s chance of publication. They’re known as p-hacking and are regarded as research malpractice, although not all instances of it are intentional. Let’s check out two such methods.

Here, we’re simultaneously testing our data as we’re collecting them. **After each sample or batch, we perform a statistical test and calculate the p-value.** If it’s lower than 0.01 or 0.05, we stop data acquisition. Otherwise, we collect more data and re-evaluate the test. This is repeated until we either get a significant result or can’t collect more samples.

This is considered malpractice because we’re adapting our data to fit the hypothesis we want to prove and stopping just when we think we’ve succeeded. Given that the p-values are uniformly distributed over under the null hypothesis, **enlarging the dataset will eventually result in a significant p-value and incorrect rejection of the null even if the null holds.**

Let’s say we’re testing a drug for insomnia. In the experiment, patients reported how many hours they had slept the night they took the drug and the day before. If the drug works, we expect them to sleep more the night they took the medicine. So, we conduct a related-samples t-test with the following null and alternative hypotheses:

- Null: the mean sleep duration is the same in both groups
- Alternative: the mean is greater for the night when the medicine was taken

We’ll simulate the two groups by drawing samples from the same distribution, the normal with a mean of 5 and a standard deviation of 1. Let’s check what happens if we keep adding patients one by one and calculating the p-values after each new test subject:

We got the first significant result after including the third additional patient. However, if we keep adding more patients, the p-value won’t stay under the 0.05 threshold. It rises above and falls below it several times. So, just because the p-value got under the significance threshold once, it doesn’t mean it won’t go up. In fact, in our example, the p-value is larger than 0.05 in most cases. There’s no way to justify using the first p-value lower than the significance threshold and ignoring the rest.

**Outliers are extreme values in a sample that are too small or too large compared to the rest of the data.** We can justify their exclusion by saying they’re too different from the typical values and that by disregarding them, we focus our analysis on the subjects (measurements) we’re most likely to encounter in practice.

However, there’s no universal definition of “too large” and “too small.” As a result, **there are many ways to define outliers,** e.g.:

- measurements that are more than three standard deviations larger or smaller than the sample mean
- top 5% and bottom 5% of the data
- the values that are greater than 95% of the sample maximum or lower than 105% of its minimum

**This flexibility allows researchers to choose the outlier definition that makes the results significant.**

Let’s say we tested a sleeping drug on 100 patients with insomnia and recorded their sleep durations the day before and the night they took the drug. Then, we calculated the differences and decided to use the one-sample t-test to check if the differences were equal to zero or greater than it. In this simulation, we drew the values from a normal distribution with a mean of 8 and a standard deviation of 1. The results turned statistically insignificant, with a p-value of 0.075.

That’s pretty close to the usual threshold of 0.05, so we check what happens if we exclude outliers. We can define them as the differences related to the patients whose sleep durations on either night are not within three standard deviations from the means (for the respective night). The 3-sigma rule is not uncommon in statistics, so feel comfortable using it, and voila! We exclude one test subject and get a p-value of 0.027.

However, we can define outliers in other ways. For instance, we can say that sleep durations shorter than one hour and longer than twelve hours are unusual and can be disregarded. This outlier definition results in excluding a different test subject, which yields a p-value of 0.105:

The 3-sigma rule excluded one subject who slept 5 hours less having taken the medicine, which gave us a significant result. The second rule kept the subject in the sample but excluded another one, yielding a p-value greater than 0.1. Both outlier definitions make sense but lead to different conclusions. Using the one that’s convenient for us isn’t scientifically warranted (unless there’s another reason why the more convenient outlier definition is preferred).

**To avoid p-hacking, we can preregister our experiment’s design and analysis plans** on a public website. This includes all the decisions related to data acquisition and statistical techniques with which we’ll analyze data:

- From where and how will we collect our sample?
- How large a sample will we have?
- What will be our dependent variables, and which variables will be independent?
- Which statistical tests will we use?
- Which assumptions do we make about the data and statistical models?

Preregistration should stop us from resorting to p-hacking even if we get a temptation to do it.

We can also use Bayesian statistics to avoid p-values and the associated problems. However, there’s a related problem in Bayesian methods: B-hacking, i.e., boosting the Bayes factor to confirm the hypothesis we want to prove.

In this article, we defined p-value hacking in statistics and explained it through examples. It refers to **tailoring data acquisition and analysis to get statistically significant results** (supporting the hypothesis we want to prove). Preregistering data collection and analysis plans can reduce the chance of intentionally or unintentionally p-hacking.

Explore the differences between these horizontal and vertical partitioning in databases.

The post Horizontal and Vertical Partitioning in Databases first appeared on Baeldung on Computer Science.]]>Databases play a crucial role in modern applications, storing and managing vast amounts of data efficiently. In the same context, horizontal and vertical partitioning are two common techniques used for managing data in databases.

**In this tutorial, we’ll delve into the differences between these two approaches.**

Partitioning is indispensable for modern databases, particularly in the face of big data, cloud computing, and real-time applications. As data volumes burgeon exponentially, partitioning offers a crucial solution by distributing data across multiple servers or nodes, thereby facilitating horizontal scalability and alleviating performance bottlenecks.

Besides, this approach not only optimizes query performance by spreading the workload but also enhances resource utilization and cost-effectiveness in cloud environments. Partitioning also ensures high availability and responsiveness for real-time applications, mitigating the risk of single points of failure and enabling parallel data processing.

**In essence, partitioning stands as a cornerstone strategy for modern database management, empowering organizations to effectively manage data growth, achieve scalability, and meet the demanding requirements of today’s data-driven landscape.**

Also referred to as sharding, this technique involves dividing a database table into multiple partitions based on rows, with each partition containing a subset of the rows from the original table.

Let’s suppose we have a bustling e-commerce platform that manages a vast array of products as follows:

In this approach, the product inventory data is split into shards based on the product key. Imagine organizing the products into neat little groups alphabetically. Moreover, each shard takes care of a specific chunk of products, like those from A to G or H to Z.

**This smart sharding technique spreads the workload across multiple computers, easing the strain and boosting performance exponentially.**

Vertical partitioning involves splitting a database table into multiple partitions based on columns. Besides, each partition contains a subset of the columns from the original table.

Let’s switch gears and consider employing vertical partitioning for our product inventory scenario. This method entails dividing our database table into multiple partitions, not by rows this time but by columns, as follows:

In this example, different properties of an item are stored in different partitions. One partition holds data that is accessed more frequently, including product name, description, and price. Another partition holds inventory data: the stock count and last-ordered date.

**This technique is useful for optimizing query performance by reducing the amount of data retrieved for each query.**

The following table provides a clear comparison between horizontal and vertical partitioning in terms of data distribution, scalability, query performance, and their respective use cases:

Key Differences | Horizontal Partitioning | Vertical Partitioning |
---|---|---|

Data Distribution | Distributes data across multiple partitions based on rows | Divides data based on columns, grouping related columns together |

Scalability | Facilitates scalability by distributing data across multiple servers or nodes | It can improve scalability by reducing the size of each partition, making queries more efficient |

Query Performance | Can improve query performance by distributing the workload across multiple nodes | Improves query performance by reducing the amount of data retrieved for each query |

Use Cases | Suitable for applications with large datasets that need to be distributed across multiple servers | It is beneficial for optimizing query performance by reducing the number of columns retrieved |

Various tools and frameworks support partitioning in database management systems (DBMS). Proprietary solutions like Oracle and SQL Server offer robust features for partitioning, allowing users to partition tables based on criteria such as range, list, or hash.

Moreover, open-source databases like MySQL and PostgreSQL provide partitioning capabilities through features like table inheritance and declarative partitioning. NoSQL databases, such as MongoDB, offer sharding mechanisms for horizontal partitioning across clusters. Additionally, cloud-based platforms like Amazon Aurora and Google Cloud Spanner integrate partitioning into their scalable architectures.

**These options cater to diverse needs, enabling efficient management and scalability of databases.**

In summary, horizontal partitioning distributes data by rows, while vertical partitioning divides it by columns. By employing the appropriate strategy, organizations can effectively manage and scale their databases for modern data-driven applications.

The post Horizontal and Vertical Partitioning in Databases first appeared on Baeldung on Computer Science. ]]>Explore the details of implementing the elitism concept in evolutionary algorithms using Python.

The post Implementation of Elitism in Evolutionary Algorithms first appeared on Baeldung on Computer Science.]]>In evolutionary algorithms, we use the concept of elitism to select the best-performing individuals from one generation and transfer them directly to the next generation without applying mutation or crossover operations.

Moreover, the primary purpose of the elitism method is to preserve the best solutions and maintain diversity in the population. As a result, the elitism method helps to improve the convergence speed and robustness of evolutionary algorithms.

In this tutorial, **we’ll discuss how to implement the elitism method in evolutionary algorithms in detail using Python**.

We start an evolutionary algorithm by generating an initial population. Furthermore, **the initial population consists of individuals representing the solution space of an optimization problem**. Typically, we model each individual as a string of 0s and 1s, but the representation can vary depending on the given problem.

To generate the initial population in Python, we first import the *random* module. Furthermore, we specify the size of the population and the length of the binary string representing each individual:

```
import random
population_s = 15
individual_s = 6
```

**Therefore, the initial population contains 15 individuals with a length of 6**.

Now, we define a function in Python that generates the population with 15 individual binary strings:

```
def generate_p(population_s, individual_s):
p = []
for _ in range(population_s):
indi = ''.join([random.choice('01') for _ in range(individual_s)])
p.append(indi)
return p
```

Moreover, we add a *print* statement to display the initial population:

`print("The generated Initial population is:", generate_p(15, 6))`

Finally, let’s take a look at the output:

```
The generated Initial population is: ['000101', '101000', '111101', '000101', '010000',
'000010', '110000', '110001', '001000', '011010', '111000', '100010', '001000', '110101', '111100']
```

Thus, based on the given problem, we can change the initial population size and the length of the individuals.

After the initial generation of the population, the next step is to define a fitness function. In particular, **the fitness function’s role is crucial in evolutionary algorithms as it measures the quality of each individual by assigning a numerical number**. Moreover, the number is assigned based on the individual’s suitability to become an acceptable solution to the given problem.

Let’s assume we have a problem where the more one an individual contains, the more suitable the individual is to become a solution. Therefore, considering the given problem, let’s implement a simple fitness function that counts the number of 1 in each individual:

```
def fitness_function(indi):
ft = indi.count('1')
return ft
```

Furthermore, we implement a function that uses the fitness function that we defined and calculate the fitness score of the individuals in the initial population:

```
def calculate_fitness(p):
ft_scores = []
for indi in p:
ft_scores.append(fitness_function(indi))
return ft_scores
print("Fitness scores are:", ft_scores)
```

Finally, let’s see the fitness scores of the individuals:

`Fitness scores are: [2, 2, 5, 2, 1, 1, 2, 3, 1, 3, 3, 2, 1, 4, 4]`

Based on the fitness scores, we can select a particular individual or a set of individuals to proceed to the next step.

After calculating the fitness scores, **we apply the elitism approach to select top individuals from the initial population**. Here, we pick the top 20% of the individuals with the highest fitness score:

```
def selection_elitism(p, ft_scores):
sorted_ft_scores = sorted(zip(p, ft_scores), key=lambda x: x[1], reverse=True)
t1 = int(len(sorted_ft_scores) * 0.2)
elite_indi = [i for i, (indi, score) in enumerate(sorted_ft_scores) if score >= sorted_ft_scores[t1][1]]
```

Furthermore, based on the given problem and requirement, we can vary the percentage of individuals we pick from the initial population. Moreover, we implement a function to display the elite individuals:

```
print("The elite individuals and their fitness scores:")
for i, (indi, score) in enumerate(sorted_ft_scores[0:t1]):
print(f"Individual {i+1} ({indi}): Fitness score {score}")
elite_indi(p, ft_scores)
```

Thus, let’s display the top 20% of elite individuals from the initial population:

```
Top elite individuals and their fitness scores:
Individual 3 (111101): Fitness score 5
Individual 14 (110101): Fitness score 4
Individual 15 (111100): Fitness score 4
```

According to the elitism approach, the top 20% of the individuals we pick from the initial population will directly move to the next generation of the population without any change.

After we select the top elite individuals, the rest go through some operations to generate the next population generation. Therefore, to increase the quality of individuals for the next generation, we use two operators: crossover and mutation.

In the crossover operation, first, we select two parent chromosomes and a crossover point. Furthermore, **based on this selection, we exchange genetic information or chromosomes to generate two new individuals who inherit traits from both parents**:

```
def crossover_operation(pr1, pr2):
cross_p = random.randint(0, len(pr1))
new_indi1 = pr1[:cross_p] + pr2[cross_p:]
new_indi2 = pr2[:cross_p] + pr1[cross_p:]
return new_indi1, new_indi2
pr1 = [0, 0, 0, 1, 0, 1]
pr2 = [1, 0, 1, 0, 0, 0]
new_indi1, new_indi2 = crossover_operation(pr1, pr2)
print("New Individual 1:", new_indi1)
print("New Individual 2:", new_indi2)
```

In this case, we randomly selected the crossover point: *cross_p*. Hence, after we complete the crossover process, we get two new individuals for the next generation of the population:

```
New Individual 1: [0, 0, 1, 0, 0, 0]
New Individual 2: [1, 0, 0, 1, 0, 1]
```

Similarly, we can select a different pair of individuals from the initial population and generate a pair of new individuals using the crossover operation.

Moving forward, we can also use the mutation operation to generate a new population from the initial population. Furthermore, **in the mutation process, we randomly alter some values from the individual of the initial population**. Thus, it introduces diversity and helps evolutionary algorithms find optimal solutions faster.

To implement the mutation operation, we need to define a mutation probability. Therefore, in this implementation, **we define the mutation probability as 10%**. Additionally, for each bit of the individual bitstring, we generate a random number between 0 and 1 utilizing *random.random()* function. Hence, if the random number is less than the defined mutation probability, we flip the bit:

```
def mutation_operation(indi, mutation_p):
mutated_indi = indi.copy()
for i in range(len(mutated_indi)):
if random.random() < mutation_p:
mutated_indi[i] = 1 - mutated_indi[i]
return mutated_indi
chromosome = [1, 0, 1, 0, 1, 0]
mutation_p = 0.1
mutated_indi = mutation_operation(indi, mutation_p)
print("Original individual bitstring:", indi)
print("Mutated individual bitstring:", mutation_operation)
```

After the completion of the mutation process, we get the new mutated individual for the next generation of the population:

```
Original individual bitstring: 101010
Mutated individual bitstring: 100110
```

Finally, we can apply the mutation operation to all the individuals in the initial population to generate new individuals for the new population.

At this point, we have picked the top individuals from the initial population by implementing the concept of elitism. Furthermore, we explored how to apply the crossover and mutation operation to generate new individuals for the new population. Now, let’s see an example of how the elitism concept works in practice:

**Hence, in this implementation, we picked three individuals from the initial population based on their fitness scores**. The rest of the individuals go through the crossover and mutation process. Via the crossover and mutation process, we modify the bits and exchange bits between individuals to boost the probability of finding optimal solutions.

However, as we can see, we add the elite individuals from the initial population to the new population without changing any bits.

In this article, we discussed the details of implementing the elitism concept in evolutionary algorithms using Python. Additionally, we explored the intermediate steps and presented the implementation details of the selection, crossover, and mutation operation.

The post Implementation of Elitism in Evolutionary Algorithms first appeared on Baeldung on Computer Science. ]]>Learn when to use fold-left and fold-right in functional programming.

The post When to Use Fold-Left and Fold-Right? first appeared on Baeldung on Computer Science.]]>In this tutorial, we’re going to look at the Fold Left and Fold Right operations for collections. We’ll explore the difference between them and the cases when we should use each.

**In functional programming, folding is a standard operation that can be used to collapse collections down to a single result**. It works by going over the collection and applying the same accumulator function to combine the currently accumulated result with the next one in the collection.

A common example is summing a collection of numbers. In this case, the accumulator function that we want to apply to the numbers in the collection is *plus*. The desired result is:

`result <- 1 + 2 + 3;`

We can describe this as folding the collection with the *plus* function:

```
plus <- (acc, next) => acc + next;
result <- [1, 2, 3].fold(plus);
```

This equates to the following:

`result <- plus(plus(1, 2), 3);`

We can clearly see that the outcome is the same as our desired result.

Notably, **we’ll sometimes see the term Reduce used instead of Fold**. Depending on the context, this might be nothing more than an alternative word for the same operation. However, in other contexts, it can mean a fold operation with a provided initial value.

Let’s see an example:

`result <- [1, 2, 3].reduce(plus, 4);`

This is the same as:

`result <- plus(plus(plus(4, 1), 2), 3);`

**The big advantage here is that we no longer need to have the output of the accumulator function be of the same type as the collection entries**. In our previous Fold example, the accumulator function is required to have two parameters and a return value that are all the same type. These also need to be of the same type as the entries in the collection we’re working with.

With a Reduce operation, the output of the accumulator function can now be of a different type compared to the collection entries, as long as it’s the same as the provided initial value and the first parameter to our accumulator function:

```
concat <- (acc, next) => acc + next.toString();
result <- [1, 2, 3].reduce(concat, "");
```

This equates to the following:

`result <- concat(concat(concat("", 1.toString()), 2.toString()), 3.toString());`

Here, *result* will be a string, whereas the collection entries are integers.

When talking about folding operations, we’ll often see them referred to as Fold Left or Fold Right.** This determines whether we should apply our accumulator function starting from the left or right end of our collection**.

If no direction is specified, this typically means Fold Left. In this case, we want to start combining values from the left-hand side of our collection.

This is exactly what we’ve already seen:

```
result <- [1, 2, 3, 4].foldLeft(plus);
result <- plus(plus(plus(1, 2), 3), 4);
```

Here, we combine the first two entries in the collection with our accumulator function. Next, we combine this result with the next entry in the collection. This continues until we reach the end of the collection.

**Fold Right means that we start combining values from the right-hand side of our collection**:

```
result <- [1, 2, 3, 4].foldRight(plus);
result <- plus(1, plus(2, plus(3, 4)));
```

In this case, we combine the first entry in the collection with the result of folding the rest of the collection. We then repeat this for every step until we reach the end of the collection.

Fold Left and Fold Right seem very similar to each other, so what’s the difference? **The most significant difference is the order in which the operations are applied**. In some cases, this makes no difference at all. However, in other cases, it can be critical.

If our accumulator function is associative, then Fold Left and Fold Right will produce exactly the same result. We can see this directly from the definition of the associative property:

`(x + y) + z = x + (y + z) for all x, y, z in S`

The left-hand side of this equation corresponds to how Fold Left works, whereas the right-hand side corresponds to how Fold Right works. Therefore, if the operation is associative, then Fold Left and Fold Right will produce the same result.

However, **if the operation isn’t associative, then we’ll get different results**. For example, instead of addition, let’s look at the *subtract *function:

```
leftResult <- [1, 2, 3].foldLeft(subtract); // (1 - 2) - 3 = -4
rightResult <- [1, 2, 3].foldRight(subtract); // 1 - (2 - 3) = 2
```

Consequently, if we know that our accumulator function isn’t associative, then it becomes important that we select the correct direction to fold in. However, if our accumulator function is associative, then we’ll need to explore other reasons for selecting between folding left or right.

**Another factor to consider is how efficient the implementations are**. The traditional definitions of Fold Left and Fold Right are recursive:

```
algorithm FoldLeft(collection, initial, fn):
if collection.empty:
return initial;
head <- collection[0]
tail <- collection.slice(1)
return FoldLeft(tail, fn(initial, head), fn)
algorithm FoldRight(collection, initial, fn):
if collection.empty:
return initial;
head <- collection[0]
tail <- collection.slice(1)
return fn(head, FoldRight(tail, initial, fn))
```

Notably, these examples are being provided with an initial value, but this is only to make them easier to follow. It’s equally possible to implement both without using initial values.

The two algorithms are very similar, with the only difference being in how the recursive call is made. However, this difference is important from an efficiency perspective. In particular, **Fold Left is tail recursive, whereas Fold Right isn’t**. Therefore, **this means that we can easily rewrite Fold Left to be iterative instead**.** This will then be more efficient in both time and memory usage**:

```
algorithm FoldLeft(collection, initial, fn):
result <- initial
for next in collection:
result <- fn(result, next)
return result
```

Rewriting Fold Right to be iterative is much harder, and in some cases, it may be impossible – it would require the collection to be reverse-iterable, which not all collections are.

This then means that Fold Left is likely to be more efficient than Fold Right for the same collection. However, this assumes that the operation is associative, as we saw earlier. Additionally, the efficiency gain is only likely to matter with very large collections.

In this article, we’ve had a look at what folding is in the context of functional programming. We’ve explored some of the differences between Fold Left and Fold Right, and some of the ways to help select between them.

The post When to Use Fold-Left and Fold-Right? first appeared on Baeldung on Computer Science. ]]>Learn how to use the proportion of variance to choose the number of components in PCA.

The post What Is Proportion of Variance? first appeared on Baeldung on Computer Science.]]>In this tutorial, we’ll show how to use the proportion of variance to set the number of principal components in Principal Component Analysis (PCA) and its main theoretical aspects. Moreover, we’ll present a numerical example aiming to highlight its importance.

A key challenge we face in Principal Component Analysis (PCA) is to define the number of principal components (PCs). If we select a large number of PCs, we lose the benefit of dimensionality reduction. Or even worse, we increase computational complexity and encounter difficulties visualizing high-dimensional spaces. On the other hand, with a small number of PCs, we might miss an important aspect of the data. Therefore, this can lead us to wrong conclusions in subsequent analyses.

To avoid this problem, we can compute the proportion of our data’s variance. This shows the dispersion, or how spread the data points are around the mean. But why is this important when it comes to PCA? The principal components that **PCA computes are ordered by the variance they explain. **Additionally, these components are uncorrelated and, therefore, orthogonal to each other. This ensures that each component captures variance in the data that no other component captures. So, if we have the proportion of variance for each component, we can choose the first few components with the greatest cumulative proportion of explained variance.

Here is the general workflow of how things work:

As an illustration, let’s consider an example with three variables. In this case, each column of the matrix X corresponds to a variable, and each row corresponds to a data point:

The corresponding covariance matrix for this data is:

**We can find the variances in the diagonal of the covariance matrix.** Their sum is equal to 3.3392, which represents the overall variability, while the off-diagonal elements represent the covariances between the three variables.

Next, we perform a PCA on our data X with three components (n=3) and get 2.7706, 0.4657, and 0.1028 as the explained variances. **Notably, ****t****hese are the eigenvalues of the covariance matrix.**

But how can we interpret these results? Since the first eigenvalue is 2.7706, the first PC explains 2.7706/3.3392 = 82.97% of the overall variability for the dataset. So, this principal component captures a lot of the information in the data. The second component accounts for 0.4657/3.3392 = 13.94% of the variability, and the last principal component explains 0.1028/3.3392 =3.08% of the variance. :

This means that if we drop the third PC, we’ll still explain 82.97% + 13.94% = 96.91% of the dataset’s overall variability. So, the first two components are sufficient to explain our data.

There’s no universal threshold of cumulative proportion after which we can stop increasing the number of PCs. However, there are some guidelines and rules of thumb.

First, the threshold depends on the objective of our analysis and the constraints involved. If we aim to draw critical conclusions to provide secure and meaningful insights, we should aim at a proportion of around 95%. **In general applications, a good value ranges from 80% to 95%.** But again, this is arbitrary and should be evident when reporting and presenting any analysis. In some critical applications, the threshold can be even higher.

However, for highly correlated variables, having a low variance proportion isn’t a problem. In that case, there is a lot of redundancy across the variables. For that reason, a small proportion of explained variance is usually enough to represent the variability of the dataset. Additionally, we may have constraints in our model, which means we simply can’t have as many PCs as we want. This might also lead us to accept a lower proportion of variance.

Another tool that we can use is a scree plot.** In this graph, we see the cumulative explained variance for each principal component and the individual-explained variances.** Here is an example:

These are the cumulative explained variances for the seven components: 0.39, 0.59, 0.79, 0.93, 0.98, 0.99, 1. This means that the first PC alone accounts for 39% of the total variability. We can easily see that with four PCs, we represent 93% of the variability of the data. At this point, there’s an abrupt change in the slope of the curve in red, characterizing an “elbow”. This is the point at which we can stop adding more PCAs since they don’t contribute significantly to the total variance explained. The addition of the last three PCs would increase the explained variance by only 7%.

In this article, we discussed and illustrated the proportion of variance in the context of PCA. **This indicator shows us how much variability of the dataset the principal components explain.**

**We use the proportion to choose the components that, together, explain most of the variance. This rule of thumb implicitly determines the number of PCs in PCA** and reduces the dimensionality of the dataset while maintaining fidelity to the original data.

Learn how to implement the recursive top-down and iterative bottom-up versions of Merge Sort in Python.

The post Python Implementations of Top-Down and Bottom-Up Merge Sort first appeared on Baeldung on Computer Science.]]>When implementing Merge sort, we can follow the top-down or bottom-up approach.

In this tutorial, we’ll delve into the Python implementation of both approaches.

**Top-down Merge Sort recursively divides a list into smaller halves, sorts each half, and then merges the sorted halves back together** to form a sorted list:

```
def merge_sort_top_down(arr):
if len(arr) <= 1:
return arr
mid = len(arr) // 2
left_half = arr[:mid]
right_half = arr[mid:]
sorted_left = merge_sort_top_down(left_half)
sorted_right = merge_sort_top_down(right_half)
return merge(sorted_left, sorted_right)
```

The top-down Merge Sort makes a new list during the merge to help make the process easier and make sure it is stable. This way, we don’t need to worry about avoiding overwrites or errors when dealing with in-place merging. It fits well with the recursive nature of the algorithm and its following of the divide-and-conquer approach. It takes up extra space but brings clarity and reliability.

First, the function checks if the list has no elements or has exactly one. This is the base case of the recursion: such lists are already sorted.

Next comes the core part. The first thing to do is find the list’s middle index. To do this, we integer-divide the length by 2. Then, we split the list into two equal parts. The left half will have elements from the beginning of the list up to the middle, but it will not include the middle element. The right half will have elements from the list’s middle to end.

**The sort is made by sorting the two halves recursively using merge_sort_top_d͏own.** This implements the divide-and-conquer strategy.

The next and final step is to use the external *merge* function to merge the sorted portions into one final sorted portion for the entire input list. The purpose is to combine two sorted lists to form one merged and sorted list.

**The merge function is a helper to the Merge Sort algorithm. It combines two sorted sublists (left, right) into a single sorted output list (merged):**

```
def merge(left, right):
merged = []
left_index = 0
right_index = 0
while left_index < len(left) and right_index < len(right):
if left[left_index] <= right[right_index]:
merged.append(left[left_index])
left_index += 1
else:
merged.append(right[right_index])
right_index += 1
merged.extend(left[left_index:])
merged.extend(right[right_index:])
return merged
```

To start, *left_index* and *right_index* both equal zero at the start of the function, indicating which elements of the left and right lists are next to merge. At this point, we initialize another empty list, *merged*. It will hold the single merged list in sorted order.

Now begins the *while* loop, which continues until we exhaust one of the input lists. We iterate over the lists and insert the smaller of *left[left_index]* and *right[right_index]* into *merged*. Upon placing an element into the *merged* list, we increment the index corresponding to the input list from which it came.

After the loop ends, there may be leftover elements in one of the input lists because either *left_index = len(left)* or *right_index = len(right)*. The *left[left_index:]* is an empty list in the former case. In the latter, *right[right_index:]* is empty. We append all leftover elements using the *extend* method.

**The Bottom-up Merge Sort **treats each element in the list as a sorted sublist of size one. Then, it iteratively merges the consecutive pairs of the sublists, effectively doubling their sizes in each iteration. This process continues iteratively until the entire list is sorted:

```
def merge_sort_bottom_up(arr):
n = len(arr)
auxiliary_array = [None] * n
width = 1
while width < n:
for i in range(0, n, 2 * width):
left = i
mid = min(i + width, n)
right = min(i + 2 * width, n)
merge(arr, auxiliary_array, left, mid, right)
for i in range(n):
arr[i] = temp_arr[i]
width *= 2
return arr
```

This method eliminates the need for the recursive stack calls required in the top-down approach. As a result, **the bottom-up Merge Sort can be more memory-efficient in some scenarios.**

We start by finding the list’s length and initializing an *auxiliary_array* of the same length with all of its values to *None*. This *auxiliary_array* will serve as a temporary swap area for merging lists. Then, the variable *width* is initialized with the value of 1. It serves as the size of the sublists that are to be merged. Now, the function is ready to merge sublists by iteratively doubling their sizes, starting with the smallest possible sublists of size 1.

The function combines increasingly larger sorted sublists within the master list. It starts with the tiniest possible components—single elements—and gradually doubles the merging range in every iteration of the loop. The indices *left*, *mid*, and *right* denote the endpoints of the consecutive sublists to merge.

The *merge* function combines all the sublists defined by *left*, *mid*, and *right* indices into the *auxiliary_array*. **Only after all the sublists of the current width have been processed and merged are the elements from the auxiliary_array copied back to arr so that arr contains sorted sublists. After that, the algorithm doubles the width.**

Eventually, the merged sublists form a single sorted list.

Here’s the merge function we use in the bottom-up approach:

```
def merge(arr, auxiliary_array, left, mid, right):
i, j, k = left, mid, left
while i < mid and j < right:
if arr[i] <= arr[j]:
auxiliary_array[k] = arr[i]
i += 1
else:
auxiliary_array[k] = arr[j]
j += 1
k += 1
while i < mid:
auxiliary_array[k] = arr[i]
i += 1
k += 1
while j < right:
auxiliary_array[k] = arr[j]
j += 1
k += 1
```

The process starts by setting three index variables (*i*, *j*, and *k*) to track the positions of the passed lists.

Then, we enter a loop comparing two elements from the two sublists. If the current element from the first sublist is smaller than or equal to the current element from the second sublist, we place this element in the output and increment the *i* index. Otherwise, we do the same for the second sublist but increment the *j* index.

After processing both sublists, data may remain in either the first or second sublists. The function handles this by extending the *auxiliary_array* with any leftover elements.

**In this article, we showed how to implement top-down and bottom-up versions of Merge Sort in Python.**

The most well-suited use case of the top-down method is a non-stack limited environment that doesn’t mind the additional memory overhead brought by recursion.

On the other hand, the bottom-up approach to Merge Sort eliminates recursion and drastically reduces the related stack overhead. The bottom-up method is handy when considering large datasets: since stack overflow is not a consideration, we can sort arbitrarily large datasets with this method.

The size of the dataset that can be processed using the bottom-up Merge Sort is limited by the available system memory. This means that while the algorithm is efficient for large datasets, ** the amount of data it can handle is determined by the memory capacity of the hardware it runs on.**

Learn how hyperspectral imaging works.

The post What Is Hyperspectral Imaging? first appeared on Baeldung on Computer Science.]]>In this tutorial, **we’ll introduce the field of hyperspectral imaging (HSI). First, we’ll describe the basics of HSI and the process of capturing spectral information in images. Then, we’ll discuss **some useful applications of HSI and finally mention some challenges of this technology.

In the past years, we have experienced many exciting technologies in the imaging field. **Hyperspectral Imaging (HSI) emerges as a handy tool that enables us to capture rich information across the whole electromagnetic spectrum.** This way, we can identify objects and materials far beyond what the human eye or a conventional camera can capture.

In traditional imaging, we compute the reflection intensity of the light in three bands (Red, Green, and Blue) to capture the visual information, resulting in an RGB image. However, helpful information may be hidden in the remaining light spectrum beyond these three distinct bands. That’s where HSI comes in. It** involves capturing and analyzing hundreds or thousands of bands across the electromagnetic spectrum, from ultraviolet to infrared.**

An intermediate stage between HSI and RGB is Multispectral Imaging (MSI), which captures spectral information in 10 to 100 bands. The rich amount of data in HSI allows for exact analysis, identifying materials by their spectral signatures, which detail how objects interact with light across various wavelengths.

Below, we can see a comparison between HSI and RGB imaging. While in RGB, we capture only three discrete wavelengths of an image, in HSI, we capture a large number of wavelengths, resulting in richer information:

Now, let’s delve a bit into how the technology behind HSI works.

Precisely, special sensors are placed on satellites, aircraft, or microscopic devices and capture the light reflected from objects, dividing the light into numerous bands that represent different wavelengths.** The final image is a three-dimensional data cube consisting of two spatial dimensions (height and width) like traditional imaging, along with a third dimension containing spectral information.**

Each cube slice represents the object’s image at a different wavelength, offering a comprehensive view of its spectral signature. Below, we can see an example of HSI capturing a flower:

The technology of HSI comes with numerous applications. Let’s dig into the most important ones.

Monitoring a large number of fields is a very difficult task. Farmers leverage the power of HSI to monitor crop health, optimize the use of pesticides, and manage their resources more efficiently. Specifically, by monitoring the fields through HSI, they gain access to very detailed information on the state of the crops in terms of diseases, nutrient deficiencies, and more without having to check every crop manually at all times.

Another useful application of HSI is its ability to track specific minerals and chemicals that impact environmental pollution. So, in this way, scientists can capture the atmosphere of Earth through HSI and estimate the degree of environmental pollution or compute the water quality by taking HSI images of an ocean.

In the medical domain, HSI can be very beneficial in identifying cancerous cells in tissues. Specifically, the high precision in capturing the details of a human tissue allows scientists to build better prediction systems that identify if a human tissue contains cancerous cells.

Despite its benefits, HSI comes with many challenges that we should always consider when we attempt to use this technology.

**In particular, HSI generates a vast amount of data that needs a lot of storage and computing power.** For example, an RGB image with resolution will have pixels, while an HSI image with the exact resolution and 1k bands will have 3,072,000 pixels. So, when we use the HSI technology, we should always consider how we’ll save and process this massive amount of data.

In this article, we discussed how hyperspectral imaging works. First, we introduced HSI and the technique behind this technology, and then we presented some of its applications and challenges.

The post What Is Hyperspectral Imaging? first appeared on Baeldung on Computer Science. ]]>