Feeds:
Posts

# Computational complexity theory

Computational complexity theory is a branch of the theory of computation in computer science that investigates the problems related to the resources required to run algorithms, and the inherent difficulty in providing algorithms that are efficient for both general and specific computational problems.

##  Definitions

An engineer might ask “As the size of the input to an algorithm increases, by what factor does the running time and memory requirements increase? And what are the implications of that?” In other words, complexity theory, among other things, investigates the scalability of computational problems and algorithms. In particular, it places practical limits on what computers can and cannot do.

A problem is a collection of related questions, where each question is a finite string, not enumerated but written in an algebra called Big O notation, where the actual amount of resources uses numbers only to represent orders of magnitude and not the exact resources used by a particular machine.

##  Analysis

###  Big O notation

Main article: Big O notation

The time complexity of a problem is the number of steps that it takes to solve an instance of the problem as a function of the size of the input, using the most efficient known algorithm. If an instance has length n and can be solved in n2 steps we can say the problem has a time complexity of n2.

But of course, the exact amount of resources will depend on what machine or language is being used. To avoid that difficulty, the Big O notation is generally used (the O stands for the “order” of the calculation). If a problem runs in O(n2) on one computer, then (barring perverse effects) it will also run in O(n2) on all others, even though it may take longer or shorter depending on what resources they have. So this notation allows us to generalize away from the details of a particular computer.

For example, searching an unsorted list of words for a particular word will take, on average, half the time of the size of the list, because if one starts from the beginning or end one must (on average) inspect half the words in the list before finding it. If the word does not exist, one must inspect the whole list to discover that fact, so actually it could be worse, depending on how likely it is that a word in the input is in the list.

But a binary search algorithm is logarithmic in time over size (it is O(log n)). By analogy, humans roughly know where an entry would be in a phone book or dictionary, and use strategies quickly to get to their target, such as using headwords or a thumb index quickly to get roughly to the right place, then use a linear search when they are close to the target.

It is important to note that most computational complexity theory is useful in other engineering fields and has made contributions to mathematics that would be of interest even if computers did not exist.

The master theorem can be useful to calculate the time complexity of a recursive algorithm.

###  Worst case analysis

A common approach to computational complexity is the worst-case analysis, the estimation of the largest amount of computational resources (time or space) needed to run an algorithm. In many cases, the worst-case estimates are rather too pessimistic to explain the good performance of some algorithm in real use on real input. To address this, other approaches have been sought, such as average-case analysis or smoothed analysis.

###  Axiomatic analysis

Main article: NP-complete

An axiomatic approach to computational complexity was developed by Manuel Blum. He introduced measures of complexity such as the size of a machine and measures of the computational overhead of recursive functions. The complexity of algorithms are special cases of the axiomatic complexity measure. Blum axioms help in the study of computational complexity in the most general cases.

An important aspect of the theory is to categorize computational problems and algorithms into complexity classes. The most important open question of complexity theory is whether the complexity class P is the same as the complexity class NP, or whether it is a strict subset as is generally believed.

Shortly after the question was first posed, it was realised that many important industry problems in the field of operations research are of an NP subclass called NP-complete. NP-complete problems have solutions that are quick to check, yet current methods to find those solutions are not “efficiently scalable” (as described more formally below). If the NP class is larger than P, then NP-complete problems admit no polynomial-time algorithm and any all-purpose solution to the problem is not efficiently scalable.

The fact that the P vs. NP problem has not been solved, prompts and justifies various research areas in the theory, such as:

• spotting and solving special cases of common computational problems
• the study of the computational complexity itself
• finding heuristic solutions
• researching into the hierarchies of complexity classes
• exploring multivariate algorithmic approaches such as Parameterized complexity.

##  History

The first publication on computational complexity appeared in 1956. But this publication in Russian was for a long time unknown in the Western world.

The beginning of studies in computational complexity can be attributed not only to Trahtenbrot, but also to Rabin.

Hartmanis and Stearns’ paper became the most popular in this area. As a result, some researchers attribute the beginning of computational complexity theory only to Hartmanis, Lewis and Stearns.

Andrey Kolmogorov‘s research in the theory of algorithms influenced the development of computational complexity theory. A notable early discovery was the Karatsuba algorithm in 1960, for the multiplication of two numbers. This algorithm disproved Kolmogorov’s 1956 conjecture that the fastest multiplication algorithm must be O(n2), and thus helped launch the study of algorithms in earnest. In 1967, Manuel Blum developed an axiomatic complexity theory based on his axioms and proved an important result, the so called, speed-up theorem.

The field was subsequently expanded by many researchers, including:

On August 6, 2002, the AKS primality test was published in a paper “PRIMES is in P” by three Indian computer scientists. They showed a deterministic primality-proving algorithm they had made. The algorithm determines whether a number is a Prime number or Composite number in polynomial time, and was soon improved by others. The key significance of AKS is that it was the first published primality-proving algorithm to be all of general, polynomial, deterministic, and unconditional. The authors received the 2006 Gödel Prize and the 2006 Fulkerson Prize for their work.

##  Computational complexity theory topics

###  Time and space complexity

Complexity theory attempts to describe how difficult it is for an algorithm to find a solution to a problem. This differs from computability theory, which describes whether a problem can be solved at all: a branch of science probably made most famous by Alan Turing‘s essay On Computable Numbers, with an Application to the Entscheidungsproblem.

###  Decision problems

Much of complexity theory deals with decision problems. A decision problem is one where the answer is always “yes” or “no”. Some problems are undecidable, or at least seem so, so complexity theory can be used to distinguish problems where it is certain to get a correct “yes” or “no” (not necessarily both). A problem that reverses which can be relied upon is called a complement of that problem.

For example, the problem IS-PRIME says “yes” if its input is a prime and “no” otherwise, while the problem IS-COMPOSITE says “yes” if the number is not prime. Either may be unpredictable when the correct result would be “no”, they may say “yes” or perhaps never finish even if they were actually going to produce the right result, given an infinite number of monkeys.

Decision problems are often interesting to researchers because a problem can always be reduced to a decision problem. For instance, a problem HAS-FACTOR may be:

Given integers n and k find whether n has any prime factors less than k.

If we can solve this problem with known maximum resources we can use that solution to solve the next problem FACTORIZE without many more, using a binary search on factors of k until we find the smallest, divide out that factor and repeat until all the factors are found (i.e. we end up with 1 or 0).

###  Resources

Complexity theory analyzes the difficulty of computational problems in terms of many different computational resources. A problem can be described in terms of many requirements it makes on resources: time, space, randomness, alternation, and other less-intuitive measures[vague]. A complexity class is the class of all problems which can be solved using a certain amount of a certain computational resource.

There are other measures of computational complexity. For instance, communication complexity is a measure of complexity for distributed computations.

A different measure of problem complexity, which is useful in some cases, is circuit complexity. This is a measure of the size of a boolean circuit needed to compute the answer to a problem, in terms of the number of logic gates required to build the circuit. Such a measure is useful, for example, when designing hardware microchips to compute the function instead of software.

Perhaps the most studied computational resources are for determinism in time or space. These resources represent the amount of time or space needed by a deterministic computer. These resources are of great practical interest, and are well-studied.

Some computational problems are easier to analyze in terms of more unusual resources. For example, a nondeterministic Turing machine is a computational model that is allowed to branch out to check many different possibilities at once. The nondeterministic Turing machine has very little to do with how we physically want to compute algorithms, but its branching exactly captures many of the mathematical models we want to analyze, so that nondeterministic time is a very important resource in analyzing computational problems.

Many more unusual computational resources have been used in complexity theory. Technically, any complexity measure can be viewed as a computational resource, and complexity measures are very broadly defined by the Blum complexity axioms.

###  Complexity classes

A complexity class is the class of all of the computational problems which can be solved using a certain amount of a certain computational resource.

The complexity class P is the class of decision problems that can be solved by a deterministic machine in polynomial time. This class corresponds to an intuitive idea of the problems which can be effectively solved in the worst cases.

The complexity class NP is the set of decision problems that can be solved by a non-deterministic Turing machine in polynomial time. This class contains many problems that people would like to be able to solve effectively, including the Boolean satisfiability problem, the Hamiltonian path problem and the vertex cover problem. All the problems in this class have the property that their solutions can be checked efficiently.

Many complexity classes can be characterized in terms of the mathematical logic needed to express them – this field is called descriptive complexity.

##  NP completeness and other open questions

###  P = NP

Main article: Complexity classes P and NP

The question of whether P = NP (can problems that can be solved in non-deterministic polynomial time also always be solved in deterministic polynomial time?) is one of the most important open questions in theoretical computer science because of the wide implications of a solution. If the answer is yes, many important problems can be shown to have more efficient solutions that are now used with reluctance because of unknown edge cases. These include various types of integer programming in operations research, many problems in logistics, protein structure prediction in biology, and the ability to find formal proofs of pure mathematics theorems.

The P = NP problem is one of the Millennium Prize Problems proposed by the Clay Mathematics Institute the solution of which is a US\$1,000,000 prize for the first person to provide a solution.

Questions like this motivate the concepts of hard and complete

###  Hard

A class of problems X is hard for a class of problems Y if every problem in Y can be transformed “easily” (that is to say, it may take a lot of effort but the person solving the problem knows a solution exists) into some problem in X that produces the same solution. The definition of “hard” (or rather “easy”) problems differs by context. For P = NP, “hard” means NP-hard— a class of problems that are not necessarily in NP themselves, but to which any they can be reduced to an NP problem in polynomial time.

###  Complete

The class X is complete for Y if it is hard for Y and is also a subset of it.

Thus, the class of NP-complete problems contains the most “difficult” problems in NP, in the sense that they are the ones most likely not to be in P. Because the problem P = NP is not solved, being able to reduce another problem, Π1, to a known NP-complete problem, Π2, would indicate that there is no known polynomial-time solution for Π1. This is due to the fact that a polynomial-time solution to Π1 would yield a polynomial-time solution to Π2. Similarly, because all NP problems can be reduced to the set, finding an NP-complete problem that can be solved in polynomial time would mean that P = NP.

###  Incomplete

Incomplete problems are those in NP that are neither NP-complete nor in P. In other words, incomplete problems can neither be solved in polynomial time nor are they hard problems.

It has been shown that if P ≠ NP then there exist NP-incomplete problems.

###  Graph isomorphism

An important unsolved problem in this context is, whether the graph isomorphism problem is in P, NP-complete, or NP-incomplete. The answer is not known, but there are strong hints that the problem is at least not NP-complete. The graph isomorphism problem asks whether two given graphs are isomorphic.

###  The NP = co-NP problem

Co-NP is the class containing the complement problems (i.e. problems with the yes/no answers reversed) of NP problems. It is believed[who?] that the two classes are not equal; however, it has not yet been proven. It has been shown that if these two complexity classes are not equal no NP-complete problem can be in co-NP and no co-NP-complete problem can be in NP.

Every problem in P is also in both NP and co-NP. Integer factorization is an example of a problem in both NP and co-NP that is not known to be in P.

##  Intractability

Problems that can be solved but not fast enough for the solution to be useful are called intractable. Naive complexity theory assumes problems that lack polynomial-time solutions are intractable for more than the smallest inputs. Problems that are known to be intractable in this sense include those that are EXPTIME-complete. If NP is not the same as P, then the NP-complete problems are also intractable in this sense. To see why exponential-time algorithms might be unusable in practice, consider a program that makes 2n operations before halting. For small n, say 100, and assuming for the sake of example that the computer does 1012 operations each second, the program would run for about 4 × 1010 years, which is roughly the age of the universe. Even with a much faster computer, the program would only be useful for very small instances and in that sense the intractability of a problem is somewhat independent of technological progress. Nevertheless a polynomial time algorithm is not always practical. If its running time is, say, n15, it is unreasonable to consider it efficient and it is still useless except on small instances. The graph shows time (average of 100 instances in msec using a 933 MHz Pentium III) vs.problem size for knapsack problems for a state-of-the-art specialized algorithm. Quadratic fit suggests that empirical algorithmic complexity for instances with 50–10,000 variables is O((log n)2). The data comes from 

###  Theory vs. practice

What intractability means in practice is open to debate. Saying that a problem is not in P does not imply that all large cases of the problem are hard or even that most of them are. For example the decision problem in Presburger arithmetic has been shown not to be in P, yet algorithms have been written that solve the problem in reasonable times in most cases. Similarly, algorithms can solve the NP-complete knapsack problem over a wide range of sizes in less than quadratic time (see graph) and SAT solvers routinely handle large instances of the NP-complete Boolean satisfiability problem.