Tuesday, January 19, 2010

Information

Information as a concept has many meanings, from everyday usage to technical settings. The concept of information is closely related to notions of constraint, communication, control, data, form, instruction, knowledge, meaning, mental stimulus, pattern, perception, and representation. In its most restricted technical meaning, information is an ordered sequence of symbols.

The English word was apparently derived from the Latin accusative form (informationem) of the nominative (informatio): this noun is in its turn derived from the verb "informare" (to inform) in the sense of "to give form to the mind", "to discipline", "instruct", "teach": "Men so wise should go and inform their kings." (1330) Inform itself comes (via French) from the Latin verb informare, to give form to, to form an idea of. Furthermore, Latin itself already contained the word informatio meaning concept or idea, but the extent to which this may have influenced the development of the word information in English is unclear.

As a final note, the ancient Greek word for form was "μορφή" (morf -> morphe, Morph) and also είδος eidos (kind, idea, shape, set), the latter word was famously used in a technical philosophical sense by Plato (and later Aristotle) to denote the ideal identity or essence of something (see Theory of forms). "Eidos" can also be associated with thought, proposition or even concept.

As a message

Information is a term with many meanings depending on context, but is as a rule closely related to such concepts as meaning, knowledge, instruction, communication, representation, and mental stimulus. Simply stated, information is a message received and understood. In terms of data, it can be defined as a collection of facts from which conclusions may be drawn. There are many other aspects of information since it is the knowledge acquired through study or experience or instruction. But overall, information is the result of processing, manipulating and organizing data in a way that adds to the knowledge of the person receiving it.

Information is the state of a system of interest. Message is the information materialized.

Information is a quality of a message from a sender to one or more receivers. Information is always about something (size of a parameter, occurrence of an event, value, ethics, etc). Viewed in this manner, information does not have to be accurate; it may be a truth or a lie, or just the sound of a falling tree. Even a disruptive noise used to inhibit the flow of communication and create misunderstanding would in this view be a form of information. However, generally speaking, if the amount of information in the received message increases, the message is more accurate.

This model assumes there is a definite sender and at least one receiver. Many refinements of the model assume the existence of a common language understood by the sender and at least one of the receivers. An important variation identifies information as that which would be communicated by a message if it were sent from a sender to a receiver capable of understanding the message. In another variation, it is not required that the sender be capable of understanding the message, or even cognizant that there is a message, making information something that can be extracted from an environment, e.g., through observation, reading or measurement.

Communication theory provides a numerical measure of the uncertainty of an outcome. For example, we can say that "the signal contained thousands of bits of information". Communication theory tends to use the concept of information entropy, generally attributed to Claude Shannon, see below.

Another form of information is Fisher information, a concept of R.A. Fisher. This is used in application of statistics to estimation theory and to science in general. Fisher information is thought of as the amount of information that a message carries about an unobservable parameter. It can be computed from knowledge of the likelihood function defining the system. For example, with a normal likelihood function, the Fisher information is the reciprocal of the variance of the law. In the absence of knowledge of the likelihood law, the Fisher information may be computed from normally distributed score data as the reciprocal of their second moment.

Even though information and data are often used interchangeably, they are actually very different. Data are sets of unrelated information, and as such are of no use until they are properly evaluated. Upon evaluation, once there is some significant relation between data, and they show some relevance, then they are converted into information. Now this same data can be used for different purposes. Thus, till the data convey some information, they are not useful and therefore not information.

Measuring information entropy

The view of information as a message came into prominence with the publication in 1948 of an influential paper by Claude Shannon, "A Mathematical Theory of Communication." This thesis provides the foundations of information theory and endows the word information not only with a technical meaning but also a measure. If the sending device is equally likely to send any one of a set of N messages, then the preferred measure of "the information produced when one message is chosen from the set" is the base two logarithm of N (This measure is called self-information). In this paper, Shannon continues:

The choice of a logarithmic base corresponds to the choice of a unit for measuring information. If the base 2 is used the resulting units may be called binary digits, or more briefly bits, a word suggested by J. W. Tukey. A device with two stable positions, such as a relay or a flip-flop circuit, can store one bit of information. N such devices can store N bits…[1]

A complementary way of measuring information is provided by algorithmic information theory. In brief, this measures the information content of a list of symbols based on how predictable they are, or more specifically how easy it is to compute the list through a program: the information content of a sequence is the number of bits of the shortest program that computes it. The sequence below would have a very low algorithmic information measurement since it is a very predictable pattern, and as the pattern continues the measurement would not change. Shannon information would give the same information measurement for each symbol, since they are statistically random, and each new symbol would increase the measurement.

123456789101112131415161718192021

It is important to recognize the limitations of traditional information theory and algorithmic information theory from the perspective of human meaning. For example, when referring to the meaning content of a message Shannon noted “Frequently the messages have meaning… these semantic aspects of communication are irrelevant to the engineering problem. The significant aspect is that the actual message is one selected from a set of possible messages” (emphasis in original).

In information theory signals are part of a process, not a substance; they do something, they do not contain any specific meaning. Combining algorithmic information theory and information theory we can conclude that the most random signal contains the most information as it can be interpreted in any way and cannot be compressed.[citation needed]

Michael Reddy noted that "'signals' of the mathematical theory are 'patterns that can be exchanged'. There is no message contained in the signal, the signals convey the ability to select from a set of possible messages." In information theory "the system must be designed to operate for each possible selection, not just the one which will actually be chosen since this is unknown at the time of design".

As sensory input

Often information is viewed as a type of input to an organism or designed device. Inputs are of two kinds. Some inputs are important to the function of the organism (for example, food) or device (energy) by themselves. In his book Sensory Ecology, Dusenbery called these causal inputs. Other inputs (information) are important only because they are associated with causal inputs and can be used to predict the occurrence of a causal input at a later time (and perhaps another place). Some information is important because of association with other information but eventually there must be a connection to a causal input. In practice, information is usually carried by weak stimuli that must be detected by specialized sensory systems and amplified by energy inputs before they can be functional to the organism or device. For example, light is often a causal input to plants but provides information to animals. The colored light reflected from a flower is too weak to do much photosynthetic work but the visual system of the bee detects it and the bee's nervous system uses the information to guide the bee to the flower, where the bee often finds nectar or pollen, which are causal inputs, serving a nutritional function.

Information is any type of sensory input. When an organism with a nervous system receives an input, it transforms the input into an electrical signal. This is regarded information by some. The idea of representation is still relevant, but in a slightly different manner. That is, while abstract painting does not represent anything concretely, when the viewer sees the painting, it is nevertheless transformed into electrical signals that create a representation of the painting. Defined this way, information does not have to be related to truth, communication, or representation of an object. Entertainment in general is not intended to be informative. Music, the performing arts, amusement parks, works of fiction and so on are thus forms of information in this sense, but they are not necessarily forms of information according to some definitions given above. Consider another example: food supplies both nutrition and taste for those who eat it. If information is equated to sensory input, then nutrition is not information but taste is.

As an influence which leads to a transformation

Information is any type of pattern that influences the formation or transformation of other patterns. In this sense, there is no need for a conscious mind to perceive, much less appreciate, the pattern. Consider, for example, DNA. The sequence of nucleotides is a pattern that influences the formation and development of an organism without any need for a conscious mind. Systems theory at times seems to refer to information in this sense, assuming information does not necessarily involve any conscious mind, and patterns circulating (due to feedback) in the system can be called information. In other words, it can be said that information in this sense is something potentially perceived as representation, though not created or presented for that purpose.

If, however, the premise of "influence" implies that information has been perceived by a conscious mind and also interpreted by it, the specific context associated with this interpretation may cause the transformation of the information into knowledge. Complex definitions of both "information" and "knowledge" make such semantic and logical analysis difficult, but the condition of "transformation" is an important point in the study of information as it relates to knowledge, especially in the business discipline of knowledge management. In this practice, tools and processes are used to assist a knowledge worker in performing research and making decisions, including steps such as:

  • reviewing information in order to effectively derive value and meaning
  • referencing metadata if any is available
  • establishing a relevant context, often selecting from many possible contexts
  • deriving new knowledge from the information
  • making decisions or recommendations from the resulting knowledge.

Stewart (2001) argues that the transformation of information into knowledge is a critical one, lying at the core of value creation and competitive advantage for the modern enterprise.

When Marshall McLuhan speaks of media and their effects on human cultures, he refers to the structure of artifacts that in turn shape our behaviors and mindsets. Also, pheromones are often said to be "information" in this sense.

As a property in physics

In 2003, J. D. Bekenstein claimed there is a growing trend in physics to define the physical world as being made of information itself (and thus information is defined in this way) (see Digital physics). Information has a well defined meaning in physics. Examples of this include the phenomenon of quantum entanglement where particles can interact without reference to their separation or the speed of light. Information itself cannot travel faster than light even if the information is transmitted indirectly. This could lead to the fact that all attempts at physically observing a particle with an "entangled" relationship to another are slowed down, even though the particles are not connected in any other way other than by the information they carry.

Another link is demonstrated by the Maxwell's demon thought experiment. In this experiment, a direct relationship between information and another physical property, entropy, is demonstrated. A consequence is that it is impossible to destroy information without increasing the entropy of a system; in practical terms this often means generating heat. Another, more philosophical, outcome is that information could be thought of as interchangeable with energy. Thus, in the study of logic gates, the theoretical lower bound of thermal energy released by an AND gate is higher than for the NOT gate (because information is destroyed in an AND gate and simply converted in a NOT gate). Physical information is of particular importance in the theory of quantum computers.

As records

Records are a specialized form of information. Essentially, records are information produced consciously or as by-products of business activities or transactions and retained because of their value. Primarily their value is as evidence of the activities of the organization but they may also be retained for their informational value. Sound records management ensures that the integrity of records is preserved for as long as they are required.

The international standard on records management, ISO 15489, defines records as "information created, received, and maintained as evidence and information by an organization or person, in pursuance of legal obligations or in the transaction of business". The International Committee on Archives (ICA) Committee on electronic records defined a record as, "a specific piece of recorded information generated, collected or received in the initiation, conduct or completion of an activity and that comprises sufficient content, context and structure to provide proof or evidence of that activity".

Records may be maintained to retain corporate memory of the organization or to meet legal, fiscal or accountability requirements imposed on the organization. Willis (2005) expressed the view that sound management of business records and information delivered "…six key requirements for good corporate governance…transparency; accountability; due process; compliance; meeting statutory and common law requirements; and security of personal and corporate information."

Information and semiotics

Beynon-Davies explains the multi-faceted concept of information in terms of signs and signal-sign systems. Signs themselves can be considered in terms of four inter-dependent levels, layers or branches of semiotics: pragmatics, semantics, syntax, and empirics. These four layers serve to connect the social world on the one hand with the physical or technical world on the other.

Pragmatics is concerned with the purpose of communication. Pragmatics links the issue of signs with that of intention. The focus of pragmatics is on the intentions of human agents underlying communicative behaviour. In other words, intentions link language to action.

Semantics is concerned with the meaning of a message conveyed in a communicative act. Semantics considers the content of communication. Semantics is the study of the meaning of signs - the association between signs and behaviour. Semantics can be considered as the study of the link between symbols and their referents or concepts; particularly the way in which signs relate to human behaviour.

Syntax is concerned with the formalism used to represent a message. Syntax as an area studies the form of communication in terms of the logic and grammar of sign systems. Syntax is devoted to the study of the form rather than the content of signs and sign-systems.

Empirics is the study of the signals used to carry a message; the physical characteristics of the medium of communication. Empirics is devoted to the study of communication channels and their characteristics, e.g., sound, light, electronic transmission etc.

Nielsen (2008) discusses the relationship between semiotics and information in relation to dictionaries. The concept of lexicographic information costs is introduced and refers to the efforts users of dictionaries need to make in order to, first, find the data sought and, secondly, understand the data so that they can generate information.

Communication normally exists within the context of some social situation. The social situation sets the context for the intentions conveyed (pragmatics) and the form in which communication takes place. In a communicative situation intentions are expressed through messages which comprise collections of inter-related signs taken from a language which is mutually understood by the agents involved in the communication. Mutual understanding implies that agents involved understand the chosen language in terms of its agreed syntax (syntactics) and semantics. The sender codes the message in the language and sends the message as signals along some communication channel (empirics). The chosen communication channel will have inherent properties which determine outcomes such as the speed with which communication can take place and over what distance.

More recently Shu-Kun Lin proposed a simple definition of information: Information is the amount of the data after data compression.

Computer science

Computer science or computing science (sometimes abbreviated CS) is the study of the theoretical foundations of information and computation, and of practical techniques for their implementation and application in computer systems. It is frequently described as the systematic study of algorithmic processes that create, describe and transform information. According to Peter J. Denning, the fundamental question underlying computer science is, "What can be (efficiently) automated?" Computer science has many sub-fields; some, such as computer graphics, emphasize the computation of specific results, while others, such as computational complexity theory, study the properties of computational problems. Still others focus on the challenges in implementing computations. For example, programming language theory studies approaches to describing computations, while computer programming applies specific programming languages to solve specific computational problems, and human-computer interaction focuses on the challenges in making computers and computations useful, usable, and universally accessible to people.

The general public sometimes confuses computer science with vocational areas that deal with computers (such as information technology), or think that it relates to their own experience of computers, which typically involves activities such as gaming, web-browsing, and word-processing. However, the focus of computer science is more on understanding the properties of the programs used to implement software such as games and web-browsers, and using that understanding to create new programs or improve existing ones.


Visualization of a portion of the routes on the Internet.

History

The early foundations of what would become computer science predate the invention of the modern digital computer. Machines for calculating fixed numerical tasks, such as the abacus, have existed since antiquity. Wilhelm Schickard built the first mechanical calculator in 1623. Charles Babbage designed a difference engine in Victorian times helped by Ada Lovelace.[8] Around 1900, punch-card machines were introduced. However, all of these machines were constrained to perform a single task, or at best some subset of all possible tasks.

During the 1940s, as newer and more powerful computing machines were developed, the term computer came to refer to the machines rather than their human predecessors. As it became clear that computers could be used for more than just mathematical calculations, the field of computer science broadened to study computation in general. Computer science began to be established as a distinct academic discipline in the 1950s and early 1960s, with the creation of the first computer science departments and degree programs.[4][10] Since practical computers became available, many applications of computing have become distinct areas of study in their own right.

Although many initially believed it impossible that computers themselves could actually be a scientific field of study, in the late fifties it gradually became accepted among the greater academic population.[11] It is the now well-known IBM brand that formed part of the computer science revolution during this time. IBM (short for International Business Machines) released the IBM 704 and later the IBM 709 computers, which were widely used during the exploration period of such devices. "Still, working with the IBM [computer] was frustrating...if you had misplaced as much as one letter in one instruction, the program would crash, and you would have to start the whole process over again".[11] During the late 1950s, the computer science discipline was very much in its developmental stages, and such issues were commonplace.

Time has seen significant improvements in the usability and effectiveness of computer science technology. Modern society has seen a significant shift from computers being used solely by experts or professionals to a more widespread user base.

Major achievements


The German military used the Enigma machine during World War II for communication they thought to be secret. The large-scale decryption of Enigma traffic at Bletchley Park was an important factor that contributed to Allied victory in WWII.[12]

Despite its short history as a formal academic discipline, computer science has made a number of fundamental contributions to science and society. These include:

  • The start of the "digital revolution," which includes the current Information Age and the Internet.
  • A formal definition of computation and computability, and proof that there are computationally unsolvable and intractable problems.
  • The concept of a programming language, a tool for the precise expression of methodological information at various levels of abstraction.
  • In cryptography, breaking the Enigma machine was an important factor contributing to the Allied victory in World War II.
  • Scientific computing enabled advanced study of the mind, and mapping the human genome became possible with Human Genome Project.[13] Distributed computing projects such as Folding@home explore protein folding.
  • Algorithmic trading has increased the efficiency and liquidity of financial markets by using artificial intelligence, machine learning, and other statistical and numerical techniques on a large scale.

Areas of computer science

As a discipline, computer science spans a range of topics from theoretical studies of algorithms and the limits of computation to the practical issues of implementing computing systems in hardware and software. The Computer Sciences Accreditation Board (CSAB) – which is made up of representatives of the Association for Computing Machinery (ACM), the Institute of Electrical and Electronics Engineers Computer Society, and the Association for Information Systems – identifies four areas that it considers crucial to the discipline of computer science: theory of computation, algorithms and data structures, programming methodology and languages, and computer elements and architecture. In addition to these four areas, CSAB also identifies fields such as software engineering, artificial intelligence, computer networking and communication, database systems, parallel computation, distributed computation, computer-human interaction, computer graphics, operating systems, and numerical and symbolic computation as being important areas of computer science.

Theoretical computer science

The broader field of theoretical computer science encompasses both the classical theory of computation and a wide range of other topics that focus on the more abstract, logical, and mathematical aspects of computing.

 P \rightarrow Q \, DFAexample.svg Elliptic curve simple.png 6n-graf.svg
Mathematical logic Automata theory Number theory Graph theory
\Gamma\vdash x: Int Commutative diagram for morphism.svg SimplexRangeSearching.png Blochsphere.svg
Type theory Category theory Computational geometry Quantum computing theory

Theory of computation

The study of the theory of computation is focused on answering fundamental questions about what can be computed and what amount of resources are required to perform those computations. In an effort to answer the first question, computability theory examines which computational problems are solvable on various theoretical models of computation. The second question is addressed by computational complexity theory, which studies the time and space costs associated with different approaches to solving a computational problem.

The famous "P=NP?" problem, one of the Millennium Prize Problems, is an open problem in the theory of computation.

Wang tiles.png P = NP ?
Computability theory Computational complexity theory

Algorithms and data structures

O(n2) Sorting quicksort anim.gif Singly linked list.png
Analysis of algorithms Algorithms Data structures

Programming methodology and languages

Ideal compiler.png Python add5 syntax.svg
Compilers Programming languages

Computer elements and architecture

NOR ANSI.svg Fivestagespipeline.png SIMD.svg
Digital logic Microarchitecture Multiprocessing

Numerical and symbolic computation

1u04-argonaute.png User-FastFission-brain.gif Naphthalene-3D-balls.png Neuron-no labels.png X-43A (Hyper - X) Mach 7 computational fluid dynamic (CFD).jpg Wind-particle.png y = sin(x) + c
Bioinformatics Cognitive Science Computational chemistry Computational neuroscience Computational physics Numerical algorithms Symbolic mathematics

Applied Computer Science

The following areas are often studied from a more theoretical, computer science viewpoint, as well as from a more practical, engineering perspective.

Operating system placement.svg NETWORK-Library-LAN.png 5-cell.gif Corner.png Emp Tables (Database).PNG
Operating systems Computer networks Computer graphics Computer vision Databases Information retrieval
Padlock.svg ArtificialFictionBrain.png Roomba original.jpg Wacom Pen-tablet.jpg
Computer security Artificial intelligence Robotics Human–computer interaction Ubiquitous computing Software engineering

Relationship with other fields

Despite its name, a significant amount of computer science does not involve the study of computers themselves. Because of this, several alternative names have been proposed. Certain departments of major universities prefer the term computing science, to emphasize precisely that difference. Danish scientist Peter Naur suggested the term datalogy, to reflect the fact that the scientific discipline revolves around data and data treatment, while not necessarily involving computers. The first scientific institution to use the term was the Department of Datalogy at the University of Copenhagen, founded in 1969, with Peter Naur being the first professor in datalogy. The term is used mainly in the Scandinavian countries. Also, in the early days of computing, a number of terms for the practitioners of the field of computing were suggested in the Communications of the ACMturingineer, turologist, flow-charts-man, applied meta-mathematician, and applied epistemologist. Three months later in the same journal, comptologist was suggested, followed next year by hypologist. The term computics has also been suggested. In continental Europe, names such as informatique (French), Informatik (German) or informatica (Dutch), derived from information and possibly mathematics or automatic, are more common than names derived from computer/computation.

The renowned computer scientist Edsger Dijkstra stated, "Computer science is no more about computers than astronomy is about telescopes." The design and deployment of computers and computer systems is generally considered the province of disciplines other than computer science. For example, the study of computer hardware is usually considered part of computer engineering, while the study of commercial computer systems and their deployment is often called information technology or information systems. However, there has been much cross-fertilization of ideas between the various computer-related disciplines. Computer science research has also often crossed into other disciplines, such as computational sociology, philosophy, cognitive science, economics, mathematics, physics, and linguistics.

Computer science is considered by some to have a much closer relationship with mathematics than many scientific disciplines, with some observers saying that computing is a mathematical science. Early computer science was strongly influenced by the work of mathematicians such as Kurt Gödel and Alan Turing, and there continues to be a useful interchange of ideas between the two fields in areas such as mathematical logic, category theory, domain theory, and algebra.

The relationship between computer science and software engineering is a contentious issue, which is further muddied by disputes over what the term "software engineering" means, and how computer science is defined. David Parnas, taking a cue from the relationship between other engineering and science disciplines, has claimed that the principal focus of computer science is studying the properties of computation in general, while the principal focus of software engineering is the design of specific computations to achieve practical goals, making the two separate but complementary disciplines.

The academic, political, and funding aspects of computer science tend to depend on whether a department formed with a mathematical emphasis or with an engineering emphasis. Computer science departments with a mathematics emphasis and with a numerical orientation consider alignment computational science. Both types of departments tend to make efforts to bridge the field educationally if not across all research.

Computer science education

Some universities teach computer science as a theoretical study of computation and algorithmic reasoning. These programs often feature the theory of computation, analysis of algorithms, formal methods, concurrency theory, databases, computer graphics and systems analysis, among others. They typically also teach computer programming, but treat it as a vessel for the support of other fields of computer science rather than a central focus of high-level study.

Other colleges and universities, as well as secondary schools and vocational programs that teach computer science, emphasize the practice of advanced programming rather than the theory of algorithms and computation in their computer science curricula. Such curricula tend to focus on those skills that are important to workers entering the software industry. The practical aspects of computer programming are often referred to as software engineering. However, there is a lot of disagreement over the meaning of the term, and whether or not it is the same thing as programming.