TIM ANGLADE PROUDLY PRESENTS PART TWO OF THE TOTALLY UNKNOWN “FUN & PROFIT”SERIES. @TIMANGLADE Hit me up. I don’t bite… too hard.
About Jack Berlin
Founded Accusoft (Pegasus Imaging) in 1991 and has been CEO ever since.
Very proud of what the team has created with edocr, it is easy to share documents in a personalized way and so very useful at no cost to the user! Hope to hear comments and suggestions at info@edocr.com.
NOSQL
Profit!
&
forFun
TIM ANGLADE PROUDLY PRESENTS PART TWO
OF THE TOTALLY UNKOWN “FUN & PROFITâ€
SERIES. A TALE OF TECH,
INTRIGUE
&Â FORBIDDEN LOVE. A WHIRLWIND OF
ADVENTURERS, PRODUCTION SYSTEMS
&Â TROLLS. A STORY SO BIG, ITS TITLE HAD TO
HAVE ITS OWN INTRODUCTION TEXT. HERE IS…
@TIMANGLADE
Hit me up. I don’t bite… too hard.
AN ANNOUNCEMENT
NØSQL
rope!
Eu
LONDON, APRIL 20TH & 21ST
WORKSHOPS AND TRAINING ON THE 22ND
FOLLOW @NOSQLEU FOR DETAILS
A WARNING
This is Tech for Managers. Don’t Blame Me.
40 YEARS
IN THE DESERT
Information
Retrieval
P. BAXENDALE,
Editor
A Relational Model of Data for
Large Shared Data Banks
E. F. CODD
IBM Research Laboratory, San Jose, California
Future
users
of
large
data
banks
must
be
protected
from
having
to
know
how
the
data
is organized
in the machine
(the
internal
representation).
A
prompting
service
which
supplies
such
information
is not
a satisfactory
solution.
Activities
of
users
at
terminals
and
most
application
programs
should
remain
unaffected
when
the
internal
representation
of data
is changed
and
even
when
some
aspects
of
the
external
representation
are
changed.
Changes
in
data
representation
will
often
be
needed
as a
result
of
changes
in query,
update,
and
report
traffic
and
natural
growth
in
the
types
of
stored
information.
Existing
noninferential,
formatted
data
systems
provide
users
with
tree-structured
files
or
slightly
more
general
network
models
of
the
data.
In Section
1,
inadequacies
of
these
models
are
discussed.
A model
based
on n-ary
relations,
a
normal
form
for
data
base
relations,
and
the
concept
of
a universal
data
sublanguage
are
introduced.
In Section
2, certain
opera-
tions
on
relations
(other
than
logical
inference)
are
discussed
and
applied
to
the
problems
of
redundancy
and
consistency
in the
user’s
model.
KEY WORDS
AND
PHRASES:
data
bank,
data
base,
data structure,
data
organization,
hierarchies
of
data,
networks
of
data,
relations,
derivability,
redundancy,
consistency,
composition,
join,
retrieval
language,
predicate
calculus,
security,
data
integrity
CR CATEGORIES:
3.70,
3.73,
3.75,
4.20,
4.22,
4.29
1.
Relational
Model
and
Normal
Form
1 .I.
INTR~xJ~TI~N
This paper is concerned with the application of ele-
mentary relation theory to systems which provide shared
access to large banks of formatted data. Except for a paper
by Childs [l], the principal application of relations to data
systems has been to deductive question-answering systems.
Levein and Maron [2] provide numerous references to work
in this area.
In contrast, the problems treated here are those of data
independence-the independence of application programs
and terminal activities from growth in data types and
changes in data representation-and
certain kinds of data
inconsistency which are expected to become troublesome
even in nondeductive systems.
Volume
13
/ Number
6 /
June,
1970
The relational view (or model) of data described in
Section 1 appears to be superior in several respects to the
graph or network model [3,4] presently in vogue for non-
inferential systems. It provides a means of describing data
with its natural structure only-that
is, without superim-
posing any additional structure for machine representation
purposes. Accordingly,
it provides a basis for a high level
data language which will yield maximal independence be-
tween programs on the one hand and machine representa-
tion and organization of data on the other.
A further advantage of the relational view is that it
forms a sound basis for treating derivability,
redundancy,
and consistency of relations-these are discussed in Section
2. The network model, on the other hand, has spawned a
number of confusions, not the least of which is mistaking
the derivation of connections for the derivation of rela-
tions (see remarks in Section 2 on the “connection trapâ€).
Finally, the relational view permits a clearer evaluation
of the scope and logical limitations of present formatted
data systems, and also the relative merits (from a logical
standpoint) of competing representations of data within a
single system. Examples of this clearer perspective are
cited in various parts of this paper. Implementations of
systems to support the relational model are not discussed.
1.2. DATA DEPENDENCIES IN PRESENT SYSTEMS
The provision of data description tables in recently de-
veloped information systems represents a major advance
toward the goal of data independence [5,6,7]. Such tables
facilitate changing certain characteristics of the data repre-
sentation stored in a data bank. However, the variety of
data representation characteristics which can be changed
without logically impairing some application programs is
still quite limited. Further, the model of data with which
users interact is still cluttered with representational prop-
erties, particularly
in regard to the representation of col-
lections of data (as opposed to individual items). Three of
the principal kinds of data dependencies which still need
to be removed are: ordering dependence, indexing depend-
ence, and access path dependence. In some systems these
dependencies are not clearly separable from one another.
1.2.1. Ordering Dependence. Elements of data in a
data bank may be stored in a variety of ways, some involv-
ing no concern for ordering, some permitting each element
to participate in one ordering only, others permitting each
element to participate in several orderings. Let us consider
those existing systems which either require or permit data
elements to be stored in at least one total ordering which is
closely associated with the hardware-determined ordering
of addresses. For example, the records of a file concerning
parts might be stored in ascending order by part serial
number. Such systems normally permit application pro-
grams to assume that the order of presentation of records
from such a file is identical to (or is a subordering of) the
Communications
of
the
ACM
377
I
n
f
o
r
m
a
t
i
o
n
R
e
t
r
i
e
v
a
l
P
.
B
A
X
E
N
D
A
L
E
,
E
d
i
t
o
r
A
R
e
l
a
t
i
o
n
a
l
M
o
d
e
l
o
f
D
a
t
a
f
o
r
L
a
r
g
e
S
h
a
r
e
d
D
a
t
a
B
a
n
k
s
E
.
F
.
C
O
D
D
I
B
M
R
e
s
e
a
r
c
h
L
a
b
o
r
a
t
o
r
y
,
S
a
n
J
o
s
e
,
C
a
l
i
f
o
r
n
i
a
F
u
t
u
r
e
u
s
e
r
s
o
f
l
a
r
g
e
d
a
t
a
b
a
n
k
s
m
u
s
t
b
e
p
r
o
t
e
c
t
e
d
f
r
o
m
h
a
v
i
n
g
t
o
k
n
o
w
h
o
w
t
h
e
d
a
t
a
i
s
o
r
g
a
n
i
z
e
d
i
n
t
h
e
m
a
c
h
i
n
e
(
t
h
e
i
n
t
e
r
n
a
l
r
e
p
r
e
s
e
n
t
a
t
i
o
n
)
.
A
p
r
o
m
p
t
i
n
g
s
e
r
v
i
c
e
w
h
i
c
h
s
u
p
p
l
i
e
s
s
u
c
h
i
n
f
o
r
m
a
t
i
o
n
i
s
n
o
t
a
s
a
t
i
s
f
a
c
t
o
r
y
s
o
l
u
t
i
o
n
.
A
c
t
i
v
i
t
i
e
s
o
f
u
s
e
r
s
a
t
t
e
r
m
i
n
a
l
s
a
n
d
m
o
s
t
a
p
p
l
i
c
a
t
i
o
n
p
r
o
g
r
a
m
s
s
h
o
u
l
d
r
e
m
a
i
n
u
n
a
f
f
e
c
t
e
d
w
h
e
n
t
h
e
i
n
t
e
r
n
a
l
r
e
p
r
e
s
e
n
t
a
t
i
o
n
o
f
d
a
t
a
i
s
c
h
a
n
g
e
d
a
n
d
e
v
e
n
w
h
e
n
s
o
m
e
a
s
p
e
c
t
s
o
f
t
h
e
e
x
t
e
r
n
a
l
r
e
p
r
e
s
e
n
t
a
t
i
o
n
a
r
e
c
h
a
n
g
e
d
.
C
h
a
n
g
e
s
i
n
d
a
t
a
r
e
p
r
e
s
e
n
t
a
t
i
o
n
w
i
l
l
o
f
t
e
n
b
e
n
e
e
d
e
d
a
s
a
r
e
s
u
l
t
o
f
c
h
a
n
g
e
s
i
n
q
u
e
r
y
,
u
p
d
a
t
e
,
a
n
d
r
e
p
o
r
t
t
r
a
f
f
i
c
a
n
d
n
a
t
u
r
a
l
g
r
o
w
t
h
i
n
t
h
e
t
y
p
e
s
o
f
s
t
o
r
e
d
i
n
f
o
r
m
a
t
i
o
n
.
E
x
i
s
t
i
n
g
n
o
n
i
n
f
e
r
e
n
t
i
a
l
,
f
o
r
m
a
t
t
e
d
d
a
t
a
s
y
s
t
e
m
s
p
r
o
v
i
d
e
u
s
e
r
s
w
i
t
h
t
r
e
e
-
s
t
r
u
c
t
u
r
e
d
f
i
l
e
s
o
r
s
l
i
g
h
t
l
y
m
o
r
e
g
e
n
e
r
a
l
n
e
t
w
o
r
k
m
o
d
e
l
s
o
f
t
h
e
d
a
t
a
.
I
n
S
e
c
t
i
o
n
1
,
i
n
a
d
e
q
u
a
c
i
e
s
o
f
t
h
e
s
e
m
o
d
e
l
s
a
r
e
d
i
s
c
u
s
s
e
d
.
A
m
o
d
e
l
b
a
s
e
d
o
n
n
-
a
r
y
r
e
l
a
t
i
o
n
s
,
a
n
o
r
m
a
l
form
for
data
base
relations,
and
the
concept
of
a universal
data
sublanguage
are
introduced.
In Section
2, certain
opera-
tions
on
relations
(other
than
logical
inference)
are
discussed
and
applied
to
the
problems
of
redundancy
and
consistency
in the
user’s
model.
KEY WORDS
AND
PHRASES:
data
bank,
data
base,
data structure,
data
organization,
hierarchies
of
data,
networks
of
data,
relations,
derivability,
redundancy,
consistency,
composition,
join,
retrieval
language,
predicate
calculus,
security,
data
integrity
CR CATEGORIES:
3.70,
3.73,
3.75,
4.20,
4.22,
4.29
1.
Relational
Model
and
Normal
Form
1 .I.
INTR~xJ~TI~N
This paper is concerned with the application of ele-
mentary relation theory to systems which provide shared
access to large banks of formatted data. Except for a paper
by Childs [l], the principal application of relations to data
systems has been to deductive question-answering systems.
Levein and Maron [2] provide numerous references to work
in this area.
In contrast, the problems treated here are those of data
independence-the independence of application programs
and terminal activities from growth in data types and
changes in data representation-and
certain kinds of data
inconsistency which are expected to become troublesome
even in nondeductive systems.
Volume
13
/ Number
6 /
June,
1970
T
h
e
r
e
l
a
t
i
o
n
a
l
v
i
e
w
(
o
r
m
o
d
e
l
)
o
f
d
a
t
a
d
e
s
c
r
i
b
e
d
i
n
S
e
c
t
i
o
n
1
a
p
p
e
a
r
s
t
o
b
e
s
u
p
e
r
i
o
r
i
n
s
e
v
e
r
a
l
r
e
s
p
e
c
t
s
t
o
t
h
e
g
r
a
p
h
o
r
n
e
t
w
o
r
k
m
o
d
e
l
[
3
,
4
]
p
r
e
s
e
n
t
l
y
i
n
v
o
g
u
e
f
o
r
n
o
n
-
i
n
f
e
r
e
n
t
i
a
l
s
y
s
t
e
m
s
.
I
t
p
r
o
v
i
d
e
s
a
m
e
a
n
s
o
f
d
e
s
c
r
i
b
i
n
g
d
a
t
a
w
i
t
h
i
t
s
n
a
t
u
r
a
l
s
t
r
u
c
t
u
r
e
o
n
l
y
-
t
h
a
t
i
s
,
w
i
t
h
o
u
t
s
u
p
e
r
i
m
-
p
o
s
i
n
g
a
n
y
a
d
d
i
t
i
o
n
a
l
s
t
r
u
c
t
u
r
e
f
o
r
m
a
c
h
i
n
e
r
e
p
r
e
s
e
n
t
a
t
i
o
n
p
u
r
p
o
s
e
s
.
A
c
c
o
r
d
i
n
g
l
y
,
i
t
p
r
o
v
i
d
e
s
a
b
a
s
i
s
f
o
r
a
h
i
g
h
l
e
v
e
l
d
a
t
a
l
a
n
g
u
a
g
e
w
h
i
c
h
w
i
l
l
y
i
e
l
d
m
a
x
i
m
a
l
i
n
d
e
p
e
n
d
e
n
c
e
b
e
-
t
w
e
e
n
p
r
o
g
r
a
m
s
o
n
t
h
e
o
n
e
h
a
n
d
a
n
d
m
a
c
h
i
n
e
r
e
p
r
e
s
e
n
t
a
-
t
i
o
n
a
n
d
o
r
g
a
n
i
z
a
t
i
o
n
o
f
d
a
t
a
o
n
t
h
e
o
t
h
e
r
.
A
f
u
r
t
h
e
r
a
d
v
a
n
t
a
g
e
o
f
t
h
e
r
e
l
a
t
i
o
n
a
l
v
i
e
w
i
s
t
h
a
t
i
t
f
o
r
m
s
a
s
o
u
n
d
b
a
s
i
s
f
o
r
t
r
e
a
t
i
n
g
d
e
r
i
v
a
b
i
l
i
t
y
,
r
e
d
u
n
d
a
n
c
y
,
a
n
d
c
o
n
s
i
s
t
e
n
c
y
o
f
r
e
l
a
t
i
o
n
s
-
t
h
e
s
e
a
r
e
d
i
s
c
u
s
s
e
d
i
n
S
e
c
t
i
o
n
2
.
T
h
e
n
e
t
w
o
r
k
m
o
d
e
l
,
o
n
t
h
e
o
t
h
e
r
h
a
n
d
,
h
a
s
s
p
a
w
n
e
d
a
n
u
m
b
e
r
o
f
c
o
n
f
u
s
i
o
n
s
,
n
o
t
t
h
e
l
e
a
s
t
o
f
w
h
i
c
h
i
s
m
i
s
t
a
k
i
n
g
t
h
e
d
e
r
i
v
a
t
i
o
n
o
f
c
o
n
n
e
c
t
i
o
n
s
f
o
r
t
h
e
d
e
r
i
v
a
t
i
o
n
o
f
r
e
l
a
-
t
i
o
n
s
(
s
e
e
r
e
m
a
r
k
s
i
n
S
e
c
t
i
o
n
2
o
n
t
h
e
“
c
o
n
n
e
c
t
i
o
n
t
r
a
p
â€
)
.
F
i
n
a
l
l
y
,
t
h
e
r
e
l
a
t
i
o
n
a
l
v
i
e
w
p
e
r
m
i
t
s
a
c
l
e
a
r
e
r
e
v
a
l
u
a
t
i
o
n
o
f
t
h
e
s
c
o
p
e
a
n
d
l
o
g
i
c
a
l
l
i
m
i
t
a
t
i
o
n
s
o
f
p
r
e
s
e
n
t
f
o
r
m
a
t
t
e
d
d
a
t
a
s
y
s
t
e
m
s
,
a
n
d
a
l
s
o
t
h
e
r
e
l
a
t
i
v
e
m
e
r
i
t
s
(
f
r
o
m
a
l
o
g
i
c
a
l
s
t
a
n
d
p
o
i
n
t
)
o
f
c
o
m
p
e
t
i
n
g
r
e
p
r
e
s
e
n
t
a
t
i
o
n
s
o
f
d
a
t
a
w
i
t
h
i
n
a
s
i
n
g
l
e
s
y
s
t
e
m
.
E
x
a
m
p
l
e
s
o
f
t
h
i
s
c
l
e
a
r
e
r
p
e
r
s
p
e
c
t
i
v
e
a
r
e
cited in various parts of this paper. Implementations of
systems to support the relational model are not discussed.
1.2. DATA DEPENDENCIES IN PRESENT SYSTEMS
The provision of data description tables in recently de-
veloped information systems represents a major advance
toward the goal of data independence [5,6,7]. Such tables
facilitate changing certain characteristics of the data repre-
sentation stored in a data bank. However, the variety of
data representation characteristics which can be changed
without logically impairing some application programs is
still quite limited. Further, the model of data with which
users interact is still cluttered with representational prop-
erties, particularly
in regard to the representation of col-
lections of data (as opposed to individual items). Three of
the principal kinds of data dependencies which still need
to be removed are: ordering dependence, indexing depend-
ence, and access path dependence. In some systems these
dependencies are not clearly separable from one another.
1.2.1. Ordering Dependence. Elements of data in a
data bank may be stored in a variety of ways, some involv-
ing no concern for ordering, some permitting each element
to participate in one ordering only, others permitting each
element to participate in several orderings. Let us consider
those existing systems which either require or permit data
elements to be stored in at least one total ordering which is
closely associated with the hardware-determined ordering
of addresses. For example, the records of a file concerning
parts might be stored in ascending order by part serial
number. Such systems normally permit application pro-
grams to assume that the order of presentation of records
from such a file is identical to (or is a subordering of) the
Communications
of
the
ACM
377
WHAT DO YOU MEAN
BY “THE DESERT�
THE GOOD
A strong ecosystem.
THE BAD
Databases on ACID.
THE UGLY
Paradigm Puzzlement.
Noun
paradigm (plural paradigms)
1. An example serving as a model or pattern.
2. A system of assumptions, concepts,
values, and practices that constitutes
a way of viewing reality.
SQL
Just
say no
A NOT-SO-NOVEL
IDEA
Information
Retrieval
P. BAXENDALE,
Editor
A Relational Model of Data for
Large Shared Data Banks
E. F. CODD
IBM Research Laboratory, San Jose, California
Future
users
of
large
data
banks
must
be
protected
from
having
to
know
how
the
data
is organized
in the machine
(the
internal
representation).
A
prompting
service
which
supplies
such
information
is not
a satisfactory
solution.
Activities
of
users
at
terminals
and
most
application
programs
should
remain
unaffected
when
the
internal
representation
of data
is changed
and
even
when
some
aspects
of
the
external
representation
are
changed.
Changes
in
data
representation
will
often
be
needed
as a
result
of
changes
in query,
update,
and
report
traffic
and
natural
growth
in
the
types
of
stored
information.
Existing
noninferential,
formatted
data
systems
provide
users
with
tree-structured
files
or
slightly
more
general
network
models
of
the
data.
In Section
1,
inadequacies
of
these
models
are
discussed.
A model
based
on n-ary
relations,
a
normal
form
for
data
base
relations,
and
the
concept
of
a universal
data
sublanguage
are
introduced.
In Section
2, certain
opera-
tions
on
relations
(other
than
logical
inference)
are
discussed
and
applied
to
the
problems
of
redundancy
and
consistency
in the
user’s
model.
KEY WORDS
AND
PHRASES:
data
bank,
data
base,
data structure,
data
organization,
hierarchies
of
data,
networks
of
data,
relations,
derivability,
redundancy,
consistency,
composition,
join,
retrieval
language,
predicate
calculus,
security,
data
integrity
CR CATEGORIES:
3.70,
3.73,
3.75,
4.20,
4.22,
4.29
1.
Relational
Model
and
Normal
Form
1 .I.
INTR~xJ~TI~N
This paper is concerned with the application of ele-
mentary relation theory to systems which provide shared
access to large banks of formatted data. Except for a paper
by Childs [l], the principal application of relations to data
systems has been to deductive question-answering systems.
Levein and Maron [2] provide numerous references to work
in this area.
In contrast, the problems treated here are those of data
independence-the independence of application programs
and terminal activities from growth in data types and
changes in data representation-and
certain kinds of data
inconsistency which are expected to become troublesome
even in nondeductive systems.
Volume
13
/ Number
6 /
June,
1970
The relational view (or model) of data described in
Section 1 appears to be superior in several respects to the
graph or network model [3,4] presently in vogue for non-
inferential systems. It provides a means of describing data
with its natural structure only-that
is, without superim-
posing any additional structure for machine representation
purposes. Accordingly,
it provides a basis for a high level
data language which will yield maximal independence be-
tween programs on the one hand and machine representa-
tion and organization of data on the other.
A further advantage of the relational view is that it
forms a sound basis for treating derivability,
redundancy,
and consistency of relations-these are discussed in Section
2. The network model, on the other hand, has spawned a
number of confusions, not the least of which is mistaking
the derivation of connections for the derivation of rela-
tions (see remarks in Section 2 on the “connection trapâ€).
Finally, the relational view permits a clearer evaluation
of the scope and logical limitations of present formatted
data systems, and also the relative merits (from a logical
standpoint) of competing representations of data within a
single system. Examples of this clearer perspective are
cited in various parts of this paper. Implementations of
systems to support the relational model are not discussed.
1.2. DATA DEPENDENCIES IN PRESENT SYSTEMS
The provision of data description tables in recently de-
veloped information systems represents a major advance
toward the goal of data independence [5,6,7]. Such tables
facilitate changing certain characteristics of the data repre-
sentation stored in a data bank. However, the variety of
data representation characteristics which can be changed
without logically impairing some application programs is
still quite limited. Further, the model of data with which
users interact is still cluttered with representational prop-
erties, particularly
in regard to the representation of col-
lections of data (as opposed to individual items). Three of
the principal kinds of data dependencies which still need
to be removed are: ordering dependence, indexing depend-
ence, and access path dependence. In some systems these
dependencies are not clearly separable from one another.
1.2.1. Ordering Dependence. Elements of data in a
data bank may be stored in a variety of ways, some involv-
ing no concern for ordering, some permitting each element
to participate in one ordering only, others permitting each
element to participate in several orderings. Let us consider
those existing systems which either require or permit data
elements to be stored in at least one total ordering which is
closely associated with the hardware-determined ordering
of addresses. For example, the records of a file concerning
parts might be stored in ascending order by part serial
number. Such systems normally permit application pro-
grams to assume that the order of presentation of records
from such a file is identical to (or is a subordering of) the
Communications
of
the
ACM
377
Information
Retrieval
P. BAXENDALE,
Editor
A Relational Model of Data for
Large Shared Data Banks
E. F. CODD
IBM Research Laboratory, San Jose, California
Future
users
of
large
data
banks
must
be
protected
from
having
to
know
how
the
data
is organized
in the machine
(the
internal
representation).
A
prompting
service
which
supplies
such
information
is not
a satisfactory
solution.
Activities
of
users
at
terminals
and
most
application
programs
should
remain
unaffected
when
the
internal
representation
of data
is changed
and
even
when
some
aspects
of
the
external
representation
are
changed.
Changes
in
data
representation
will
often
be
needed
as a
result
of
changes
in query,
update,
and
report
traffic
and
natural
growth
in
the
types
of
stored
information.
Existing
noninferential,
formatted
data
systems
provide
users
with
tree-structured
files
or
slightly
more
general
network
models
of
the
data.
In Section
1,
inadequacies
of
these
models
are
discussed.
A model
based
on n-ary
relations,
a
normal
form
for
data
base
relations,
and
the
concept
of
a universal
data
sublanguage
are
introduced.
In Section
2, certain
opera-
tions
on
relations
(other
than
logical
inference)
are
discussed
and
applied
to
the
problems
of
redundancy
and
consistency
in the
user’s
model.
KEY WORDS
AND
PHRASES:
data
bank,
data
base,
data structure,
data
organization,
hierarchies
of
data,
networks
of
data,
relations,
derivability,
redundancy,
consistency,
composition,
join,
retrieval
language,
predicate
calculus,
security,
data
integrity
CR CATEGORIES:
3.70,
3.73,
3.75,
4.20,
4.22,
4.29
1.
Relational
Model
and
Normal
Form
1 .I.
INTR~xJ~TI~N
This paper is concerned with the application of ele-
mentary relation theory to systems which provide shared
access to large banks of formatted data. Except for a paper
by Childs [l], the principal application of relations to data
systems has been to deductive question-answering systems.
Levein and Maron [2] provide numerous references to work
in this area.
In contrast, the problems treated here are those of data
independence-the independence of application programs
and terminal activities from growth in data types and
changes in data representation-and
certain kinds of data
inconsistency which are expected to become troublesome
even in nondeductive systems.
Volume
13
/ Number
6 /
June,
1970
The relational view (or model) of data described in
Section 1 appears to be superior in several respects to the
graph or network model [3,4] presently in vogue for non-
inferential systems. It provides a means of describing data
with its natural structure only-that
is, without superim-
posing any additional structure for machine representation
purposes. Accordingly,
it provides a basis for a high level
data language which will yield maximal independence be-
tween programs on the one hand and machine representa-
tion and organization of data on the other.
A further advantage of the relational view is that it
forms a sound basis for treating derivability,
redundancy,
and consistency of relations-these are discussed in Section
2. The network model, on the other hand, has spawned a
number of confusions, not the least of which is mistaking
the derivation of connections for the derivation of rela-
tions (see remarks in Section 2 on the “connection trapâ€).
Finally, the relational view permits a clearer evaluation
of the scope and logical limitations of present formatted
data systems, and also the relative merits (from a logical
standpoint) of competing representations of data within a
single system. Examples of this clearer perspective are
cited in various parts of this paper. Implementations of
systems to support the relational model are not discussed.
1.2. DATA DEPENDENCIES IN PRESENT SYSTEMS
The provision of data description tables in recently de-
veloped information systems represents a major advance
toward the goal of data independence [5,6,7]. Such tables
facilitate changing certain characteristics of the data repre-
sentation stored in a data bank. However, the variety of
data representation characteristics which can be changed
without logically impairing some application programs is
still quite limited. Further, the model of data with which
users interact is still cluttered with representational prop-
erties, particularly
in regard to the representation of col-
lections of data (as opposed to individual items). Three of
the principal kinds of data dependencies which still need
to be removed are: ordering dependence, indexing depend-
ence, and access path dependence. In some systems these
dependencies are not clearly separable from one another.
1.2.1. Ordering Dependence. Elements of data in a
data bank may be stored in a variety of ways, some involv-
ing no concern for ordering, some permitting each element
to participate in one ordering only, others permitting each
element to participate in several orderings. Let us consider
those existing systems which either require or permit data
elements to be stored in at least one total ordering which is
closely associated with the hardware-determined ordering
of addresses. For example, the records of a file concerning
parts might be stored in ascending order by part serial
number. Such systems normally permit application pro-
grams to assume that the order of presentation of records
from such a file is identical to (or is a subordering of) the
Communications
of
the
ACM
377
TWO WORDS
data warehousing.
THE ODD COUPLE
FAMILY
COUCHDB
MONGODB
RIAK
REDIS
TOKYOCABINET
NEO4J
INFOGRID
SONES
HYPERGRAPHDB
HYPERTABLE
SIMPLEDB
TERRASTORE
HADOOP
MNESIA
CASSANDRA
HBASE
JACKRABBIT
VOLDEMORT
GT.M
DYNOMITE
MEMCACHEDB
BIGTABLE
DYNAMO
SHERPA
ORACLE SPATIAL
ESRI ARCGIS
SAND
CITRUSLEAF
NEPTUNE
C
O
U
C
H
D
B
M
O
N
G
O
D
B
R
I
A
K
R
E
D
I
S
TOKYOCABINET
NEO4J
INFOGRID
SONES
HYPERGRAPHDB
HYPERTABLE
SIMPLEDB
TERRASTORE
HADOOP
MNESIA
CASSANDRA
HBASE
JACKRABBIT
VOLDEMORT
GT.M
DYNOMITE
MEMCACHEDB
BIGTABLE
DYNAMO
SHERPA
ORACLE SPATIAL
ESRI ARCGIS
SAND
CITRUSLEAF
NEPTUNE
C
O
U
C
H
D
B
M
O
N
G
O
D
B
R
I
A
K
R
E
D
I
S
T
O
K
Y
O
C
A
B
I
N
E
T
N
E
O
4
J
I
N
F
O
G
R
I
D
S
O
N
E
S
H
Y
P
E
R
G
R
A
P
H
D
B
HYPERTABLE
SIMPLEDB
TERRASTORE
HADOOP
MNESIA
CASSANDRA
HBASE
JACKRABBIT
VOLDEMORT
GT.M
DYNOMITE
MEMCACHEDB
BIGTABLE
DYNAMO
SHERPA
ORACLE SPATIAL
ESRI ARCGIS
SAND
CITRUSLEAF
NEPTUNE
C
O
U
C
H
D
B
M
O
N
G
O
D
B
R
I
A
K
R
E
D
I
S
T
O
K
Y
O
C
A
B
I
N
E
T
N
E
O
4
J
I
N
F
O
G
R
I
D
S
O
N
E
S
H
Y
P
E
R
G
R
A
P
H
D
B
H
Y
P
E
R
T
A
B
L
E
S
I
M
P
L
E
D
B
T
E
R
R
A
S
T
O
R
E
H
A
D
O
O
P
M
N
E
S
I
A
CASSANDRA
HBASE
JACKRABBIT
VOLDEMORT
GT.M
DYNOMITE
MEMCACHEDB
BIGTABLE
DYNAMO
SHERPA
ORACLE SPATIAL
ESRI ARCGIS
SAND
CITRUSLEAF
NEPTUNE
C
O
U
C
H
D
B
M
O
N
G
O
D
B
R
I
A
K
R
E
D
I
S
T
O
K
Y
O
C
A
B
I
N
E
T
N
E
O
4
J
I
N
F
O
G
R
I
D
S
O
N
E
S
H
Y
P
E
R
G
R
A
P
H
D
B
H
Y
P
E
R
T
A
B
L
E
S
I
M
P
L
E
D
B
T
E
R
R
A
S
T
O
R
E
H
A
D
O
O
P
M
N
E
S
I
A
C
A
S
S
A
N
D
R
A
H
B
A
S
E
J
A
C
K
R
A
B
B
I
T
V
O
L
D
E
M
O
R
T
G
T
.
M
DYNOMITE
MEMCACHEDB
BIGTABLE
DYNAMO
SHERPA
ORACLE SPATIAL
ESRI ARCGIS
SAND
CITRUSLEAF
NEPTUNE
C
O
U
C
H
D
B
M
O
N
G
O
D
B
R
I
A
K
R
E
D
I
S
T
O
K
Y
O
C
A
B
I
N
E
T
N
E
O
4
J
I
N
F
O
G
R
I
D
S
O
N
E
S
H
Y
P
E
R
G
R
A
P
H
D
B
H
Y
P
E
R
T
A
B
L
E
S
I
M
P
L
E
D
B
T
E
R
R
A
S
T
O
R
E
H
A
D
O
O
P
M
N
E
S
I
A
C
A
S
S
A
N
D
R
A
H
B
A
S
E
J
A
C
K
R
A
B
B
I
T
V
O
L
D
E
M
O
R
T
G
T
.
M
D
Y
N
O
M
I
T
E
M
E
M
C
A
C
H
E
D
B
B
I
G
T
A
B
L
E
D
Y
N
A
M
O
S
H
E
R
P
A
ORACLE SPATIAL
ESRI ARCGIS
SAND
CITRUSLEAF
NEPTUNE
DOCUMENT
KEY–VALUE
GRAPH
COLUMN/BIGTABLE
GEO
OBJECT
FILESYSTEM
1.
2.
3.
4.
5.
6.
7.
FLAT!DOCUMENT, FILESYSTEM
ASSOCIATIVE!KEY-VALUE
HIERARCHICAL!GEO
NETWORK!GRAPH
DIMENSIONAL!COLUMN
OBJECTIONAL!OBJECT
1.
2.
3.
4.
5.
6.
FOR THE SQL-ERS
I made a relational version of that.
7
filesystem
object
6
geo
5
column
4
graph
3
key–value
2
document
1
brand
4
flat
dimensional
3
associative
5
1
objectional
6
network
2
hierarchical
paradigm
6
6
5
5
4
4
3
3
2
2
7
1
1
1
join
FLAT
(DOCUMENT)
ASSOCIATIVE
(KEY–VALUE)
HIERARCHICAL
(GEO)
NETWORK
(GRAPH)
DIMENSIONAL
(COLUMN)
Sales Fact Table
+------------------------+
| sale_amount | time_id |
+------------------------+ Time Dimension
| 2008.08| 1234 |---+ +-----------------------------+
+------------------------+ | | time_id | timestamp |
| +-----------------------------+
+---->| 1234 | 20080902 12:35:43 |
+-----------------------------+
OBJECTIONAL
(OBJECT)
WHAT’S IN
A NAME?
ANTI-SQL?
ANTI-DATABASES?
A NEW STANDARD?
A NEW LANGUAGE?
NOT ONLY SQL?
WHAT IS NOSQL ABOUT?
SQL VS. NOSQL
VS. NOSQL
1. NOSQL SUCKS
No, really.
2. IT’S NOT ABOUT
THE SIZE. IT’S
ABOUT HOW YOU
USE IT.
3. IT’S NOT ROCKET
SCIENCE.
ALIVE !!!
IT’S…
NOSQL
Profit!
&
forFun
THANK YOU!
SpeakerRate.com/timanglade
?
Profit!
&
forFun
TIM ANGLADE PROUDLY PRESENTS PART TWO
OF THE TOTALLY UNKOWN “FUN & PROFITâ€
SERIES. A TALE OF TECH,
INTRIGUE
&Â FORBIDDEN LOVE. A WHIRLWIND OF
ADVENTURERS, PRODUCTION SYSTEMS
&Â TROLLS. A STORY SO BIG, ITS TITLE HAD TO
HAVE ITS OWN INTRODUCTION TEXT. HERE IS…
@TIMANGLADE
Hit me up. I don’t bite… too hard.
AN ANNOUNCEMENT
NØSQL
rope!
Eu
LONDON, APRIL 20TH & 21ST
WORKSHOPS AND TRAINING ON THE 22ND
FOLLOW @NOSQLEU FOR DETAILS
A WARNING
This is Tech for Managers. Don’t Blame Me.
40 YEARS
IN THE DESERT
Information
Retrieval
P. BAXENDALE,
Editor
A Relational Model of Data for
Large Shared Data Banks
E. F. CODD
IBM Research Laboratory, San Jose, California
Future
users
of
large
data
banks
must
be
protected
from
having
to
know
how
the
data
is organized
in the machine
(the
internal
representation).
A
prompting
service
which
supplies
such
information
is not
a satisfactory
solution.
Activities
of
users
at
terminals
and
most
application
programs
should
remain
unaffected
when
the
internal
representation
of data
is changed
and
even
when
some
aspects
of
the
external
representation
are
changed.
Changes
in
data
representation
will
often
be
needed
as a
result
of
changes
in query,
update,
and
report
traffic
and
natural
growth
in
the
types
of
stored
information.
Existing
noninferential,
formatted
data
systems
provide
users
with
tree-structured
files
or
slightly
more
general
network
models
of
the
data.
In Section
1,
inadequacies
of
these
models
are
discussed.
A model
based
on n-ary
relations,
a
normal
form
for
data
base
relations,
and
the
concept
of
a universal
data
sublanguage
are
introduced.
In Section
2, certain
opera-
tions
on
relations
(other
than
logical
inference)
are
discussed
and
applied
to
the
problems
of
redundancy
and
consistency
in the
user’s
model.
KEY WORDS
AND
PHRASES:
data
bank,
data
base,
data structure,
data
organization,
hierarchies
of
data,
networks
of
data,
relations,
derivability,
redundancy,
consistency,
composition,
join,
retrieval
language,
predicate
calculus,
security,
data
integrity
CR CATEGORIES:
3.70,
3.73,
3.75,
4.20,
4.22,
4.29
1.
Relational
Model
and
Normal
Form
1 .I.
INTR~xJ~TI~N
This paper is concerned with the application of ele-
mentary relation theory to systems which provide shared
access to large banks of formatted data. Except for a paper
by Childs [l], the principal application of relations to data
systems has been to deductive question-answering systems.
Levein and Maron [2] provide numerous references to work
in this area.
In contrast, the problems treated here are those of data
independence-the independence of application programs
and terminal activities from growth in data types and
changes in data representation-and
certain kinds of data
inconsistency which are expected to become troublesome
even in nondeductive systems.
Volume
13
/ Number
6 /
June,
1970
The relational view (or model) of data described in
Section 1 appears to be superior in several respects to the
graph or network model [3,4] presently in vogue for non-
inferential systems. It provides a means of describing data
with its natural structure only-that
is, without superim-
posing any additional structure for machine representation
purposes. Accordingly,
it provides a basis for a high level
data language which will yield maximal independence be-
tween programs on the one hand and machine representa-
tion and organization of data on the other.
A further advantage of the relational view is that it
forms a sound basis for treating derivability,
redundancy,
and consistency of relations-these are discussed in Section
2. The network model, on the other hand, has spawned a
number of confusions, not the least of which is mistaking
the derivation of connections for the derivation of rela-
tions (see remarks in Section 2 on the “connection trapâ€).
Finally, the relational view permits a clearer evaluation
of the scope and logical limitations of present formatted
data systems, and also the relative merits (from a logical
standpoint) of competing representations of data within a
single system. Examples of this clearer perspective are
cited in various parts of this paper. Implementations of
systems to support the relational model are not discussed.
1.2. DATA DEPENDENCIES IN PRESENT SYSTEMS
The provision of data description tables in recently de-
veloped information systems represents a major advance
toward the goal of data independence [5,6,7]. Such tables
facilitate changing certain characteristics of the data repre-
sentation stored in a data bank. However, the variety of
data representation characteristics which can be changed
without logically impairing some application programs is
still quite limited. Further, the model of data with which
users interact is still cluttered with representational prop-
erties, particularly
in regard to the representation of col-
lections of data (as opposed to individual items). Three of
the principal kinds of data dependencies which still need
to be removed are: ordering dependence, indexing depend-
ence, and access path dependence. In some systems these
dependencies are not clearly separable from one another.
1.2.1. Ordering Dependence. Elements of data in a
data bank may be stored in a variety of ways, some involv-
ing no concern for ordering, some permitting each element
to participate in one ordering only, others permitting each
element to participate in several orderings. Let us consider
those existing systems which either require or permit data
elements to be stored in at least one total ordering which is
closely associated with the hardware-determined ordering
of addresses. For example, the records of a file concerning
parts might be stored in ascending order by part serial
number. Such systems normally permit application pro-
grams to assume that the order of presentation of records
from such a file is identical to (or is a subordering of) the
Communications
of
the
ACM
377
I
n
f
o
r
m
a
t
i
o
n
R
e
t
r
i
e
v
a
l
P
.
B
A
X
E
N
D
A
L
E
,
E
d
i
t
o
r
A
R
e
l
a
t
i
o
n
a
l
M
o
d
e
l
o
f
D
a
t
a
f
o
r
L
a
r
g
e
S
h
a
r
e
d
D
a
t
a
B
a
n
k
s
E
.
F
.
C
O
D
D
I
B
M
R
e
s
e
a
r
c
h
L
a
b
o
r
a
t
o
r
y
,
S
a
n
J
o
s
e
,
C
a
l
i
f
o
r
n
i
a
F
u
t
u
r
e
u
s
e
r
s
o
f
l
a
r
g
e
d
a
t
a
b
a
n
k
s
m
u
s
t
b
e
p
r
o
t
e
c
t
e
d
f
r
o
m
h
a
v
i
n
g
t
o
k
n
o
w
h
o
w
t
h
e
d
a
t
a
i
s
o
r
g
a
n
i
z
e
d
i
n
t
h
e
m
a
c
h
i
n
e
(
t
h
e
i
n
t
e
r
n
a
l
r
e
p
r
e
s
e
n
t
a
t
i
o
n
)
.
A
p
r
o
m
p
t
i
n
g
s
e
r
v
i
c
e
w
h
i
c
h
s
u
p
p
l
i
e
s
s
u
c
h
i
n
f
o
r
m
a
t
i
o
n
i
s
n
o
t
a
s
a
t
i
s
f
a
c
t
o
r
y
s
o
l
u
t
i
o
n
.
A
c
t
i
v
i
t
i
e
s
o
f
u
s
e
r
s
a
t
t
e
r
m
i
n
a
l
s
a
n
d
m
o
s
t
a
p
p
l
i
c
a
t
i
o
n
p
r
o
g
r
a
m
s
s
h
o
u
l
d
r
e
m
a
i
n
u
n
a
f
f
e
c
t
e
d
w
h
e
n
t
h
e
i
n
t
e
r
n
a
l
r
e
p
r
e
s
e
n
t
a
t
i
o
n
o
f
d
a
t
a
i
s
c
h
a
n
g
e
d
a
n
d
e
v
e
n
w
h
e
n
s
o
m
e
a
s
p
e
c
t
s
o
f
t
h
e
e
x
t
e
r
n
a
l
r
e
p
r
e
s
e
n
t
a
t
i
o
n
a
r
e
c
h
a
n
g
e
d
.
C
h
a
n
g
e
s
i
n
d
a
t
a
r
e
p
r
e
s
e
n
t
a
t
i
o
n
w
i
l
l
o
f
t
e
n
b
e
n
e
e
d
e
d
a
s
a
r
e
s
u
l
t
o
f
c
h
a
n
g
e
s
i
n
q
u
e
r
y
,
u
p
d
a
t
e
,
a
n
d
r
e
p
o
r
t
t
r
a
f
f
i
c
a
n
d
n
a
t
u
r
a
l
g
r
o
w
t
h
i
n
t
h
e
t
y
p
e
s
o
f
s
t
o
r
e
d
i
n
f
o
r
m
a
t
i
o
n
.
E
x
i
s
t
i
n
g
n
o
n
i
n
f
e
r
e
n
t
i
a
l
,
f
o
r
m
a
t
t
e
d
d
a
t
a
s
y
s
t
e
m
s
p
r
o
v
i
d
e
u
s
e
r
s
w
i
t
h
t
r
e
e
-
s
t
r
u
c
t
u
r
e
d
f
i
l
e
s
o
r
s
l
i
g
h
t
l
y
m
o
r
e
g
e
n
e
r
a
l
n
e
t
w
o
r
k
m
o
d
e
l
s
o
f
t
h
e
d
a
t
a
.
I
n
S
e
c
t
i
o
n
1
,
i
n
a
d
e
q
u
a
c
i
e
s
o
f
t
h
e
s
e
m
o
d
e
l
s
a
r
e
d
i
s
c
u
s
s
e
d
.
A
m
o
d
e
l
b
a
s
e
d
o
n
n
-
a
r
y
r
e
l
a
t
i
o
n
s
,
a
n
o
r
m
a
l
form
for
data
base
relations,
and
the
concept
of
a universal
data
sublanguage
are
introduced.
In Section
2, certain
opera-
tions
on
relations
(other
than
logical
inference)
are
discussed
and
applied
to
the
problems
of
redundancy
and
consistency
in the
user’s
model.
KEY WORDS
AND
PHRASES:
data
bank,
data
base,
data structure,
data
organization,
hierarchies
of
data,
networks
of
data,
relations,
derivability,
redundancy,
consistency,
composition,
join,
retrieval
language,
predicate
calculus,
security,
data
integrity
CR CATEGORIES:
3.70,
3.73,
3.75,
4.20,
4.22,
4.29
1.
Relational
Model
and
Normal
Form
1 .I.
INTR~xJ~TI~N
This paper is concerned with the application of ele-
mentary relation theory to systems which provide shared
access to large banks of formatted data. Except for a paper
by Childs [l], the principal application of relations to data
systems has been to deductive question-answering systems.
Levein and Maron [2] provide numerous references to work
in this area.
In contrast, the problems treated here are those of data
independence-the independence of application programs
and terminal activities from growth in data types and
changes in data representation-and
certain kinds of data
inconsistency which are expected to become troublesome
even in nondeductive systems.
Volume
13
/ Number
6 /
June,
1970
T
h
e
r
e
l
a
t
i
o
n
a
l
v
i
e
w
(
o
r
m
o
d
e
l
)
o
f
d
a
t
a
d
e
s
c
r
i
b
e
d
i
n
S
e
c
t
i
o
n
1
a
p
p
e
a
r
s
t
o
b
e
s
u
p
e
r
i
o
r
i
n
s
e
v
e
r
a
l
r
e
s
p
e
c
t
s
t
o
t
h
e
g
r
a
p
h
o
r
n
e
t
w
o
r
k
m
o
d
e
l
[
3
,
4
]
p
r
e
s
e
n
t
l
y
i
n
v
o
g
u
e
f
o
r
n
o
n
-
i
n
f
e
r
e
n
t
i
a
l
s
y
s
t
e
m
s
.
I
t
p
r
o
v
i
d
e
s
a
m
e
a
n
s
o
f
d
e
s
c
r
i
b
i
n
g
d
a
t
a
w
i
t
h
i
t
s
n
a
t
u
r
a
l
s
t
r
u
c
t
u
r
e
o
n
l
y
-
t
h
a
t
i
s
,
w
i
t
h
o
u
t
s
u
p
e
r
i
m
-
p
o
s
i
n
g
a
n
y
a
d
d
i
t
i
o
n
a
l
s
t
r
u
c
t
u
r
e
f
o
r
m
a
c
h
i
n
e
r
e
p
r
e
s
e
n
t
a
t
i
o
n
p
u
r
p
o
s
e
s
.
A
c
c
o
r
d
i
n
g
l
y
,
i
t
p
r
o
v
i
d
e
s
a
b
a
s
i
s
f
o
r
a
h
i
g
h
l
e
v
e
l
d
a
t
a
l
a
n
g
u
a
g
e
w
h
i
c
h
w
i
l
l
y
i
e
l
d
m
a
x
i
m
a
l
i
n
d
e
p
e
n
d
e
n
c
e
b
e
-
t
w
e
e
n
p
r
o
g
r
a
m
s
o
n
t
h
e
o
n
e
h
a
n
d
a
n
d
m
a
c
h
i
n
e
r
e
p
r
e
s
e
n
t
a
-
t
i
o
n
a
n
d
o
r
g
a
n
i
z
a
t
i
o
n
o
f
d
a
t
a
o
n
t
h
e
o
t
h
e
r
.
A
f
u
r
t
h
e
r
a
d
v
a
n
t
a
g
e
o
f
t
h
e
r
e
l
a
t
i
o
n
a
l
v
i
e
w
i
s
t
h
a
t
i
t
f
o
r
m
s
a
s
o
u
n
d
b
a
s
i
s
f
o
r
t
r
e
a
t
i
n
g
d
e
r
i
v
a
b
i
l
i
t
y
,
r
e
d
u
n
d
a
n
c
y
,
a
n
d
c
o
n
s
i
s
t
e
n
c
y
o
f
r
e
l
a
t
i
o
n
s
-
t
h
e
s
e
a
r
e
d
i
s
c
u
s
s
e
d
i
n
S
e
c
t
i
o
n
2
.
T
h
e
n
e
t
w
o
r
k
m
o
d
e
l
,
o
n
t
h
e
o
t
h
e
r
h
a
n
d
,
h
a
s
s
p
a
w
n
e
d
a
n
u
m
b
e
r
o
f
c
o
n
f
u
s
i
o
n
s
,
n
o
t
t
h
e
l
e
a
s
t
o
f
w
h
i
c
h
i
s
m
i
s
t
a
k
i
n
g
t
h
e
d
e
r
i
v
a
t
i
o
n
o
f
c
o
n
n
e
c
t
i
o
n
s
f
o
r
t
h
e
d
e
r
i
v
a
t
i
o
n
o
f
r
e
l
a
-
t
i
o
n
s
(
s
e
e
r
e
m
a
r
k
s
i
n
S
e
c
t
i
o
n
2
o
n
t
h
e
“
c
o
n
n
e
c
t
i
o
n
t
r
a
p
â€
)
.
F
i
n
a
l
l
y
,
t
h
e
r
e
l
a
t
i
o
n
a
l
v
i
e
w
p
e
r
m
i
t
s
a
c
l
e
a
r
e
r
e
v
a
l
u
a
t
i
o
n
o
f
t
h
e
s
c
o
p
e
a
n
d
l
o
g
i
c
a
l
l
i
m
i
t
a
t
i
o
n
s
o
f
p
r
e
s
e
n
t
f
o
r
m
a
t
t
e
d
d
a
t
a
s
y
s
t
e
m
s
,
a
n
d
a
l
s
o
t
h
e
r
e
l
a
t
i
v
e
m
e
r
i
t
s
(
f
r
o
m
a
l
o
g
i
c
a
l
s
t
a
n
d
p
o
i
n
t
)
o
f
c
o
m
p
e
t
i
n
g
r
e
p
r
e
s
e
n
t
a
t
i
o
n
s
o
f
d
a
t
a
w
i
t
h
i
n
a
s
i
n
g
l
e
s
y
s
t
e
m
.
E
x
a
m
p
l
e
s
o
f
t
h
i
s
c
l
e
a
r
e
r
p
e
r
s
p
e
c
t
i
v
e
a
r
e
cited in various parts of this paper. Implementations of
systems to support the relational model are not discussed.
1.2. DATA DEPENDENCIES IN PRESENT SYSTEMS
The provision of data description tables in recently de-
veloped information systems represents a major advance
toward the goal of data independence [5,6,7]. Such tables
facilitate changing certain characteristics of the data repre-
sentation stored in a data bank. However, the variety of
data representation characteristics which can be changed
without logically impairing some application programs is
still quite limited. Further, the model of data with which
users interact is still cluttered with representational prop-
erties, particularly
in regard to the representation of col-
lections of data (as opposed to individual items). Three of
the principal kinds of data dependencies which still need
to be removed are: ordering dependence, indexing depend-
ence, and access path dependence. In some systems these
dependencies are not clearly separable from one another.
1.2.1. Ordering Dependence. Elements of data in a
data bank may be stored in a variety of ways, some involv-
ing no concern for ordering, some permitting each element
to participate in one ordering only, others permitting each
element to participate in several orderings. Let us consider
those existing systems which either require or permit data
elements to be stored in at least one total ordering which is
closely associated with the hardware-determined ordering
of addresses. For example, the records of a file concerning
parts might be stored in ascending order by part serial
number. Such systems normally permit application pro-
grams to assume that the order of presentation of records
from such a file is identical to (or is a subordering of) the
Communications
of
the
ACM
377
WHAT DO YOU MEAN
BY “THE DESERT�
THE GOOD
A strong ecosystem.
THE BAD
Databases on ACID.
THE UGLY
Paradigm Puzzlement.
Noun
paradigm (plural paradigms)
1. An example serving as a model or pattern.
2. A system of assumptions, concepts,
values, and practices that constitutes
a way of viewing reality.
SQL
Just
say no
A NOT-SO-NOVEL
IDEA
Information
Retrieval
P. BAXENDALE,
Editor
A Relational Model of Data for
Large Shared Data Banks
E. F. CODD
IBM Research Laboratory, San Jose, California
Future
users
of
large
data
banks
must
be
protected
from
having
to
know
how
the
data
is organized
in the machine
(the
internal
representation).
A
prompting
service
which
supplies
such
information
is not
a satisfactory
solution.
Activities
of
users
at
terminals
and
most
application
programs
should
remain
unaffected
when
the
internal
representation
of data
is changed
and
even
when
some
aspects
of
the
external
representation
are
changed.
Changes
in
data
representation
will
often
be
needed
as a
result
of
changes
in query,
update,
and
report
traffic
and
natural
growth
in
the
types
of
stored
information.
Existing
noninferential,
formatted
data
systems
provide
users
with
tree-structured
files
or
slightly
more
general
network
models
of
the
data.
In Section
1,
inadequacies
of
these
models
are
discussed.
A model
based
on n-ary
relations,
a
normal
form
for
data
base
relations,
and
the
concept
of
a universal
data
sublanguage
are
introduced.
In Section
2, certain
opera-
tions
on
relations
(other
than
logical
inference)
are
discussed
and
applied
to
the
problems
of
redundancy
and
consistency
in the
user’s
model.
KEY WORDS
AND
PHRASES:
data
bank,
data
base,
data structure,
data
organization,
hierarchies
of
data,
networks
of
data,
relations,
derivability,
redundancy,
consistency,
composition,
join,
retrieval
language,
predicate
calculus,
security,
data
integrity
CR CATEGORIES:
3.70,
3.73,
3.75,
4.20,
4.22,
4.29
1.
Relational
Model
and
Normal
Form
1 .I.
INTR~xJ~TI~N
This paper is concerned with the application of ele-
mentary relation theory to systems which provide shared
access to large banks of formatted data. Except for a paper
by Childs [l], the principal application of relations to data
systems has been to deductive question-answering systems.
Levein and Maron [2] provide numerous references to work
in this area.
In contrast, the problems treated here are those of data
independence-the independence of application programs
and terminal activities from growth in data types and
changes in data representation-and
certain kinds of data
inconsistency which are expected to become troublesome
even in nondeductive systems.
Volume
13
/ Number
6 /
June,
1970
The relational view (or model) of data described in
Section 1 appears to be superior in several respects to the
graph or network model [3,4] presently in vogue for non-
inferential systems. It provides a means of describing data
with its natural structure only-that
is, without superim-
posing any additional structure for machine representation
purposes. Accordingly,
it provides a basis for a high level
data language which will yield maximal independence be-
tween programs on the one hand and machine representa-
tion and organization of data on the other.
A further advantage of the relational view is that it
forms a sound basis for treating derivability,
redundancy,
and consistency of relations-these are discussed in Section
2. The network model, on the other hand, has spawned a
number of confusions, not the least of which is mistaking
the derivation of connections for the derivation of rela-
tions (see remarks in Section 2 on the “connection trapâ€).
Finally, the relational view permits a clearer evaluation
of the scope and logical limitations of present formatted
data systems, and also the relative merits (from a logical
standpoint) of competing representations of data within a
single system. Examples of this clearer perspective are
cited in various parts of this paper. Implementations of
systems to support the relational model are not discussed.
1.2. DATA DEPENDENCIES IN PRESENT SYSTEMS
The provision of data description tables in recently de-
veloped information systems represents a major advance
toward the goal of data independence [5,6,7]. Such tables
facilitate changing certain characteristics of the data repre-
sentation stored in a data bank. However, the variety of
data representation characteristics which can be changed
without logically impairing some application programs is
still quite limited. Further, the model of data with which
users interact is still cluttered with representational prop-
erties, particularly
in regard to the representation of col-
lections of data (as opposed to individual items). Three of
the principal kinds of data dependencies which still need
to be removed are: ordering dependence, indexing depend-
ence, and access path dependence. In some systems these
dependencies are not clearly separable from one another.
1.2.1. Ordering Dependence. Elements of data in a
data bank may be stored in a variety of ways, some involv-
ing no concern for ordering, some permitting each element
to participate in one ordering only, others permitting each
element to participate in several orderings. Let us consider
those existing systems which either require or permit data
elements to be stored in at least one total ordering which is
closely associated with the hardware-determined ordering
of addresses. For example, the records of a file concerning
parts might be stored in ascending order by part serial
number. Such systems normally permit application pro-
grams to assume that the order of presentation of records
from such a file is identical to (or is a subordering of) the
Communications
of
the
ACM
377
Information
Retrieval
P. BAXENDALE,
Editor
A Relational Model of Data for
Large Shared Data Banks
E. F. CODD
IBM Research Laboratory, San Jose, California
Future
users
of
large
data
banks
must
be
protected
from
having
to
know
how
the
data
is organized
in the machine
(the
internal
representation).
A
prompting
service
which
supplies
such
information
is not
a satisfactory
solution.
Activities
of
users
at
terminals
and
most
application
programs
should
remain
unaffected
when
the
internal
representation
of data
is changed
and
even
when
some
aspects
of
the
external
representation
are
changed.
Changes
in
data
representation
will
often
be
needed
as a
result
of
changes
in query,
update,
and
report
traffic
and
natural
growth
in
the
types
of
stored
information.
Existing
noninferential,
formatted
data
systems
provide
users
with
tree-structured
files
or
slightly
more
general
network
models
of
the
data.
In Section
1,
inadequacies
of
these
models
are
discussed.
A model
based
on n-ary
relations,
a
normal
form
for
data
base
relations,
and
the
concept
of
a universal
data
sublanguage
are
introduced.
In Section
2, certain
opera-
tions
on
relations
(other
than
logical
inference)
are
discussed
and
applied
to
the
problems
of
redundancy
and
consistency
in the
user’s
model.
KEY WORDS
AND
PHRASES:
data
bank,
data
base,
data structure,
data
organization,
hierarchies
of
data,
networks
of
data,
relations,
derivability,
redundancy,
consistency,
composition,
join,
retrieval
language,
predicate
calculus,
security,
data
integrity
CR CATEGORIES:
3.70,
3.73,
3.75,
4.20,
4.22,
4.29
1.
Relational
Model
and
Normal
Form
1 .I.
INTR~xJ~TI~N
This paper is concerned with the application of ele-
mentary relation theory to systems which provide shared
access to large banks of formatted data. Except for a paper
by Childs [l], the principal application of relations to data
systems has been to deductive question-answering systems.
Levein and Maron [2] provide numerous references to work
in this area.
In contrast, the problems treated here are those of data
independence-the independence of application programs
and terminal activities from growth in data types and
changes in data representation-and
certain kinds of data
inconsistency which are expected to become troublesome
even in nondeductive systems.
Volume
13
/ Number
6 /
June,
1970
The relational view (or model) of data described in
Section 1 appears to be superior in several respects to the
graph or network model [3,4] presently in vogue for non-
inferential systems. It provides a means of describing data
with its natural structure only-that
is, without superim-
posing any additional structure for machine representation
purposes. Accordingly,
it provides a basis for a high level
data language which will yield maximal independence be-
tween programs on the one hand and machine representa-
tion and organization of data on the other.
A further advantage of the relational view is that it
forms a sound basis for treating derivability,
redundancy,
and consistency of relations-these are discussed in Section
2. The network model, on the other hand, has spawned a
number of confusions, not the least of which is mistaking
the derivation of connections for the derivation of rela-
tions (see remarks in Section 2 on the “connection trapâ€).
Finally, the relational view permits a clearer evaluation
of the scope and logical limitations of present formatted
data systems, and also the relative merits (from a logical
standpoint) of competing representations of data within a
single system. Examples of this clearer perspective are
cited in various parts of this paper. Implementations of
systems to support the relational model are not discussed.
1.2. DATA DEPENDENCIES IN PRESENT SYSTEMS
The provision of data description tables in recently de-
veloped information systems represents a major advance
toward the goal of data independence [5,6,7]. Such tables
facilitate changing certain characteristics of the data repre-
sentation stored in a data bank. However, the variety of
data representation characteristics which can be changed
without logically impairing some application programs is
still quite limited. Further, the model of data with which
users interact is still cluttered with representational prop-
erties, particularly
in regard to the representation of col-
lections of data (as opposed to individual items). Three of
the principal kinds of data dependencies which still need
to be removed are: ordering dependence, indexing depend-
ence, and access path dependence. In some systems these
dependencies are not clearly separable from one another.
1.2.1. Ordering Dependence. Elements of data in a
data bank may be stored in a variety of ways, some involv-
ing no concern for ordering, some permitting each element
to participate in one ordering only, others permitting each
element to participate in several orderings. Let us consider
those existing systems which either require or permit data
elements to be stored in at least one total ordering which is
closely associated with the hardware-determined ordering
of addresses. For example, the records of a file concerning
parts might be stored in ascending order by part serial
number. Such systems normally permit application pro-
grams to assume that the order of presentation of records
from such a file is identical to (or is a subordering of) the
Communications
of
the
ACM
377
TWO WORDS
data warehousing.
THE ODD COUPLE
FAMILY
COUCHDB
MONGODB
RIAK
REDIS
TOKYOCABINET
NEO4J
INFOGRID
SONES
HYPERGRAPHDB
HYPERTABLE
SIMPLEDB
TERRASTORE
HADOOP
MNESIA
CASSANDRA
HBASE
JACKRABBIT
VOLDEMORT
GT.M
DYNOMITE
MEMCACHEDB
BIGTABLE
DYNAMO
SHERPA
ORACLE SPATIAL
ESRI ARCGIS
SAND
CITRUSLEAF
NEPTUNE
C
O
U
C
H
D
B
M
O
N
G
O
D
B
R
I
A
K
R
E
D
I
S
TOKYOCABINET
NEO4J
INFOGRID
SONES
HYPERGRAPHDB
HYPERTABLE
SIMPLEDB
TERRASTORE
HADOOP
MNESIA
CASSANDRA
HBASE
JACKRABBIT
VOLDEMORT
GT.M
DYNOMITE
MEMCACHEDB
BIGTABLE
DYNAMO
SHERPA
ORACLE SPATIAL
ESRI ARCGIS
SAND
CITRUSLEAF
NEPTUNE
C
O
U
C
H
D
B
M
O
N
G
O
D
B
R
I
A
K
R
E
D
I
S
T
O
K
Y
O
C
A
B
I
N
E
T
N
E
O
4
J
I
N
F
O
G
R
I
D
S
O
N
E
S
H
Y
P
E
R
G
R
A
P
H
D
B
HYPERTABLE
SIMPLEDB
TERRASTORE
HADOOP
MNESIA
CASSANDRA
HBASE
JACKRABBIT
VOLDEMORT
GT.M
DYNOMITE
MEMCACHEDB
BIGTABLE
DYNAMO
SHERPA
ORACLE SPATIAL
ESRI ARCGIS
SAND
CITRUSLEAF
NEPTUNE
C
O
U
C
H
D
B
M
O
N
G
O
D
B
R
I
A
K
R
E
D
I
S
T
O
K
Y
O
C
A
B
I
N
E
T
N
E
O
4
J
I
N
F
O
G
R
I
D
S
O
N
E
S
H
Y
P
E
R
G
R
A
P
H
D
B
H
Y
P
E
R
T
A
B
L
E
S
I
M
P
L
E
D
B
T
E
R
R
A
S
T
O
R
E
H
A
D
O
O
P
M
N
E
S
I
A
CASSANDRA
HBASE
JACKRABBIT
VOLDEMORT
GT.M
DYNOMITE
MEMCACHEDB
BIGTABLE
DYNAMO
SHERPA
ORACLE SPATIAL
ESRI ARCGIS
SAND
CITRUSLEAF
NEPTUNE
C
O
U
C
H
D
B
M
O
N
G
O
D
B
R
I
A
K
R
E
D
I
S
T
O
K
Y
O
C
A
B
I
N
E
T
N
E
O
4
J
I
N
F
O
G
R
I
D
S
O
N
E
S
H
Y
P
E
R
G
R
A
P
H
D
B
H
Y
P
E
R
T
A
B
L
E
S
I
M
P
L
E
D
B
T
E
R
R
A
S
T
O
R
E
H
A
D
O
O
P
M
N
E
S
I
A
C
A
S
S
A
N
D
R
A
H
B
A
S
E
J
A
C
K
R
A
B
B
I
T
V
O
L
D
E
M
O
R
T
G
T
.
M
DYNOMITE
MEMCACHEDB
BIGTABLE
DYNAMO
SHERPA
ORACLE SPATIAL
ESRI ARCGIS
SAND
CITRUSLEAF
NEPTUNE
C
O
U
C
H
D
B
M
O
N
G
O
D
B
R
I
A
K
R
E
D
I
S
T
O
K
Y
O
C
A
B
I
N
E
T
N
E
O
4
J
I
N
F
O
G
R
I
D
S
O
N
E
S
H
Y
P
E
R
G
R
A
P
H
D
B
H
Y
P
E
R
T
A
B
L
E
S
I
M
P
L
E
D
B
T
E
R
R
A
S
T
O
R
E
H
A
D
O
O
P
M
N
E
S
I
A
C
A
S
S
A
N
D
R
A
H
B
A
S
E
J
A
C
K
R
A
B
B
I
T
V
O
L
D
E
M
O
R
T
G
T
.
M
D
Y
N
O
M
I
T
E
M
E
M
C
A
C
H
E
D
B
B
I
G
T
A
B
L
E
D
Y
N
A
M
O
S
H
E
R
P
A
ORACLE SPATIAL
ESRI ARCGIS
SAND
CITRUSLEAF
NEPTUNE
DOCUMENT
KEY–VALUE
GRAPH
COLUMN/BIGTABLE
GEO
OBJECT
FILESYSTEM
1.
2.
3.
4.
5.
6.
7.
FLAT!DOCUMENT, FILESYSTEM
ASSOCIATIVE!KEY-VALUE
HIERARCHICAL!GEO
NETWORK!GRAPH
DIMENSIONAL!COLUMN
OBJECTIONAL!OBJECT
1.
2.
3.
4.
5.
6.
FOR THE SQL-ERS
I made a relational version of that.
7
filesystem
object
6
geo
5
column
4
graph
3
key–value
2
document
1
brand
4
flat
dimensional
3
associative
5
1
objectional
6
network
2
hierarchical
paradigm
6
6
5
5
4
4
3
3
2
2
7
1
1
1
join
FLAT
(DOCUMENT)
ASSOCIATIVE
(KEY–VALUE)
HIERARCHICAL
(GEO)
NETWORK
(GRAPH)
DIMENSIONAL
(COLUMN)
Sales Fact Table
+------------------------+
| sale_amount | time_id |
+------------------------+ Time Dimension
| 2008.08| 1234 |---+ +-----------------------------+
+------------------------+ | | time_id | timestamp |
| +-----------------------------+
+---->| 1234 | 20080902 12:35:43 |
+-----------------------------+
OBJECTIONAL
(OBJECT)
WHAT’S IN
A NAME?
ANTI-SQL?
ANTI-DATABASES?
A NEW STANDARD?
A NEW LANGUAGE?
NOT ONLY SQL?
WHAT IS NOSQL ABOUT?
SQL VS. NOSQL
VS. NOSQL
1. NOSQL SUCKS
No, really.
2. IT’S NOT ABOUT
THE SIZE. IT’S
ABOUT HOW YOU
USE IT.
3. IT’S NOT ROCKET
SCIENCE.
ALIVE !!!
IT’S…
NOSQL
Profit!
&
forFun
THANK YOU!
SpeakerRate.com/timanglade
?