Example: Merging

When thinking about combining two data frames one has to distinguish between merging them by the values given in a specific column or consecutively putting them together just rows after rows.

For the following examples, we will use these two data frames.

x <- c(1, 2, 3, 4, 5)
y <- c(1.4, 2.5, 3.6, 4.0, 5.5)
z <- c("A", "B", "C", "D", "E")
df1 <- data.frame(X = x, Y = y, Z = z)
df1
##   X   Y Z
## 1 1 1.4 A
## 2 2 2.5 B
## 3 3 3.6 C
## 4 4 4.0 D
## 5 5 5.5 E
y <- c(14, 25, 36, 40, 55)
x <- c(100, 200, 300, 400, 500)
z <- c("A", "C", "E", "D", "B")
df2 <- data.frame(X = x, Y = y, Z = z)
df2
##     X  Y Z
## 1 100 14 A
## 2 200 25 C
## 3 300 36 E
## 4 400 40 D
## 5 500 55 B

To combine two data frames, use the rbind function provided that both data frames have the same column names and the same number of columns.

dfr <- rbind(df1, df2)
dfr
##      X    Y Z
## 1    1  1.4 A
## 2    2  2.5 B
## 3    3  3.6 C
## 4    4  4.0 D
## 5    5  5.5 E
## 6  100 14.0 A
## 7  200 25.0 C
## 8  300 36.0 E
## 9  400 40.0 D
## 10 500 55.0 B

Note, that it is not necessary that the columns are in the same order since the concatenation is done by the column names, not the column sequence:

df3 <- df2[,c(1,3,2)]
df3
##     X Z  Y
## 1 100 A 14
## 2 200 C 25
## 3 300 E 36
## 4 400 D 40
## 5 500 B 55
dfr <- rbind(df1, df3)
dfr
##      X    Y Z
## 1    1  1.4 A
## 2    2  2.5 B
## 3    3  3.6 C
## 4    4  4.0 D
## 5    5  5.5 E
## 6  100 14.0 A
## 7  200 25.0 C
## 8  300 36.0 E
## 9  400 40.0 D
## 10 500 55.0 B

To merge two data frames, you have to specify the column containing the values which should be used in the matching. In this example, we use the values of the column Z to merge the two data frames.

dfc <- merge(df1, df2, by = "Z")
dfc
##   Z X.x Y.x X.y Y.y
## 1 A   1 1.4 100  14
## 2 B   2 2.5 500  55
## 3 C   3 3.6 200  25
## 4 D   4 4.0 400  40
## 5 E   5 5.5 300  36

Since both data frames have identical column names, X and Y is added to the column names in the resulting data frame to indicate if the column is from the first (i.e. X) or second (i.e. Y) data frame. You can easily rename the columns using the colnames function.

If the columns which should be used for mergin have different names, no problem: just supply the column names separately for the first (i.e. X) and second (i.e. Y) data frame:

colnames(df2) <- c("H", "I", "J")
dfc <- merge(df1, df2, by.x = "Z", by.y = "J")
dfc
##   Z X   Y   H  I
## 1 A 1 1.4 100 14
## 2 B 2 2.5 500 55
## 3 C 3 3.6 200 25
## 4 D 4 4.0 400 40
## 5 E 5 5.5 300 36

Since not only the names of the columns used for merging but all column names are different, no X or Y is added in the output column names.

Updated: