如何计算连续行的时间差

By simon at 2018-02-28 • 0人收藏 • 28人看过

原始数据看起来像这样,我想根据访问者和时间对它进行分类 在保存之前计算行中的时差它到一个新的文件。

  visitor         v_time payment items
1    Jack 1/2/2018 16:07      35     3
2    Jack 1/2/2018 16:09     160     1
3   David 1/2/2018 16:12      25     2
4    Kate 1/2/2018 16:16       3     3
5   David 1/2/2018 16:21      25     5
6    Jack 1/2/2018 16:32      85     5
7    Kate 1/2/2018 16:33     639     3
8    Jack 1/2/2018 16:55       6     2
分组和排序都可以。但它没有计算时间 差别,也不是文件saviNG。
visitor <- c("Jack", "Jack", "David", "Kate", "David", "Jack", "Kate", "Jack")
v_time <- c("1/2/2018 16:07","1/2/2018 16:09","1/2/2018 16:12","1/2/2018 16:16","1/2/2018 16:21","1/2/2018 16:32","1/2/2018 16:33", "1/2/2018 16:55")
payment <- c(35,160,25,3,25,85,639,6)
items <- c(3,1,2,3,5,5,3,2)
df <- data.frame(visitor, v_time, payment, items)

df %>%
  arrange(visitor, v_time) %>%
  group_by(visitor) %>%
  mutate(diff = strptime(v_time, "%d/%m/%Y %H:%M") - lag(strptime(v_time, "%d/%m/%Y %H:%M")), diff_secs = as.numeric(diff, units = 'secs'))

write.csv(df,"C:/output.csv", row.names = F)
你能指出这个错误,并告诉我正确的做法吗? 谢谢。
# A tibble: 8 x 6
# Groups: visitor [3]
  visitor v_time         payment items diff   diff_secs
  <fct>   <fct>            <dbl> <dbl> <time>     <dbl>
1 David   1/2/2018 16:12   25.0   2.00 NA            NA
2 David   1/2/2018 16:21   25.0   5.00 NA            NA
3 Jack    1/2/2018 16:07   35.0   3.00 NA            NA
4 Jack    1/2/2018 16:09  160     1.00 NA            NA
5 Jack    1/2/2018 16:32   85.0   5.00 NA            NA
6 Jack    1/2/2018 16:55    6.00  2.00 NA            NA
7 Kate    1/2/2018 16:16    3.00  3.00 NA            NA
8 Kate    1/2/2018 16:33  639     3.00 NA            NA

5 个回复 | 最后更新于 2018-02-28
2018-02-28   #1

当您只需将default = strptime(v_time, "%d/%m/%Y %H:%M")[1]添加到lag部件中时:

df <- df %>%
  arrange(visitor, v_time) %>%
  group_by(visitor) %>%
  mutate(diff = strptime(v_time, "%d/%m/%Y %H:%M") - lag(strptime(v_time, "%d/%m/%Y %H:%M"), default = strptime(v_time, "%d/%m/%Y %H:%M")[1]),
         diff_secs = as.numeric(diff, units = 'secs'))
你会得到你期望的结果:

> df
# A tibble: 8 x 6
# Groups:   visitor [3]
  visitor v_time         payment items diff   diff_secs
  <fct>   <fct>            <dbl> <dbl> <time>     <dbl>
1 David   1/2/2018 16:12     25.    2. 0             0.
2 David   1/2/2018 16:21     25.    5. 540         540.
3 Jack    1/2/2018 16:07     35.    3. 0             0.
4 Jack    1/2/2018 16:09    160.    1. 120         120.
5 Jack    1/2/2018 16:32     85.    5. 1380       1380.
6 Jack    1/2/2018 16:55      6.    2. 1380       1380.
7 Kate    1/2/2018 16:16      3.    3. 0             0.
8 Kate    1/2/2018 16:33    639.    3. 1020       1020.


另一个选项是使用difftime:

df <- df %>%
  arrange(visitor, v_time) %>%
  group_by(visitor) %>%
  mutate(diff = difftime(strptime(v_time, "%d/%m/%Y %H:%M"), lag(strptime(v_time, "%d/%m/%Y %H:%M"), default = strptime(v_time, "%d/%m/%Y %H:%M")[1]), units = 'mins'),
         diff_secs = as.numeric(diff, units = 'secs'))
现在diff列在几分钟内,diff_sec列在 塞康DS:

> df
# A tibble: 8 x 6
# Groups:   visitor [3]
  visitor v_time         payment items diff   diff_secs
  <fct>   <fct>            <dbl> <dbl> <time>     <dbl>
1 David   1/2/2018 16:12     25.    2. 0             0.
2 David   1/2/2018 16:21     25.    5. 9           540.
3 Jack    1/2/2018 16:07     35.    3. 0             0.
4 Jack    1/2/2018 16:09    160.    1. 2           120.
5 Jack    1/2/2018 16:32     85.    5. 23         1380.
6 Jack    1/2/2018 16:55      6.    2. 23         1380.
7 Kate    1/2/2018 16:16      3.    3. 0             0.
8 Kate    1/2/2018 16:33    639.    3. 17         1020.


您现在可以使用write.csv(df,"C:/output.csv", row.names = FALSE)再次保存结果

2018-02-28   #2

当您只需将default = strptime(v_time, "%d/%m/%Y %H:%M")[1]添加到lag部件中时:

df <- df %>%
  arrange(visitor, v_time) %>%
  group_by(visitor) %>%
  mutate(diff = strptime(v_time, "%d/%m/%Y %H:%M") - lag(strptime(v_time, "%d/%m/%Y %H:%M"), default = strptime(v_time, "%d/%m/%Y %H:%M")[1]),
         diff_secs = as.numeric(diff, units = 'secs'))
你会得到你期望的结果:

> df
# A tibble: 8 x 6
# Groups:   visitor [3]
  visitor v_time         payment items diff   diff_secs
  <fct>   <fct>            <dbl> <dbl> <time>     <dbl>
1 David   1/2/2018 16:12     25.    2. 0             0.
2 David   1/2/2018 16:21     25.    5. 540         540.
3 Jack    1/2/2018 16:07     35.    3. 0             0.
4 Jack    1/2/2018 16:09    160.    1. 120         120.
5 Jack    1/2/2018 16:32     85.    5. 1380       1380.
6 Jack    1/2/2018 16:55      6.    2. 1380       1380.
7 Kate    1/2/2018 16:16      3.    3. 0             0.
8 Kate    1/2/2018 16:33    639.    3. 1020       1020.


另一个选项是使用difftime:

df <- df %>%
  arrange(visitor, v_time) %>%
  group_by(visitor) %>%
  mutate(diff = difftime(strptime(v_time, "%d/%m/%Y %H:%M"), lag(strptime(v_time, "%d/%m/%Y %H:%M"), default = strptime(v_time, "%d/%m/%Y %H:%M")[1]), units = 'mins'),
         diff_secs = as.numeric(diff, units = 'secs'))
现在diff列在几分钟内,diff_sec列在 塞康DS:

> df
# A tibble: 8 x 6
# Groups:   visitor [3]
  visitor v_time         payment items diff   diff_secs
  <fct>   <fct>            <dbl> <dbl> <time>     <dbl>
1 David   1/2/2018 16:12     25.    2. 0             0.
2 David   1/2/2018 16:21     25.    5. 9           540.
3 Jack    1/2/2018 16:07     35.    3. 0             0.
4 Jack    1/2/2018 16:09    160.    1. 2           120.
5 Jack    1/2/2018 16:32     85.    5. 23         1380.
6 Jack    1/2/2018 16:55      6.    2. 23         1380.
7 Kate    1/2/2018 16:16      3.    3. 0             0.
8 Kate    1/2/2018 16:33    639.    3. 17         1020.


您现在可以使用write.csv(df,"C:/output.csv", row.names = FALSE)再次保存结果

2018-02-28   #3

错误来自lag(strptime(v_time, "%d/%m/%Y %H:%M")) 错误信息:

# Error in format.POSIXlt(x, usetz = TRUE) : 
#  invalid component [[10]] in "POSIXlt" should be 'zone'
为了避免这种情况,请尝试strptime(lag(v_time), "%d/%m/%Y %H:%M")
df <- df %>%
    arrange(visitor, v_time) %>%
    group_by(visitor) %>%
    mutate(diff = strptime(v_time, "%d/%m/%Y %H:%M") - strptime(lag(v_time), "%d/%m/%Y %H:%M"), diff_secs = as.numeric(diff, units = 'secs'))
print(df)
输出:
# A tibble: 8 x 6
# Groups:   visitor [3]
  visitor         v_time payment items    diff diff_secs
   <fctr>         <fctr>   <dbl> <dbl>  <time>     <dbl>
1   David 1/2/2018 16:12      25     2 NA mins        NA
2   David 1/2/2018 16:21      25     5  9 mins       540
3    Jack 1/2/2018 16:07      35     3 NA mins        NA
4    Jack 1/2/2018 16:09     160     1  2 mins       120
5    Jack 1/2/2018 16:32      85     5 23 mins      1380
6    Jack 1/2/2018 16:55       6     2 23 mins      1380
7    Kate 1/2/2018 16:16       3     3 NA mins        NA
8    Kate 1/2/2018 16:33     639     3 17 mins      1020
d不要忘记在你之前使用df <-在df上保存你的工作 导出它。

2018-02-28   #4

这是lubridate包装的一种方法

library(lubridate)
df$v_time <- mdy_hm(df$v_time)
df <- df %>%
  arrange(visitor, v_time) %>%
  group_by(visitor) 
df$diff <- rep(0,nrow(df))
for(i in 1:(nrow(df)-1)){
  df$diff[i+1] <- df$v_time[i+1]-df$v_time[i]
}
write.csv(df,"C:/output.csv", row.names = F)

2018-02-28   #5

这是一个difftime的选项。我们将'v_time'转换为datetime dmy_hm(从

write_csv(out, "yourfile.csv")
4),然后在9999之后999999126ing和分组 通过'visitor',使用difftime以秒为单位输出
library(tidyverse)
out <- df %>% 
        mutate(v_time = dmy_hm(v_time)) %>% 
        arrange(visitor, v_time) %>% 
        group_by(visitor) %>%
        mutate(diff = difftime(v_time, lag(v_time, default = first(v_time)), units = "secs"))
# A tibble: 8 x 5
# Groups: visitor [3]
#  visitor v_time              payment items diff  
#  <fctr>  <dttm>                <dbl> <dbl> <time>
#1 David   2018-02-01 16:12:00   25.0   2.00 0     
#2 David   2018-02-01 16:21:00   25.0   5.00 540   
#3 Jack    2018-02-01 16:07:00   35.0   3.00 0     
#4 Jack    2018-02-01 16:09:00  160     1.00 120   
#5 Jack    2018-02-01 16:32:00   85.0   5.00 1380  
#6 Jack    2018-02-01 16:55:00    6.00  2.00 1380  
#7 Kate    2018-02-01 16:16:00    3.00  3.00 0     
#8 Kate    2018-02-01 16:33:00  639     3.00 1020  
然后,我们用99999写入csv99999201
write_csv(out, "yourfile.csv")

登录后方可回帖

Loading...