从R中的strsplit删除空白

By simon at 2018-02-07 • 0人收藏 • 20人看过

> dc1
  V1                V2
1 20140211-0100     |Box
2 20140211-1782     |Office|Ball
3 20140211-1783     |Office
4 20140211-1784     |Office
5 20140221-0756     |Box
6 20140203-0418     |Box
> strsplit(as.character(dc1[,2]),"^\\|")
[[1]]
[1] ""    "Box"


[[2]]
[1] ""             "Office" "Ball"


[[3]]
[1] ""             "Office"


[[4]]
[1] ""             "Office"


[[5]]
[1] ""    "Box"


[[6]]
[1] ""    "Box"  
如何从* strsplit 结果中删除空白(“”)*结果 应该看起来像:

[[1]]
[1] "Box"
[[2]]
[1] "Office"    "Ball"

7 个回复 | 最后更新于 2018-02-07
2018-02-07   #1

您可以在列表中检查使用lapply。我改变了你的定义 strsplit以匹配您的预期输出。

dc1 <- read.table(text = 'V1                V2
1 20140211-0100     |Box
2 20140211-1782     |Office|Ball
3 20140211-1783     |Office
4 20140211-1784     |Office
5 20140221-0756     |Box
6 20140203-0418     |Box', header = TRUE)

out <- strsplit(as.character(dc1[,2]),"\\|")

> lapply(out, function(x){x[!x ==""]})
[[1]]
[1] "Box"

[[2]]
[1] "Office" "Ball"  

[[3]]
[1] "Office"

[[4]]
[1] "Office"

[[5]]
[1] "Box"

[[6]]
[1] "Box"

2018-02-07   #2

您可以在列表中检查使用lapply。我改变了你的定义 strsplit以匹配您的预期输出。

dc1 <- read.table(text = 'V1                V2
1 20140211-0100     |Box
2 20140211-1782     |Office|Ball
3 20140211-1783     |Office
4 20140211-1784     |Office
5 20140221-0756     |Box
6 20140203-0418     |Box', header = TRUE)

out <- strsplit(as.character(dc1[,2]),"\\|")

> lapply(out, function(x){x[!x ==""]})
[[1]]
[1] "Box"

[[2]]
[1] "Office" "Ball"  

[[3]]
[1] "Office"

[[4]]
[1] "Office"

[[5]]
[1] "Box"

[[6]]
[1] "Box"

2018-02-07   #3

我没有一个全球性的解决方案,但对于你的例子,你可以尝试: strsplit(sub("^\\|", "", as.character(dc1[,2])),"\\|") 它删除了第一个|(这是wh在正则表达式"^\\|" 说),这是"",之前执行拆分的原因。

2018-02-07   #4

在这种情况下,可以通过调用删除每个向量的第一个元素 "["在"["0

> sapply(strsplit(as.character(dc1[,2]), "\\|"), "[", -1)
# [[1]]
# [1] "Box"

# [[2]]
# [1] "Office" "Ball"  

# [[3]]
# [1] "Office"

# [[4]]
# [1] "Office"

# [[5]]
# [1] "Box"

# [[6]]
# [1] "Box"

2018-02-07   #5

你可以使用:

library(stringr)
str_extract_all(dc1[,2], "[[:alpha:]]+")
[[1]]
 [1] "Box"

[[2]]
 [1] "Office" "Ball"  

[[3]]
 [1] "Office"

[[4]]
 [1] "Office"

[[5]]
 [1] "Box"

[[6]]
 [1] "Box"

2018-02-07   #6

取消strsplit()的结果后,另一种方法使用nzchar():

out <- unlist(strsplit(as.character(dc1[,2]),"\\|"))

out[nzchar(x=out)] # removes the extraneous "" marks

2018-02-07   #7

library("stringr")

lapply(str_split(dc1$V2, "\\|"), function(x) x[-1])

[[1]]
[1] "Box"

[[2]]
[1] "Office" "Ball"  

[[3]]
[1] "Office"

[[4]]
[1] "Office"

[[5]]
[1] "Box"

[[6]]
[1] "Box"

登录后方可回帖

Loading...