R:STRINGdb包用于string蛋白互作分析

时间:2022-07-22
本文章向大家介绍R:STRINGdb包用于string蛋白互作分析,主要内容包括其使用实例、应用技巧、基本知识点总结和需要注意事项,具有一定的参考价值,需要的朋友可以参考一下。

使用STRING数据库进行蛋白互作分析是生信常规下游分析项目之一。

本文将通过R包STRINGdb来进行string蛋白互作分析,同时会利用igraph和ggraph对互作网络进行可视化。

STRINGdb包用于蛋白互作分析

STRINGdb包有别于其他的R包,它的帮助信息不是使用help函数查看,而是传给STRINGdbhelp(),如使用STRINGdbhelp("map")查看map函数的帮助。如需要查看它的vignette文档,可以使用vignette("STRINGdb")命令。

map和plot_network用于蛋白互作分析

进行分析前,先要创建一个STRINGdb对象:

library(tidyverse)
library(clusterProfiler)
library(org.Mm.eg.db)

library(STRINGdb)
library(igraph)
library(ggraph)
# 创建STRINGdb对象
string_db <- STRINGdb$new( version="10", species=10090,
                           score_threshold=400, input_directory="")

其中species代表物种编码,它是NCBI Taxonomy的物种编码(http://www.ncbi.nlm.nih.gov/taxonomy),其中人的编码为9606,小鼠的编码为10090,这里使用小鼠的编码,因为下面分析的基因来源于小鼠。

score_threshold是蛋白互作的得分,此值会用于筛选互作结果,400是默认分值,如果要求严格可以调高此值。

# 需要分析的基因
# 这些基因我测试过,就不再找其他的基因测试了(比如clusterProfiler里面的geneList基因集)
gene <- c('Drd2','Lrrc10b','Adora2a','Gpr6','Syndig1l','Rgs9','Drd1','Scn4b','Gpr88','Pde10a',
  'Rasd2','Ppp1r1b','Pde1b','Adcy5','Gng7','Arpp21','Hpca','Kcnab1','Ptpn5','St8sia3',
  'Spock3','Penk','Rxrg','Tmem158','Pcp4','Tac1','Gnal','Rasgrp2','Ano3','Vxn','Trbc2',
  'Ighm','Tbr1','Nptx1','Nr4a2','Ccn2','Plp1','Mbp','Trf','Cnp','Mobp','Qdpr','Mal',
  'Mag','Plekhb1','Cldn11','Cryab','Car2','Tspan2','Csrp1','Sept4','Bcas1','Mog','Qk',
  'Ermn','Ndrg1','Apod','Gatm','Olig1','Gpr37','Tmem88b','Pllp','Dbndd2','Opalin',
  'Ppp1r14a','Ugt8a','Fa2h','Grb14','Gsn','Evi2a','Lamp5','Rasgrf2','Eomes','Ms4a15',
  'Th','Cdhr1','Nmb','Doc2g','S100a5','Islr','Ly6g6e','Calb2','Trh','Fabp7','Slc6a11',
  'Csdc2','Nrip3','Lbhd2','Reln','Rab3b','Ptn','Kctd12','Ptgds','Mgp','Stx1a','Baiap3',
  'Nts','Zcchc12','Nap1l5','Resp18','Tmem130','Ndn','Gprasp2','Lypd1','Ahi1','Hap1',
  'Hpcal1','Pnck','Vat1l','Wdr6','Nnat','6330403K07Rik','Sst','Pdyn','Tac1','Strip2',
  'Drd1','Wfs1','Ppp1r2','Penk','Rgs9','Pde1b','Pde10a','Gpr88','Ppp1r1b','Tafa2',
  'Lmo3','Slc30a3','Nptxr','Rprm','Cartpt','Rhcg','Ctxn3','Sp8','Lgr6','Dlx2','Dlx1',
  'Dcx','Shisa8','Ablim3','Sp9','Tuba8','Myo16','Isoc1','Sall1','Cacng5','Pbx3','Gpsm1',
  'Gng4','Kank3','Tshz1','Pcbp3','Ptpro','Cpne4','Ppp1r12b','Synpr','Gm17750','Ampd2',
  'Meis2','Ankrd6','Ppm1e','Pbx1','Inpp5j','Grb2','Csdc2','Gprc5b','Nrep','Rgs12',
  'Cpne6','Kcnip1','Rasa2','Dclk2','Gprin1','Btg1','Dync1i1','Sox4','Gad1','Gm27032',
  'Ppp1cc','Syt6','Eml5','Rragd','Pcp4l1','Nrip3','Rasl11b','Gdpd5','Prkca','Marcksl1',
  'Tubb2b','Pacsin2','Kcnf1','Hap1','Calb2','Stxbp6','Nr2f2','Six3','Ramp3','Pvalb',
  'Unc13c','Vamp1','Gpx1','Kcnc1','Spp1','Gad1','Agt','Lgi2','Nefh','Prkcd','Esrrg',
  'Kcnip1','Prr32','Cldn2','Calml4','Slc4a5','Otx2','Kcne2','Steap1','Tmem72','Ccdc153',
  'Tmem212','F5','Col8a2','Kcnj13','Wfdc2','Folr1','Cfap126','Lbp','2900040C04Rik',
  'Trpv4','Sostdc1','Col8a1','Prlr','Tuba1c','Dynlrb2','Wdr86','Abca4','Msx1','Kl',
  'Gm19935','Foxj1','Ecrg4','Rdh5','Rsph1','Mia','Cdkn1c','Aqp1','Cox6b2','Clic6',
  'Prelp','Slc2a12','Slc13a4','Col9a3','Ace','1110017D15Rik','Igf2','Bst2','Car12',
  'Lgals3bp','Cab39l','Slc29a4','Sulf1','Pcolce','Ttr','Enpp2','Fxyd1','Slc4a2','Pltp',
  'Igfbp2','Rbp1','Rarres2','Cd24a','Ezr','Trpm3','1700094D03Rik','Car14','Ctsd',
  'Slc31a1','Ifi27','Cd9','Ifitm3','Dbi','Ccnd2','Vamp8','Ucp2','Cd63','Hemk1','Spint2',
  'Slc12a2','Plxnb2','Ephx1','Stk39','Tcn2','Vim','Emb','Hba-a2','Hbb-bs','Hba-a1',
  'Gfap','Spink8','Rasd1','Shisa6','Npy2r','Cabp7','Olfml2b','Homer3','Neurod6','Nr4a3',
  'Tanc1','Cnih2','Gabra5','Neurod2','Wipf3','Nr3c2','Zbtb18','Pcdh20','Arhgef25',
  'Prdm8','Epha4','Arpc5','Nell2','Hpca','Slit1','Crym','Hs3st4','Prkca','Chgb','Cpne6',
  'Dock9','Prkcg','Cpne7','Grin2a','Dkk3','Nt5dc3','Ak5','Rgs14','St6galnac5','Nptxr',
  'Nptx1','Bok')
# 将Gene Symbol转换为Entrez ID
gene <- gene %>% bitr(fromType = "SYMBOL", 
                      toType = "ENTREZID", 
                      OrgDb = "org.Mm.eg.db", 
                      drop = T)

使用map函数用于将基因匹配到STRING数据库的ID,map函数的帮助信息可以查看STRINGdb$help("map")。然后plot_network绘图即可。

data_mapped <- gene %>% string_db$map(my_data_frame_id_col_names = "ENTREZID", 
                removeUnmappedRows = TRUE)
string_db$plot_network( data_mapped$STRING_id )

可以发现和官网的出图是一样的。这里不是重点,只是为了说明STRINGdb的基础用法,就不再展开了。

使用get_interactions获取互作信息用于后续可视化分析

使用get_interactions获取蛋白互作信息,以用于后续可视化。

data_links <- data_mapped$STRING_id[1:100] %>% string_db$get_interactions()

data_links包含了蛋白互作的信息,比较重要的是前两列和最后一列:from、to、combined_score,前两列指定了蛋白互作关系的基因对,最后一列是此蛋白互作关系的得分。

data_links数据将用于后续分析。

使用igraph和ggraph可视化蛋白互作网络图

先使用igraph创建网络数据,并进行必要的处理,然后转到ggraph绘图。

# 转换stringID为Symbol,只取前两列和最后一列
links <- data_links %>%
  mutate(from = data_mapped[match(from, data_mapped$STRING_id), "SYMBOL"]) %>% 
  mutate(to = data_mapped[match(to, data_mapped$STRING_id), "SYMBOL"]) %>%  
  dplyr::select(from, to , last_col()) %>% 
  dplyr::rename(weight = combined_score)
# 节点数据
nodes <- links %>% { data.frame(gene = c(.$from, .$to)) } %>% distinct()
# 创建网络图
# 根据links和nodes创建
net <- igraph::graph_from_data_frame(d=links,vertices=nodes,directed = F)
# 添加一些参数信息用于后续绘图
# V和E是igraph包的函数,分别用于修改网络图的节点(nodes)和连线(links)
igraph::V(net)$deg <- igraph::degree(net) # 每个节点连接的节点数
igraph::V(net)$size <- igraph::degree(net)/5 #
igraph::E(net)$width <- igraph::E(net)$weight/10
# 使用ggraph绘图
# ggraph是基于ggplot2的包,语法和常规ggplot2类似
ggraph(net,layout = "kk")+
  geom_edge_fan(aes(edge_width=width), color = "lightblue", show.legend = F)+
  geom_node_point(aes(size=size), color="orange", alpha=0.7)+
  geom_node_text(aes(filter=deg>5, label=name), size = 5, repel = T)+
  scale_edge_width(range = c(0.2,1))+
  scale_size_continuous(range = c(1,10) )+
  guides(size=F)+
  theme_graph()

这里的参数设置是节点的大小和其连接的线的数量有关,线数量越多则点越大;线的宽度和其蛋白互作的得分有关,得分越高则越宽。只显示节点数大于5的基因名称。

布局方式有多种,除了刚才的kk方法,还有stress布局:

ggraph(net,layout = "stress")+
  geom_edge_fan(aes(edge_width=width), color = "lightblue", show.legend = F)+
  geom_node_point(aes(size=size), color="orange", alpha=0.7)+
  geom_node_text(aes(filter=deg>5, label=name), size = 5, repel = T)+
  scale_edge_width(range = c(0.2,1))+
  scale_size_continuous(range = c(1,10) )+
  guides(size=F)+
  theme_graph()

环形布局:

ggraph(net,layout = "linear", circular = TRUE)+
  geom_edge_fan(aes(edge_width=width), color = "lightblue", show.legend = F)+
  geom_node_point(aes(size=size), color="orange", alpha=0.7)+
  geom_node_text(aes(filter=deg>5, label=name), size = 5, repel = F)+
  scale_edge_width(range = c(0.2,1))+
  scale_size_continuous(range = c(1,10) )+
  guides(size=F)+
  theme_graph()

上面的几张图可以看到有几个单一的互作关系,其和主网络并不相连,像这种互作关系可以去掉,此时出来的图就会更加美观。

# 去除游离的互作关系
# 如果links数据框的一个link的from只出现过一次,同时to也只出现一次,则将其去除
links_2 <- links %>% mutate(from_c = count(., from)$n[match(from, count(., from)$from)]) %>%
  mutate(to_c = count(., to)$n[match(to, count(., to)$to)]) %>%
  filter(!(from_c == 1 & to_c == 1)) %>%
  dplyr::select(1,2,3)
# 新的节点数据
nodes_2 <- links_2 %>% { data.frame(gene = c(.$from, .$to)) } %>% distinct()
# 创建网络图
net_2 <- igraph::graph_from_data_frame(d=links_2,vertices=nodes_2,directed = F)
# 添加必要的参数
igraph::V(net_2)$deg <- igraph::degree(net_2)
igraph::V(net_2)$size <- igraph::degree(net_2)/5
igraph::E(net_2)$width <- igraph::E(net_2)$weight/10

如果去除了游离的互作关系,那么可以使用一种中心布局的方式,它是根据一个节点的连接数而排列其位置,连接数越大,节点越倾向于在中间位置排列,会更容易看得出重要节点。

ggraph(net_2,layout = "centrality", cent = deg)+
  geom_edge_fan(aes(edge_width=width), color = "lightblue", show.legend = F)+
  geom_node_point(aes(size=size), color="orange", alpha=0.7)+
  geom_node_text(aes(filter=deg>5, label=name), size = 5, repel = T)+
  scale_edge_width(range = c(0.2,1))+
  scale_size_continuous(range = c(1,10) )+
  guides(size=F)+
  theme_graph()

另外环形布局的线使用弧形线(geom_edge_arc)会更美观:

ggraph(net,layout = "linear", circular = TRUE)+
  geom_edge_arc(aes(edge_width=width), color = "lightblue", show.legend = F)+
  geom_node_point(aes(size=size), color="orange", alpha=0.7)+
  geom_node_text(aes(filter=deg>5, label=name), size = 5, repel = F)+
  scale_edge_width(range = c(0.2,1))+
  scale_size_continuous(range = c(1,10) )+
  guides(size=F)+
  theme_graph()

除了使用igraph创建网络图外,也可以使用tidygraph的as_tbl_graph函数处理数据,然后使用ggraph绘图:

links_2 %>% tidygraph::as_tbl_graph() %>%
  ggraph(layout = "kk")+
  geom_edge_fan(color = "grey")+
  geom_node_point(size=5, color="blue", alpha=0.8)+
  geom_node_text(aes(label=name), repel = T)+
  theme_void()

参考资料

网络数据可视化(3修订版)ggraph:ggplot2的网络可视化 https://mp.weixin.qq.com/s/jV8KEMudy9-L_EYwvAgYSA