使用pairs绘制矩阵多元图

简介:
如果 X 是一个数值矩阵或者数据框,命令
     > pairs(X)
将产生 X 的列之间两两相对的成对散点图阵列(pairwise scatterplot matrix)。 也就是说,X的每一列相对 X 的所有其他列而产生 n(n-1) 个 图,并且把这些图以阵列个形式显示在图区。这个 图形阵列的行列图形尺度一致。
例如 : 
> X <- matrix(1:8, 2, 4)
> X
     [,1] [,2] [,3] [,4]
[1,]    1    3    5    7
[2,]    2    4    6    8
> pairs(X)

主要是不是每幅图都有坐标, 所以看起来很难理解, 其实每个格子都有坐标, 没画出来的话就在对立面.

使用pairs绘制矩阵多元图 - 德哥@Digoal - PostgreSQL research

我们看上图的第一列, 横坐标固定, 横坐标对应矩阵第一列的数据. 
纵坐标, 每个格子有不同的纵坐标, 这些纵坐标和矩阵的其他几列对应.

上图第1列 : 
第1个格子是第1列和第1列的数据,  横坐标是第1列的数据, 纵坐标是第1列的数据.   (无图显示)
第2个格子是第1列和第2列的数据, 横坐标 代表第1列, 是第1列的数据, 纵坐标是第2列的数据.
第3个格子是第1列和第3列的数据, 横坐标 代表第1列 , 是第1列的数据, 纵坐标是第3列的数据.
第4个格子是第1列和第4列的数据, 横坐标 代表第1列 , 是第1列的数据, 纵坐标是第4列的数据.

上图第2列 : 
从上开始
第1个格子是第2列和第1列的数据,  横坐标是第2列的数据, 纵坐标是第1列的数据.
第2个格子是第2列和第2列的数据, 横坐标是第2列的数据, 纵坐标是第2列的数据. (无图显示)
第3个格子是第2列和第3列的数据, 横坐标 是第2列的数据, 纵坐标是第3列的数据.
第4个格子是第2列和第4列的数据, 横坐标 是第2列的数据, 纵坐标是第4列的数据.

........

pairs还支持使用公式, 输入多个变量时, 每个变量代表一列, 绘制与其他变量的多元图.
例如 : 
> x <- 1:10
> y <- 101:110
> z <- 201:210
> x
 [1]  1  2  3  4  5  6  7  8  9 10
> y
 [1] 101 102 103 104 105 106 107 108 109 110
> z
 [1] 201 202 203 204 205 206 207 208 209 210
> pairs(~ x / y / z)
> pairs(~ x + y + z)
以上得到的图是一致的 : 
使用pairs绘制矩阵多元图 - 德哥@Digoal - PostgreSQL research
x,y,z每个代表1列, 与其他列绘制多元图.
例如第1列的横坐标为x, 纵坐标分别为y,z绘制2副图.
第2列的横坐标为y, 纵坐标分别为x,z绘制2副图.
第3列的横坐标为z, 纵坐标分别为x,y绘制2副图.

[参考]
1.  > help(pairs)
pairs                 package:graphics                 R Documentation

Scatterplot Matrices

Description:

     A matrix of scatterplots is produced.

Usage:

     pairs(x, ...)
     
     ## S3 method for class 'formula'
     pairs(formula, data = NULL, ..., subset,
           na.action = stats::na.pass)
     
     ## Default S3 method:
     pairs(x, labels, panel = points, ...,
           lower.panel = panel, upper.panel = panel,
           diag.panel = NULL, text.panel = textPanel,
           label.pos = 0.5 + has.diag/3, line.main = 3,
           cex.labels = NULL, font.labels = 1,
           row1attop = TRUE, gap = 1, log = "")
     
Arguments:

       x: the coordinates of points given as numeric columns of a
          matrix or data frame.  Logical and factor columns are
          converted to numeric in the same way that ‘data.matrix’ does.

 formula: a formula, such as ‘~ x + y + z’.  Each term will give a
          separate variable in the pairs plot, so terms should be
          numeric vectors.  (A response will be interpreted as another
          variable, but not treated specially, so it is confusing to
          use one.)

    data: a data.frame (or list) from which the variables in ‘formula’
          should be taken.

  subset: an optional vector specifying a subset of observations to be
          used for plotting.

na.action: a function which indicates what should happen when the data
          contain ‘NA’s.  The default is to pass missing values on to
          the panel functions, but ‘na.action = na.omit’ will cause
          cases with missing values in any of the variables to be
          omitted entirely.

  labels: the names of the variables.

   panel: ‘function(x, y, ...)’ which is used to plot the contents of
          each panel of the display.

     ...: arguments to be passed to or from methods.

          Also, graphical parameters can be given as can arguments to
          ‘plot’ such as ‘main’.  ‘par("oma")’ will be set
          appropriately unless specified.

lower.panel, upper.panel: separate panel functions (or ‘NULL’) to be
          used below and above the diagonal respectively.

diag.panel: optional ‘function(x, ...)’ to be applied on the diagonals.

text.panel: optional ‘function(x, y, labels, cex, font, ...)’ to be
          applied on the diagonals.

label.pos: ‘y’ position of labels in the text panel.

line.main: if ‘main’ is specified, ‘line.main’ gives the ‘line’
          argument to ‘mtext()’ which draws the title.  You may want to
          specify ‘oma’ when changing ‘line.main’.

cex.labels, font.labels: graphics parameters for the text panel.

row1attop: logical. Should the layout be matrix-like with row 1 at the
          top, or graph-like with row 1 at the bottom?

     gap: distance between subplots, in margin lines.

     log: a character string indicating if logarithmic axes are to be
          used: see ‘plot.default’. ‘log = "xy"’ specifies logarithmic
          axes for all variables.

Details:

     The ijth scatterplot contains ‘x[,i]’ plotted against ‘x[,j]’.
     The scatterplot can be customised by setting panel functions to
     appear as something completely different. The off-diagonal panel
     functions are passed the appropriate columns of ‘x’ as ‘x’ and
     ‘y’: the diagonal panel function (if any) is passed a single
     column, and the ‘text.panel’ function is passed a single ‘(x, y)’
     location and the column name.  Setting some of these panel
     functions to ‘NULL’ is equivalent to _not_ drawing anything there.

     The graphical parameters ‘pch’ and ‘col’ can be used to specify a
     vector of plotting symbols and colors to be used in the plots.

     The graphical parameter ‘oma’ will be set by ‘pairs.default’
     unless supplied as an argument.

     A panel function should not attempt to start a new plot, but just
     plot within a given coordinate system: thus ‘plot’ and ‘boxplot’
     are not panel functions.

     By default, missing values are passed to the panel functions and
     will often be ignored within a panel.  However, for the formula
     method and ‘na.action = na.omit’, all cases which contain a
     missing values for any of the variables are omitted completely
     (including when the scales are selected).

Author(s):

     Enhancements for R 1.0.0 contributed by Dr. Jens
     Oehlschlaegel-Akiyoshi and R-core members.

References:

     Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) _The New S
     Language_.  Wadsworth & Brooks/Cole.

Examples:

     pairs(iris[1:4], main = "Anderson's Iris Data -- 3 species",
           pch = 21, bg = c("red", "green3", "blue")[unclass(iris$Species)])
     
     ## formula method
     pairs(~ Fertility + Education + Catholic, data = swiss,
           subset = Education < 20, main = "Swiss data, Education < 20")
     
     pairs(USJudgeRatings)
     ## show only lower triangle (and suppress labeling for whatever reason):
     pairs(USJudgeRatings, text.panel = NULL, upper.panel = NULL)
     
     ## put histograms on the diagonal
     panel.hist <- function(x, ...)
     {
         usr <- par("usr"); on.exit(par(usr))
         par(usr = c(usr[1:2], 0, 1.5) )
         h <- hist(x, plot = FALSE)
         breaks <- h$breaks; nB <- length(breaks)
         y <- h$counts; y <- y/max(y)
         rect(breaks[-nB], 0, breaks[-1], y, col = "cyan", ...)
     }
     pairs(USJudgeRatings[1:5], panel = panel.smooth,
           cex = 1.5, pch = 24, bg = "light blue",
           diag.panel = panel.hist, cex.labels = 2, font.labels = 2)
     
     ## put (absolute) correlations on the upper panels,
     ## with size proportional to the correlations.
     panel.cor <- function(x, y, digits = 2, prefix = "", cex.cor, ...)
     {
         usr <- par("usr"); on.exit(par(usr))
         par(usr = c(0, 1, 0, 1))
         r <- abs(cor(x, y))
         txt <- format(c(r, 0.123456789), digits = digits)[1]
         txt <- paste0(prefix, txt)
         if(missing(cex.cor)) cex.cor <- 0.8/strwidth(txt)
         text(0.5, 0.5, txt, cex = cex.cor * r)
     }
     pairs(USJudgeRatings, lower.panel = panel.smooth, upper.panel = panel.cor)
     
     pairs(iris[-5], log = "xy") # plot all variables on log scale
     pairs(iris, log = 1:4, # log the first four
           main = "Lengths and Widths in [log]", line.main=1.5, oma=c(2,2,3,2))

相关文章
|
1天前
|
数据可视化
绘制GGPLOT2双色XY区间面积图组合交叉折线图数据可视化
绘制GGPLOT2双色XY区间面积图组合交叉折线图数据可视化
|
2天前
|
机器学习/深度学习 数据可视化
如何在R语言中建立六边形矩阵热图heatmap可视化
如何在R语言中建立六边形矩阵热图heatmap可视化
|
5月前
|
数据可视化 Python
使用递归图 recurrence plot 表征时间序列
在本文中,我将展示如何使用递归图 Recurrence Plots 来描述不同类型的时间序列。我们将查看具有500个数据点的各种模拟时间序列。我们可以通过可视化时间序列的递归图并将其与其他已知的不同时间序列的递归图进行比较,从而直观地表征时间序列。
176 0
|
2月前
GEE——土地利用分类种两个矢量集合中不同列进行相减的方式(利用join进行连接处理)
GEE——土地利用分类种两个矢量集合中不同列进行相减的方式(利用join进行连接处理)
33 2
|
3月前
|
JavaScript SoC
leetcode-304:二维区域和检索 - 矩阵不可变
leetcode-304:二维区域和检索 - 矩阵不可变
27 0
|
6月前
|
数据挖掘
跟着 Cancer Cell 学作图 | 相关性热图(不对称版)
跟着 Cancer Cell 学作图 | 相关性热图(不对称版)
53 0
|
9月前
|
算法
ENVI_IDL:使用反距离权重法选取最近n个点插值(底层实现)并输出为Geotiff格式(效果等价于Arcgis中反距离权重插值)
ENVI_IDL:使用反距离权重法选取最近n个点插值(底层实现)并输出为Geotiff格式(效果等价于Arcgis中反距离权重插值)
251 0
|
10月前
|
人工智能 数据可视化
跟SCI学umap图| ggplot2 绘制umap图,坐标位置 ,颜色 ,大小还不是你说了算
跟SCI学umap图| ggplot2 绘制umap图,坐标位置 ,颜色 ,大小还不是你说了算
706 1
|
11月前
|
数据可视化 数据挖掘 Python
跟着Nature Metabolism学作图:R语言ggplot2热图组合树图和双层分组标记
跟着Nature Metabolism学作图:R语言ggplot2热图组合树图和双层分组标记
R 实战| 几种常用的绘制离散变量热图/方块图/华夫图的方法
R 实战| 几种常用的绘制离散变量热图/方块图/华夫图的方法
470 0
R 实战| 几种常用的绘制离散变量热图/方块图/华夫图的方法