R3.0.0 is relased

23
R.3.0.0 is released!! 201342030Tokyo.R @sfchaos

description

 

Transcript of R3.0.0 is relased

Page 1: R3.0.0 is relased

R.3.0.0 is released!!

2013年4月20日第30回Tokyo.R

@sfchaos

Page 2: R3.0.0 is relased

2

というわけで,

2013年4月3日

Page 3: R3.0.0 is relased

3

R-3.0.0

リリース!

The R Project for Statistical Computing

\

Page 4: R3.0.0 is relased

4

いろめき立つ

R界隈

R-statistics blog

SOURCEFORGE.JP Magazine

Page 5: R3.0.0 is relased

5

何が変わったか?

Page 6: R3.0.0 is relased

6

@kohskeさんのページにまとまっている

R.3.0.0の大事な変更点

Page 7: R3.0.0 is relased

7

大きな変更点

Long Vectorの導入

Page 8: R3.0.0 is relased

8

Long Vectorとは

長いベクトルのこと

Page 9: R3.0.0 is relased

9

> x <- rep(0, 2^31-1)> x <- rep(0, 2^31) エラー: サイズ 8.0 Gb のベクトルを割り当てることができません 追加情報: 警告メッセージ: 1: Reached total allocation of 16375Mb: see help(memory.size) 2: Reached total allocation of 16375Mb: see help(memory.size) 3: Reached total allocation of 16375Mb: see help(memory.size) 4: Reached total allocation of 16375Mb: see help(memory.size)

R-2.15.3までは1ベクトルのサイズは231-1まで(32ビット)

※ベクトルだけでなく,配列や行列などのオブジェクトにも同じ制約あり

Page 10: R3.0.0 is relased

10

これがもっともっと

長くなった

※ただし,いろいろと微妙らしい

前出の@kohskeさんのページ参照

出典: Wikipedia 太陽系

Page 11: R3.0.0 is relased

11

その比なんと221(≒106.3)

/* both config.h and Rconfig.h set SIZEOF_SIZE_T, but Rconfig.h is skipped if config.h has already been included. */#ifndef R_CONFIG_H# include <Rconfig.h>#endif

#if ( SIZEOF_SIZE_T > 4 )# define LONG_VECTOR_SUPPORT#endif

#ifdef LONG_VECTOR_SUPPORT typedef ptrdiff_t R_xlen_t; typedef struct { R_xlen_t lv_length, lv_truelength; } R_long_vec_hdr_t;# define R_XLEN_T_MAX 4503599627370496# define R_SHORT_LEN_MAX 2147483647# define R_LONG_VEC_TOKEN -1

252

231-1

Page 12: R3.0.0 is relased

12

まさにマシマシ

Page 13: R3.0.0 is relased

13

コードも簡潔にchar *R_alloc(size_t nelem, int eltsize){ R_size_t size = nelem * eltsize; double dsize = (double)nelem * eltsize; if (dsize > 0) { /* precaution against integer

overflow on 32-bit*/ SEXP s;#if SIZEOF_SIZE_T > 4 /* In this case by allocating larger units we can

get up to size(Rcomplex) * (2^31 - 1) bytes, approx 16Gb

*/ if(dsize < R_LEN_T_MAX) s = allocVector(RAWSXP, size + 1); else if(dsize < sizeof(double) * (R_LEN_T_MAX - 1)) s = allocVector(REALSXP, (int)

(0.99+dsize/sizeof(double))); else if(dsize < sizeof(Rcomplex) * (R_LEN_T_MAX -

1)) s = allocVector(CPLXSXP, (int)

(0.99+dsize/sizeof(Rcomplex))); else { error(_("cannot allocate memory block of size

%0.1f Gb"), dsize/1024.0/1024.0/1024.0); s = R_NilValue; /* -Wall */ }#else if(dsize > R_LEN_T_MAX) /* must be in the Gb range

*/ error(_("cannot allocate memory block of size

%0.1f Gb"), dsize/1024.0/1024.0/1024.0); s = allocVector(RAWSXP, size + 1);#endif ATTRIB(s) = R_VStack; R_VStack = s; return (char *) DATAPTR(s); } else return NULL;}

char *R_alloc(size_t nelem, int eltsize){ R_size_t size = nelem * eltsize; /* doubles are a precaution against integer

overflow on 32-bit */ double dsize = (double) nelem * eltsize; if (dsize > 0) { SEXP s;#ifdef LONG_VECTOR_SUPPORT /* 64-bit platform: previous version used REALSXPs

*/ if(dsize > R_XLEN_T_MAX) /* currently 4096 TB */ error(_("cannot allocate memory block of size

%0.f Tb"), dsize/pow(1024.0, 4.0)); s = allocVector(RAWSXP, size + 1);#else if(dsize > R_LEN_T_MAX) /* must be in the Gb range

*/ error(_("cannot allocate memory block of size

%0.1f Gb"), dsize/pow(1024.0, 3.0)); s = allocVector(RAWSXP, size + 1);#endif ATTRIB(s) = R_VStack; R_VStack = s; return (char *) DATAPTR(s); } /* One programmer has relied on this, but it is

undocumented! */ else return NULL;}

src/main/memory.c(R.version.tar.gzを解凍)

Page 14: R3.0.0 is relased

14

あと,R.3.0.0では

パッケージを

入れ直す必要が

あることに注意

Page 15: R3.0.0 is relased

15

以上!

詳しくはこのあとの

@wdkzさんのLTで

Page 16: R3.0.0 is relased

16

SEXP attribute_hidden do_Rprofmem(SEXP call, SEXP op, SEXP args, SEXP rho){ SEXP filename; R_size_t threshold; int append_mode;

checkArity(op, args); if (!isString(CAR(args)) || (LENGTH(CAR(args))) != 1) error(_("invalid '%s' argument"), "filename"); append_mode = asLogical(CADR(args)); filename = STRING_ELT(CAR(args), 0); threshold = REAL(CADDR(args))[0]; if (strlen(CHAR(filename))) R_InitMemReporting(filename, append_mode, threshold); else R_EndMemReporting(); return R_NilValue;}

#include "RBufferUtils.h"

attribute_hiddenvoid *R_AllocStringBuffer(size_t blen, R_StringBuffer *buf){ size_t blen1, bsize = buf->defaultSize;

/* for backwards compatibility, probably no longer needed */ if(blen == (size_t)-1) { warning("R_AllocStringBuffer(-1) used: please report"); R_FreeStringBufferL(buf); return NULL; }

if(blen * sizeof(char) < buf->bufsize) return buf->data; if(blen * sizeof(char) < buf->bufsize) return buf->data; blen1 = blen = (blen + 1) * sizeof(char); blen = (blen / bsize) * bsize; if(blen < blen1) blen += bsize;

if(buf->data == NULL) { buf->data = (char *) malloc(blen);

ところで

ソースコードが

読みにくくないか?

Page 17: R3.0.0 is relased

17

読みやすいコード発見namespace CXXR { class String; template <typename, SEXPTYPE, typename Initializer = RObject::DoNothing> class FixedVector; typedef FixedVector<int, INTSXP> IntVector; typedef FixedVector<RHandle<>, VECSXP> ListVector; typedef FixedVector<RHandle<String>, STRSXP> StringVector;

/** @brief Untemplated base class for R vectors. */ class VectorBase : public RObject { public: /** * @param stype The required ::SEXPTYPE. * @param sz The required number of elements in the vector. */ VectorBase(SEXPTYPE stype, std::size_t sz) : RObject(stype), m_truelength(sz), m_size(sz) {}

/** @brief Copy constructor. * * @param pattern VectorBase to be copied. */ VectorBase(const VectorBase& pattern) : RObject(pattern), m_truelength(pattern.m_truelength), m_size(pattern.m_size) {}

クラスを用いて

オブジェクト指向で記述

Page 18: R3.0.0 is relased

18

ドキュメントもある

Page 19: R3.0.0 is relased

19

その名はCXXR

CXXR: Refactorising R into C++

Page 20: R3.0.0 is relased

20

#inclu

de <io

stream

>

#inclu

de <fs

tream>

#inclu

de <bo

ost/no

ncopya

ble.hp

p>

class

BigDat

aFrame

: boo

st::no

ncopya

ble

{ p

ublic:

en

um Dat

aType

{CHAR=

1, SHO

RT=2,

INT=3,

DOUBL

E=4, C

OMPLEX

=5};

publ

ic:

Bi

gDataF

rame(i

ndex_t

ype nr

ow, in

dex_ty

pe nco

l) : n

row_(n

row),

ncol_(

ncol)

{

// ini

tializ

ing sh

ared p

ointer

p = st

d::sha

red_pt

r<Moni

tor>(n

ew Mon

itor[n

col],

st

d::def

ault_d

elete<

Monito

r[]>()

);

Rの中身を

C++で書き直す

プロジェクト

Page 22: R3.0.0 is relased

22

おわコンかと思いきや最近の論文も

CXXR: an extensible R interpreter

Page 23: R3.0.0 is relased

23

調査して

いつかどこかで

紹介する(かも)