1 26 October 2013 Observation and Reflection on Official Statistics against Big Data Challenge Yuan...

21
1 26 October 2013 Observation and Reflection on Official Statistics against Big Data Challenge Yuan Pengfei Research Institute of Statistical Sciences National Bureau of Statistics of China

Transcript of 1 26 October 2013 Observation and Reflection on Official Statistics against Big Data Challenge Yuan...

1

26 October 2013

Observation and Reflection on Official Statistics against Big Data Challenge

Yuan PengfeiResearch Institute of Statistical SciencesNational Bureau of Statistics of China

2

The situation and our preparation

Big data rolls toward us.In recent years , we constantly strengthen the construction of statistical informatization in NBS.

3

The characteristics of big dataMost of the big data are from automated

generation.

There are many data sources of big data.

Unstructured data have taken a large proportion of

big data.

The value of big data need to be filtrated and

extracted.

From 3V to 6V.

4

Why from 3V to 6V

Volume: data volume is huge. Value: application value is huge.Variety: data types are various.Velocity: processing speed is rapid. Vender: the acquisition and transmission of big data are flexible. Veracity: veracity and accuracy.

5

The challenges and influences: About system design

Statistical standard.Statistical indicators.Statistical range. Statistical method.

6

The challenges and influences:About data collection

By searching on the internet.By purchasing.By cooperation.

7

The challenges and influences: About data processing

We must try our best to explore the methods and

techniques on how transform unstructured data into

structured ones.

8

Processing big data with high capacity, high speed and complexity requires server cluster to support a variety of tools.Cloud computing is generally considered as the most economical way.

The challenges and influences: About data storage

9

AccuracyTimelinessApplicabilityEconomy

The challenges and influences: About data quality assessment

10

The challenges and influences: About data release

Release will be more timely. The choice of release media will be more

diverse.The content of release must be richer.

11

Some ideas for application: CPI statistics

To collect online transaction price data by searching on the internet.To explore the cooperation with online stores, thus to acquire online transaction price data.To establish a system on which malls, supermarkets and hospitals can submit their transaction records to official statistical departments.

12

13

14

Some ideas for application: PPI statistics

Collecting relevant online data by means of

searching, thus provide useful supplements for the

compiling of PPI in NBS.Establishing cooperation with related companies,

thus to collect the price information of related

industries for the evaluation and validation of PPI.

15

(10)

(5)

0

5

10

15

(40)(30)(20)(10)010203040

2008-1

2008-5

2008-9

2009-1

2009-5

2009-9

2010-1

2010-5

2010-9

2011-1

2011-5

2011-9

2012-1

2012-5

2012-9

上海钢联中国大宗商品价格指数PPI同比(右)%

钢联大宗商品指数领先1个半月左右

35

40

45

50

55

60

(40)(30)(20)(10)010203040

2008-1

2008-5

2008-9

2009-1

2009-5

2009-9

2010-1

2010-5

2010-9

2011-1

2011-5

2011-9

2012-1

2012-5

2012-9

上海钢联中国大宗商品价格指数PMI指数(右)%

16

Some ideas for application: Employment survey

Statistical analysis on big data related to

employment on the internet will, to some extent,

be very useful for learning about the situation in

the labor market.

17

Some ideas for application: Agricultural statistics

The application of spatial data.The application of data on the network of things.The application of data on the Internet.

18

19

Some ideas for application: Wholesale and retail statistics

Collecting the base data of E-commerce transactions, including quality assessment. Adding the indexes reflecting E-commerce transactions into statistical report forms, such as total volume of E-commerce transactions.Building E-commerce index reflecting the level of E-commerce transactions.

20

Some ideas for application: Transportation statistics

Making use of the data collected from various transportation infrastructures. Making use of the data recorded and transmitted by vehicles.Making use of the data generated from the object of transportation service.

21

Thank You!