Postgres as BI platform
Transcript of Postgres as BI platform
![Page 1: Postgres as BI platform](https://reader030.fdocuments.net/reader030/viewer/2022012717/61af7a416f0405063f4661c5/html5/thumbnails/1.jpg)
Postgres asBIplatformAndyFefelovmastery.pro
1
![Page 2: Postgres as BI platform](https://reader030.fdocuments.net/reader030/viewer/2022012717/61af7a416f0405063f4661c5/html5/thumbnails/2.jpg)
Agenda
• Problemstatement• Opensourcesolution• ROLAP• Ourarchitecturereview• Postgresfeaturessuitablefor BI• ETLvsELT(stage-nds-ddm)• Columndatastorage• Configuration• Specialfeatures
• Prosandconsofoutsolution
2
![Page 3: Postgres as BI platform](https://reader030.fdocuments.net/reader030/viewer/2022012717/61af7a416f0405063f4661c5/html5/thumbnails/3.jpg)
Problemstatement
• OurcustomerisoneofthelargestpharmacysupplychaingroupinIreland• 4typesofdispensarysoftware• 250pharmacies• Tobeanalyzed:
• Orders• Scripts (prescription,recipe)• Claims
• Goalstobeachieved:• Purchasingpolicyoptimization• Marketingkillingfeature
3
![Page 4: Postgres as BI platform](https://reader030.fdocuments.net/reader030/viewer/2022012717/61af7a416f0405063f4661c5/html5/thumbnails/4.jpg)
Opensource
• SpagoBI• Pentaho• Mondrian• Saiku• Cubes(databrewery)
4
![Page 5: Postgres as BI platform](https://reader030.fdocuments.net/reader030/viewer/2022012717/61af7a416f0405063f4661c5/html5/thumbnails/5.jpg)
ROLAP(R-ROLAP)
• Starscheme• Facts• Dimensions• Measures
• Nopre-calculatedaggregates• SSD• Columnstorage• ???• Profit!
5
![Page 6: Postgres as BI platform](https://reader030.fdocuments.net/reader030/viewer/2022012717/61af7a416f0405063f4661c5/html5/thumbnails/6.jpg)
ROLAP
6
![Page 7: Postgres as BI platform](https://reader030.fdocuments.net/reader030/viewer/2022012717/61af7a416f0405063f4661c5/html5/thumbnails/7.jpg)
Ourarchitecture
Extractors
Postgres (LoadTransform)
Cubes(API)
Rails +React(UI)
Saiku (UI)
7
![Page 8: Postgres as BI platform](https://reader030.fdocuments.net/reader030/viewer/2022012717/61af7a416f0405063f4661c5/html5/thumbnails/8.jpg)
Architecture- extractors
• Cyclone_client• Mssql (2008-2012)• Golang• CSV+rsync overssh
• Kachok• Webscrapper
• Skytools replication• Fromexistingproducts
Extractors
Postgres (LoadTransform)
Cubes(API)
Rails +React(UI)
Saiku (UI)
8
![Page 9: Postgres as BI platform](https://reader030.fdocuments.net/reader030/viewer/2022012717/61af7a416f0405063f4661c5/html5/thumbnails/9.jpg)
Architecture– API+UI
• Cubes- cubes.databrewery.org• Easydrilling-down• Slicinganddicing• Servesaggregates,dimensiondetails,facts
• Providesallnecessarymetadataforareportingapplication
• Rails,React• Authorization• d3,dc,crossfilter
• Saiku• Onlyforbackoffice
Extractors
Postgres (LoadTransform)
Cubes(API)
Rails +React(UI)
Saiku (UI)
9
![Page 10: Postgres as BI platform](https://reader030.fdocuments.net/reader030/viewer/2022012717/61af7a416f0405063f4661c5/html5/thumbnails/10.jpg)
Architecture– Postgres(load,transform)
• rawdata• load_something_to_nds(_pharmacy_id integer)stage• normalizeddatastore• load_something_to_ddm(_pharmacy_id integer)nds•cubesandsnapshots•viewsddm
10
![Page 11: Postgres as BI platform](https://reader030.fdocuments.net/reader030/viewer/2022012717/61af7a416f0405063f4661c5/html5/thumbnails/11.jpg)
Architecture– Postgres(load,transform)
Stage• «Raw»data• CleanedupcompletelyineveryELTcycle• IsasdatasourceforNDS
• rawdata• load_something_to_nds(_pharmacy_id integer)
stage•normalized datastore•load_something_to_ddm(_pharmacy_idinteger)nds
•cubesandsnapshots•viewsddm
11
![Page 12: Postgres as BI platform](https://reader030.fdocuments.net/reader030/viewer/2022012717/61af7a416f0405063f4661c5/html5/thumbnails/12.jpg)
Architecture– Postgres(load,transform)
• NormalizedDataStore• Heredataisnormalizingandvalidating• Isasourceforddm
• Measuresforddm iscalculatedthere• deltacalculatingforloadingintoddm basedon last_updated field
• rawdata• load_something_to_nds(_pharmacy_id integer)
stage• normalizeddatastore• load_something_to_ddm(_pharmacy_idinteger)
nds
• cubesandsnapshots• viewsddm
12
![Page 13: Postgres as BI platform](https://reader030.fdocuments.net/reader030/viewer/2022012717/61af7a416f0405063f4661c5/html5/thumbnails/13.jpg)
Architecture– Postgres(load,transform)
• Dimensionaldatamodel• Cubes• Snapshots
• Deploycalmly• Analyzebefore-afterreleasestates• Viewisentrypointforapplication
• rawdata• load_something_to_nds(_pharmacy_id integer)
stage• normalizeddatastore• load_something_to_ddm(_pharmacy_idinteger)
nds
• cubesandsnapshots• viewsddm
13
![Page 14: Postgres as BI platform](https://reader030.fdocuments.net/reader030/viewer/2022012717/61af7a416f0405063f4661c5/html5/thumbnails/14.jpg)
Architecture– Postgres(snapshots)
fact_order_item
vw_order_item
s1_order_item
s2_order_item
14
![Page 15: Postgres as BI platform](https://reader030.fdocuments.net/reader030/viewer/2022012717/61af7a416f0405063f4661c5/html5/thumbnails/15.jpg)
Architecture– Postgres(snapshots)
fact_order_item
vw_order_item
s1_order_item
s2_order_item
15
![Page 16: Postgres as BI platform](https://reader030.fdocuments.net/reader030/viewer/2022012717/61af7a416f0405063f4661c5/html5/thumbnails/16.jpg)
Columnstorage
• Suitablefor:• aggregations• showingfixednumbersofcolumns
• cstore_fdw ->https://github.com/citusdata/cstore_fdw• Compression:Reducesin-memoryandon-diskdatasizeby2-4x.Canbeextendedtosupportdifferentcodecs.• Columnprojections:Onlyreadscolumndatarelevanttothequery.ImprovesperformanceforI/Oboundqueries.• Skipindexes:Storesmin/maxstatisticsforrowgroups,andusesthemtoskipoverunrelatedrows.
16
![Page 17: Postgres as BI platform](https://reader030.fdocuments.net/reader030/viewer/2022012717/61af7a416f0405063f4661c5/html5/thumbnails/17.jpg)
Columnstorage
• Ourexpirience:
• Isnotfasterthanvanilla postgres (sayhelloto cubes)• Volumereducedupto 12times.Wow.• Nowaytobackuptraditionalway(noneed?)• Nosupportfor delete/update (snapshots)
17
![Page 18: Postgres as BI platform](https://reader030.fdocuments.net/reader030/viewer/2022012717/61af7a416f0405063f4661c5/html5/thumbnails/18.jpg)
Configuration
• Loadprofile:• BigvolumeRWI/O• Mostof I/Oissittingin stage,nds• ddm isnothighloaded
• shared_buffers =½RAM• work_mem=2GB• maintenance_work_mem=3GB• temp_buffers =2GB• effective_cache_size =½RAM• max_wal_sizr =32GB
18
![Page 19: Postgres as BI platform](https://reader030.fdocuments.net/reader030/viewer/2022012717/61af7a416f0405063f4661c5/html5/thumbnails/19.jpg)
Features
• DDMcouldbeplacedindedicatedserver(londiste,pg_logical)• Use COPY/BULKINSERTS,don’tuse UPDATE(ke ke ke)• Youshouldthinkabouthorizontalandverticalpartitioning,pleasefindproperkeysforthat• Youshouldthinkaboutparallelismfromverybeginning• Use TABLESPACES/PARTIALINDEXES (andmoreandmoredisks)• Youshouldusedatastorepolicy• Statisticsshouldbecollectedintempfs volume
19
![Page 20: Postgres as BI platform](https://reader030.fdocuments.net/reader030/viewer/2022012717/61af7a416f0405063f4661c5/html5/thumbnails/20.jpg)
Featuresvol 2
• Usemigrations– sqitch bytheory• You’dbettertestELT- sqitch bytheory• Use pg_stat_statements (addthisintomonitoring)• Useprofiling– PLPROFILER3• Sometimes,youhave(not)touse cstore_fdw• Sometimes,youhave(not)touse unloggedtables
20
![Page 21: Postgres as BI platform](https://reader030.fdocuments.net/reader030/viewer/2022012717/61af7a416f0405063f4661c5/html5/thumbnails/21.jpg)
Prosandcons
• Cons• Noeasywaytoscalehorizontally• Reasonabledifficultdeploy
• Pros• Localdata(nobignetworktransfers)• Effectivelyparallelized(thankstopharmacy_id)• PL/pgSQL
21
![Page 23: Postgres as BI platform](https://reader030.fdocuments.net/reader030/viewer/2022012717/61af7a416f0405063f4661c5/html5/thumbnails/23.jpg)
Speedlimit
• cubesisnotfast (duetoserialization)• json (12sec)• ujson (4sec)• postgres json output (1.5sec)db selftime0.3-0.7sec
23