Rpsonmongodb
-
Upload
mongodb-apac -
Category
Technology
-
view
1.653 -
download
0
Transcript of Rpsonmongodb
Rediff News Publishing System using MongoDB
Subramanyam Yeleswarapu
Agenda
• Use Cases • MongoDB Usage • Architecture • Q & A
Use Cases
• Rediff Maps • Core Publishing System • Newsletter
Rediff Maps Use Case
• Upload excel file and select the data • Match data to Map attributes • Author an article that consumes data
generated by data science team • Visualize data on the map
Upload data
2 3 RPS Login
1
1. Login require to post to rediff news to backyard
2. Select your excel file to upload
3. Upload the data file to server and display it below
Upload data…
4 7 6 5
4 If checked, ignore the first row and consider it as header column names of the data
5 Select the region column where name of district or state are given
6 If checked, many data columns can be selected
7 Select the data column that you want to show on map
(if first row doesn’t contains the header select options would be A,B,C..)
Workflow
1. Check the details of the data
• Area: Your data coverage It may be India or any State of India
• Data Unit: each record in data pertaining to state, district constituency
• Calssification:
• Select Categorized for Regions (records) with category,
• Select Gratuated / Quantile for Regions (records) with quantity. If you want to highlight the records in map with respect to each other Quantile may be good option.
• Intervals: Select the intervals to simplify your data. For Categorized the intervals are taken automatically from data.
`
Workflow
2. Select Colour Palette (Categorized): Click on the color palette to select more. In case of Categoriezed data colors can be changed at Legend box also.
`
2. Select Colour Palette (Gratuated/ Quantile): Click on the color palette to select more.
`
Output of Categorized map
If any changes done in the options, please click on the Render Map again to reflect changes on map
Output of Quantile map
Output of map with time series data
Push to publishing system
Where do we use • Management of the life cycle of articles • Articles’ Meta data storage • Role, Access and Work flow management • Acquisition External Feeds • Tagging • Notification • Search • Integrating data on Maps • Compose Newsletters
– Subscription based – Customized Newsletters on user habits/profiling
Why Mongodb
• Write throughput performance • Flexible Schema design (document style)
– Allows to modify / alter data model as the business demands • Read throughput (moderate) • New document storage is future ready
– Data mining, Shading and Clustering as per the volume and features of the business.
Architecture • Schema is defined in POJO
– “Reflection” are used to discover data structure • Custom Dimension’s are created on fly
– Use standard indices – Create specialized named collection – Counters – All defined in simple config file – Storage is totally abstracted from Apps layer
• REST Layer – Auto wiring Apps’s collections and exposing data as
resources
Architecture `
Create additional datasets
RPSApps
Mongodb
ETL Tools
dataset dataset
dataset dataset
dataset dataset
dataset
Datasets using Mongodb M/R
Map Reduce
• Based on uploaded photo’s metadata • Trends analysis on Tags • Timelines on geo location • Popular topics / editorial wise analysis
Out-bound Datasets
Use Case
• Article Publishing • Newsletter Publishing
Features • Search filters based on author, classification and
date range • Scheduling articles to be published live • Role based approval process and publishing life
cycle (for control and editorial reviews) • Easy content versioning of articles • Notification on application’s Tab / email • Provides a channel publish “Breaking News” on
web and mobile platforms in real time • Integrate with existing in house systems
Add on features
• Auto RSS Feeds creation and publishing • Data Journalism Simplified • SEO friendly (adding meta tags that helps to
rank up in search results) • Newsletters creation and publish process
+ Minimum and properly positioned buttons helps in publishing faster, less hassles and once used to it, it’s a game. Like while copy editing most of the buttons are positioned and bottom-right, so the editor does not have to scroll in search of buttons when he/she is done with editing it, its always in front.
+ Image preview in slide-shows allow us to see what image is getting uploaded with the content, so there is not mis-match of images. + Proper placing of other required fields helps in updating them faster. + fast navigation between slides, swapping slides by dragging them on required sequence.
All the versions of a copy gets locked when an editor opens it for editing, this helps in keeping the data update and its versioning/publishing smooth.
• The newsletter system has amazingly reduced efforts, its like select-headline and submit it for todays update.
• Newsletter system allows to edit, re-process copy headline and abstract, can be tweaked to get better clicks from email.
• Add URL in newsletter and Breaking news allows to add coverage and other content to go with regular RPS content. A faster and smooth process.
Thank You