1. Radu Stancut Foundations of Urban Science Assignment #5
Final Paper Spatial Patterns of Urban Innovation and Productivity
The purpose of creating a science of cities is to bring a
fact-based rigor and standardization to a critical human subject:
the way we live with others and the planet. To the extent that we
can fruitfully observe our urban environment, capture accurate
readings, and allow for hypothesis testing we are beholden to do
so. To affect our surroundings in an intentional and predictive
manner, ideally for the mutual benefit of our civilization and the
environment, grants us greater control over our long-term success
as a species. Many practices may come to bear when developing this
new science and we should be opportunistic in taking what works in
other fields, and applying their techniques. Jane Jacobs famously
tackled the topic of what kind of a problem a city is.1 Whether or
not we come to agree with her assessment, that cities are problems
of organized complexity, we should follow her rationale: identify
the features and functions of the urban environment, see what
analogous problems we have tackled in other areas, most especially
the sciences, and apply similar approaches and methods, modified
appropriately for the urban field and that most messy of subject
matters: people. Recent increases in technological capabilities,
such as storage, computational power, and easy access to data,
coupled with a belief that there are valuable and actionable
insights to be found in data have ushered in the concept of a
science of cities. This paper takes the notion of a science of
cities to mean that urban environments may now be considered
objects of study within a 1 Jacobs, J. 1961. The Death and Life of
Great American Cities. New York: Random House, Inc.
2. scientific framework, where the structure and behavior of
cities may be systematically studied via observation and
experiment.2 Lit Review Any science of cities approach would appear
to require delving into big data. The availability of new forms and
sources of data are opening up the possibility of taking
measurements at a speed never previously available in human
history.3 The belief would seem to be that with enough data we will
be able to identify patterns and delve deeper,4 perhaps identifying
underlying principles and laws. Big data is certainly a social
phenomenon,5 but its effectiveness will depend on how it is used
and the principles put in place. We have for instance the following
challenges to consider:6 Exponential data growth New types of data
Privacy and access Institutional barriers Use and relevance This
paper deals primarily with the last item and uses data to attempt
to extract insights on urban behavior and outcomes through a modest
analysis of GDP and patent information. The goal is to pick up on
potential power laws and see if they hold and can tell us something
about how a city behaves.7 2
http://www.oxforddictionaries.com/us/definition/american_english/science
3 Koonin, S. Big data and city living - what can it do for us?.
2012 The Royal Statistical Society 4
http://archive.wired.com/science/discoveries/magazine/16-07/pb_theory
5 danah boyd & Kate Crawford (2012) CRITICAL QUESTIONS FOR BIG
DATA, Information, Communication & Society, 15:5, 662-679, DOI:
10.1080/1369118X.2012.678878 6 Koonin, S. Big data and city living
- what can it do for us?. 2012 The Royal Statistical Society 7
Bettencourt, L.M.A. West, G. (2011) Bigger Cities do More with
Less. Scientific American
3. Materials and Methods In exploring cities for patterns and
regularities this paper focused on economic, population, and
innovation features of Metropolitan Statistical Areas (MSAs). The
unit of analysis for all research below was the MSA, unless
specified otherwise. The three main sections below correspond to
the following questions, and will be expanded on during the
analysis in each respective portion: 1. What is the relationship
between patenting performance and economic performance? 2. What is
the technological profile of the New York MSA? How does this
profile contrast/compare with that of (Boston, Houston, and the San
Jose MSAs)? 3. How diverse are the metropolitan patenting
portfolios and what does the resulting pattern reveal about
patenting across metropolitan areas? Data was collected from three
main sources: the Bureau of Economic Analysis (BEA), which provided
GDP per capita numbers by MSAs and broke down the technology
classes found within MSAs; the Census, for population numbers; and
the U.S. Patent Office (USPTO), where patents by both technology
class and MSAs could be found. The variables for each section are
described below, as part of the methods of data manipulation and
analysis. The table below outlines the breakdown of data by section
and may be used as a reference guide. Section Unit of Analysis
Variables Sources I. Patents and Economic Development MSA 1) per
capita real GDP 2) population 3) patent intensity BEA; Census;
USPTO II. Technological Profiles of Metropolitan Areas MSA 1)
patents by MSA by technology class USPTO III. Technological Heat
Maps of Metropolitan Areas MSA 1) Tally of technology class patents
by MSA BEA; USPTO
4. I. Patents and Economic Development Per capita real GDP was
compared against patent intensity, within each MSA, in order to
better understand the relationship between patenting performance
and economic performance. Per capita real GDP was acquired directly
from the BEA, while patent intensity had to be constructed from MSA
patents (USPTO) and population (Census). Patent intensity was
defined as (MSA patents / MSA population) x 100,000 (since the
numbers tend to be small for some locations). Data was collected
from the sources mentioned above, uploaded into Python, and the
disparate resulting data tables matched on MSA ID codes/FIPS.
Having merged the data sets together we now had in one table
information on population, per capita GDP, and, patent counts,
spanning from 2001 through 2012, inclusive. As suggested in the
assignment document, per capita GDP and patents were averaged over
a five year window. The five year time frame was used to smooth the
numbers and help minimize distortions. Additionally, different time
frames were used for the patents (2001-2005) and GDP (2008-2012) to
account for the time delay in patents coming on-line and to set up
the analysis for possible causality, with patents leading to
greater economic activity and not the other way around. With the
data merged, and the averages calculated, it was now possible to
construct the patent intensity variable (formula above) and
generate plots. Plotting the log of both GDP and patent intensity
shows a positive correlation (coefficient of 3.45; R- squared:
0.875; see Appendix). Subsequent plots show additional positive
correlations between population and patent intensity (Appendix:
Population Influence on Patent Intensity) and population and
average GDP (Appendix: 'Population and MSA GDP'). All three
plots/numbers provide evidence that size does matter and that it is
likely the larger the MSA, the more patent activity there exists
and the higher the GDP.
5. II. Technological Profiles of Metropolitan Areas Below I
describe the technological profile of the New York MSA and compare
it with three other metropolitan areas: Boston, Houston, and the
San Jose. In each instance, the variables analyzed were counts of
patents by technology class within each MSA. All data for this
section was acquired from the USPTO and uploaded to Python where
numbers were tabulated and plots/graphs generated. The focus was on
the top 10 technologies of each MSA and what could be ascertained
from this information. New York The top 10 technologies of New York
account for nearly a third (32%, Appendix) of all patent
technologies. This is in line with what we will see from the other
three MSAs below. As for each MSA, an index was created on the top
10 technologies, pegged against the top technology and we see a
marked drop off of ~40% from the top technology (Drug,
Bio-Affecting and Body Treating Compositions) to the technology in
second place (Multiplex Communications). This drop off is also not
uncommon for the selected MSAs, with one exception (San Jose).
6. The final exploratory step was to plot collected (2000-2011)
to get a better idea cities may be found in the Appendix,
reference. was to plotting the top 10 technologies for each city
through the years 2011) to get a better idea of patent activity
over time. Plots for each of the subsequent cities may be found in
the Appendix, the New York ones were presented here for convenience
and through the years of patent activity over time. Plots for each
of the subsequent for convenience and as a
7. Boston Of the three additionally selected MSAs, Boston is
the one most in line with New York. Boston technologies account for
a third of all patent activity and there is a similar (Drug,
Bio-Affecting and Body Treating Compositions and Microbiology).
Houston Houston introduces our first difference northeast MSAs,
with the top 10 technologies accounting for 43% of all patent
activity and the drop off from the first place patent class (Wells
[ second (Synthetic Resins or Natural Rubbers innovatively, with
respect to patent activity over the past decade, of report. San
Jose San Jose is also unlike the northeast MSA to be both more
concentrated and more diverse Of the three additionally selected
MSAs, Boston is the one most in line with New York. Boston a third
of all patent activity and there is a similar drop off from
technology #1 Affecting and Body Treating Compositions) to
technology #2 (Chemistry: Molecular Biology Houston introduces our
first differences in the MSA comparison: it is more top heavy
technologies accounting for 43% of all patent activity and the drop
off Wells [shafts or deep borings in the earth, e.g., for oil and
gas] Synthetic Resins or Natural Rubbers) is over 60%. Houston
would appear to be tent activity over the past decade, of the four
MSAs highlighted in this the northeast MSAs but in a different way
than Houston. San Jose would appear to be both more concentrated
and more diverse than New York, a paradox revealed by the numbers.
Of the three additionally selected MSAs, Boston is the one most in
line with New York. Bostons top ten drop off from technology #1
Chemistry: Molecular Biology top heavy than the two technologies
accounting for 43% of all patent activity and the drop off he
earth, e.g., for oil and gas]) to the ) is over 60%. Houston would
appear to be the least diverse s highlighted in this in a different
way than Houston. San Jose would appear revealed by the
numbers.
8. San Joses top 10 technologies account for nearly 40% of all
patent activity, but within this group the patents are more evenly
distributed; five of the San Jose industries are within 40% of the
lead patent category (Semiconductor Device Manufacturing: Process),
while New York and Boston only have one such industry each within
their MSA Lastly, the plotting of patents by year shows that 2010
and 2011 were exceptional for all four MSAs in the following
technological areas, something that would require additional
research to explain: New York Multiplex Communications; DP:
Financial, Business Practice, Management, or Cost/Price
Determination (Data Processing) Boston Multiplex Communications;
Multicomputer Data Transferring (Electrical Computers and Digital
Processing Systems) Houston Boring or Penetrating the Earth San
Jose Multiplex Communications; Multicomputer Data Transferring
(Electrical Computers and Digital Processing Systems); DP: Database
and File Management or Data Structures (Data Processing) III.
Technological Heat Maps of Metropolitan Areas Here again we take a
global look at MSAs and through the use of a different
visualization, a heat map, attempt to glean a better understanding
of urban innovation by comparing tallies of technology class
patents by MSAs. Two variables were mapped against one another,
patent technology classes on the vertical axis and MSAs on the
horizontal axis, both from the USPTO. This resulted in a large
grid, a 481 (patent technology classes) x 367 (MSAs) matrix. A for
loop was implemented in Python to read each instance of a
technology class per MSA and where a match was found a Y was placed
in that respective patent/MSA cell. Following the completion of the
for loop the Y instances were summed by MSA and the grid was sorted
along the horizontal axis (MSAs) from least Ys to most. Due to the
density of the matrix, Y cells were further highlighted in green to
provide a clearer visual representation.
9. Above we can see the green areas, instances of patent
activity by MSAs within technology areas, picking up or becoming
denser as we scan from left to right. This is the expected and
uninteresting part; what is non-trivial, however, are the gaps or
black areas shown above. Based on the image above and corresponding
data we can report that MSAs are lagging in several patent areas
(listed in Appendix).
10. Conclusion Based on the plots and numbers presented I would
tentatively argue that MSAs, at least in the United States show a
consistent and super-linear effect in relation to population and
GDP per capita and population and patent intensity. Throughout our
group we saw increases above the ratio of 1.0 suggesting that
greater populations lead to greater returns, in this case on wealth
and innovation as measured by our proxy statistics. Additional data
could be collected to investigate the topics pointed out in the
Materials and Methods section more thoroughly. So far, what has
been shown is correlation; it would be interesting to test for
causality and see in which direction the effect is more pronounced:
GDP to patent intensity or vice versa. Population was investigated
on an MSA level but not taken into consideration by land area, in
other words by density. Digging into population density could be
helpful in identifying if there is an optimal MSA for innovation.
Patents, and specifically the top 10 patents, can be delved into
deeper, specifically by comparing performance against industry
payroll and C-level employees due to outsourcing of industry, as
well as reviewing changes in MSA top 10 patents over time to review
changes in innovation and economic drivers over decades.
14. New York Class Class Title Total Class % Class IDX of Top
424 Drug, Bio-Affecting and Body Treating Compositions (includes
Class 514) 5212 8.462824947 1 370 Multiplex Communications 3138
5.095231136 0.602072141 705 DP: Financial, Business Practice,
Management, or Cost/Price Determination (Data Processing) 1848
3.000633251 0.354566385 455 Telecommunications 1589 2.580089954
0.304873369 435 Chemistry: Molecular Biology and Microbiology 1509
2.450192411 0.289524175 532 Organic Compounds (includes Classes
532-570) 1473 2.391738516 0.282617038 375 Pulse or Digital
Communications 1416 2.299186517 0.271680737 709 Multicomputer Data
Transferring (Electrical Computers and Digital Processing Systems)
1323 2.148180623 0.253837299 438 Semiconductor Device
Manufacturing: Process 1295 2.102716482 0.248465081 707 DP:
Database and File Management or Data Structures (Data Processing)
1139 1.849416273 0.218534152 Top 10% of total 32.38021011 Boston
Class Class Title Total Class % Class IDX of Top 424 Drug,
Bio-Affecting and Body Treating Compositions (includes Class 514)
3326 8.274661027 1 435 Chemistry: Molecular Biology and
Microbiology 2143 5.331508894 0.644317498 370 Multiplex
Communications 1397 3.475556661 0.420024053 709 Multicomputer Data
Transferring (Electrical Computers and Digital Processing Systems)
1136 2.826222167 0.341551413 128 Surgery (includes Class 600) 1089
2.709292201 0.327420325 250 Radiant Energy 1004 2.497823112
0.301864101 707 DP: Database and File Management or Data Structures
(Data Processing) 994 2.472944396 0.298857486 606 Surgery
(instruments) 871 2.166936186 0.261876127 532 Organic Compounds
(includes Classes 532-570) 847 2.107227267 0.254660253 382 Image
Analysis 631 1.569846996 0.189717378 Top 10% of total
33.43201891
15. Houston Class Class Title Total Class % Class IDX of Top
166 Wells (shafts or deep borings in the earth, e.g., for oil and
gas) 3259 15.49322558 1 520 Synthetic Resins or Natural Rubbers
(includes Classes 520- 528) 1272 6.047064416 0.390303774 175 Boring
or Penetrating the Earth 1049 4.986926551 0.321877877 702 DP:
Measuring, Calibrating, or Testing (Data Processing) 636
3.023532208 0.195151887 424 Drug, Bio-Affecting and Body Treating
Compositions (includes Class 514) 551 2.619443784 0.169070267 324
Electricity: Measuring and Testing 537 2.552888044 0.164774471 585
Chemistry of Hydrocarbon Compounds 502 2.386498693 0.15403498 532
Organic Compounds (includes Classes 532-570) 479 2.277157119
0.1469776 73 Measuring and Testing 468 2.224863323 0.143602332 507
Earth Boring, Well Treating, and Oil Field Chemistry 391
1.858806751 0.119975453 Top 10% of total 43.47040647 San Jose Class
Class Title Total Class % Class IDX of Top 438 Semiconductor Device
Manufacturing: Process 5418 6.050453952 1 370 Multiplex
Communications 4785 5.343562598 0.88316722 257 Active Solid-State
Devices (e.g., Transistors, Solid-State Diodes) 3695 4.126324723
0.681985973 365 Static Information Storage and Retrieval 3466
3.870593096 0.639719454 709 Multicomputer Data Transferring
(Electrical Computers and Digital Processing Systems) 3420
3.819223425 0.631229236 707 DP: Database and File Management or
Data Structures (Data Processing) 3219 3.594760293 0.594130676 360
Dynamic Magnetic Information Storage or Retrieval 2789 3.114565535
0.514765596 711 Memory (Electrical Computers and Digital Processing
Systems) 2578 2.878935084 0.475821336 345 Computer Graphics
Processing and Selective Visual Display Systems 2416 2.698024501
0.445921004 714 Error Detection/Correction and Fault
Detection/Recovery 2221 2.480261762 0.409929863 Top 10% of total
37.97670497
16. Gaps in Patent Activity 901 Robots 902 Electronic funds
transfer 903 Hybrid electric vehicles (hevs) 930 Peptide or protein
sequence 968 Horology 976 Nuclear technology 977 Nanotechnology 984
Musical instruments 987 Organic compounds containing a bi, sb, as,
or p atom or containing a metal atom of the 6th to 8th group of the
periodic system D01 Edible products D02 Apparel and haberdashery
D03 Travel goods and personal belongings D04 Brushware D05 Textile
or paper yard goods; sheet material D06 Furnishings D07 Equipment
for preparing or serving food or drink not elsewhere specified D08
Tools and hardware D09 Packages and containers for goods D10
Measuring, testing, or signalling instruments D11 Jewelry, symbolic
insignia, and ornaments D12 Transportation D13 Equipment for
production, distribution, or transformation of energy D14
Recording, communication, or information retrieval equipment D15
Machines not elsewhere specified D16 Photography and optical
equipment D17 Musical instruments D18 Printing and office machinery
D19 Office supplies; artists and teachers materials D20 Sales and
advertising equipment D21 Games, toys, and sports goods D22 Arms,
pyrotechnics, hunting and fishing equipment D23 Environmental
heating and cooling; fluid handling and sanitary equipment D24
Medical and laboratory equipment D25 Building units and
construction elements
17. D26 Lighting D27 Tobacco and smokers' supplies D28 Cosmetic
products and toilet articles D29 Equipment for safety, protection,
and rescue D30 Animal husbandry D32 Washing, cleaning, or drying
machine D34 Material or article handling equipment D99
Miscellaneous G9B INFORMATION STORAGE BASED ON RELATIVE MOVEMENT
BETWEEN RECORD CARRIER AND TRANSDUCER PLT Plants