User Study for Measuring Linguistic Complexity and Its...

8
User Study for Measuring Linguistic Complexity and Its Reduction by Technology on a Patent Website Hanna Suominen 1234 Gabriela Ferraro 21 Jaume Nualart 5 Leif Hanlen 213 Abstract Clarity of language improves efficiency and re- duces misunderstanding. In written text, it is measured by readability and with patent docu- ments, this readability is known to be particu- larly poor in the case of layperson-users without specialized knowledge in this subject. Here we introduce a 46-question survey, founded socio- technical theories of information technology (IT) use and users, to measure linguistic complex- ity and its reduction by IT on a patent web- site. 65 participants have taken the survey and their responses indicate that the patent language is complex for laypeople but reducible by IT that processes long claims sections and sentences; claim words with a specific meaning; the claim dependency structure; and patent classification codes. Supplementing current patent websites with these reading aids could unlock their valu- able information to the general public. 1. Introduction Clarity of language improves efficiency and reduces mis- understanding in communication. In written text, it is mea- sured by readability. Our focus is on improving readability of a web collection of patent documents, such as The Lens by Cambia and Espacenet Patent Search by the European Patent Office. The documents are visualized as webpages and searchable through web-based search. Here, we first develop a 46-question survey, founded socio- technical theories of information technology (IT) use and 1 The Australian National University, Canberra, ACT, Australia 2 Data61/CSIRO, Canberra, ACT, Australia 3 University of Can- berra, Canberra, ACT, Australia 4 University of Turku, Turku, Fin- land 5 Independent researcherwho was employed by 2 and 3 while this study was conducted. Correspondence to: Hanna Suominen <[email protected]>. Proceedings of the 34 th International Conference on Machine Learning, Sydney, Australia, PMLR 70, 2017. Copyright 2017 by the author(s). users, to measure linguistic complexity on a patent website. We then apply it to five readability tests that target reducing the complexity by introducing IT to summarize, visualize, and clarify the patent classification codes, dependencies be- tween patent claims, phrase structure of a long claim, its words with specific meaning, and key words. Finally, we report on outcomes of this user study with 65 participants. 2. Background The following dichotomy between the patent website users is tangible: patent specialists (e.g., intellectual property professionals, patent examiners, & patent landscape re- viewers) and business people (e.g., shareholders, investors, & trade associations) with at least fair expertise in this legal genre versus laypeople, including academics using patents documents or their bibliographies as sources for related work, public policy advisors wanting to see impacts of a given legal framework or gain other evidence for decision making, and data traders, which deliver IT or analytics ser- vices that are founded on the patent website, and other peo- ple outside this domain. Patent documents should follow a predefined structure and for readability, their claims can be seen both as the most important and most problematic part. Each document con- sists of a section for them, the title, abstract, background of the invention, description of the drawings, and other prede- fined sections. The claims section is important, because it defines the scope of legal protection of the invention. It is problematic, because due to jurisdictional regulations (in this study, we use those from the USA (Pressman, 2006)), every claim must be written into a single sentence and con- tain the following predefined parts: Preamble is an intro- duction, which describes the class of the invention. Tran- sition is a phrase or linking word that relates the preamble with the rest of the claim. The most common transitions are comprising, containing, including, consisting of, wherein, and characterize. Body text describes the invention and re- cites its limitations. To protect the invention, wordings are careful, resulting in arcane legal jargon both in terms of vocabulary and gram- mar (Alberts et al., 2011). For example, consist of in a

Transcript of User Study for Measuring Linguistic Complexity and Its...

Page 1: User Study for Measuring Linguistic Complexity and Its ...users.cecs.anu.edu.au/~u5422389/SuominenetalIML-ICML2017.pdf · User Study for Measuring Linguistic Complexity and Its Reduction

User Study for Measuring Linguistic Complexity and Its Reduction byTechnology on a Patent Website

Hanna Suominen 1 2 3 4 Gabriela Ferraro 2 1 Jaume Nualart 5 Leif Hanlen 2 1 3

Abstract

Clarity of language improves efficiency and re-duces misunderstanding In written text it ismeasured by readability and with patent docu-ments this readability is known to be particu-larly poor in the case of layperson-users withoutspecialized knowledge in this subject Here weintroduce a 46-question survey founded socio-technical theories of information technology (IT)use and users to measure linguistic complex-ity and its reduction by IT on a patent web-site 65 participants have taken the survey andtheir responses indicate that the patent languageis complex for laypeople but reducible by IT thatprocesses long claims sections and sentencesclaim words with a specific meaning the claimdependency structure and patent classificationcodes Supplementing current patent websiteswith these reading aids could unlock their valu-able information to the general public

1 IntroductionClarity of language improves efficiency and reduces mis-understanding in communication In written text it is mea-sured by readability Our focus is on improving readabilityof a web collection of patent documents such as The Lensby Cambia and Espacenet Patent Search by the EuropeanPatent Office The documents are visualized as webpagesand searchable through web-based search

Here we first develop a 46-question survey founded socio-technical theories of information technology (IT) use and

1The Australian National University Canberra ACT Australia2Data61CSIRO Canberra ACT Australia 3University of Can-berra Canberra ACT Australia 4University of Turku Turku Fin-land 5Independent researcherwho was employed by 2 and 3 whilethis study was conducted Correspondence to Hanna Suominenlthannasuominenanueduaugt

Proceedings of the 34 th International Conference on MachineLearning Sydney Australia PMLR 70 2017 Copyright 2017by the author(s)

users to measure linguistic complexity on a patent websiteWe then apply it to five readability tests that target reducingthe complexity by introducing IT to summarize visualizeand clarify the patent classification codes dependencies be-tween patent claims phrase structure of a long claim itswords with specific meaning and key words Finally wereport on outcomes of this user study with 65 participants

2 BackgroundThe following dichotomy between the patent website usersis tangible patent specialists (eg intellectual propertyprofessionals patent examiners amp patent landscape re-viewers) and business people (eg shareholders investorsamp trade associations) with at least fair expertise in this legalgenre versus laypeople including academics using patentsdocuments or their bibliographies as sources for relatedwork public policy advisors wanting to see impacts of agiven legal framework or gain other evidence for decisionmaking and data traders which deliver IT or analytics ser-vices that are founded on the patent website and other peo-ple outside this domain

Patent documents should follow a predefined structure andfor readability their claims can be seen both as the mostimportant and most problematic part Each document con-sists of a section for them the title abstract background ofthe invention description of the drawings and other prede-fined sections The claims section is important because itdefines the scope of legal protection of the invention It isproblematic because due to jurisdictional regulations (inthis study we use those from the USA (Pressman 2006))every claim must be written into a single sentence and con-tain the following predefined parts Preamble is an intro-duction which describes the class of the invention Tran-sition is a phrase or linking word that relates the preamblewith the rest of the claim The most common transitions arecomprising containing including consisting of whereinand characterize Body text describes the invention and re-cites its limitations

To protect the invention wordings are careful resulting inarcane legal jargon both in terms of vocabulary and gram-mar (Alberts et al 2011) For example consist of in a

User Study for Measuring Linguistic Complexity and Its Reduction by Technology on a Patent Website

patent claim is defined to be a closed transition and beforea list it means that inventions infringing this claim need tohave all and only the listed elements or steps In contrastconsisting is a hybrid transition with the specific meaningof other inventors might not infringe this claim by employ-ing some additional elements or steps The third transitiontype is called open transition defined as having at least thefollowing elements or steps

The sentences are so long and grammatically hard that theirunderstanding becomes difficult as illustrated by the fol-lowing 156-word sentence of the first claim of Patent EP0670191 B1 (transitions emphasized) ldquoToolholder whichcould comprise of a holder body with an insert site at its forwardend consisting essentially of [+ 99 words] characterized inthat the wedge consists of a pair of distantly provided first pro-trusions for abutment against a top face of the insert and a pairof distantly provided second protrusions for abutment against anadjacent edge surfacerdquo

3 Survey and Its Theoretical FrameworkOur user study (Table 1) is founded on socio-technical the-ories of IT use and users namely website user satisfaction(US yUS) technology acceptance model (TAM) and tasktechnology fit (TTF) The website US theory (Muylle et al2004) (WUS) factorizes yUS as a function of language cus-tomization website layout yI and yIT The dependent fac-tor for information (I yI ) is defined by accuracy compre-hensibility comprehensiveness and relevance The depen-dent factor for IT (yIT ) is defined by entry guidance hy-perlink connotation perceived ease of use (PEU yIT-PEU)speed and structure This factorization builds on moregeneral theories of IT US (Doll amp Torkzadeh 1988) andIT success (DeLone amp McLean 2003) The IT US theorymodels yUS through yIT-PEU and I provided by the IT yI |yIT meeting the user U rsquos needs The IT success theory factor-izes yUS as a function of yI yIT service quality u|i and bwhere u|i refers to the IT use u after having an intention ito use it i is defined by yI yIT service quality yUS and band the net benefit of the use b addresses the user U andU rsquos community and is defined by yUS and u|i

Similarly to the IT success theory TAM (Davis et al1989) elaborated as TAM2 (Venkatesh amp Davis 2000)and extended as unified theory of acceptance and use oftechnology (UTAUT) (Venkatesh et al 2003) models uThese three theories are founded on the structural equa-tion model (SEM) (Igbaria 1997) and theory of reasonedaction (TRA) (Fishbein amp Ajzen 1975) SEM factorizesu as a function of yIT-PEU(b) and yIT-PU(b yIT-PEU) whichconsiders not only U but also U rsquos community and supple-ments the IT assessment with i ts perceived usefulness (PUyIT-PU) In contrast TRA isolates U from the communityby using u|i(yUA yun) with the factor yUA for U rsquos attitude

toward the use formed by U rsquos beliefs on or evaluations ofIT and the factor yun for U rsquos subjective normative beliefsand motivation to comply TAM models u|i(yUA yIT-PU)where yUA is defined by yIT-PEU(e) and yIT-PU(yIT-PEU e)where e refers to external factors In TAM2 e is specifiedas the voluntariness of u to U benefit of u to U tangibilityof the results from u U rsquos subjective norm U rsquos experiencein using the IT U rsquos perception on U rsquos status enhancementwithin the community through u and U rsquos view on IT beingrelevant to U rsquos job In UTAUT they are defined by U rsquosage experience in using the IT and gender facilitatingconditions in U rsquos community to support u social influenceof important others in U rsquos community and voluntariness ofu to U

TTF (Goodhue 1995) and its TTFTAM integra-tion (Dishaw amp Strong 1999) view IT as U rsquos meansto perform a task TTF posits that a given IT will be usedif and only if its available functionalities fit the taskthat is it factorizes ytask-IT-fit as a function of ytask and yIT Then the theory models U rsquos performance as a function ofytask and u(ytask) The integrated theory has been found tooffer a significant improvement over either TTF or TAMalone a rational experienced U will choose to use ITthat enable completing the task with the greatest b andnot to accept those that will not offer them a sufficientadvantage TTFTAM links the theories by first addingthe aforementioned factor for U rsquos experience in using theIT into TTF second defining yIT-PEU of TAM by usingthis experience factor yIT and ytask-IT-fit of TTF thirdfactorizing yIT-PU of TAM as a function of yIT-PEU of TAMthe experience factor and ytask-IT-fit and finally connectingu with ytask ytask-IT-fit and i

These theories have been validated for evaluation and use-prediction on a wide variety of communities and IT (Daviset al 1989 Dishaw amp Strong 1999 Horton et al 2001Igbaria 1997 Venkatesh et al 2003) They are also ap-plicable to cases such as ours where use is voluntary (Dollet al 1998) development has only been initiated (Daviset al 1989) and a web interface is used to interact with theIT users (Castaneda et al 2007 Cheung amp Sachs 2006DrsquoAmbra et al 2013 Goodhue 1995 Lederer et al 2000Lin 2012 Venkatesh et al 2003) Moreover only a verybrief period of user interaction with the IT is needed be-fore these theories are capable to explain and predict useracceptance (Doll et al 1998 Szajna 1996) Finally theirfactors of PEU of IT (included explicitly in WUS TAMTAM2 UTAUT and TTFTAM and implicitly in TTF) theattractiveness of its graph design (compare with the web-site layout in WUS and PU of IT in TAM TAM2 UTAUTand TTFTAM) and U rsquos experience in using the IT (in-cluded explicitly in TAM2 and UTAUT) have been used fora user study of information visualization (IV) in the contextof specialists and laypeople in design (Quispel et al 2016)

User Study for Measuring Linguistic Complexity and Its Reduction by Technology on a Patent Website

Table 1 Explicit (E) and implicit (I) inclusion of the survey questions in the socio-technical theories of IT use and users

No Question Answer WUS TAM TAM2 UTAUT TTF TTFTAM1 Demographic Questions11 What is your age lt 20 21ndash30 51ndash60 ge 61 ndash I I E ndash I12 What is your highest level of

completed educationHigh school or equivalentSome college but not degreeBachelor Graduate

ndash I I I ndash I

13 Which of the following bestdescribes your current occupa-tion

Business and financial opera-tions Computer and mathemat-ical Architecture and engineer-ing Life and physical scienceCommunity and social serviceLegal Education and trainingArts design and entertainmentHealth care Office and admin-istrative Other

I I E E E E

14 Have you ever read a patentdocument (PD)

Yes No E E E E E E

15 Have you ever written a PD orits part

Yes No E E E E E E

2 Introductory Questions21 Do you use or have you used

PDs at your work on a daily ba-sis If yes how and when

Yes and free-form text No E E E E E E

22 How do you rank next state-ments about PDs

221 It is important to improve theirreadability

The 5-point Likert scale ofStrongly agree Somewhatagree Neutral Somewhatdisagree and Strongly disagree

E E E E E E

222 It is difficult to read them The 5-point Likert scale E E E E E E223 I do not find information I need

easily from them as they arenow

The 5-point Likert scale E E E E E E

23 Please explain what aspects ofthe way PDs are presented areyou happy with

Free-form text E E E E E E

24 Please explain how this formatcould be improved

Free-form text E E E E E E

3 Readability Questions31 Please rank the solutions in

your preference orderRanking from the most to theleast preferred solution

E E E E E E

32 In my opinion there is an evenbetter way of representing thePD If yes please describe it

Yes and free-form text No E E E E E E

33 Please tell how you agreeor disagree with the followingstatements about PDs

331 I think my preferred solutionsupports reading PDs

The 5-point Likert scale E E E E E E

332 I can see myself using my pre-ferred solution when readingPDs

The 5-point Likert scale ndash E E E E E

333 I plan to use my preferred solu-tion (if it was made available)

The 5-point Likert scale ndash E E E I E

334 I think my preferred solutionsupports my work

The 5-point Likert scale E E E E E E

335 In my opinion I do not findinformation I need easily fromPDs even if using the most pre-ferred solution

The 5-point Likert scale E E E E E E

User Study for Measuring Linguistic Complexity and Its Reduction by Technology on a Patent Website

(a) Test 1 phrase structure of a long claim

(b) Test 2 words with specific meaning

(c) Test 3 summarizing a long claim

(d) Test 4 interactive claims enumeration

(e) Test 5 patent classification codes as acolored table with codes definitions

Figure 1 Test 1 Test 2 Test 3 Test 4 and Test 5 to improve patent readability for laypeople Click the hyperlinks for larger figures

The user-study method has been used in the context of read-ability before (Chi et al 2007 Mayr et al 2011) Theformer paper describes an IV solution to enhance subjectindexing a book by taking a userrsquos information needs en-tered as keywords into account The solution also pro-vides a number of navigational cues to use the index andskim-read the book content by conceptually highlightingsentences The results show that the solution enables usersbe more efficient and more accurate in finding compar-ing and comprehending material than without it The latterpaper emphasizes that as opposed to measuring accuracyor speed IV evaluation should analyze usersrsquo problem-solving strategies and how the proposed solutions facilitateor hinder their task completion This approach aligns withthe TTF and TTFTAM theories

An ethical assessment of our questionnaire study involvinghuman participants was conducted organizationally lead-ing to an ethics approval prior to recruiting participantsParticipant recruitment was conducted through email andsocial media including Facebook Google+ LinkedIn and

Twitter The survey was implemented in English using anopen source web survey application It was open for any-one to participate from 19 Nov 2014 to 23 Feb 2015 Com-pleting the survey and any of its questions was optional

The front page of the survey included a participant infor-mation sheet and requested the participantrsquos informed con-sent and after this 5 demographic 6 introductory and 5times7readability questions were asked (Table 1) claims sectionreadability with understanding a long claim (Test 1 with 8solutions) a word with a specific meaning (Test 2 with 3solutions) a long sentence (Test 3 with 2 solutions) and adependence between individual claims (Test 4 with 2 solu-tions and document readability with 2 solutions of Test 5for understanding patent classification codes as summariesfor the document (Fig 1)

4 Analysis of All Participant ResponsesThe number of participants was 65 Most were relativelymature professionals (79 were at least 31 years old) with

User Study for Measuring Linguistic Complexity and Its Reduction by Technology on a Patent Website

a university degree (23 bachelor and 66 graduate)working in computing or mathematics (34) The nextmost common fields included arts design and entertain-ment (15) education and training (15) business andfinancial operations (10) and legal occupations (8)Only 31 of participants had never read a patent document74 had never written or partially-written one Only 12used or had used patent documents at their work on a dailybasis at least occasionally

A majority of participants strongly agreed (35) or some-what agreed (23) that improving the readability of patentdocuments was important nobody strongly disagreed withthis claim and only one participant somewhat disagreedDocuments were seen as extremely (29) or somewhat(29) difficult to read 39 had some difficulties in find-ing information

Participants were satisfied with the document headingstructure (eg abstract and background) conventions inwriting claims and consistence in legal terms They alsoappreciated figures and being written by professionals Onthe other hand this legal and technical jargon was also crit-icized as being unnecessary and difficult to understand Inaddition the inventorsrsquo common desire to patent as broadscope as possible hiding the precise invention was men-tioned together with the desire for subheadings consis-tent format structured content and standardized word-ings to allow machine reading To support this comput-erized search of machine-readable documents automationto propose synonymous terms or key phrases for searchingwas suggested Enriching documents with explanatory ex-pansions of synonymous concepts and hyperlinkingcross-referencing related parts (eg claims and figures) were alsosuggested Participants suggested a shorter and more fo-cused description of the precise invention for example asa supplementary abstract written by technical specialiststo detail the key features of the invention Another supple-mentary suggestion was a layperson-friendly rephrasing ofthe patent document itself

The most preferred Test 1 solutions for the phrase struc-ture were indentations with colorful graphical bars (26)or numbered bars (26) All solutions were at least aspopular as the unchanged structure Some participants(31) suggested IV improvements (eg cross-referencingwith figures collapsibleexpendable tree representationsmultiple layouts to choose andor combine from (eg in-dentation and colorful bars) information from other partsof the document (eg inventor classification codes key-words and context) emphasizing by font weightcolor ortext highlighting and simplifications) Most participantsstrongly (28) or somewhat (42) agreed on their pre-ferred solution supporting the document readability and75 saw themselves using this solution when reading

patent documents Over half (53) of the participantswould use this solution if it was made available and 36thought this solution supported their work However 36did not get their information needs satisfied

The highlighting (46) and font color (42) were themost preferred Test 2 solutions to visualize claims wordswith specific meaning followed by the unchanged text(13) Few participants (18) thought there was a bet-ter way for this representation such as simplification oflong sentences through automated parsing and text gen-eration methods and a proofing tool for entering text in anew patent document Most participants strongly (21)or somewhat (42) agreed on their preferred solution sup-porting the document readability and nearly two-thirds(61) saw themselves using this solution when readingpatent documents 42 of participants had some plansto use this solution given it was made available and 30thought this solution supports their work However a third(33) did not get their information needs satisfied

Test 3 summaries were formed by automatically extractingkey phrases for each long claim using them to search re-lated images and supplementing the claims section withthese images Almost all participants (92) preferred thissolution over the unchanged text (8) Few participants(16) thought there was an even better solution for thisrepresentation All the suggestions were concerns of the(poor) relevance of the images related to the sample IVor the risk of artificially reducing the claim scope Mostparticipants strongly (32) or somewhat (35) agreed ontheir preferred solution supporting the document readabil-ity and 65 saw themselves using this solution when read-ing patent documents 52 had some plans to use this so-lution given it was made available and 39 thought this so-lution supported their work although 36 did not get theirinformation needs satisfied

In Test 4 75 of the participants preferred enriching theclaims enumeration interactively over the unchanged enu-meration 13 thought there was an even better wayfor this representation they were concerned of the stud-ied interaction hiding content and wanted it to be a non-default option A clear majority of participants strongly(23) or somewhat (42) agreed on their preferred solu-tion supporting the document readability and nearly two-thirds (65) saw themselves using this solution when read-ing patent documents 48 had some plans to use this so-lution given it was made available and 42 thought thissolution supports their work However 36 did not gettheir information needs satisfied

Almost all participants (80) preferred the Test 5 solu-tion of presenting patent classification codes as a coloredtable with codes definition over the unchanged listing ofalphanumeric symbols (20) 23 suggested improve-

User Study for Measuring Linguistic Complexity and Its Reduction by Technology on a Patent Website

ments including hyperlinks or hovers to the definitionsexpanding or filtering the search by clicking the assignedclassification codes or altering the suggested color schemeA 66 majority strongly or somewhat agreed on their pre-ferred solution supporting the document readability andtwo thirds (67) saw themselves using this solution whenreading patent documents 50 had some plans to use thissolution if it was made available and 40 thought this so-lution supported their work However 27 did not get theirinformation needs satisfied

5 Analysis by the User Group of LaypeopleWe analyzed the responses of the 20 layperson-participantswho had taken all 5 readability tests (31 out of 65 partici-pants in total or 69 out of 29 participants taking all tests)responses of 2 patent examiners 1 legal counsel and 6patent authors were excluded Again most were relativelymature professionals (80 were over 31 years old) with auniversity degree (25 bachelor and 65 graduate) work-ing in computing or mathematics (50) The next mostcommon fields included education and training (25) andarts design and entertainment (20) 65 had and 30had not read a patent document Half (50) of laypersonparticipants had not used patent documents at their work ona daily basis

A clear majority of this group strongly (50) or some-what (25) agreed on improving the readability of patentdocuments being important nobody disagreed with thisclaim Documents were seen as extremely (20) or some-what (40) difficult to read nobody agreed that find-ing information from patent documents was easy whilst45 experienced least some difficulties Participants weresatisfied with the figures predictable document structure(iesections titles spaces typography and layout) andconventional consistent wordings and terminology Sug-gested improvements consisted of improving readability ofdocuments addressing legal or technical jargon ldquoin a spe-cific domain [that] seems at times gratuitous and unnec-essaryrdquo addressing ambiguity which was conjectured to beadded to ldquobe used to fight legal battlesrdquo and one participantsuggested that ldquothe whole patent culture and legal impli-cations should be revised and brought to a more practicallevelrdquo They suggested making documents machine read-able and guiding the readers in understanding the legallybinding content by supplementing documents with the non-binding clarifications such as figures hyperlinks to defini-tions synonym expansions from-legal-to-lay-term conver-sions highlights of the main aspects use cases summariesand further explanations of the invention

The most preferred Test 1 solutions were indentations withcolorful graphical bars (20) or numbered bars (20)The unchanged structure was the least (65) or second-

least (5) preferred solution in most responses 7 par-ticipants thought there was a better way for this repre-sentation 3 suggested collapsibleextendable tree repre-sentations and adding complementary semantic elementsfrom the same document or other patents whilst the othersuggestions were having multiple layouts for the user tochoose from highlightingemphasizingcoloring text andgenerating simplifications A clear majority of participantsstrongly (40) or somewhat (40) agreed on their pre-ferred solution supporting the document readability 90 5saw themselves using the solution Over a half (55) ofthem had some plans to use this solution and 35 thoughtthis solution supports their work However one participant(5) replied that the solution did not support hisher workand even with the solution 35 did not get their informa-tion needs satisfied

In Test 2 the font color solution was the most preferredalternative (35) followed by highlighting (25) and theunchanged text (15) 5 participants thought there was aneven better way for this representation and suggested short-ening and simplifying sentences having an online proof-ing tool to alert if sentences are too long or complicatedand assuring that the solutions are not too disruptive forthe reader A clear majority of participants strongly (20)or somewhat (45) agreed on their preferred solution sup-porting the document readability and 60 saw themselvesusing this solution 35 had some plans to use it and 25somewhat agreed with it supporting their work However35 did not get their information needs satisfied

An impressive majority of 80 preferred the Test 3 solu-tion over the unchanged text (5) 3 participants thoughtthere was an even better way for this representationagain concerns were raised regarding image relevance forIV Most participants strongly (35) or somewhat (30)agreed on their preferred solution supporting the documentreadability and 65 saw themselves using this solutionwhen reading patents Half had some plans to use the so-lution and 30 thought it supported their work Howevereven with this solution 30 did not get their informationneeds satisfied

As many as 65 preferred the Test 4 solution over the un-changed enumeration (20) 3 thought there was an evenbetter way for this representation and emphasized that thesolution must be optional because it has the danger of hid-ing content A clear majority strongly (30) or some-what (35) agreed on their preferred solution supportingthe document readability and 65 saw themselves using it45 had some plans to use it and 35 thought it supportstheir work However 30 did not get their informationneeds satisfied

A clear 70 majority preferred the Test 5 solution overthe unchanged enumeration (15) 5 thought there was

User Study for Measuring Linguistic Complexity and Its Reduction by Technology on a Patent Website

an even better way for this representation and suggestedkeeping the table structure but using more subtle pastelcolors and adding hyperlinks or hovers to definitions Aclear majority strongly (40) or somewhat (25) agreedon their preferred solution supporting the document read-ability 65 saw themselves using it with 45 havingsome plans to use it and nobody having no use plans 30thought it supports their work and nobody thought it didnot support it However even with this solution 25 didnot get their information needs satisfied

6 DiscussionPatents are nominally intended to provide a public goodby releasing innovative information to the public and thussupporting greater innovation However papers rarely citepatents due to scientist preferring other sources (Seymore2010) 63 of citations in US patents are due to examinersnot inventor (Alcaer et al 2008) and for laypeople (suchas our mainly university-educated layperson-participants)finding extracting and using information from patent doc-uments is too difficult (Alberts et al 2011)

For the patent specialist our survey results were positiveMost agreed that various support mechanisms were help-ful although the optimal visualization was open to argu-ment Equally they tended to be concerned at the potentialfor misrepresenting a patent (eg by reducing the scope ofclaims) and the claims text and classification codes weredesigned for- and used primarily by them For each testroughly half of the cohort could see themselves using theirpreferred solution

Conversely the laypeople strongly agreed that patentsshould and could be made more readable and interestinglyfocused many suggestions on non-legally binding reading-aids that is providing annotations for patents that sup-ported wider understanding of the teachings without im-pacting the legal nature of the patent The layperson cohortwere far more supportive of IT that improved patent read-ability and were far more likely to agree that they woulduse such technologies in future compared with the over-all cohort Comparing the responses of (educated) laypeo-ple with the wider cohort suggests that as a demographiclaypeople are both under-served by patent readability tech-nologies and that they are highly absorptive of new tech-nologies for patent readability

These reading aids have been developed and evaluated sta-tistically from the text unit of individual words in En-glish to entire document collections For example Fer-raro et al (2014) have introduced a method for segment-ing a patent claim first to the parts of preamble transitionand body text followed by further segmentation of clausesClause segmentation has also been addressed in a com-

putational evaluation initiative leading to six participat-ing systems (Sang amp Dejean 2001) Moreover Shereme-tyeva (2003) have proposed a syntactic dependency parserto simplify sentences of patent documents by paraphrasingand the PATEexpert patent processing service by Bouayad-Agha et al (2009) exists to simplify patent claims by notonly paraphrasing but also text summarization FinallyKoch et al (2011) have developed the PatViz system forinteractive visual search and analysis of patent information

If extending to reading and writing aids that are domain in-dependent or widely applicable Goffin et al (2014) havestudied enriching individual words with metadata (suchas our Test 2 IV for words with specific meaning) TheVisRA visual analysis tool by Oelke et al (2012) can beused to codify a document with respect to 141 readabil-ity features in order to support writing simpler paragraphsand sentences Moreover analogously to our Test 3 so-lution which uses pictures to summarize a long claim Liuet al (2015) have studied the use of word clouds to visuallysummarize important keywords from a large collection oftext If analyzing a document collection the ThemeDeltavisual analytics system by Gad et al (2015) is availablefor visualizing its temporal trends in topics with respectto for example document publication dates Furthermorethe Adaptive VIsualization By Example (VIBE) system byAhn amp Brusilovsky (2009) enables visual representationand exploration of the search results as a website usingthe TaskSieve adaptive search engine Its evaluation as auser study comprises ten participants Finally the Topic-Panorama solution by Liu et al (2014) and Wang et al(2016) extends the readability analysis to visual analyticsfor getting a full picture of relevant topics that are discussedin multiple document collections

ReferencesAhn J-W and Brusilovsky P Adaptive visualization of search

results Bringing user models to visual analytics InformationVisualization 8167ndash79 2009

Alberts D Yang C B Fobare-DePonio D Koubek K RobinsS Rodgers M Simmons E and D DeMarco Introductionto patent searching In Current Challenges in Patent Informa-tion Retrieval vol 29 of the series The Information RetrievalSeries pp 3ndash43 Berlin Heidelberg Germany 2011 Springer-Verlag

Alcaer J Gittelman M and Sampat B Applicant and ExaminerCitations in US Patents An Overview and Analysis HarvardBusiness School Boston MA 2008

Bouayad-Agha N Casamayor G Ferraro G Mille S Vidal Vand Wanner L Improving the comprehension of legal docu-mentation The case of patent claims In Proc of the 12th In-ternational Conference on Artificial Intelligence and Law pp78ndash87 New York NY 2009 ACM

Castaneda J A Munoz-Leiva F and Luque T Web acceptance

User Study for Measuring Linguistic Complexity and Its Reduction by Technology on a Patent Website

model (WAM) Moderating effects of user experience Journalof Information and Management 44(4)384ndash96 2007

Cheung E Y and Sachs J Test of the technology acceptancemodel for a web-based information system in a Hong KongChinese sample Psychological Reports 99(3)691ndash703 2006

Chi E H Hong L Heiser J Card S K and GumbrechtM ScentIndex and ScentHighlights Productive readingtechniques for conceptually reorganizing subject indexes andhighlighting passages Information Visualization 6(1)32ndash472007

DrsquoAmbra J Wilson C and Akter S Application of the task-technology fit model to structure and evaluate the adoption ofebooks by academics Journal of the American Society for In-formation Science and Technology 64(1)48ndash64 2013

Davis F Bagozzi R and Warshaw P User acceptance of com-puter technology A comparison of two theoretical modelsManagement Science 35(8)982ndash1003 1989

DeLone W H and McLean E R The DeLone and McLean modelof information systems success A ten-year update Journal ofManagement Information Systems 19(4)9ndash30 2003

Dishaw M T and Strong D M Extending the technology accep-tance model with task-technology fit constructs 36(1)9ndash211999

Doll W J and Torkzadeh G The measurement of end-user com-puting satisfaction MIS Quarterly 12(2)259ndash74 1988

Doll W J Hendrickson A M and Deng X Using Davisrsquoperceived usefulness and ease-of-use instruments for decisionmaking A confirmatory and multigroup invariance analysisDecision Sciences 29(4)839ndash69 1998

Ferraro F Suominen H and Nualart J Segmentation of patentclaims for improving their readability In Proc of the 3rd Work-shop on Predicting and Improving Text Readability for TargetReader Populations (PITR) at EACL pp 66ndash73 StroudsburgPA 2014 ACL

Fishbein M and Ajzen I Belief Attitude Intention and Be-haviour An Introduction to Theory and Research Addison-Wesley Reading MA 1975

Gad S Javed W Ghani S Elmqvist N Ewing T HamptonK N and Ramakrishnan N ThemeDelta Dynamic segmenta-tions over temporal topic models IEEE Transactions on Visu-alization and Computer Graphics 21672ndash85 2015

Goffin P Willett W Fekete J-D and Isenberg P Exploringthe placement and design of word-scale visualizations IEEETransactions on Visualization and Computer Graphics 202291ndash300 2014

Goodhue D Understanding user evaluations of information sys-tems Management Science 41(12)1827ndash44 1995

Horton R P Buck R Waterson P E and Clegg C ExplainingIntranet use with the technology acceptance model Journal ofInformation Technology 16(4)237ndash49 2001

Igbaria M Personal computing acceptance factors in small firmsA structural equation model MIS Quarterly 21(3)279ndash3021997

Koch S Bosch H Giereth M and Ertl R Iterative integrationof visual insights during scalable patent search and analysisIEEE Transactions on Visualization and Computer Graphics17557ndash69 2011

Lederer A L Maupin D J Sena M P and Zhuang Y The tech-nology acceptance model and the world wide web DecisionSupport Systems 29(3)269ndash82 2000

Lin W-S Perceived fit and satisfaction on web learning perfor-mance IS continuance intention and task-technology fit per-spectives International Journal of Human-Computer Studies70(7)498ndash507 2012

Liu S Wang X Chen J Zhu J and Guo B TopicPanoramaA full picture of relevant topics In Proc of the 2014 IEEEConference on Visual Analytics Science and Technology pp183ndash92 New York NY 2014 IEEE

Liu X Shen H-W and Hu Y Supporting multifaceted viewingof word clouds with focus+context display Information Visu-alization 14168ndash80 2015

Mayr E Smuc M and Risku H Many roads lead to romeMapping users problem-solving strategies Information Visu-alization 10(3)232ndash47 2011

Muylle S Moenaert R and M M Despontin The conceptual-ization and empirical validation of web site user satisfactionInformation amp Management 41(5)543ndash60 2004

Oelke D Spretke D Stoffel A and Keim D A Visual read-ability analysis How to make your writings easier to readIEEE Transactions on Visualization and Computer Graphics18662ndash74 2012

Pressman D Patent It Yourself Nolo Berkeley CA 2006

Quispel A Maes A and Schilperoord J Graph and chart aes-thetics for experts and laymen in design The role of familiar-ity and perceived ease of use Information Visualization 15(3)238ndash52 2016

Sang E T K and Dejean H Introduction to the CoNLL-2001shared task Clause identification In Proc of the Fith Confer-ence on Computational Natural Language Learning 7 no 8 ofCoNLL pp 53ndash7 Stroudsburg PA 2001 ACL

Seymore S B The teaching function of patents Notre Dame LawReview 85621ndash70 2010

Sheremetyeva S Natural language analysis of patent claims InProc of the ACL-2003 workshop on Patent Corpus Processingat ACL pp 66ndash73 Stroudsburg PA 2003 ACL

Szajna B Empirical evaluation of the revised technology accep-tance model Management Science 42(1)85ndash92 1996

Venkatesh V and Davis F D A theoretical extension of thetechnology acceptance model Four longitudinal field studiesManagement Science 46(2)186ndash204 2000

Venkatesh V Morris M G Davis F D and Davis G User accep-tance of information technology Toward a unified view MISQuarterly 27(3)425ndash78 2003

Wang X Liu S Liu J Chen J Zhu J and Guo B A fullpicture of relevant topics IEEE Transactions on Visualizationand Computer Graphics PP1 2016

Page 2: User Study for Measuring Linguistic Complexity and Its ...users.cecs.anu.edu.au/~u5422389/SuominenetalIML-ICML2017.pdf · User Study for Measuring Linguistic Complexity and Its Reduction

User Study for Measuring Linguistic Complexity and Its Reduction by Technology on a Patent Website

patent claim is defined to be a closed transition and beforea list it means that inventions infringing this claim need tohave all and only the listed elements or steps In contrastconsisting is a hybrid transition with the specific meaningof other inventors might not infringe this claim by employ-ing some additional elements or steps The third transitiontype is called open transition defined as having at least thefollowing elements or steps

The sentences are so long and grammatically hard that theirunderstanding becomes difficult as illustrated by the fol-lowing 156-word sentence of the first claim of Patent EP0670191 B1 (transitions emphasized) ldquoToolholder whichcould comprise of a holder body with an insert site at its forwardend consisting essentially of [+ 99 words] characterized inthat the wedge consists of a pair of distantly provided first pro-trusions for abutment against a top face of the insert and a pairof distantly provided second protrusions for abutment against anadjacent edge surfacerdquo

3 Survey and Its Theoretical FrameworkOur user study (Table 1) is founded on socio-technical the-ories of IT use and users namely website user satisfaction(US yUS) technology acceptance model (TAM) and tasktechnology fit (TTF) The website US theory (Muylle et al2004) (WUS) factorizes yUS as a function of language cus-tomization website layout yI and yIT The dependent fac-tor for information (I yI ) is defined by accuracy compre-hensibility comprehensiveness and relevance The depen-dent factor for IT (yIT ) is defined by entry guidance hy-perlink connotation perceived ease of use (PEU yIT-PEU)speed and structure This factorization builds on moregeneral theories of IT US (Doll amp Torkzadeh 1988) andIT success (DeLone amp McLean 2003) The IT US theorymodels yUS through yIT-PEU and I provided by the IT yI |yIT meeting the user U rsquos needs The IT success theory factor-izes yUS as a function of yI yIT service quality u|i and bwhere u|i refers to the IT use u after having an intention ito use it i is defined by yI yIT service quality yUS and band the net benefit of the use b addresses the user U andU rsquos community and is defined by yUS and u|i

Similarly to the IT success theory TAM (Davis et al1989) elaborated as TAM2 (Venkatesh amp Davis 2000)and extended as unified theory of acceptance and use oftechnology (UTAUT) (Venkatesh et al 2003) models uThese three theories are founded on the structural equa-tion model (SEM) (Igbaria 1997) and theory of reasonedaction (TRA) (Fishbein amp Ajzen 1975) SEM factorizesu as a function of yIT-PEU(b) and yIT-PU(b yIT-PEU) whichconsiders not only U but also U rsquos community and supple-ments the IT assessment with i ts perceived usefulness (PUyIT-PU) In contrast TRA isolates U from the communityby using u|i(yUA yun) with the factor yUA for U rsquos attitude

toward the use formed by U rsquos beliefs on or evaluations ofIT and the factor yun for U rsquos subjective normative beliefsand motivation to comply TAM models u|i(yUA yIT-PU)where yUA is defined by yIT-PEU(e) and yIT-PU(yIT-PEU e)where e refers to external factors In TAM2 e is specifiedas the voluntariness of u to U benefit of u to U tangibilityof the results from u U rsquos subjective norm U rsquos experiencein using the IT U rsquos perception on U rsquos status enhancementwithin the community through u and U rsquos view on IT beingrelevant to U rsquos job In UTAUT they are defined by U rsquosage experience in using the IT and gender facilitatingconditions in U rsquos community to support u social influenceof important others in U rsquos community and voluntariness ofu to U

TTF (Goodhue 1995) and its TTFTAM integra-tion (Dishaw amp Strong 1999) view IT as U rsquos meansto perform a task TTF posits that a given IT will be usedif and only if its available functionalities fit the taskthat is it factorizes ytask-IT-fit as a function of ytask and yIT Then the theory models U rsquos performance as a function ofytask and u(ytask) The integrated theory has been found tooffer a significant improvement over either TTF or TAMalone a rational experienced U will choose to use ITthat enable completing the task with the greatest b andnot to accept those that will not offer them a sufficientadvantage TTFTAM links the theories by first addingthe aforementioned factor for U rsquos experience in using theIT into TTF second defining yIT-PEU of TAM by usingthis experience factor yIT and ytask-IT-fit of TTF thirdfactorizing yIT-PU of TAM as a function of yIT-PEU of TAMthe experience factor and ytask-IT-fit and finally connectingu with ytask ytask-IT-fit and i

These theories have been validated for evaluation and use-prediction on a wide variety of communities and IT (Daviset al 1989 Dishaw amp Strong 1999 Horton et al 2001Igbaria 1997 Venkatesh et al 2003) They are also ap-plicable to cases such as ours where use is voluntary (Dollet al 1998) development has only been initiated (Daviset al 1989) and a web interface is used to interact with theIT users (Castaneda et al 2007 Cheung amp Sachs 2006DrsquoAmbra et al 2013 Goodhue 1995 Lederer et al 2000Lin 2012 Venkatesh et al 2003) Moreover only a verybrief period of user interaction with the IT is needed be-fore these theories are capable to explain and predict useracceptance (Doll et al 1998 Szajna 1996) Finally theirfactors of PEU of IT (included explicitly in WUS TAMTAM2 UTAUT and TTFTAM and implicitly in TTF) theattractiveness of its graph design (compare with the web-site layout in WUS and PU of IT in TAM TAM2 UTAUTand TTFTAM) and U rsquos experience in using the IT (in-cluded explicitly in TAM2 and UTAUT) have been used fora user study of information visualization (IV) in the contextof specialists and laypeople in design (Quispel et al 2016)

User Study for Measuring Linguistic Complexity and Its Reduction by Technology on a Patent Website

Table 1 Explicit (E) and implicit (I) inclusion of the survey questions in the socio-technical theories of IT use and users

No Question Answer WUS TAM TAM2 UTAUT TTF TTFTAM1 Demographic Questions11 What is your age lt 20 21ndash30 51ndash60 ge 61 ndash I I E ndash I12 What is your highest level of

completed educationHigh school or equivalentSome college but not degreeBachelor Graduate

ndash I I I ndash I

13 Which of the following bestdescribes your current occupa-tion

Business and financial opera-tions Computer and mathemat-ical Architecture and engineer-ing Life and physical scienceCommunity and social serviceLegal Education and trainingArts design and entertainmentHealth care Office and admin-istrative Other

I I E E E E

14 Have you ever read a patentdocument (PD)

Yes No E E E E E E

15 Have you ever written a PD orits part

Yes No E E E E E E

2 Introductory Questions21 Do you use or have you used

PDs at your work on a daily ba-sis If yes how and when

Yes and free-form text No E E E E E E

22 How do you rank next state-ments about PDs

221 It is important to improve theirreadability

The 5-point Likert scale ofStrongly agree Somewhatagree Neutral Somewhatdisagree and Strongly disagree

E E E E E E

222 It is difficult to read them The 5-point Likert scale E E E E E E223 I do not find information I need

easily from them as they arenow

The 5-point Likert scale E E E E E E

23 Please explain what aspects ofthe way PDs are presented areyou happy with

Free-form text E E E E E E

24 Please explain how this formatcould be improved

Free-form text E E E E E E

3 Readability Questions31 Please rank the solutions in

your preference orderRanking from the most to theleast preferred solution

E E E E E E

32 In my opinion there is an evenbetter way of representing thePD If yes please describe it

Yes and free-form text No E E E E E E

33 Please tell how you agreeor disagree with the followingstatements about PDs

331 I think my preferred solutionsupports reading PDs

The 5-point Likert scale E E E E E E

332 I can see myself using my pre-ferred solution when readingPDs

The 5-point Likert scale ndash E E E E E

333 I plan to use my preferred solu-tion (if it was made available)

The 5-point Likert scale ndash E E E I E

334 I think my preferred solutionsupports my work

The 5-point Likert scale E E E E E E

335 In my opinion I do not findinformation I need easily fromPDs even if using the most pre-ferred solution

The 5-point Likert scale E E E E E E

User Study for Measuring Linguistic Complexity and Its Reduction by Technology on a Patent Website

(a) Test 1 phrase structure of a long claim

(b) Test 2 words with specific meaning

(c) Test 3 summarizing a long claim

(d) Test 4 interactive claims enumeration

(e) Test 5 patent classification codes as acolored table with codes definitions

Figure 1 Test 1 Test 2 Test 3 Test 4 and Test 5 to improve patent readability for laypeople Click the hyperlinks for larger figures

The user-study method has been used in the context of read-ability before (Chi et al 2007 Mayr et al 2011) Theformer paper describes an IV solution to enhance subjectindexing a book by taking a userrsquos information needs en-tered as keywords into account The solution also pro-vides a number of navigational cues to use the index andskim-read the book content by conceptually highlightingsentences The results show that the solution enables usersbe more efficient and more accurate in finding compar-ing and comprehending material than without it The latterpaper emphasizes that as opposed to measuring accuracyor speed IV evaluation should analyze usersrsquo problem-solving strategies and how the proposed solutions facilitateor hinder their task completion This approach aligns withthe TTF and TTFTAM theories

An ethical assessment of our questionnaire study involvinghuman participants was conducted organizationally lead-ing to an ethics approval prior to recruiting participantsParticipant recruitment was conducted through email andsocial media including Facebook Google+ LinkedIn and

Twitter The survey was implemented in English using anopen source web survey application It was open for any-one to participate from 19 Nov 2014 to 23 Feb 2015 Com-pleting the survey and any of its questions was optional

The front page of the survey included a participant infor-mation sheet and requested the participantrsquos informed con-sent and after this 5 demographic 6 introductory and 5times7readability questions were asked (Table 1) claims sectionreadability with understanding a long claim (Test 1 with 8solutions) a word with a specific meaning (Test 2 with 3solutions) a long sentence (Test 3 with 2 solutions) and adependence between individual claims (Test 4 with 2 solu-tions and document readability with 2 solutions of Test 5for understanding patent classification codes as summariesfor the document (Fig 1)

4 Analysis of All Participant ResponsesThe number of participants was 65 Most were relativelymature professionals (79 were at least 31 years old) with

User Study for Measuring Linguistic Complexity and Its Reduction by Technology on a Patent Website

a university degree (23 bachelor and 66 graduate)working in computing or mathematics (34) The nextmost common fields included arts design and entertain-ment (15) education and training (15) business andfinancial operations (10) and legal occupations (8)Only 31 of participants had never read a patent document74 had never written or partially-written one Only 12used or had used patent documents at their work on a dailybasis at least occasionally

A majority of participants strongly agreed (35) or some-what agreed (23) that improving the readability of patentdocuments was important nobody strongly disagreed withthis claim and only one participant somewhat disagreedDocuments were seen as extremely (29) or somewhat(29) difficult to read 39 had some difficulties in find-ing information

Participants were satisfied with the document headingstructure (eg abstract and background) conventions inwriting claims and consistence in legal terms They alsoappreciated figures and being written by professionals Onthe other hand this legal and technical jargon was also crit-icized as being unnecessary and difficult to understand Inaddition the inventorsrsquo common desire to patent as broadscope as possible hiding the precise invention was men-tioned together with the desire for subheadings consis-tent format structured content and standardized word-ings to allow machine reading To support this comput-erized search of machine-readable documents automationto propose synonymous terms or key phrases for searchingwas suggested Enriching documents with explanatory ex-pansions of synonymous concepts and hyperlinkingcross-referencing related parts (eg claims and figures) were alsosuggested Participants suggested a shorter and more fo-cused description of the precise invention for example asa supplementary abstract written by technical specialiststo detail the key features of the invention Another supple-mentary suggestion was a layperson-friendly rephrasing ofthe patent document itself

The most preferred Test 1 solutions for the phrase struc-ture were indentations with colorful graphical bars (26)or numbered bars (26) All solutions were at least aspopular as the unchanged structure Some participants(31) suggested IV improvements (eg cross-referencingwith figures collapsibleexpendable tree representationsmultiple layouts to choose andor combine from (eg in-dentation and colorful bars) information from other partsof the document (eg inventor classification codes key-words and context) emphasizing by font weightcolor ortext highlighting and simplifications) Most participantsstrongly (28) or somewhat (42) agreed on their pre-ferred solution supporting the document readability and75 saw themselves using this solution when reading

patent documents Over half (53) of the participantswould use this solution if it was made available and 36thought this solution supported their work However 36did not get their information needs satisfied

The highlighting (46) and font color (42) were themost preferred Test 2 solutions to visualize claims wordswith specific meaning followed by the unchanged text(13) Few participants (18) thought there was a bet-ter way for this representation such as simplification oflong sentences through automated parsing and text gen-eration methods and a proofing tool for entering text in anew patent document Most participants strongly (21)or somewhat (42) agreed on their preferred solution sup-porting the document readability and nearly two-thirds(61) saw themselves using this solution when readingpatent documents 42 of participants had some plansto use this solution given it was made available and 30thought this solution supports their work However a third(33) did not get their information needs satisfied

Test 3 summaries were formed by automatically extractingkey phrases for each long claim using them to search re-lated images and supplementing the claims section withthese images Almost all participants (92) preferred thissolution over the unchanged text (8) Few participants(16) thought there was an even better solution for thisrepresentation All the suggestions were concerns of the(poor) relevance of the images related to the sample IVor the risk of artificially reducing the claim scope Mostparticipants strongly (32) or somewhat (35) agreed ontheir preferred solution supporting the document readabil-ity and 65 saw themselves using this solution when read-ing patent documents 52 had some plans to use this so-lution given it was made available and 39 thought this so-lution supported their work although 36 did not get theirinformation needs satisfied

In Test 4 75 of the participants preferred enriching theclaims enumeration interactively over the unchanged enu-meration 13 thought there was an even better wayfor this representation they were concerned of the stud-ied interaction hiding content and wanted it to be a non-default option A clear majority of participants strongly(23) or somewhat (42) agreed on their preferred solu-tion supporting the document readability and nearly two-thirds (65) saw themselves using this solution when read-ing patent documents 48 had some plans to use this so-lution given it was made available and 42 thought thissolution supports their work However 36 did not gettheir information needs satisfied

Almost all participants (80) preferred the Test 5 solu-tion of presenting patent classification codes as a coloredtable with codes definition over the unchanged listing ofalphanumeric symbols (20) 23 suggested improve-

User Study for Measuring Linguistic Complexity and Its Reduction by Technology on a Patent Website

ments including hyperlinks or hovers to the definitionsexpanding or filtering the search by clicking the assignedclassification codes or altering the suggested color schemeA 66 majority strongly or somewhat agreed on their pre-ferred solution supporting the document readability andtwo thirds (67) saw themselves using this solution whenreading patent documents 50 had some plans to use thissolution if it was made available and 40 thought this so-lution supported their work However 27 did not get theirinformation needs satisfied

5 Analysis by the User Group of LaypeopleWe analyzed the responses of the 20 layperson-participantswho had taken all 5 readability tests (31 out of 65 partici-pants in total or 69 out of 29 participants taking all tests)responses of 2 patent examiners 1 legal counsel and 6patent authors were excluded Again most were relativelymature professionals (80 were over 31 years old) with auniversity degree (25 bachelor and 65 graduate) work-ing in computing or mathematics (50) The next mostcommon fields included education and training (25) andarts design and entertainment (20) 65 had and 30had not read a patent document Half (50) of laypersonparticipants had not used patent documents at their work ona daily basis

A clear majority of this group strongly (50) or some-what (25) agreed on improving the readability of patentdocuments being important nobody disagreed with thisclaim Documents were seen as extremely (20) or some-what (40) difficult to read nobody agreed that find-ing information from patent documents was easy whilst45 experienced least some difficulties Participants weresatisfied with the figures predictable document structure(iesections titles spaces typography and layout) andconventional consistent wordings and terminology Sug-gested improvements consisted of improving readability ofdocuments addressing legal or technical jargon ldquoin a spe-cific domain [that] seems at times gratuitous and unnec-essaryrdquo addressing ambiguity which was conjectured to beadded to ldquobe used to fight legal battlesrdquo and one participantsuggested that ldquothe whole patent culture and legal impli-cations should be revised and brought to a more practicallevelrdquo They suggested making documents machine read-able and guiding the readers in understanding the legallybinding content by supplementing documents with the non-binding clarifications such as figures hyperlinks to defini-tions synonym expansions from-legal-to-lay-term conver-sions highlights of the main aspects use cases summariesand further explanations of the invention

The most preferred Test 1 solutions were indentations withcolorful graphical bars (20) or numbered bars (20)The unchanged structure was the least (65) or second-

least (5) preferred solution in most responses 7 par-ticipants thought there was a better way for this repre-sentation 3 suggested collapsibleextendable tree repre-sentations and adding complementary semantic elementsfrom the same document or other patents whilst the othersuggestions were having multiple layouts for the user tochoose from highlightingemphasizingcoloring text andgenerating simplifications A clear majority of participantsstrongly (40) or somewhat (40) agreed on their pre-ferred solution supporting the document readability 90 5saw themselves using the solution Over a half (55) ofthem had some plans to use this solution and 35 thoughtthis solution supports their work However one participant(5) replied that the solution did not support hisher workand even with the solution 35 did not get their informa-tion needs satisfied

In Test 2 the font color solution was the most preferredalternative (35) followed by highlighting (25) and theunchanged text (15) 5 participants thought there was aneven better way for this representation and suggested short-ening and simplifying sentences having an online proof-ing tool to alert if sentences are too long or complicatedand assuring that the solutions are not too disruptive forthe reader A clear majority of participants strongly (20)or somewhat (45) agreed on their preferred solution sup-porting the document readability and 60 saw themselvesusing this solution 35 had some plans to use it and 25somewhat agreed with it supporting their work However35 did not get their information needs satisfied

An impressive majority of 80 preferred the Test 3 solu-tion over the unchanged text (5) 3 participants thoughtthere was an even better way for this representationagain concerns were raised regarding image relevance forIV Most participants strongly (35) or somewhat (30)agreed on their preferred solution supporting the documentreadability and 65 saw themselves using this solutionwhen reading patents Half had some plans to use the so-lution and 30 thought it supported their work Howevereven with this solution 30 did not get their informationneeds satisfied

As many as 65 preferred the Test 4 solution over the un-changed enumeration (20) 3 thought there was an evenbetter way for this representation and emphasized that thesolution must be optional because it has the danger of hid-ing content A clear majority strongly (30) or some-what (35) agreed on their preferred solution supportingthe document readability and 65 saw themselves using it45 had some plans to use it and 35 thought it supportstheir work However 30 did not get their informationneeds satisfied

A clear 70 majority preferred the Test 5 solution overthe unchanged enumeration (15) 5 thought there was

User Study for Measuring Linguistic Complexity and Its Reduction by Technology on a Patent Website

an even better way for this representation and suggestedkeeping the table structure but using more subtle pastelcolors and adding hyperlinks or hovers to definitions Aclear majority strongly (40) or somewhat (25) agreedon their preferred solution supporting the document read-ability 65 saw themselves using it with 45 havingsome plans to use it and nobody having no use plans 30thought it supports their work and nobody thought it didnot support it However even with this solution 25 didnot get their information needs satisfied

6 DiscussionPatents are nominally intended to provide a public goodby releasing innovative information to the public and thussupporting greater innovation However papers rarely citepatents due to scientist preferring other sources (Seymore2010) 63 of citations in US patents are due to examinersnot inventor (Alcaer et al 2008) and for laypeople (suchas our mainly university-educated layperson-participants)finding extracting and using information from patent doc-uments is too difficult (Alberts et al 2011)

For the patent specialist our survey results were positiveMost agreed that various support mechanisms were help-ful although the optimal visualization was open to argu-ment Equally they tended to be concerned at the potentialfor misrepresenting a patent (eg by reducing the scope ofclaims) and the claims text and classification codes weredesigned for- and used primarily by them For each testroughly half of the cohort could see themselves using theirpreferred solution

Conversely the laypeople strongly agreed that patentsshould and could be made more readable and interestinglyfocused many suggestions on non-legally binding reading-aids that is providing annotations for patents that sup-ported wider understanding of the teachings without im-pacting the legal nature of the patent The layperson cohortwere far more supportive of IT that improved patent read-ability and were far more likely to agree that they woulduse such technologies in future compared with the over-all cohort Comparing the responses of (educated) laypeo-ple with the wider cohort suggests that as a demographiclaypeople are both under-served by patent readability tech-nologies and that they are highly absorptive of new tech-nologies for patent readability

These reading aids have been developed and evaluated sta-tistically from the text unit of individual words in En-glish to entire document collections For example Fer-raro et al (2014) have introduced a method for segment-ing a patent claim first to the parts of preamble transitionand body text followed by further segmentation of clausesClause segmentation has also been addressed in a com-

putational evaluation initiative leading to six participat-ing systems (Sang amp Dejean 2001) Moreover Shereme-tyeva (2003) have proposed a syntactic dependency parserto simplify sentences of patent documents by paraphrasingand the PATEexpert patent processing service by Bouayad-Agha et al (2009) exists to simplify patent claims by notonly paraphrasing but also text summarization FinallyKoch et al (2011) have developed the PatViz system forinteractive visual search and analysis of patent information

If extending to reading and writing aids that are domain in-dependent or widely applicable Goffin et al (2014) havestudied enriching individual words with metadata (suchas our Test 2 IV for words with specific meaning) TheVisRA visual analysis tool by Oelke et al (2012) can beused to codify a document with respect to 141 readabil-ity features in order to support writing simpler paragraphsand sentences Moreover analogously to our Test 3 so-lution which uses pictures to summarize a long claim Liuet al (2015) have studied the use of word clouds to visuallysummarize important keywords from a large collection oftext If analyzing a document collection the ThemeDeltavisual analytics system by Gad et al (2015) is availablefor visualizing its temporal trends in topics with respectto for example document publication dates Furthermorethe Adaptive VIsualization By Example (VIBE) system byAhn amp Brusilovsky (2009) enables visual representationand exploration of the search results as a website usingthe TaskSieve adaptive search engine Its evaluation as auser study comprises ten participants Finally the Topic-Panorama solution by Liu et al (2014) and Wang et al(2016) extends the readability analysis to visual analyticsfor getting a full picture of relevant topics that are discussedin multiple document collections

ReferencesAhn J-W and Brusilovsky P Adaptive visualization of search

results Bringing user models to visual analytics InformationVisualization 8167ndash79 2009

Alberts D Yang C B Fobare-DePonio D Koubek K RobinsS Rodgers M Simmons E and D DeMarco Introductionto patent searching In Current Challenges in Patent Informa-tion Retrieval vol 29 of the series The Information RetrievalSeries pp 3ndash43 Berlin Heidelberg Germany 2011 Springer-Verlag

Alcaer J Gittelman M and Sampat B Applicant and ExaminerCitations in US Patents An Overview and Analysis HarvardBusiness School Boston MA 2008

Bouayad-Agha N Casamayor G Ferraro G Mille S Vidal Vand Wanner L Improving the comprehension of legal docu-mentation The case of patent claims In Proc of the 12th In-ternational Conference on Artificial Intelligence and Law pp78ndash87 New York NY 2009 ACM

Castaneda J A Munoz-Leiva F and Luque T Web acceptance

User Study for Measuring Linguistic Complexity and Its Reduction by Technology on a Patent Website

model (WAM) Moderating effects of user experience Journalof Information and Management 44(4)384ndash96 2007

Cheung E Y and Sachs J Test of the technology acceptancemodel for a web-based information system in a Hong KongChinese sample Psychological Reports 99(3)691ndash703 2006

Chi E H Hong L Heiser J Card S K and GumbrechtM ScentIndex and ScentHighlights Productive readingtechniques for conceptually reorganizing subject indexes andhighlighting passages Information Visualization 6(1)32ndash472007

DrsquoAmbra J Wilson C and Akter S Application of the task-technology fit model to structure and evaluate the adoption ofebooks by academics Journal of the American Society for In-formation Science and Technology 64(1)48ndash64 2013

Davis F Bagozzi R and Warshaw P User acceptance of com-puter technology A comparison of two theoretical modelsManagement Science 35(8)982ndash1003 1989

DeLone W H and McLean E R The DeLone and McLean modelof information systems success A ten-year update Journal ofManagement Information Systems 19(4)9ndash30 2003

Dishaw M T and Strong D M Extending the technology accep-tance model with task-technology fit constructs 36(1)9ndash211999

Doll W J and Torkzadeh G The measurement of end-user com-puting satisfaction MIS Quarterly 12(2)259ndash74 1988

Doll W J Hendrickson A M and Deng X Using Davisrsquoperceived usefulness and ease-of-use instruments for decisionmaking A confirmatory and multigroup invariance analysisDecision Sciences 29(4)839ndash69 1998

Ferraro F Suominen H and Nualart J Segmentation of patentclaims for improving their readability In Proc of the 3rd Work-shop on Predicting and Improving Text Readability for TargetReader Populations (PITR) at EACL pp 66ndash73 StroudsburgPA 2014 ACL

Fishbein M and Ajzen I Belief Attitude Intention and Be-haviour An Introduction to Theory and Research Addison-Wesley Reading MA 1975

Gad S Javed W Ghani S Elmqvist N Ewing T HamptonK N and Ramakrishnan N ThemeDelta Dynamic segmenta-tions over temporal topic models IEEE Transactions on Visu-alization and Computer Graphics 21672ndash85 2015

Goffin P Willett W Fekete J-D and Isenberg P Exploringthe placement and design of word-scale visualizations IEEETransactions on Visualization and Computer Graphics 202291ndash300 2014

Goodhue D Understanding user evaluations of information sys-tems Management Science 41(12)1827ndash44 1995

Horton R P Buck R Waterson P E and Clegg C ExplainingIntranet use with the technology acceptance model Journal ofInformation Technology 16(4)237ndash49 2001

Igbaria M Personal computing acceptance factors in small firmsA structural equation model MIS Quarterly 21(3)279ndash3021997

Koch S Bosch H Giereth M and Ertl R Iterative integrationof visual insights during scalable patent search and analysisIEEE Transactions on Visualization and Computer Graphics17557ndash69 2011

Lederer A L Maupin D J Sena M P and Zhuang Y The tech-nology acceptance model and the world wide web DecisionSupport Systems 29(3)269ndash82 2000

Lin W-S Perceived fit and satisfaction on web learning perfor-mance IS continuance intention and task-technology fit per-spectives International Journal of Human-Computer Studies70(7)498ndash507 2012

Liu S Wang X Chen J Zhu J and Guo B TopicPanoramaA full picture of relevant topics In Proc of the 2014 IEEEConference on Visual Analytics Science and Technology pp183ndash92 New York NY 2014 IEEE

Liu X Shen H-W and Hu Y Supporting multifaceted viewingof word clouds with focus+context display Information Visu-alization 14168ndash80 2015

Mayr E Smuc M and Risku H Many roads lead to romeMapping users problem-solving strategies Information Visu-alization 10(3)232ndash47 2011

Muylle S Moenaert R and M M Despontin The conceptual-ization and empirical validation of web site user satisfactionInformation amp Management 41(5)543ndash60 2004

Oelke D Spretke D Stoffel A and Keim D A Visual read-ability analysis How to make your writings easier to readIEEE Transactions on Visualization and Computer Graphics18662ndash74 2012

Pressman D Patent It Yourself Nolo Berkeley CA 2006

Quispel A Maes A and Schilperoord J Graph and chart aes-thetics for experts and laymen in design The role of familiar-ity and perceived ease of use Information Visualization 15(3)238ndash52 2016

Sang E T K and Dejean H Introduction to the CoNLL-2001shared task Clause identification In Proc of the Fith Confer-ence on Computational Natural Language Learning 7 no 8 ofCoNLL pp 53ndash7 Stroudsburg PA 2001 ACL

Seymore S B The teaching function of patents Notre Dame LawReview 85621ndash70 2010

Sheremetyeva S Natural language analysis of patent claims InProc of the ACL-2003 workshop on Patent Corpus Processingat ACL pp 66ndash73 Stroudsburg PA 2003 ACL

Szajna B Empirical evaluation of the revised technology accep-tance model Management Science 42(1)85ndash92 1996

Venkatesh V and Davis F D A theoretical extension of thetechnology acceptance model Four longitudinal field studiesManagement Science 46(2)186ndash204 2000

Venkatesh V Morris M G Davis F D and Davis G User accep-tance of information technology Toward a unified view MISQuarterly 27(3)425ndash78 2003

Wang X Liu S Liu J Chen J Zhu J and Guo B A fullpicture of relevant topics IEEE Transactions on Visualizationand Computer Graphics PP1 2016

Page 3: User Study for Measuring Linguistic Complexity and Its ...users.cecs.anu.edu.au/~u5422389/SuominenetalIML-ICML2017.pdf · User Study for Measuring Linguistic Complexity and Its Reduction

User Study for Measuring Linguistic Complexity and Its Reduction by Technology on a Patent Website

Table 1 Explicit (E) and implicit (I) inclusion of the survey questions in the socio-technical theories of IT use and users

No Question Answer WUS TAM TAM2 UTAUT TTF TTFTAM1 Demographic Questions11 What is your age lt 20 21ndash30 51ndash60 ge 61 ndash I I E ndash I12 What is your highest level of

completed educationHigh school or equivalentSome college but not degreeBachelor Graduate

ndash I I I ndash I

13 Which of the following bestdescribes your current occupa-tion

Business and financial opera-tions Computer and mathemat-ical Architecture and engineer-ing Life and physical scienceCommunity and social serviceLegal Education and trainingArts design and entertainmentHealth care Office and admin-istrative Other

I I E E E E

14 Have you ever read a patentdocument (PD)

Yes No E E E E E E

15 Have you ever written a PD orits part

Yes No E E E E E E

2 Introductory Questions21 Do you use or have you used

PDs at your work on a daily ba-sis If yes how and when

Yes and free-form text No E E E E E E

22 How do you rank next state-ments about PDs

221 It is important to improve theirreadability

The 5-point Likert scale ofStrongly agree Somewhatagree Neutral Somewhatdisagree and Strongly disagree

E E E E E E

222 It is difficult to read them The 5-point Likert scale E E E E E E223 I do not find information I need

easily from them as they arenow

The 5-point Likert scale E E E E E E

23 Please explain what aspects ofthe way PDs are presented areyou happy with

Free-form text E E E E E E

24 Please explain how this formatcould be improved

Free-form text E E E E E E

3 Readability Questions31 Please rank the solutions in

your preference orderRanking from the most to theleast preferred solution

E E E E E E

32 In my opinion there is an evenbetter way of representing thePD If yes please describe it

Yes and free-form text No E E E E E E

33 Please tell how you agreeor disagree with the followingstatements about PDs

331 I think my preferred solutionsupports reading PDs

The 5-point Likert scale E E E E E E

332 I can see myself using my pre-ferred solution when readingPDs

The 5-point Likert scale ndash E E E E E

333 I plan to use my preferred solu-tion (if it was made available)

The 5-point Likert scale ndash E E E I E

334 I think my preferred solutionsupports my work

The 5-point Likert scale E E E E E E

335 In my opinion I do not findinformation I need easily fromPDs even if using the most pre-ferred solution

The 5-point Likert scale E E E E E E

User Study for Measuring Linguistic Complexity and Its Reduction by Technology on a Patent Website

(a) Test 1 phrase structure of a long claim

(b) Test 2 words with specific meaning

(c) Test 3 summarizing a long claim

(d) Test 4 interactive claims enumeration

(e) Test 5 patent classification codes as acolored table with codes definitions

Figure 1 Test 1 Test 2 Test 3 Test 4 and Test 5 to improve patent readability for laypeople Click the hyperlinks for larger figures

The user-study method has been used in the context of read-ability before (Chi et al 2007 Mayr et al 2011) Theformer paper describes an IV solution to enhance subjectindexing a book by taking a userrsquos information needs en-tered as keywords into account The solution also pro-vides a number of navigational cues to use the index andskim-read the book content by conceptually highlightingsentences The results show that the solution enables usersbe more efficient and more accurate in finding compar-ing and comprehending material than without it The latterpaper emphasizes that as opposed to measuring accuracyor speed IV evaluation should analyze usersrsquo problem-solving strategies and how the proposed solutions facilitateor hinder their task completion This approach aligns withthe TTF and TTFTAM theories

An ethical assessment of our questionnaire study involvinghuman participants was conducted organizationally lead-ing to an ethics approval prior to recruiting participantsParticipant recruitment was conducted through email andsocial media including Facebook Google+ LinkedIn and

Twitter The survey was implemented in English using anopen source web survey application It was open for any-one to participate from 19 Nov 2014 to 23 Feb 2015 Com-pleting the survey and any of its questions was optional

The front page of the survey included a participant infor-mation sheet and requested the participantrsquos informed con-sent and after this 5 demographic 6 introductory and 5times7readability questions were asked (Table 1) claims sectionreadability with understanding a long claim (Test 1 with 8solutions) a word with a specific meaning (Test 2 with 3solutions) a long sentence (Test 3 with 2 solutions) and adependence between individual claims (Test 4 with 2 solu-tions and document readability with 2 solutions of Test 5for understanding patent classification codes as summariesfor the document (Fig 1)

4 Analysis of All Participant ResponsesThe number of participants was 65 Most were relativelymature professionals (79 were at least 31 years old) with

User Study for Measuring Linguistic Complexity and Its Reduction by Technology on a Patent Website

a university degree (23 bachelor and 66 graduate)working in computing or mathematics (34) The nextmost common fields included arts design and entertain-ment (15) education and training (15) business andfinancial operations (10) and legal occupations (8)Only 31 of participants had never read a patent document74 had never written or partially-written one Only 12used or had used patent documents at their work on a dailybasis at least occasionally

A majority of participants strongly agreed (35) or some-what agreed (23) that improving the readability of patentdocuments was important nobody strongly disagreed withthis claim and only one participant somewhat disagreedDocuments were seen as extremely (29) or somewhat(29) difficult to read 39 had some difficulties in find-ing information

Participants were satisfied with the document headingstructure (eg abstract and background) conventions inwriting claims and consistence in legal terms They alsoappreciated figures and being written by professionals Onthe other hand this legal and technical jargon was also crit-icized as being unnecessary and difficult to understand Inaddition the inventorsrsquo common desire to patent as broadscope as possible hiding the precise invention was men-tioned together with the desire for subheadings consis-tent format structured content and standardized word-ings to allow machine reading To support this comput-erized search of machine-readable documents automationto propose synonymous terms or key phrases for searchingwas suggested Enriching documents with explanatory ex-pansions of synonymous concepts and hyperlinkingcross-referencing related parts (eg claims and figures) were alsosuggested Participants suggested a shorter and more fo-cused description of the precise invention for example asa supplementary abstract written by technical specialiststo detail the key features of the invention Another supple-mentary suggestion was a layperson-friendly rephrasing ofthe patent document itself

The most preferred Test 1 solutions for the phrase struc-ture were indentations with colorful graphical bars (26)or numbered bars (26) All solutions were at least aspopular as the unchanged structure Some participants(31) suggested IV improvements (eg cross-referencingwith figures collapsibleexpendable tree representationsmultiple layouts to choose andor combine from (eg in-dentation and colorful bars) information from other partsof the document (eg inventor classification codes key-words and context) emphasizing by font weightcolor ortext highlighting and simplifications) Most participantsstrongly (28) or somewhat (42) agreed on their pre-ferred solution supporting the document readability and75 saw themselves using this solution when reading

patent documents Over half (53) of the participantswould use this solution if it was made available and 36thought this solution supported their work However 36did not get their information needs satisfied

The highlighting (46) and font color (42) were themost preferred Test 2 solutions to visualize claims wordswith specific meaning followed by the unchanged text(13) Few participants (18) thought there was a bet-ter way for this representation such as simplification oflong sentences through automated parsing and text gen-eration methods and a proofing tool for entering text in anew patent document Most participants strongly (21)or somewhat (42) agreed on their preferred solution sup-porting the document readability and nearly two-thirds(61) saw themselves using this solution when readingpatent documents 42 of participants had some plansto use this solution given it was made available and 30thought this solution supports their work However a third(33) did not get their information needs satisfied

Test 3 summaries were formed by automatically extractingkey phrases for each long claim using them to search re-lated images and supplementing the claims section withthese images Almost all participants (92) preferred thissolution over the unchanged text (8) Few participants(16) thought there was an even better solution for thisrepresentation All the suggestions were concerns of the(poor) relevance of the images related to the sample IVor the risk of artificially reducing the claim scope Mostparticipants strongly (32) or somewhat (35) agreed ontheir preferred solution supporting the document readabil-ity and 65 saw themselves using this solution when read-ing patent documents 52 had some plans to use this so-lution given it was made available and 39 thought this so-lution supported their work although 36 did not get theirinformation needs satisfied

In Test 4 75 of the participants preferred enriching theclaims enumeration interactively over the unchanged enu-meration 13 thought there was an even better wayfor this representation they were concerned of the stud-ied interaction hiding content and wanted it to be a non-default option A clear majority of participants strongly(23) or somewhat (42) agreed on their preferred solu-tion supporting the document readability and nearly two-thirds (65) saw themselves using this solution when read-ing patent documents 48 had some plans to use this so-lution given it was made available and 42 thought thissolution supports their work However 36 did not gettheir information needs satisfied

Almost all participants (80) preferred the Test 5 solu-tion of presenting patent classification codes as a coloredtable with codes definition over the unchanged listing ofalphanumeric symbols (20) 23 suggested improve-

User Study for Measuring Linguistic Complexity and Its Reduction by Technology on a Patent Website

ments including hyperlinks or hovers to the definitionsexpanding or filtering the search by clicking the assignedclassification codes or altering the suggested color schemeA 66 majority strongly or somewhat agreed on their pre-ferred solution supporting the document readability andtwo thirds (67) saw themselves using this solution whenreading patent documents 50 had some plans to use thissolution if it was made available and 40 thought this so-lution supported their work However 27 did not get theirinformation needs satisfied

5 Analysis by the User Group of LaypeopleWe analyzed the responses of the 20 layperson-participantswho had taken all 5 readability tests (31 out of 65 partici-pants in total or 69 out of 29 participants taking all tests)responses of 2 patent examiners 1 legal counsel and 6patent authors were excluded Again most were relativelymature professionals (80 were over 31 years old) with auniversity degree (25 bachelor and 65 graduate) work-ing in computing or mathematics (50) The next mostcommon fields included education and training (25) andarts design and entertainment (20) 65 had and 30had not read a patent document Half (50) of laypersonparticipants had not used patent documents at their work ona daily basis

A clear majority of this group strongly (50) or some-what (25) agreed on improving the readability of patentdocuments being important nobody disagreed with thisclaim Documents were seen as extremely (20) or some-what (40) difficult to read nobody agreed that find-ing information from patent documents was easy whilst45 experienced least some difficulties Participants weresatisfied with the figures predictable document structure(iesections titles spaces typography and layout) andconventional consistent wordings and terminology Sug-gested improvements consisted of improving readability ofdocuments addressing legal or technical jargon ldquoin a spe-cific domain [that] seems at times gratuitous and unnec-essaryrdquo addressing ambiguity which was conjectured to beadded to ldquobe used to fight legal battlesrdquo and one participantsuggested that ldquothe whole patent culture and legal impli-cations should be revised and brought to a more practicallevelrdquo They suggested making documents machine read-able and guiding the readers in understanding the legallybinding content by supplementing documents with the non-binding clarifications such as figures hyperlinks to defini-tions synonym expansions from-legal-to-lay-term conver-sions highlights of the main aspects use cases summariesand further explanations of the invention

The most preferred Test 1 solutions were indentations withcolorful graphical bars (20) or numbered bars (20)The unchanged structure was the least (65) or second-

least (5) preferred solution in most responses 7 par-ticipants thought there was a better way for this repre-sentation 3 suggested collapsibleextendable tree repre-sentations and adding complementary semantic elementsfrom the same document or other patents whilst the othersuggestions were having multiple layouts for the user tochoose from highlightingemphasizingcoloring text andgenerating simplifications A clear majority of participantsstrongly (40) or somewhat (40) agreed on their pre-ferred solution supporting the document readability 90 5saw themselves using the solution Over a half (55) ofthem had some plans to use this solution and 35 thoughtthis solution supports their work However one participant(5) replied that the solution did not support hisher workand even with the solution 35 did not get their informa-tion needs satisfied

In Test 2 the font color solution was the most preferredalternative (35) followed by highlighting (25) and theunchanged text (15) 5 participants thought there was aneven better way for this representation and suggested short-ening and simplifying sentences having an online proof-ing tool to alert if sentences are too long or complicatedand assuring that the solutions are not too disruptive forthe reader A clear majority of participants strongly (20)or somewhat (45) agreed on their preferred solution sup-porting the document readability and 60 saw themselvesusing this solution 35 had some plans to use it and 25somewhat agreed with it supporting their work However35 did not get their information needs satisfied

An impressive majority of 80 preferred the Test 3 solu-tion over the unchanged text (5) 3 participants thoughtthere was an even better way for this representationagain concerns were raised regarding image relevance forIV Most participants strongly (35) or somewhat (30)agreed on their preferred solution supporting the documentreadability and 65 saw themselves using this solutionwhen reading patents Half had some plans to use the so-lution and 30 thought it supported their work Howevereven with this solution 30 did not get their informationneeds satisfied

As many as 65 preferred the Test 4 solution over the un-changed enumeration (20) 3 thought there was an evenbetter way for this representation and emphasized that thesolution must be optional because it has the danger of hid-ing content A clear majority strongly (30) or some-what (35) agreed on their preferred solution supportingthe document readability and 65 saw themselves using it45 had some plans to use it and 35 thought it supportstheir work However 30 did not get their informationneeds satisfied

A clear 70 majority preferred the Test 5 solution overthe unchanged enumeration (15) 5 thought there was

User Study for Measuring Linguistic Complexity and Its Reduction by Technology on a Patent Website

an even better way for this representation and suggestedkeeping the table structure but using more subtle pastelcolors and adding hyperlinks or hovers to definitions Aclear majority strongly (40) or somewhat (25) agreedon their preferred solution supporting the document read-ability 65 saw themselves using it with 45 havingsome plans to use it and nobody having no use plans 30thought it supports their work and nobody thought it didnot support it However even with this solution 25 didnot get their information needs satisfied

6 DiscussionPatents are nominally intended to provide a public goodby releasing innovative information to the public and thussupporting greater innovation However papers rarely citepatents due to scientist preferring other sources (Seymore2010) 63 of citations in US patents are due to examinersnot inventor (Alcaer et al 2008) and for laypeople (suchas our mainly university-educated layperson-participants)finding extracting and using information from patent doc-uments is too difficult (Alberts et al 2011)

For the patent specialist our survey results were positiveMost agreed that various support mechanisms were help-ful although the optimal visualization was open to argu-ment Equally they tended to be concerned at the potentialfor misrepresenting a patent (eg by reducing the scope ofclaims) and the claims text and classification codes weredesigned for- and used primarily by them For each testroughly half of the cohort could see themselves using theirpreferred solution

Conversely the laypeople strongly agreed that patentsshould and could be made more readable and interestinglyfocused many suggestions on non-legally binding reading-aids that is providing annotations for patents that sup-ported wider understanding of the teachings without im-pacting the legal nature of the patent The layperson cohortwere far more supportive of IT that improved patent read-ability and were far more likely to agree that they woulduse such technologies in future compared with the over-all cohort Comparing the responses of (educated) laypeo-ple with the wider cohort suggests that as a demographiclaypeople are both under-served by patent readability tech-nologies and that they are highly absorptive of new tech-nologies for patent readability

These reading aids have been developed and evaluated sta-tistically from the text unit of individual words in En-glish to entire document collections For example Fer-raro et al (2014) have introduced a method for segment-ing a patent claim first to the parts of preamble transitionand body text followed by further segmentation of clausesClause segmentation has also been addressed in a com-

putational evaluation initiative leading to six participat-ing systems (Sang amp Dejean 2001) Moreover Shereme-tyeva (2003) have proposed a syntactic dependency parserto simplify sentences of patent documents by paraphrasingand the PATEexpert patent processing service by Bouayad-Agha et al (2009) exists to simplify patent claims by notonly paraphrasing but also text summarization FinallyKoch et al (2011) have developed the PatViz system forinteractive visual search and analysis of patent information

If extending to reading and writing aids that are domain in-dependent or widely applicable Goffin et al (2014) havestudied enriching individual words with metadata (suchas our Test 2 IV for words with specific meaning) TheVisRA visual analysis tool by Oelke et al (2012) can beused to codify a document with respect to 141 readabil-ity features in order to support writing simpler paragraphsand sentences Moreover analogously to our Test 3 so-lution which uses pictures to summarize a long claim Liuet al (2015) have studied the use of word clouds to visuallysummarize important keywords from a large collection oftext If analyzing a document collection the ThemeDeltavisual analytics system by Gad et al (2015) is availablefor visualizing its temporal trends in topics with respectto for example document publication dates Furthermorethe Adaptive VIsualization By Example (VIBE) system byAhn amp Brusilovsky (2009) enables visual representationand exploration of the search results as a website usingthe TaskSieve adaptive search engine Its evaluation as auser study comprises ten participants Finally the Topic-Panorama solution by Liu et al (2014) and Wang et al(2016) extends the readability analysis to visual analyticsfor getting a full picture of relevant topics that are discussedin multiple document collections

ReferencesAhn J-W and Brusilovsky P Adaptive visualization of search

results Bringing user models to visual analytics InformationVisualization 8167ndash79 2009

Alberts D Yang C B Fobare-DePonio D Koubek K RobinsS Rodgers M Simmons E and D DeMarco Introductionto patent searching In Current Challenges in Patent Informa-tion Retrieval vol 29 of the series The Information RetrievalSeries pp 3ndash43 Berlin Heidelberg Germany 2011 Springer-Verlag

Alcaer J Gittelman M and Sampat B Applicant and ExaminerCitations in US Patents An Overview and Analysis HarvardBusiness School Boston MA 2008

Bouayad-Agha N Casamayor G Ferraro G Mille S Vidal Vand Wanner L Improving the comprehension of legal docu-mentation The case of patent claims In Proc of the 12th In-ternational Conference on Artificial Intelligence and Law pp78ndash87 New York NY 2009 ACM

Castaneda J A Munoz-Leiva F and Luque T Web acceptance

User Study for Measuring Linguistic Complexity and Its Reduction by Technology on a Patent Website

model (WAM) Moderating effects of user experience Journalof Information and Management 44(4)384ndash96 2007

Cheung E Y and Sachs J Test of the technology acceptancemodel for a web-based information system in a Hong KongChinese sample Psychological Reports 99(3)691ndash703 2006

Chi E H Hong L Heiser J Card S K and GumbrechtM ScentIndex and ScentHighlights Productive readingtechniques for conceptually reorganizing subject indexes andhighlighting passages Information Visualization 6(1)32ndash472007

DrsquoAmbra J Wilson C and Akter S Application of the task-technology fit model to structure and evaluate the adoption ofebooks by academics Journal of the American Society for In-formation Science and Technology 64(1)48ndash64 2013

Davis F Bagozzi R and Warshaw P User acceptance of com-puter technology A comparison of two theoretical modelsManagement Science 35(8)982ndash1003 1989

DeLone W H and McLean E R The DeLone and McLean modelof information systems success A ten-year update Journal ofManagement Information Systems 19(4)9ndash30 2003

Dishaw M T and Strong D M Extending the technology accep-tance model with task-technology fit constructs 36(1)9ndash211999

Doll W J and Torkzadeh G The measurement of end-user com-puting satisfaction MIS Quarterly 12(2)259ndash74 1988

Doll W J Hendrickson A M and Deng X Using Davisrsquoperceived usefulness and ease-of-use instruments for decisionmaking A confirmatory and multigroup invariance analysisDecision Sciences 29(4)839ndash69 1998

Ferraro F Suominen H and Nualart J Segmentation of patentclaims for improving their readability In Proc of the 3rd Work-shop on Predicting and Improving Text Readability for TargetReader Populations (PITR) at EACL pp 66ndash73 StroudsburgPA 2014 ACL

Fishbein M and Ajzen I Belief Attitude Intention and Be-haviour An Introduction to Theory and Research Addison-Wesley Reading MA 1975

Gad S Javed W Ghani S Elmqvist N Ewing T HamptonK N and Ramakrishnan N ThemeDelta Dynamic segmenta-tions over temporal topic models IEEE Transactions on Visu-alization and Computer Graphics 21672ndash85 2015

Goffin P Willett W Fekete J-D and Isenberg P Exploringthe placement and design of word-scale visualizations IEEETransactions on Visualization and Computer Graphics 202291ndash300 2014

Goodhue D Understanding user evaluations of information sys-tems Management Science 41(12)1827ndash44 1995

Horton R P Buck R Waterson P E and Clegg C ExplainingIntranet use with the technology acceptance model Journal ofInformation Technology 16(4)237ndash49 2001

Igbaria M Personal computing acceptance factors in small firmsA structural equation model MIS Quarterly 21(3)279ndash3021997

Koch S Bosch H Giereth M and Ertl R Iterative integrationof visual insights during scalable patent search and analysisIEEE Transactions on Visualization and Computer Graphics17557ndash69 2011

Lederer A L Maupin D J Sena M P and Zhuang Y The tech-nology acceptance model and the world wide web DecisionSupport Systems 29(3)269ndash82 2000

Lin W-S Perceived fit and satisfaction on web learning perfor-mance IS continuance intention and task-technology fit per-spectives International Journal of Human-Computer Studies70(7)498ndash507 2012

Liu S Wang X Chen J Zhu J and Guo B TopicPanoramaA full picture of relevant topics In Proc of the 2014 IEEEConference on Visual Analytics Science and Technology pp183ndash92 New York NY 2014 IEEE

Liu X Shen H-W and Hu Y Supporting multifaceted viewingof word clouds with focus+context display Information Visu-alization 14168ndash80 2015

Mayr E Smuc M and Risku H Many roads lead to romeMapping users problem-solving strategies Information Visu-alization 10(3)232ndash47 2011

Muylle S Moenaert R and M M Despontin The conceptual-ization and empirical validation of web site user satisfactionInformation amp Management 41(5)543ndash60 2004

Oelke D Spretke D Stoffel A and Keim D A Visual read-ability analysis How to make your writings easier to readIEEE Transactions on Visualization and Computer Graphics18662ndash74 2012

Pressman D Patent It Yourself Nolo Berkeley CA 2006

Quispel A Maes A and Schilperoord J Graph and chart aes-thetics for experts and laymen in design The role of familiar-ity and perceived ease of use Information Visualization 15(3)238ndash52 2016

Sang E T K and Dejean H Introduction to the CoNLL-2001shared task Clause identification In Proc of the Fith Confer-ence on Computational Natural Language Learning 7 no 8 ofCoNLL pp 53ndash7 Stroudsburg PA 2001 ACL

Seymore S B The teaching function of patents Notre Dame LawReview 85621ndash70 2010

Sheremetyeva S Natural language analysis of patent claims InProc of the ACL-2003 workshop on Patent Corpus Processingat ACL pp 66ndash73 Stroudsburg PA 2003 ACL

Szajna B Empirical evaluation of the revised technology accep-tance model Management Science 42(1)85ndash92 1996

Venkatesh V and Davis F D A theoretical extension of thetechnology acceptance model Four longitudinal field studiesManagement Science 46(2)186ndash204 2000

Venkatesh V Morris M G Davis F D and Davis G User accep-tance of information technology Toward a unified view MISQuarterly 27(3)425ndash78 2003

Wang X Liu S Liu J Chen J Zhu J and Guo B A fullpicture of relevant topics IEEE Transactions on Visualizationand Computer Graphics PP1 2016

Page 4: User Study for Measuring Linguistic Complexity and Its ...users.cecs.anu.edu.au/~u5422389/SuominenetalIML-ICML2017.pdf · User Study for Measuring Linguistic Complexity and Its Reduction

User Study for Measuring Linguistic Complexity and Its Reduction by Technology on a Patent Website

(a) Test 1 phrase structure of a long claim

(b) Test 2 words with specific meaning

(c) Test 3 summarizing a long claim

(d) Test 4 interactive claims enumeration

(e) Test 5 patent classification codes as acolored table with codes definitions

Figure 1 Test 1 Test 2 Test 3 Test 4 and Test 5 to improve patent readability for laypeople Click the hyperlinks for larger figures

The user-study method has been used in the context of read-ability before (Chi et al 2007 Mayr et al 2011) Theformer paper describes an IV solution to enhance subjectindexing a book by taking a userrsquos information needs en-tered as keywords into account The solution also pro-vides a number of navigational cues to use the index andskim-read the book content by conceptually highlightingsentences The results show that the solution enables usersbe more efficient and more accurate in finding compar-ing and comprehending material than without it The latterpaper emphasizes that as opposed to measuring accuracyor speed IV evaluation should analyze usersrsquo problem-solving strategies and how the proposed solutions facilitateor hinder their task completion This approach aligns withthe TTF and TTFTAM theories

An ethical assessment of our questionnaire study involvinghuman participants was conducted organizationally lead-ing to an ethics approval prior to recruiting participantsParticipant recruitment was conducted through email andsocial media including Facebook Google+ LinkedIn and

Twitter The survey was implemented in English using anopen source web survey application It was open for any-one to participate from 19 Nov 2014 to 23 Feb 2015 Com-pleting the survey and any of its questions was optional

The front page of the survey included a participant infor-mation sheet and requested the participantrsquos informed con-sent and after this 5 demographic 6 introductory and 5times7readability questions were asked (Table 1) claims sectionreadability with understanding a long claim (Test 1 with 8solutions) a word with a specific meaning (Test 2 with 3solutions) a long sentence (Test 3 with 2 solutions) and adependence between individual claims (Test 4 with 2 solu-tions and document readability with 2 solutions of Test 5for understanding patent classification codes as summariesfor the document (Fig 1)

4 Analysis of All Participant ResponsesThe number of participants was 65 Most were relativelymature professionals (79 were at least 31 years old) with

User Study for Measuring Linguistic Complexity and Its Reduction by Technology on a Patent Website

a university degree (23 bachelor and 66 graduate)working in computing or mathematics (34) The nextmost common fields included arts design and entertain-ment (15) education and training (15) business andfinancial operations (10) and legal occupations (8)Only 31 of participants had never read a patent document74 had never written or partially-written one Only 12used or had used patent documents at their work on a dailybasis at least occasionally

A majority of participants strongly agreed (35) or some-what agreed (23) that improving the readability of patentdocuments was important nobody strongly disagreed withthis claim and only one participant somewhat disagreedDocuments were seen as extremely (29) or somewhat(29) difficult to read 39 had some difficulties in find-ing information

Participants were satisfied with the document headingstructure (eg abstract and background) conventions inwriting claims and consistence in legal terms They alsoappreciated figures and being written by professionals Onthe other hand this legal and technical jargon was also crit-icized as being unnecessary and difficult to understand Inaddition the inventorsrsquo common desire to patent as broadscope as possible hiding the precise invention was men-tioned together with the desire for subheadings consis-tent format structured content and standardized word-ings to allow machine reading To support this comput-erized search of machine-readable documents automationto propose synonymous terms or key phrases for searchingwas suggested Enriching documents with explanatory ex-pansions of synonymous concepts and hyperlinkingcross-referencing related parts (eg claims and figures) were alsosuggested Participants suggested a shorter and more fo-cused description of the precise invention for example asa supplementary abstract written by technical specialiststo detail the key features of the invention Another supple-mentary suggestion was a layperson-friendly rephrasing ofthe patent document itself

The most preferred Test 1 solutions for the phrase struc-ture were indentations with colorful graphical bars (26)or numbered bars (26) All solutions were at least aspopular as the unchanged structure Some participants(31) suggested IV improvements (eg cross-referencingwith figures collapsibleexpendable tree representationsmultiple layouts to choose andor combine from (eg in-dentation and colorful bars) information from other partsof the document (eg inventor classification codes key-words and context) emphasizing by font weightcolor ortext highlighting and simplifications) Most participantsstrongly (28) or somewhat (42) agreed on their pre-ferred solution supporting the document readability and75 saw themselves using this solution when reading

patent documents Over half (53) of the participantswould use this solution if it was made available and 36thought this solution supported their work However 36did not get their information needs satisfied

The highlighting (46) and font color (42) were themost preferred Test 2 solutions to visualize claims wordswith specific meaning followed by the unchanged text(13) Few participants (18) thought there was a bet-ter way for this representation such as simplification oflong sentences through automated parsing and text gen-eration methods and a proofing tool for entering text in anew patent document Most participants strongly (21)or somewhat (42) agreed on their preferred solution sup-porting the document readability and nearly two-thirds(61) saw themselves using this solution when readingpatent documents 42 of participants had some plansto use this solution given it was made available and 30thought this solution supports their work However a third(33) did not get their information needs satisfied

Test 3 summaries were formed by automatically extractingkey phrases for each long claim using them to search re-lated images and supplementing the claims section withthese images Almost all participants (92) preferred thissolution over the unchanged text (8) Few participants(16) thought there was an even better solution for thisrepresentation All the suggestions were concerns of the(poor) relevance of the images related to the sample IVor the risk of artificially reducing the claim scope Mostparticipants strongly (32) or somewhat (35) agreed ontheir preferred solution supporting the document readabil-ity and 65 saw themselves using this solution when read-ing patent documents 52 had some plans to use this so-lution given it was made available and 39 thought this so-lution supported their work although 36 did not get theirinformation needs satisfied

In Test 4 75 of the participants preferred enriching theclaims enumeration interactively over the unchanged enu-meration 13 thought there was an even better wayfor this representation they were concerned of the stud-ied interaction hiding content and wanted it to be a non-default option A clear majority of participants strongly(23) or somewhat (42) agreed on their preferred solu-tion supporting the document readability and nearly two-thirds (65) saw themselves using this solution when read-ing patent documents 48 had some plans to use this so-lution given it was made available and 42 thought thissolution supports their work However 36 did not gettheir information needs satisfied

Almost all participants (80) preferred the Test 5 solu-tion of presenting patent classification codes as a coloredtable with codes definition over the unchanged listing ofalphanumeric symbols (20) 23 suggested improve-

User Study for Measuring Linguistic Complexity and Its Reduction by Technology on a Patent Website

ments including hyperlinks or hovers to the definitionsexpanding or filtering the search by clicking the assignedclassification codes or altering the suggested color schemeA 66 majority strongly or somewhat agreed on their pre-ferred solution supporting the document readability andtwo thirds (67) saw themselves using this solution whenreading patent documents 50 had some plans to use thissolution if it was made available and 40 thought this so-lution supported their work However 27 did not get theirinformation needs satisfied

5 Analysis by the User Group of LaypeopleWe analyzed the responses of the 20 layperson-participantswho had taken all 5 readability tests (31 out of 65 partici-pants in total or 69 out of 29 participants taking all tests)responses of 2 patent examiners 1 legal counsel and 6patent authors were excluded Again most were relativelymature professionals (80 were over 31 years old) with auniversity degree (25 bachelor and 65 graduate) work-ing in computing or mathematics (50) The next mostcommon fields included education and training (25) andarts design and entertainment (20) 65 had and 30had not read a patent document Half (50) of laypersonparticipants had not used patent documents at their work ona daily basis

A clear majority of this group strongly (50) or some-what (25) agreed on improving the readability of patentdocuments being important nobody disagreed with thisclaim Documents were seen as extremely (20) or some-what (40) difficult to read nobody agreed that find-ing information from patent documents was easy whilst45 experienced least some difficulties Participants weresatisfied with the figures predictable document structure(iesections titles spaces typography and layout) andconventional consistent wordings and terminology Sug-gested improvements consisted of improving readability ofdocuments addressing legal or technical jargon ldquoin a spe-cific domain [that] seems at times gratuitous and unnec-essaryrdquo addressing ambiguity which was conjectured to beadded to ldquobe used to fight legal battlesrdquo and one participantsuggested that ldquothe whole patent culture and legal impli-cations should be revised and brought to a more practicallevelrdquo They suggested making documents machine read-able and guiding the readers in understanding the legallybinding content by supplementing documents with the non-binding clarifications such as figures hyperlinks to defini-tions synonym expansions from-legal-to-lay-term conver-sions highlights of the main aspects use cases summariesand further explanations of the invention

The most preferred Test 1 solutions were indentations withcolorful graphical bars (20) or numbered bars (20)The unchanged structure was the least (65) or second-

least (5) preferred solution in most responses 7 par-ticipants thought there was a better way for this repre-sentation 3 suggested collapsibleextendable tree repre-sentations and adding complementary semantic elementsfrom the same document or other patents whilst the othersuggestions were having multiple layouts for the user tochoose from highlightingemphasizingcoloring text andgenerating simplifications A clear majority of participantsstrongly (40) or somewhat (40) agreed on their pre-ferred solution supporting the document readability 90 5saw themselves using the solution Over a half (55) ofthem had some plans to use this solution and 35 thoughtthis solution supports their work However one participant(5) replied that the solution did not support hisher workand even with the solution 35 did not get their informa-tion needs satisfied

In Test 2 the font color solution was the most preferredalternative (35) followed by highlighting (25) and theunchanged text (15) 5 participants thought there was aneven better way for this representation and suggested short-ening and simplifying sentences having an online proof-ing tool to alert if sentences are too long or complicatedand assuring that the solutions are not too disruptive forthe reader A clear majority of participants strongly (20)or somewhat (45) agreed on their preferred solution sup-porting the document readability and 60 saw themselvesusing this solution 35 had some plans to use it and 25somewhat agreed with it supporting their work However35 did not get their information needs satisfied

An impressive majority of 80 preferred the Test 3 solu-tion over the unchanged text (5) 3 participants thoughtthere was an even better way for this representationagain concerns were raised regarding image relevance forIV Most participants strongly (35) or somewhat (30)agreed on their preferred solution supporting the documentreadability and 65 saw themselves using this solutionwhen reading patents Half had some plans to use the so-lution and 30 thought it supported their work Howevereven with this solution 30 did not get their informationneeds satisfied

As many as 65 preferred the Test 4 solution over the un-changed enumeration (20) 3 thought there was an evenbetter way for this representation and emphasized that thesolution must be optional because it has the danger of hid-ing content A clear majority strongly (30) or some-what (35) agreed on their preferred solution supportingthe document readability and 65 saw themselves using it45 had some plans to use it and 35 thought it supportstheir work However 30 did not get their informationneeds satisfied

A clear 70 majority preferred the Test 5 solution overthe unchanged enumeration (15) 5 thought there was

User Study for Measuring Linguistic Complexity and Its Reduction by Technology on a Patent Website

an even better way for this representation and suggestedkeeping the table structure but using more subtle pastelcolors and adding hyperlinks or hovers to definitions Aclear majority strongly (40) or somewhat (25) agreedon their preferred solution supporting the document read-ability 65 saw themselves using it with 45 havingsome plans to use it and nobody having no use plans 30thought it supports their work and nobody thought it didnot support it However even with this solution 25 didnot get their information needs satisfied

6 DiscussionPatents are nominally intended to provide a public goodby releasing innovative information to the public and thussupporting greater innovation However papers rarely citepatents due to scientist preferring other sources (Seymore2010) 63 of citations in US patents are due to examinersnot inventor (Alcaer et al 2008) and for laypeople (suchas our mainly university-educated layperson-participants)finding extracting and using information from patent doc-uments is too difficult (Alberts et al 2011)

For the patent specialist our survey results were positiveMost agreed that various support mechanisms were help-ful although the optimal visualization was open to argu-ment Equally they tended to be concerned at the potentialfor misrepresenting a patent (eg by reducing the scope ofclaims) and the claims text and classification codes weredesigned for- and used primarily by them For each testroughly half of the cohort could see themselves using theirpreferred solution

Conversely the laypeople strongly agreed that patentsshould and could be made more readable and interestinglyfocused many suggestions on non-legally binding reading-aids that is providing annotations for patents that sup-ported wider understanding of the teachings without im-pacting the legal nature of the patent The layperson cohortwere far more supportive of IT that improved patent read-ability and were far more likely to agree that they woulduse such technologies in future compared with the over-all cohort Comparing the responses of (educated) laypeo-ple with the wider cohort suggests that as a demographiclaypeople are both under-served by patent readability tech-nologies and that they are highly absorptive of new tech-nologies for patent readability

These reading aids have been developed and evaluated sta-tistically from the text unit of individual words in En-glish to entire document collections For example Fer-raro et al (2014) have introduced a method for segment-ing a patent claim first to the parts of preamble transitionand body text followed by further segmentation of clausesClause segmentation has also been addressed in a com-

putational evaluation initiative leading to six participat-ing systems (Sang amp Dejean 2001) Moreover Shereme-tyeva (2003) have proposed a syntactic dependency parserto simplify sentences of patent documents by paraphrasingand the PATEexpert patent processing service by Bouayad-Agha et al (2009) exists to simplify patent claims by notonly paraphrasing but also text summarization FinallyKoch et al (2011) have developed the PatViz system forinteractive visual search and analysis of patent information

If extending to reading and writing aids that are domain in-dependent or widely applicable Goffin et al (2014) havestudied enriching individual words with metadata (suchas our Test 2 IV for words with specific meaning) TheVisRA visual analysis tool by Oelke et al (2012) can beused to codify a document with respect to 141 readabil-ity features in order to support writing simpler paragraphsand sentences Moreover analogously to our Test 3 so-lution which uses pictures to summarize a long claim Liuet al (2015) have studied the use of word clouds to visuallysummarize important keywords from a large collection oftext If analyzing a document collection the ThemeDeltavisual analytics system by Gad et al (2015) is availablefor visualizing its temporal trends in topics with respectto for example document publication dates Furthermorethe Adaptive VIsualization By Example (VIBE) system byAhn amp Brusilovsky (2009) enables visual representationand exploration of the search results as a website usingthe TaskSieve adaptive search engine Its evaluation as auser study comprises ten participants Finally the Topic-Panorama solution by Liu et al (2014) and Wang et al(2016) extends the readability analysis to visual analyticsfor getting a full picture of relevant topics that are discussedin multiple document collections

ReferencesAhn J-W and Brusilovsky P Adaptive visualization of search

results Bringing user models to visual analytics InformationVisualization 8167ndash79 2009

Alberts D Yang C B Fobare-DePonio D Koubek K RobinsS Rodgers M Simmons E and D DeMarco Introductionto patent searching In Current Challenges in Patent Informa-tion Retrieval vol 29 of the series The Information RetrievalSeries pp 3ndash43 Berlin Heidelberg Germany 2011 Springer-Verlag

Alcaer J Gittelman M and Sampat B Applicant and ExaminerCitations in US Patents An Overview and Analysis HarvardBusiness School Boston MA 2008

Bouayad-Agha N Casamayor G Ferraro G Mille S Vidal Vand Wanner L Improving the comprehension of legal docu-mentation The case of patent claims In Proc of the 12th In-ternational Conference on Artificial Intelligence and Law pp78ndash87 New York NY 2009 ACM

Castaneda J A Munoz-Leiva F and Luque T Web acceptance

User Study for Measuring Linguistic Complexity and Its Reduction by Technology on a Patent Website

model (WAM) Moderating effects of user experience Journalof Information and Management 44(4)384ndash96 2007

Cheung E Y and Sachs J Test of the technology acceptancemodel for a web-based information system in a Hong KongChinese sample Psychological Reports 99(3)691ndash703 2006

Chi E H Hong L Heiser J Card S K and GumbrechtM ScentIndex and ScentHighlights Productive readingtechniques for conceptually reorganizing subject indexes andhighlighting passages Information Visualization 6(1)32ndash472007

DrsquoAmbra J Wilson C and Akter S Application of the task-technology fit model to structure and evaluate the adoption ofebooks by academics Journal of the American Society for In-formation Science and Technology 64(1)48ndash64 2013

Davis F Bagozzi R and Warshaw P User acceptance of com-puter technology A comparison of two theoretical modelsManagement Science 35(8)982ndash1003 1989

DeLone W H and McLean E R The DeLone and McLean modelof information systems success A ten-year update Journal ofManagement Information Systems 19(4)9ndash30 2003

Dishaw M T and Strong D M Extending the technology accep-tance model with task-technology fit constructs 36(1)9ndash211999

Doll W J and Torkzadeh G The measurement of end-user com-puting satisfaction MIS Quarterly 12(2)259ndash74 1988

Doll W J Hendrickson A M and Deng X Using Davisrsquoperceived usefulness and ease-of-use instruments for decisionmaking A confirmatory and multigroup invariance analysisDecision Sciences 29(4)839ndash69 1998

Ferraro F Suominen H and Nualart J Segmentation of patentclaims for improving their readability In Proc of the 3rd Work-shop on Predicting and Improving Text Readability for TargetReader Populations (PITR) at EACL pp 66ndash73 StroudsburgPA 2014 ACL

Fishbein M and Ajzen I Belief Attitude Intention and Be-haviour An Introduction to Theory and Research Addison-Wesley Reading MA 1975

Gad S Javed W Ghani S Elmqvist N Ewing T HamptonK N and Ramakrishnan N ThemeDelta Dynamic segmenta-tions over temporal topic models IEEE Transactions on Visu-alization and Computer Graphics 21672ndash85 2015

Goffin P Willett W Fekete J-D and Isenberg P Exploringthe placement and design of word-scale visualizations IEEETransactions on Visualization and Computer Graphics 202291ndash300 2014

Goodhue D Understanding user evaluations of information sys-tems Management Science 41(12)1827ndash44 1995

Horton R P Buck R Waterson P E and Clegg C ExplainingIntranet use with the technology acceptance model Journal ofInformation Technology 16(4)237ndash49 2001

Igbaria M Personal computing acceptance factors in small firmsA structural equation model MIS Quarterly 21(3)279ndash3021997

Koch S Bosch H Giereth M and Ertl R Iterative integrationof visual insights during scalable patent search and analysisIEEE Transactions on Visualization and Computer Graphics17557ndash69 2011

Lederer A L Maupin D J Sena M P and Zhuang Y The tech-nology acceptance model and the world wide web DecisionSupport Systems 29(3)269ndash82 2000

Lin W-S Perceived fit and satisfaction on web learning perfor-mance IS continuance intention and task-technology fit per-spectives International Journal of Human-Computer Studies70(7)498ndash507 2012

Liu S Wang X Chen J Zhu J and Guo B TopicPanoramaA full picture of relevant topics In Proc of the 2014 IEEEConference on Visual Analytics Science and Technology pp183ndash92 New York NY 2014 IEEE

Liu X Shen H-W and Hu Y Supporting multifaceted viewingof word clouds with focus+context display Information Visu-alization 14168ndash80 2015

Mayr E Smuc M and Risku H Many roads lead to romeMapping users problem-solving strategies Information Visu-alization 10(3)232ndash47 2011

Muylle S Moenaert R and M M Despontin The conceptual-ization and empirical validation of web site user satisfactionInformation amp Management 41(5)543ndash60 2004

Oelke D Spretke D Stoffel A and Keim D A Visual read-ability analysis How to make your writings easier to readIEEE Transactions on Visualization and Computer Graphics18662ndash74 2012

Pressman D Patent It Yourself Nolo Berkeley CA 2006

Quispel A Maes A and Schilperoord J Graph and chart aes-thetics for experts and laymen in design The role of familiar-ity and perceived ease of use Information Visualization 15(3)238ndash52 2016

Sang E T K and Dejean H Introduction to the CoNLL-2001shared task Clause identification In Proc of the Fith Confer-ence on Computational Natural Language Learning 7 no 8 ofCoNLL pp 53ndash7 Stroudsburg PA 2001 ACL

Seymore S B The teaching function of patents Notre Dame LawReview 85621ndash70 2010

Sheremetyeva S Natural language analysis of patent claims InProc of the ACL-2003 workshop on Patent Corpus Processingat ACL pp 66ndash73 Stroudsburg PA 2003 ACL

Szajna B Empirical evaluation of the revised technology accep-tance model Management Science 42(1)85ndash92 1996

Venkatesh V and Davis F D A theoretical extension of thetechnology acceptance model Four longitudinal field studiesManagement Science 46(2)186ndash204 2000

Venkatesh V Morris M G Davis F D and Davis G User accep-tance of information technology Toward a unified view MISQuarterly 27(3)425ndash78 2003

Wang X Liu S Liu J Chen J Zhu J and Guo B A fullpicture of relevant topics IEEE Transactions on Visualizationand Computer Graphics PP1 2016

Page 5: User Study for Measuring Linguistic Complexity and Its ...users.cecs.anu.edu.au/~u5422389/SuominenetalIML-ICML2017.pdf · User Study for Measuring Linguistic Complexity and Its Reduction

User Study for Measuring Linguistic Complexity and Its Reduction by Technology on a Patent Website

a university degree (23 bachelor and 66 graduate)working in computing or mathematics (34) The nextmost common fields included arts design and entertain-ment (15) education and training (15) business andfinancial operations (10) and legal occupations (8)Only 31 of participants had never read a patent document74 had never written or partially-written one Only 12used or had used patent documents at their work on a dailybasis at least occasionally

A majority of participants strongly agreed (35) or some-what agreed (23) that improving the readability of patentdocuments was important nobody strongly disagreed withthis claim and only one participant somewhat disagreedDocuments were seen as extremely (29) or somewhat(29) difficult to read 39 had some difficulties in find-ing information

Participants were satisfied with the document headingstructure (eg abstract and background) conventions inwriting claims and consistence in legal terms They alsoappreciated figures and being written by professionals Onthe other hand this legal and technical jargon was also crit-icized as being unnecessary and difficult to understand Inaddition the inventorsrsquo common desire to patent as broadscope as possible hiding the precise invention was men-tioned together with the desire for subheadings consis-tent format structured content and standardized word-ings to allow machine reading To support this comput-erized search of machine-readable documents automationto propose synonymous terms or key phrases for searchingwas suggested Enriching documents with explanatory ex-pansions of synonymous concepts and hyperlinkingcross-referencing related parts (eg claims and figures) were alsosuggested Participants suggested a shorter and more fo-cused description of the precise invention for example asa supplementary abstract written by technical specialiststo detail the key features of the invention Another supple-mentary suggestion was a layperson-friendly rephrasing ofthe patent document itself

The most preferred Test 1 solutions for the phrase struc-ture were indentations with colorful graphical bars (26)or numbered bars (26) All solutions were at least aspopular as the unchanged structure Some participants(31) suggested IV improvements (eg cross-referencingwith figures collapsibleexpendable tree representationsmultiple layouts to choose andor combine from (eg in-dentation and colorful bars) information from other partsof the document (eg inventor classification codes key-words and context) emphasizing by font weightcolor ortext highlighting and simplifications) Most participantsstrongly (28) or somewhat (42) agreed on their pre-ferred solution supporting the document readability and75 saw themselves using this solution when reading

patent documents Over half (53) of the participantswould use this solution if it was made available and 36thought this solution supported their work However 36did not get their information needs satisfied

The highlighting (46) and font color (42) were themost preferred Test 2 solutions to visualize claims wordswith specific meaning followed by the unchanged text(13) Few participants (18) thought there was a bet-ter way for this representation such as simplification oflong sentences through automated parsing and text gen-eration methods and a proofing tool for entering text in anew patent document Most participants strongly (21)or somewhat (42) agreed on their preferred solution sup-porting the document readability and nearly two-thirds(61) saw themselves using this solution when readingpatent documents 42 of participants had some plansto use this solution given it was made available and 30thought this solution supports their work However a third(33) did not get their information needs satisfied

Test 3 summaries were formed by automatically extractingkey phrases for each long claim using them to search re-lated images and supplementing the claims section withthese images Almost all participants (92) preferred thissolution over the unchanged text (8) Few participants(16) thought there was an even better solution for thisrepresentation All the suggestions were concerns of the(poor) relevance of the images related to the sample IVor the risk of artificially reducing the claim scope Mostparticipants strongly (32) or somewhat (35) agreed ontheir preferred solution supporting the document readabil-ity and 65 saw themselves using this solution when read-ing patent documents 52 had some plans to use this so-lution given it was made available and 39 thought this so-lution supported their work although 36 did not get theirinformation needs satisfied

In Test 4 75 of the participants preferred enriching theclaims enumeration interactively over the unchanged enu-meration 13 thought there was an even better wayfor this representation they were concerned of the stud-ied interaction hiding content and wanted it to be a non-default option A clear majority of participants strongly(23) or somewhat (42) agreed on their preferred solu-tion supporting the document readability and nearly two-thirds (65) saw themselves using this solution when read-ing patent documents 48 had some plans to use this so-lution given it was made available and 42 thought thissolution supports their work However 36 did not gettheir information needs satisfied

Almost all participants (80) preferred the Test 5 solu-tion of presenting patent classification codes as a coloredtable with codes definition over the unchanged listing ofalphanumeric symbols (20) 23 suggested improve-

User Study for Measuring Linguistic Complexity and Its Reduction by Technology on a Patent Website

ments including hyperlinks or hovers to the definitionsexpanding or filtering the search by clicking the assignedclassification codes or altering the suggested color schemeA 66 majority strongly or somewhat agreed on their pre-ferred solution supporting the document readability andtwo thirds (67) saw themselves using this solution whenreading patent documents 50 had some plans to use thissolution if it was made available and 40 thought this so-lution supported their work However 27 did not get theirinformation needs satisfied

5 Analysis by the User Group of LaypeopleWe analyzed the responses of the 20 layperson-participantswho had taken all 5 readability tests (31 out of 65 partici-pants in total or 69 out of 29 participants taking all tests)responses of 2 patent examiners 1 legal counsel and 6patent authors were excluded Again most were relativelymature professionals (80 were over 31 years old) with auniversity degree (25 bachelor and 65 graduate) work-ing in computing or mathematics (50) The next mostcommon fields included education and training (25) andarts design and entertainment (20) 65 had and 30had not read a patent document Half (50) of laypersonparticipants had not used patent documents at their work ona daily basis

A clear majority of this group strongly (50) or some-what (25) agreed on improving the readability of patentdocuments being important nobody disagreed with thisclaim Documents were seen as extremely (20) or some-what (40) difficult to read nobody agreed that find-ing information from patent documents was easy whilst45 experienced least some difficulties Participants weresatisfied with the figures predictable document structure(iesections titles spaces typography and layout) andconventional consistent wordings and terminology Sug-gested improvements consisted of improving readability ofdocuments addressing legal or technical jargon ldquoin a spe-cific domain [that] seems at times gratuitous and unnec-essaryrdquo addressing ambiguity which was conjectured to beadded to ldquobe used to fight legal battlesrdquo and one participantsuggested that ldquothe whole patent culture and legal impli-cations should be revised and brought to a more practicallevelrdquo They suggested making documents machine read-able and guiding the readers in understanding the legallybinding content by supplementing documents with the non-binding clarifications such as figures hyperlinks to defini-tions synonym expansions from-legal-to-lay-term conver-sions highlights of the main aspects use cases summariesand further explanations of the invention

The most preferred Test 1 solutions were indentations withcolorful graphical bars (20) or numbered bars (20)The unchanged structure was the least (65) or second-

least (5) preferred solution in most responses 7 par-ticipants thought there was a better way for this repre-sentation 3 suggested collapsibleextendable tree repre-sentations and adding complementary semantic elementsfrom the same document or other patents whilst the othersuggestions were having multiple layouts for the user tochoose from highlightingemphasizingcoloring text andgenerating simplifications A clear majority of participantsstrongly (40) or somewhat (40) agreed on their pre-ferred solution supporting the document readability 90 5saw themselves using the solution Over a half (55) ofthem had some plans to use this solution and 35 thoughtthis solution supports their work However one participant(5) replied that the solution did not support hisher workand even with the solution 35 did not get their informa-tion needs satisfied

In Test 2 the font color solution was the most preferredalternative (35) followed by highlighting (25) and theunchanged text (15) 5 participants thought there was aneven better way for this representation and suggested short-ening and simplifying sentences having an online proof-ing tool to alert if sentences are too long or complicatedand assuring that the solutions are not too disruptive forthe reader A clear majority of participants strongly (20)or somewhat (45) agreed on their preferred solution sup-porting the document readability and 60 saw themselvesusing this solution 35 had some plans to use it and 25somewhat agreed with it supporting their work However35 did not get their information needs satisfied

An impressive majority of 80 preferred the Test 3 solu-tion over the unchanged text (5) 3 participants thoughtthere was an even better way for this representationagain concerns were raised regarding image relevance forIV Most participants strongly (35) or somewhat (30)agreed on their preferred solution supporting the documentreadability and 65 saw themselves using this solutionwhen reading patents Half had some plans to use the so-lution and 30 thought it supported their work Howevereven with this solution 30 did not get their informationneeds satisfied

As many as 65 preferred the Test 4 solution over the un-changed enumeration (20) 3 thought there was an evenbetter way for this representation and emphasized that thesolution must be optional because it has the danger of hid-ing content A clear majority strongly (30) or some-what (35) agreed on their preferred solution supportingthe document readability and 65 saw themselves using it45 had some plans to use it and 35 thought it supportstheir work However 30 did not get their informationneeds satisfied

A clear 70 majority preferred the Test 5 solution overthe unchanged enumeration (15) 5 thought there was

User Study for Measuring Linguistic Complexity and Its Reduction by Technology on a Patent Website

an even better way for this representation and suggestedkeeping the table structure but using more subtle pastelcolors and adding hyperlinks or hovers to definitions Aclear majority strongly (40) or somewhat (25) agreedon their preferred solution supporting the document read-ability 65 saw themselves using it with 45 havingsome plans to use it and nobody having no use plans 30thought it supports their work and nobody thought it didnot support it However even with this solution 25 didnot get their information needs satisfied

6 DiscussionPatents are nominally intended to provide a public goodby releasing innovative information to the public and thussupporting greater innovation However papers rarely citepatents due to scientist preferring other sources (Seymore2010) 63 of citations in US patents are due to examinersnot inventor (Alcaer et al 2008) and for laypeople (suchas our mainly university-educated layperson-participants)finding extracting and using information from patent doc-uments is too difficult (Alberts et al 2011)

For the patent specialist our survey results were positiveMost agreed that various support mechanisms were help-ful although the optimal visualization was open to argu-ment Equally they tended to be concerned at the potentialfor misrepresenting a patent (eg by reducing the scope ofclaims) and the claims text and classification codes weredesigned for- and used primarily by them For each testroughly half of the cohort could see themselves using theirpreferred solution

Conversely the laypeople strongly agreed that patentsshould and could be made more readable and interestinglyfocused many suggestions on non-legally binding reading-aids that is providing annotations for patents that sup-ported wider understanding of the teachings without im-pacting the legal nature of the patent The layperson cohortwere far more supportive of IT that improved patent read-ability and were far more likely to agree that they woulduse such technologies in future compared with the over-all cohort Comparing the responses of (educated) laypeo-ple with the wider cohort suggests that as a demographiclaypeople are both under-served by patent readability tech-nologies and that they are highly absorptive of new tech-nologies for patent readability

These reading aids have been developed and evaluated sta-tistically from the text unit of individual words in En-glish to entire document collections For example Fer-raro et al (2014) have introduced a method for segment-ing a patent claim first to the parts of preamble transitionand body text followed by further segmentation of clausesClause segmentation has also been addressed in a com-

putational evaluation initiative leading to six participat-ing systems (Sang amp Dejean 2001) Moreover Shereme-tyeva (2003) have proposed a syntactic dependency parserto simplify sentences of patent documents by paraphrasingand the PATEexpert patent processing service by Bouayad-Agha et al (2009) exists to simplify patent claims by notonly paraphrasing but also text summarization FinallyKoch et al (2011) have developed the PatViz system forinteractive visual search and analysis of patent information

If extending to reading and writing aids that are domain in-dependent or widely applicable Goffin et al (2014) havestudied enriching individual words with metadata (suchas our Test 2 IV for words with specific meaning) TheVisRA visual analysis tool by Oelke et al (2012) can beused to codify a document with respect to 141 readabil-ity features in order to support writing simpler paragraphsand sentences Moreover analogously to our Test 3 so-lution which uses pictures to summarize a long claim Liuet al (2015) have studied the use of word clouds to visuallysummarize important keywords from a large collection oftext If analyzing a document collection the ThemeDeltavisual analytics system by Gad et al (2015) is availablefor visualizing its temporal trends in topics with respectto for example document publication dates Furthermorethe Adaptive VIsualization By Example (VIBE) system byAhn amp Brusilovsky (2009) enables visual representationand exploration of the search results as a website usingthe TaskSieve adaptive search engine Its evaluation as auser study comprises ten participants Finally the Topic-Panorama solution by Liu et al (2014) and Wang et al(2016) extends the readability analysis to visual analyticsfor getting a full picture of relevant topics that are discussedin multiple document collections

ReferencesAhn J-W and Brusilovsky P Adaptive visualization of search

results Bringing user models to visual analytics InformationVisualization 8167ndash79 2009

Alberts D Yang C B Fobare-DePonio D Koubek K RobinsS Rodgers M Simmons E and D DeMarco Introductionto patent searching In Current Challenges in Patent Informa-tion Retrieval vol 29 of the series The Information RetrievalSeries pp 3ndash43 Berlin Heidelberg Germany 2011 Springer-Verlag

Alcaer J Gittelman M and Sampat B Applicant and ExaminerCitations in US Patents An Overview and Analysis HarvardBusiness School Boston MA 2008

Bouayad-Agha N Casamayor G Ferraro G Mille S Vidal Vand Wanner L Improving the comprehension of legal docu-mentation The case of patent claims In Proc of the 12th In-ternational Conference on Artificial Intelligence and Law pp78ndash87 New York NY 2009 ACM

Castaneda J A Munoz-Leiva F and Luque T Web acceptance

User Study for Measuring Linguistic Complexity and Its Reduction by Technology on a Patent Website

model (WAM) Moderating effects of user experience Journalof Information and Management 44(4)384ndash96 2007

Cheung E Y and Sachs J Test of the technology acceptancemodel for a web-based information system in a Hong KongChinese sample Psychological Reports 99(3)691ndash703 2006

Chi E H Hong L Heiser J Card S K and GumbrechtM ScentIndex and ScentHighlights Productive readingtechniques for conceptually reorganizing subject indexes andhighlighting passages Information Visualization 6(1)32ndash472007

DrsquoAmbra J Wilson C and Akter S Application of the task-technology fit model to structure and evaluate the adoption ofebooks by academics Journal of the American Society for In-formation Science and Technology 64(1)48ndash64 2013

Davis F Bagozzi R and Warshaw P User acceptance of com-puter technology A comparison of two theoretical modelsManagement Science 35(8)982ndash1003 1989

DeLone W H and McLean E R The DeLone and McLean modelof information systems success A ten-year update Journal ofManagement Information Systems 19(4)9ndash30 2003

Dishaw M T and Strong D M Extending the technology accep-tance model with task-technology fit constructs 36(1)9ndash211999

Doll W J and Torkzadeh G The measurement of end-user com-puting satisfaction MIS Quarterly 12(2)259ndash74 1988

Doll W J Hendrickson A M and Deng X Using Davisrsquoperceived usefulness and ease-of-use instruments for decisionmaking A confirmatory and multigroup invariance analysisDecision Sciences 29(4)839ndash69 1998

Ferraro F Suominen H and Nualart J Segmentation of patentclaims for improving their readability In Proc of the 3rd Work-shop on Predicting and Improving Text Readability for TargetReader Populations (PITR) at EACL pp 66ndash73 StroudsburgPA 2014 ACL

Fishbein M and Ajzen I Belief Attitude Intention and Be-haviour An Introduction to Theory and Research Addison-Wesley Reading MA 1975

Gad S Javed W Ghani S Elmqvist N Ewing T HamptonK N and Ramakrishnan N ThemeDelta Dynamic segmenta-tions over temporal topic models IEEE Transactions on Visu-alization and Computer Graphics 21672ndash85 2015

Goffin P Willett W Fekete J-D and Isenberg P Exploringthe placement and design of word-scale visualizations IEEETransactions on Visualization and Computer Graphics 202291ndash300 2014

Goodhue D Understanding user evaluations of information sys-tems Management Science 41(12)1827ndash44 1995

Horton R P Buck R Waterson P E and Clegg C ExplainingIntranet use with the technology acceptance model Journal ofInformation Technology 16(4)237ndash49 2001

Igbaria M Personal computing acceptance factors in small firmsA structural equation model MIS Quarterly 21(3)279ndash3021997

Koch S Bosch H Giereth M and Ertl R Iterative integrationof visual insights during scalable patent search and analysisIEEE Transactions on Visualization and Computer Graphics17557ndash69 2011

Lederer A L Maupin D J Sena M P and Zhuang Y The tech-nology acceptance model and the world wide web DecisionSupport Systems 29(3)269ndash82 2000

Lin W-S Perceived fit and satisfaction on web learning perfor-mance IS continuance intention and task-technology fit per-spectives International Journal of Human-Computer Studies70(7)498ndash507 2012

Liu S Wang X Chen J Zhu J and Guo B TopicPanoramaA full picture of relevant topics In Proc of the 2014 IEEEConference on Visual Analytics Science and Technology pp183ndash92 New York NY 2014 IEEE

Liu X Shen H-W and Hu Y Supporting multifaceted viewingof word clouds with focus+context display Information Visu-alization 14168ndash80 2015

Mayr E Smuc M and Risku H Many roads lead to romeMapping users problem-solving strategies Information Visu-alization 10(3)232ndash47 2011

Muylle S Moenaert R and M M Despontin The conceptual-ization and empirical validation of web site user satisfactionInformation amp Management 41(5)543ndash60 2004

Oelke D Spretke D Stoffel A and Keim D A Visual read-ability analysis How to make your writings easier to readIEEE Transactions on Visualization and Computer Graphics18662ndash74 2012

Pressman D Patent It Yourself Nolo Berkeley CA 2006

Quispel A Maes A and Schilperoord J Graph and chart aes-thetics for experts and laymen in design The role of familiar-ity and perceived ease of use Information Visualization 15(3)238ndash52 2016

Sang E T K and Dejean H Introduction to the CoNLL-2001shared task Clause identification In Proc of the Fith Confer-ence on Computational Natural Language Learning 7 no 8 ofCoNLL pp 53ndash7 Stroudsburg PA 2001 ACL

Seymore S B The teaching function of patents Notre Dame LawReview 85621ndash70 2010

Sheremetyeva S Natural language analysis of patent claims InProc of the ACL-2003 workshop on Patent Corpus Processingat ACL pp 66ndash73 Stroudsburg PA 2003 ACL

Szajna B Empirical evaluation of the revised technology accep-tance model Management Science 42(1)85ndash92 1996

Venkatesh V and Davis F D A theoretical extension of thetechnology acceptance model Four longitudinal field studiesManagement Science 46(2)186ndash204 2000

Venkatesh V Morris M G Davis F D and Davis G User accep-tance of information technology Toward a unified view MISQuarterly 27(3)425ndash78 2003

Wang X Liu S Liu J Chen J Zhu J and Guo B A fullpicture of relevant topics IEEE Transactions on Visualizationand Computer Graphics PP1 2016

Page 6: User Study for Measuring Linguistic Complexity and Its ...users.cecs.anu.edu.au/~u5422389/SuominenetalIML-ICML2017.pdf · User Study for Measuring Linguistic Complexity and Its Reduction

User Study for Measuring Linguistic Complexity and Its Reduction by Technology on a Patent Website

ments including hyperlinks or hovers to the definitionsexpanding or filtering the search by clicking the assignedclassification codes or altering the suggested color schemeA 66 majority strongly or somewhat agreed on their pre-ferred solution supporting the document readability andtwo thirds (67) saw themselves using this solution whenreading patent documents 50 had some plans to use thissolution if it was made available and 40 thought this so-lution supported their work However 27 did not get theirinformation needs satisfied

5 Analysis by the User Group of LaypeopleWe analyzed the responses of the 20 layperson-participantswho had taken all 5 readability tests (31 out of 65 partici-pants in total or 69 out of 29 participants taking all tests)responses of 2 patent examiners 1 legal counsel and 6patent authors were excluded Again most were relativelymature professionals (80 were over 31 years old) with auniversity degree (25 bachelor and 65 graduate) work-ing in computing or mathematics (50) The next mostcommon fields included education and training (25) andarts design and entertainment (20) 65 had and 30had not read a patent document Half (50) of laypersonparticipants had not used patent documents at their work ona daily basis

A clear majority of this group strongly (50) or some-what (25) agreed on improving the readability of patentdocuments being important nobody disagreed with thisclaim Documents were seen as extremely (20) or some-what (40) difficult to read nobody agreed that find-ing information from patent documents was easy whilst45 experienced least some difficulties Participants weresatisfied with the figures predictable document structure(iesections titles spaces typography and layout) andconventional consistent wordings and terminology Sug-gested improvements consisted of improving readability ofdocuments addressing legal or technical jargon ldquoin a spe-cific domain [that] seems at times gratuitous and unnec-essaryrdquo addressing ambiguity which was conjectured to beadded to ldquobe used to fight legal battlesrdquo and one participantsuggested that ldquothe whole patent culture and legal impli-cations should be revised and brought to a more practicallevelrdquo They suggested making documents machine read-able and guiding the readers in understanding the legallybinding content by supplementing documents with the non-binding clarifications such as figures hyperlinks to defini-tions synonym expansions from-legal-to-lay-term conver-sions highlights of the main aspects use cases summariesand further explanations of the invention

The most preferred Test 1 solutions were indentations withcolorful graphical bars (20) or numbered bars (20)The unchanged structure was the least (65) or second-

least (5) preferred solution in most responses 7 par-ticipants thought there was a better way for this repre-sentation 3 suggested collapsibleextendable tree repre-sentations and adding complementary semantic elementsfrom the same document or other patents whilst the othersuggestions were having multiple layouts for the user tochoose from highlightingemphasizingcoloring text andgenerating simplifications A clear majority of participantsstrongly (40) or somewhat (40) agreed on their pre-ferred solution supporting the document readability 90 5saw themselves using the solution Over a half (55) ofthem had some plans to use this solution and 35 thoughtthis solution supports their work However one participant(5) replied that the solution did not support hisher workand even with the solution 35 did not get their informa-tion needs satisfied

In Test 2 the font color solution was the most preferredalternative (35) followed by highlighting (25) and theunchanged text (15) 5 participants thought there was aneven better way for this representation and suggested short-ening and simplifying sentences having an online proof-ing tool to alert if sentences are too long or complicatedand assuring that the solutions are not too disruptive forthe reader A clear majority of participants strongly (20)or somewhat (45) agreed on their preferred solution sup-porting the document readability and 60 saw themselvesusing this solution 35 had some plans to use it and 25somewhat agreed with it supporting their work However35 did not get their information needs satisfied

An impressive majority of 80 preferred the Test 3 solu-tion over the unchanged text (5) 3 participants thoughtthere was an even better way for this representationagain concerns were raised regarding image relevance forIV Most participants strongly (35) or somewhat (30)agreed on their preferred solution supporting the documentreadability and 65 saw themselves using this solutionwhen reading patents Half had some plans to use the so-lution and 30 thought it supported their work Howevereven with this solution 30 did not get their informationneeds satisfied

As many as 65 preferred the Test 4 solution over the un-changed enumeration (20) 3 thought there was an evenbetter way for this representation and emphasized that thesolution must be optional because it has the danger of hid-ing content A clear majority strongly (30) or some-what (35) agreed on their preferred solution supportingthe document readability and 65 saw themselves using it45 had some plans to use it and 35 thought it supportstheir work However 30 did not get their informationneeds satisfied

A clear 70 majority preferred the Test 5 solution overthe unchanged enumeration (15) 5 thought there was

User Study for Measuring Linguistic Complexity and Its Reduction by Technology on a Patent Website

an even better way for this representation and suggestedkeeping the table structure but using more subtle pastelcolors and adding hyperlinks or hovers to definitions Aclear majority strongly (40) or somewhat (25) agreedon their preferred solution supporting the document read-ability 65 saw themselves using it with 45 havingsome plans to use it and nobody having no use plans 30thought it supports their work and nobody thought it didnot support it However even with this solution 25 didnot get their information needs satisfied

6 DiscussionPatents are nominally intended to provide a public goodby releasing innovative information to the public and thussupporting greater innovation However papers rarely citepatents due to scientist preferring other sources (Seymore2010) 63 of citations in US patents are due to examinersnot inventor (Alcaer et al 2008) and for laypeople (suchas our mainly university-educated layperson-participants)finding extracting and using information from patent doc-uments is too difficult (Alberts et al 2011)

For the patent specialist our survey results were positiveMost agreed that various support mechanisms were help-ful although the optimal visualization was open to argu-ment Equally they tended to be concerned at the potentialfor misrepresenting a patent (eg by reducing the scope ofclaims) and the claims text and classification codes weredesigned for- and used primarily by them For each testroughly half of the cohort could see themselves using theirpreferred solution

Conversely the laypeople strongly agreed that patentsshould and could be made more readable and interestinglyfocused many suggestions on non-legally binding reading-aids that is providing annotations for patents that sup-ported wider understanding of the teachings without im-pacting the legal nature of the patent The layperson cohortwere far more supportive of IT that improved patent read-ability and were far more likely to agree that they woulduse such technologies in future compared with the over-all cohort Comparing the responses of (educated) laypeo-ple with the wider cohort suggests that as a demographiclaypeople are both under-served by patent readability tech-nologies and that they are highly absorptive of new tech-nologies for patent readability

These reading aids have been developed and evaluated sta-tistically from the text unit of individual words in En-glish to entire document collections For example Fer-raro et al (2014) have introduced a method for segment-ing a patent claim first to the parts of preamble transitionand body text followed by further segmentation of clausesClause segmentation has also been addressed in a com-

putational evaluation initiative leading to six participat-ing systems (Sang amp Dejean 2001) Moreover Shereme-tyeva (2003) have proposed a syntactic dependency parserto simplify sentences of patent documents by paraphrasingand the PATEexpert patent processing service by Bouayad-Agha et al (2009) exists to simplify patent claims by notonly paraphrasing but also text summarization FinallyKoch et al (2011) have developed the PatViz system forinteractive visual search and analysis of patent information

If extending to reading and writing aids that are domain in-dependent or widely applicable Goffin et al (2014) havestudied enriching individual words with metadata (suchas our Test 2 IV for words with specific meaning) TheVisRA visual analysis tool by Oelke et al (2012) can beused to codify a document with respect to 141 readabil-ity features in order to support writing simpler paragraphsand sentences Moreover analogously to our Test 3 so-lution which uses pictures to summarize a long claim Liuet al (2015) have studied the use of word clouds to visuallysummarize important keywords from a large collection oftext If analyzing a document collection the ThemeDeltavisual analytics system by Gad et al (2015) is availablefor visualizing its temporal trends in topics with respectto for example document publication dates Furthermorethe Adaptive VIsualization By Example (VIBE) system byAhn amp Brusilovsky (2009) enables visual representationand exploration of the search results as a website usingthe TaskSieve adaptive search engine Its evaluation as auser study comprises ten participants Finally the Topic-Panorama solution by Liu et al (2014) and Wang et al(2016) extends the readability analysis to visual analyticsfor getting a full picture of relevant topics that are discussedin multiple document collections

ReferencesAhn J-W and Brusilovsky P Adaptive visualization of search

results Bringing user models to visual analytics InformationVisualization 8167ndash79 2009

Alberts D Yang C B Fobare-DePonio D Koubek K RobinsS Rodgers M Simmons E and D DeMarco Introductionto patent searching In Current Challenges in Patent Informa-tion Retrieval vol 29 of the series The Information RetrievalSeries pp 3ndash43 Berlin Heidelberg Germany 2011 Springer-Verlag

Alcaer J Gittelman M and Sampat B Applicant and ExaminerCitations in US Patents An Overview and Analysis HarvardBusiness School Boston MA 2008

Bouayad-Agha N Casamayor G Ferraro G Mille S Vidal Vand Wanner L Improving the comprehension of legal docu-mentation The case of patent claims In Proc of the 12th In-ternational Conference on Artificial Intelligence and Law pp78ndash87 New York NY 2009 ACM

Castaneda J A Munoz-Leiva F and Luque T Web acceptance

User Study for Measuring Linguistic Complexity and Its Reduction by Technology on a Patent Website

model (WAM) Moderating effects of user experience Journalof Information and Management 44(4)384ndash96 2007

Cheung E Y and Sachs J Test of the technology acceptancemodel for a web-based information system in a Hong KongChinese sample Psychological Reports 99(3)691ndash703 2006

Chi E H Hong L Heiser J Card S K and GumbrechtM ScentIndex and ScentHighlights Productive readingtechniques for conceptually reorganizing subject indexes andhighlighting passages Information Visualization 6(1)32ndash472007

DrsquoAmbra J Wilson C and Akter S Application of the task-technology fit model to structure and evaluate the adoption ofebooks by academics Journal of the American Society for In-formation Science and Technology 64(1)48ndash64 2013

Davis F Bagozzi R and Warshaw P User acceptance of com-puter technology A comparison of two theoretical modelsManagement Science 35(8)982ndash1003 1989

DeLone W H and McLean E R The DeLone and McLean modelof information systems success A ten-year update Journal ofManagement Information Systems 19(4)9ndash30 2003

Dishaw M T and Strong D M Extending the technology accep-tance model with task-technology fit constructs 36(1)9ndash211999

Doll W J and Torkzadeh G The measurement of end-user com-puting satisfaction MIS Quarterly 12(2)259ndash74 1988

Doll W J Hendrickson A M and Deng X Using Davisrsquoperceived usefulness and ease-of-use instruments for decisionmaking A confirmatory and multigroup invariance analysisDecision Sciences 29(4)839ndash69 1998

Ferraro F Suominen H and Nualart J Segmentation of patentclaims for improving their readability In Proc of the 3rd Work-shop on Predicting and Improving Text Readability for TargetReader Populations (PITR) at EACL pp 66ndash73 StroudsburgPA 2014 ACL

Fishbein M and Ajzen I Belief Attitude Intention and Be-haviour An Introduction to Theory and Research Addison-Wesley Reading MA 1975

Gad S Javed W Ghani S Elmqvist N Ewing T HamptonK N and Ramakrishnan N ThemeDelta Dynamic segmenta-tions over temporal topic models IEEE Transactions on Visu-alization and Computer Graphics 21672ndash85 2015

Goffin P Willett W Fekete J-D and Isenberg P Exploringthe placement and design of word-scale visualizations IEEETransactions on Visualization and Computer Graphics 202291ndash300 2014

Goodhue D Understanding user evaluations of information sys-tems Management Science 41(12)1827ndash44 1995

Horton R P Buck R Waterson P E and Clegg C ExplainingIntranet use with the technology acceptance model Journal ofInformation Technology 16(4)237ndash49 2001

Igbaria M Personal computing acceptance factors in small firmsA structural equation model MIS Quarterly 21(3)279ndash3021997

Koch S Bosch H Giereth M and Ertl R Iterative integrationof visual insights during scalable patent search and analysisIEEE Transactions on Visualization and Computer Graphics17557ndash69 2011

Lederer A L Maupin D J Sena M P and Zhuang Y The tech-nology acceptance model and the world wide web DecisionSupport Systems 29(3)269ndash82 2000

Lin W-S Perceived fit and satisfaction on web learning perfor-mance IS continuance intention and task-technology fit per-spectives International Journal of Human-Computer Studies70(7)498ndash507 2012

Liu S Wang X Chen J Zhu J and Guo B TopicPanoramaA full picture of relevant topics In Proc of the 2014 IEEEConference on Visual Analytics Science and Technology pp183ndash92 New York NY 2014 IEEE

Liu X Shen H-W and Hu Y Supporting multifaceted viewingof word clouds with focus+context display Information Visu-alization 14168ndash80 2015

Mayr E Smuc M and Risku H Many roads lead to romeMapping users problem-solving strategies Information Visu-alization 10(3)232ndash47 2011

Muylle S Moenaert R and M M Despontin The conceptual-ization and empirical validation of web site user satisfactionInformation amp Management 41(5)543ndash60 2004

Oelke D Spretke D Stoffel A and Keim D A Visual read-ability analysis How to make your writings easier to readIEEE Transactions on Visualization and Computer Graphics18662ndash74 2012

Pressman D Patent It Yourself Nolo Berkeley CA 2006

Quispel A Maes A and Schilperoord J Graph and chart aes-thetics for experts and laymen in design The role of familiar-ity and perceived ease of use Information Visualization 15(3)238ndash52 2016

Sang E T K and Dejean H Introduction to the CoNLL-2001shared task Clause identification In Proc of the Fith Confer-ence on Computational Natural Language Learning 7 no 8 ofCoNLL pp 53ndash7 Stroudsburg PA 2001 ACL

Seymore S B The teaching function of patents Notre Dame LawReview 85621ndash70 2010

Sheremetyeva S Natural language analysis of patent claims InProc of the ACL-2003 workshop on Patent Corpus Processingat ACL pp 66ndash73 Stroudsburg PA 2003 ACL

Szajna B Empirical evaluation of the revised technology accep-tance model Management Science 42(1)85ndash92 1996

Venkatesh V and Davis F D A theoretical extension of thetechnology acceptance model Four longitudinal field studiesManagement Science 46(2)186ndash204 2000

Venkatesh V Morris M G Davis F D and Davis G User accep-tance of information technology Toward a unified view MISQuarterly 27(3)425ndash78 2003

Wang X Liu S Liu J Chen J Zhu J and Guo B A fullpicture of relevant topics IEEE Transactions on Visualizationand Computer Graphics PP1 2016

Page 7: User Study for Measuring Linguistic Complexity and Its ...users.cecs.anu.edu.au/~u5422389/SuominenetalIML-ICML2017.pdf · User Study for Measuring Linguistic Complexity and Its Reduction

User Study for Measuring Linguistic Complexity and Its Reduction by Technology on a Patent Website

an even better way for this representation and suggestedkeeping the table structure but using more subtle pastelcolors and adding hyperlinks or hovers to definitions Aclear majority strongly (40) or somewhat (25) agreedon their preferred solution supporting the document read-ability 65 saw themselves using it with 45 havingsome plans to use it and nobody having no use plans 30thought it supports their work and nobody thought it didnot support it However even with this solution 25 didnot get their information needs satisfied

6 DiscussionPatents are nominally intended to provide a public goodby releasing innovative information to the public and thussupporting greater innovation However papers rarely citepatents due to scientist preferring other sources (Seymore2010) 63 of citations in US patents are due to examinersnot inventor (Alcaer et al 2008) and for laypeople (suchas our mainly university-educated layperson-participants)finding extracting and using information from patent doc-uments is too difficult (Alberts et al 2011)

For the patent specialist our survey results were positiveMost agreed that various support mechanisms were help-ful although the optimal visualization was open to argu-ment Equally they tended to be concerned at the potentialfor misrepresenting a patent (eg by reducing the scope ofclaims) and the claims text and classification codes weredesigned for- and used primarily by them For each testroughly half of the cohort could see themselves using theirpreferred solution

Conversely the laypeople strongly agreed that patentsshould and could be made more readable and interestinglyfocused many suggestions on non-legally binding reading-aids that is providing annotations for patents that sup-ported wider understanding of the teachings without im-pacting the legal nature of the patent The layperson cohortwere far more supportive of IT that improved patent read-ability and were far more likely to agree that they woulduse such technologies in future compared with the over-all cohort Comparing the responses of (educated) laypeo-ple with the wider cohort suggests that as a demographiclaypeople are both under-served by patent readability tech-nologies and that they are highly absorptive of new tech-nologies for patent readability

These reading aids have been developed and evaluated sta-tistically from the text unit of individual words in En-glish to entire document collections For example Fer-raro et al (2014) have introduced a method for segment-ing a patent claim first to the parts of preamble transitionand body text followed by further segmentation of clausesClause segmentation has also been addressed in a com-

putational evaluation initiative leading to six participat-ing systems (Sang amp Dejean 2001) Moreover Shereme-tyeva (2003) have proposed a syntactic dependency parserto simplify sentences of patent documents by paraphrasingand the PATEexpert patent processing service by Bouayad-Agha et al (2009) exists to simplify patent claims by notonly paraphrasing but also text summarization FinallyKoch et al (2011) have developed the PatViz system forinteractive visual search and analysis of patent information

If extending to reading and writing aids that are domain in-dependent or widely applicable Goffin et al (2014) havestudied enriching individual words with metadata (suchas our Test 2 IV for words with specific meaning) TheVisRA visual analysis tool by Oelke et al (2012) can beused to codify a document with respect to 141 readabil-ity features in order to support writing simpler paragraphsand sentences Moreover analogously to our Test 3 so-lution which uses pictures to summarize a long claim Liuet al (2015) have studied the use of word clouds to visuallysummarize important keywords from a large collection oftext If analyzing a document collection the ThemeDeltavisual analytics system by Gad et al (2015) is availablefor visualizing its temporal trends in topics with respectto for example document publication dates Furthermorethe Adaptive VIsualization By Example (VIBE) system byAhn amp Brusilovsky (2009) enables visual representationand exploration of the search results as a website usingthe TaskSieve adaptive search engine Its evaluation as auser study comprises ten participants Finally the Topic-Panorama solution by Liu et al (2014) and Wang et al(2016) extends the readability analysis to visual analyticsfor getting a full picture of relevant topics that are discussedin multiple document collections

ReferencesAhn J-W and Brusilovsky P Adaptive visualization of search

results Bringing user models to visual analytics InformationVisualization 8167ndash79 2009

Alberts D Yang C B Fobare-DePonio D Koubek K RobinsS Rodgers M Simmons E and D DeMarco Introductionto patent searching In Current Challenges in Patent Informa-tion Retrieval vol 29 of the series The Information RetrievalSeries pp 3ndash43 Berlin Heidelberg Germany 2011 Springer-Verlag

Alcaer J Gittelman M and Sampat B Applicant and ExaminerCitations in US Patents An Overview and Analysis HarvardBusiness School Boston MA 2008

Bouayad-Agha N Casamayor G Ferraro G Mille S Vidal Vand Wanner L Improving the comprehension of legal docu-mentation The case of patent claims In Proc of the 12th In-ternational Conference on Artificial Intelligence and Law pp78ndash87 New York NY 2009 ACM

Castaneda J A Munoz-Leiva F and Luque T Web acceptance

User Study for Measuring Linguistic Complexity and Its Reduction by Technology on a Patent Website

model (WAM) Moderating effects of user experience Journalof Information and Management 44(4)384ndash96 2007

Cheung E Y and Sachs J Test of the technology acceptancemodel for a web-based information system in a Hong KongChinese sample Psychological Reports 99(3)691ndash703 2006

Chi E H Hong L Heiser J Card S K and GumbrechtM ScentIndex and ScentHighlights Productive readingtechniques for conceptually reorganizing subject indexes andhighlighting passages Information Visualization 6(1)32ndash472007

DrsquoAmbra J Wilson C and Akter S Application of the task-technology fit model to structure and evaluate the adoption ofebooks by academics Journal of the American Society for In-formation Science and Technology 64(1)48ndash64 2013

Davis F Bagozzi R and Warshaw P User acceptance of com-puter technology A comparison of two theoretical modelsManagement Science 35(8)982ndash1003 1989

DeLone W H and McLean E R The DeLone and McLean modelof information systems success A ten-year update Journal ofManagement Information Systems 19(4)9ndash30 2003

Dishaw M T and Strong D M Extending the technology accep-tance model with task-technology fit constructs 36(1)9ndash211999

Doll W J and Torkzadeh G The measurement of end-user com-puting satisfaction MIS Quarterly 12(2)259ndash74 1988

Doll W J Hendrickson A M and Deng X Using Davisrsquoperceived usefulness and ease-of-use instruments for decisionmaking A confirmatory and multigroup invariance analysisDecision Sciences 29(4)839ndash69 1998

Ferraro F Suominen H and Nualart J Segmentation of patentclaims for improving their readability In Proc of the 3rd Work-shop on Predicting and Improving Text Readability for TargetReader Populations (PITR) at EACL pp 66ndash73 StroudsburgPA 2014 ACL

Fishbein M and Ajzen I Belief Attitude Intention and Be-haviour An Introduction to Theory and Research Addison-Wesley Reading MA 1975

Gad S Javed W Ghani S Elmqvist N Ewing T HamptonK N and Ramakrishnan N ThemeDelta Dynamic segmenta-tions over temporal topic models IEEE Transactions on Visu-alization and Computer Graphics 21672ndash85 2015

Goffin P Willett W Fekete J-D and Isenberg P Exploringthe placement and design of word-scale visualizations IEEETransactions on Visualization and Computer Graphics 202291ndash300 2014

Goodhue D Understanding user evaluations of information sys-tems Management Science 41(12)1827ndash44 1995

Horton R P Buck R Waterson P E and Clegg C ExplainingIntranet use with the technology acceptance model Journal ofInformation Technology 16(4)237ndash49 2001

Igbaria M Personal computing acceptance factors in small firmsA structural equation model MIS Quarterly 21(3)279ndash3021997

Koch S Bosch H Giereth M and Ertl R Iterative integrationof visual insights during scalable patent search and analysisIEEE Transactions on Visualization and Computer Graphics17557ndash69 2011

Lederer A L Maupin D J Sena M P and Zhuang Y The tech-nology acceptance model and the world wide web DecisionSupport Systems 29(3)269ndash82 2000

Lin W-S Perceived fit and satisfaction on web learning perfor-mance IS continuance intention and task-technology fit per-spectives International Journal of Human-Computer Studies70(7)498ndash507 2012

Liu S Wang X Chen J Zhu J and Guo B TopicPanoramaA full picture of relevant topics In Proc of the 2014 IEEEConference on Visual Analytics Science and Technology pp183ndash92 New York NY 2014 IEEE

Liu X Shen H-W and Hu Y Supporting multifaceted viewingof word clouds with focus+context display Information Visu-alization 14168ndash80 2015

Mayr E Smuc M and Risku H Many roads lead to romeMapping users problem-solving strategies Information Visu-alization 10(3)232ndash47 2011

Muylle S Moenaert R and M M Despontin The conceptual-ization and empirical validation of web site user satisfactionInformation amp Management 41(5)543ndash60 2004

Oelke D Spretke D Stoffel A and Keim D A Visual read-ability analysis How to make your writings easier to readIEEE Transactions on Visualization and Computer Graphics18662ndash74 2012

Pressman D Patent It Yourself Nolo Berkeley CA 2006

Quispel A Maes A and Schilperoord J Graph and chart aes-thetics for experts and laymen in design The role of familiar-ity and perceived ease of use Information Visualization 15(3)238ndash52 2016

Sang E T K and Dejean H Introduction to the CoNLL-2001shared task Clause identification In Proc of the Fith Confer-ence on Computational Natural Language Learning 7 no 8 ofCoNLL pp 53ndash7 Stroudsburg PA 2001 ACL

Seymore S B The teaching function of patents Notre Dame LawReview 85621ndash70 2010

Sheremetyeva S Natural language analysis of patent claims InProc of the ACL-2003 workshop on Patent Corpus Processingat ACL pp 66ndash73 Stroudsburg PA 2003 ACL

Szajna B Empirical evaluation of the revised technology accep-tance model Management Science 42(1)85ndash92 1996

Venkatesh V and Davis F D A theoretical extension of thetechnology acceptance model Four longitudinal field studiesManagement Science 46(2)186ndash204 2000

Venkatesh V Morris M G Davis F D and Davis G User accep-tance of information technology Toward a unified view MISQuarterly 27(3)425ndash78 2003

Wang X Liu S Liu J Chen J Zhu J and Guo B A fullpicture of relevant topics IEEE Transactions on Visualizationand Computer Graphics PP1 2016

Page 8: User Study for Measuring Linguistic Complexity and Its ...users.cecs.anu.edu.au/~u5422389/SuominenetalIML-ICML2017.pdf · User Study for Measuring Linguistic Complexity and Its Reduction

User Study for Measuring Linguistic Complexity and Its Reduction by Technology on a Patent Website

model (WAM) Moderating effects of user experience Journalof Information and Management 44(4)384ndash96 2007

Cheung E Y and Sachs J Test of the technology acceptancemodel for a web-based information system in a Hong KongChinese sample Psychological Reports 99(3)691ndash703 2006

Chi E H Hong L Heiser J Card S K and GumbrechtM ScentIndex and ScentHighlights Productive readingtechniques for conceptually reorganizing subject indexes andhighlighting passages Information Visualization 6(1)32ndash472007

DrsquoAmbra J Wilson C and Akter S Application of the task-technology fit model to structure and evaluate the adoption ofebooks by academics Journal of the American Society for In-formation Science and Technology 64(1)48ndash64 2013

Davis F Bagozzi R and Warshaw P User acceptance of com-puter technology A comparison of two theoretical modelsManagement Science 35(8)982ndash1003 1989

DeLone W H and McLean E R The DeLone and McLean modelof information systems success A ten-year update Journal ofManagement Information Systems 19(4)9ndash30 2003

Dishaw M T and Strong D M Extending the technology accep-tance model with task-technology fit constructs 36(1)9ndash211999

Doll W J and Torkzadeh G The measurement of end-user com-puting satisfaction MIS Quarterly 12(2)259ndash74 1988

Doll W J Hendrickson A M and Deng X Using Davisrsquoperceived usefulness and ease-of-use instruments for decisionmaking A confirmatory and multigroup invariance analysisDecision Sciences 29(4)839ndash69 1998

Ferraro F Suominen H and Nualart J Segmentation of patentclaims for improving their readability In Proc of the 3rd Work-shop on Predicting and Improving Text Readability for TargetReader Populations (PITR) at EACL pp 66ndash73 StroudsburgPA 2014 ACL

Fishbein M and Ajzen I Belief Attitude Intention and Be-haviour An Introduction to Theory and Research Addison-Wesley Reading MA 1975

Gad S Javed W Ghani S Elmqvist N Ewing T HamptonK N and Ramakrishnan N ThemeDelta Dynamic segmenta-tions over temporal topic models IEEE Transactions on Visu-alization and Computer Graphics 21672ndash85 2015

Goffin P Willett W Fekete J-D and Isenberg P Exploringthe placement and design of word-scale visualizations IEEETransactions on Visualization and Computer Graphics 202291ndash300 2014

Goodhue D Understanding user evaluations of information sys-tems Management Science 41(12)1827ndash44 1995

Horton R P Buck R Waterson P E and Clegg C ExplainingIntranet use with the technology acceptance model Journal ofInformation Technology 16(4)237ndash49 2001

Igbaria M Personal computing acceptance factors in small firmsA structural equation model MIS Quarterly 21(3)279ndash3021997

Koch S Bosch H Giereth M and Ertl R Iterative integrationof visual insights during scalable patent search and analysisIEEE Transactions on Visualization and Computer Graphics17557ndash69 2011

Lederer A L Maupin D J Sena M P and Zhuang Y The tech-nology acceptance model and the world wide web DecisionSupport Systems 29(3)269ndash82 2000

Lin W-S Perceived fit and satisfaction on web learning perfor-mance IS continuance intention and task-technology fit per-spectives International Journal of Human-Computer Studies70(7)498ndash507 2012

Liu S Wang X Chen J Zhu J and Guo B TopicPanoramaA full picture of relevant topics In Proc of the 2014 IEEEConference on Visual Analytics Science and Technology pp183ndash92 New York NY 2014 IEEE

Liu X Shen H-W and Hu Y Supporting multifaceted viewingof word clouds with focus+context display Information Visu-alization 14168ndash80 2015

Mayr E Smuc M and Risku H Many roads lead to romeMapping users problem-solving strategies Information Visu-alization 10(3)232ndash47 2011

Muylle S Moenaert R and M M Despontin The conceptual-ization and empirical validation of web site user satisfactionInformation amp Management 41(5)543ndash60 2004

Oelke D Spretke D Stoffel A and Keim D A Visual read-ability analysis How to make your writings easier to readIEEE Transactions on Visualization and Computer Graphics18662ndash74 2012

Pressman D Patent It Yourself Nolo Berkeley CA 2006

Quispel A Maes A and Schilperoord J Graph and chart aes-thetics for experts and laymen in design The role of familiar-ity and perceived ease of use Information Visualization 15(3)238ndash52 2016

Sang E T K and Dejean H Introduction to the CoNLL-2001shared task Clause identification In Proc of the Fith Confer-ence on Computational Natural Language Learning 7 no 8 ofCoNLL pp 53ndash7 Stroudsburg PA 2001 ACL

Seymore S B The teaching function of patents Notre Dame LawReview 85621ndash70 2010

Sheremetyeva S Natural language analysis of patent claims InProc of the ACL-2003 workshop on Patent Corpus Processingat ACL pp 66ndash73 Stroudsburg PA 2003 ACL

Szajna B Empirical evaluation of the revised technology accep-tance model Management Science 42(1)85ndash92 1996

Venkatesh V and Davis F D A theoretical extension of thetechnology acceptance model Four longitudinal field studiesManagement Science 46(2)186ndash204 2000

Venkatesh V Morris M G Davis F D and Davis G User accep-tance of information technology Toward a unified view MISQuarterly 27(3)425ndash78 2003

Wang X Liu S Liu J Chen J Zhu J and Guo B A fullpicture of relevant topics IEEE Transactions on Visualizationand Computer Graphics PP1 2016