TLScompare.org - Crowdsourcing Rules for HTTPS Everywhere

18

Transcript of TLScompare.org - Crowdsourcing Rules for HTTPS Everywhere

TLScompareCrowdsourcing Rulesfor HTTPS Everywhere

Wilfried Mayer, Mar�n Schmiedecker

Introduc�on

• not used – although deployed◦ Services do not enforce it◦ Service operators could

− TCP Port redirect− HSTS

• tackles this problem◦ browser extension (client side)◦ prefers HTTPS over HTTP◦ depends on manually cra�ed rules

2

HTTPS Everywhere Rules

• Currently: Manually cra�ed◦ <target host="example.org" />

◦ <rule from="^http:"

to="https:" />

◦ <rule from="^http://([^/:@\.]+\.)?

torproject.org/"

to= "https://$1torproject.org/" />

• Rule Valida�on is not trivial◦ Equality◦ Same-same, but different

3

Similarity

Dynamic content, different ads, ...

4

Current Rules

Files with 1 rule 13,871Files with 2–10 rules 4,496Files with more than 10 rules 44Total rules 26,522Trivial rules 7,528Without reference 19,267

Current Rules

5

Automated Rule Genera�on

“Is it possible to automa�callygenerate these rules?”→ Trivial

“Are two pages similar?”→ Complex

Matchingalgorithms Crowdsourcing

6

Methodologies

Algorithm-basedsolu�on

Reality Empirical test(Crowdsourcing)

1.0...0.5...

0.0

�/�...

4/5...�/4

XXXXX...? ? ? ? ?...×××××

7

Use Cases for Empirical Tes�ng

• Valida�ng algorithms• Valida�ng exis�ng rules• Evalua�ng edge cases

8

TLScompare.org

9

TLScompare.org

10

TLScompare.org

Admin panel

11

TLScompare.org

• Iden�fy domains◦ Internet-wide scanning techniques◦ Filter HSTS or server-side redirects◦ Alexa Top 1 million ranking

• Test◦ Store IP, Date�me, User Agent, result◦ Session ID to filter bogus a�empts

• Evaluate results• If feasible: create minimal ruleset

12

Results

Equal 169Not Equal (Total) 358

Dataset for exis�ng HTTPS Everywhere rules

Total a�empts 2,600Total results 2,267Equal 1,688Not Equal 579

Dataset for similar Alexa Top 10k domains

13

Results

Combina�on Results %0 842 66%1 443 34%00 612 53%10 148 13%11 394 34%000 67 58%100 6 14%110 10111 32 28%. . . . . . . . .Mul�ple results to ensure data quality

14

Discussion

• Tool for manual crea�on• Commercial crowdsourcing pla�orms• Data quality• Reproducibility

15

Discussion cont.

• Future work◦ Different algorithms◦ Integrated algorithms◦ Improve selec�on◦ Fully automated genera�on

• Valid HTTPS Everywhere rules• Increase HTTPS usage

16

[email protected]

Images: Sean MacEntee, h�p://flickr.com/photos/18090920@N07/15944989872. (cc-by-2.0)

EFF, h�ps://www.eff.org/h�ps-everywhere

17