Dsl-adsl compare

= Simple comparison of filter terms =

These are the results of applying one filter term to the gold1000 data set.

$excludes = array('dsl'); include exclude    other totals:          303      674       23 false pos:         0        1      919 false neg:       303      616        1 true pos:          0       58       22 true neg:        697      325       58 sum:            1000     1000     1000 precision:      0.00     0.98     0.02 recall:         0.00     0.09     0.96 f-measure:      0.00     0.16     0.04

$excludes = array('adsl'); include exclude    other totals:          303      674       23 false pos:         0        1      940 false neg:       303      637        1 true pos:          0       37       22 true neg:        697      325       37 sum:            1000     1000     1000 precision:      0.00     0.97     0.02 recall:         0.00     0.05     0.96 f-measure:      0.00     0.10     0.04

Note how accurate identification of entries drops from 9% to 5% when using the longer term 'adsl' in place of 'dsl'.

All Service Providers identified by 'adsl' are also identified by 'dsl'. This includes the false positive, 'sadsl - dynamic pool', which is marked as other in the gold1000 data set and marked as exclude by applying either filter term.

This work helps to refine a generic set of filter terms confident in their ability to provide good results, and without generating false positives. This is important because we would like a generic set of filter terms for use in our continuing analysis of increasing Scratchpad usage. Through identifying generic terms, we hope to correctly categorise as many new users as possible, without the need for manual review on our part.