Duplicate user accounts on a single system are sooner or later causing a nightmare. One ambition of the SWITCH edu-ID has always been the prevention of duplicate user accounts. However, only a few weeks after the edu-ID launch in 2015 we already found indications for a couple of duplicate accounts. How did that come about and what can we do to prevent duplicate accounts?
Honestly, the truth is that one will never be able to prevent duplicate accounts completely. However, we do our best struggling against duplicates accounts, to keep them as few as possible.
Why is it so hard to prevent duplicate accounts? To prevent duplicate accounts, some unique information is needed that ties two accounts to the same user. However, an edu-ID account in its simplest form can be created only with a name and a valid e-mail address. This is to keep the barrier for account creation low. Simple account creation is required by some services that make use of the edu-ID. We have considered to ask for the day of birth as well, but there are data privacy and other reasons that speak against this.
So, let’s have a closer look at the registration form shown above. The name of a user clearly is not unique information as the currently existing accounts confirm: At the time of writing this article, of the 63’000 edu-ID account there exist for example:
- 11 accounts with the name “Thomas Müller”
- 8 accounts with the name “Christoph Müller”
- 7 accounts with the name “Andreas Meier”
- 7 accounts with the name “David Schmid”
- … and so on
In total, about 3.2% of all edu-ID accounts have a non-unique name.
Is the e-mail address a unique value? Yes, if considered at a specific point in time. Therefore, we use it for duplicate prevention. However, nowadays most users have more than one e-mail address, for example, one private and one organisational e-mail address. When users change to another institution, they typically get yet another e-mail address and by accident they then may create a new edu-ID account using this other e-mail address.
So, what means do we have to prevent accidental duplicates? There are currently only the following two methods:
- When an edu-ID account is created, a long-term cookie is stored in the web browser. The same happens when an authenticated user accesses his edu-ID account management page. This long-term cookie is used to prevent accidental creation of duplicate account. The cookie is, however, not synced between a user’s web browsers and devices. Therefore, duplicate accounts can still be created if somebody uses different devices/web browsers and a second e-mail address.
- When a user tries to link unique data (e-mail address, mobile number, ORCID, AAI unique identifier) to an account, these values are compared to the data that is already associated with accounts. If this is the case, a message informs the user that he may already have another account. Also, a direct link to automatically deduplicate accounts is provided. Additionally, an e-mail is sent to the user containing the same information. A reminder e-mail a few weeks later helps motivating users to proceed with the automatic account deduplication if this has not happened yet.
Summary & Outlook
As was shown in this article, we have only few means to prevent duplicate accounts. It is therefore not possible to prevent all duplicate accounts. However, as briefly mentioned above, there is also a deduplication process that helps reducing the number of identified duplicate accounts.
How edu-ID accounts can be automatically deduplicated will be the topic of a next blog post.
3 thoughts on “Clone Wars”
Lukas, I really how many paramters you use for finding duplicate accounts. I would be interested to see how you handle mobile numbers because they tend to be redistrubted by telco providers. The same is true for email accounts but less likely because the theoretical pool of unique email addresses is much larger.
Hi Lars, thanks for your question. Currently, this case is not handled automatically. We intend to create an automatic process that reverifies email addresses and other verifiable data after a certain time. However, even with this process in place it can happen that somebody verifies his number today and gives up his mobile phone number tomorrow, and maybe in a few months another person gets assigned that number. So, we have yet to get an overview how quickly numbers and addresses are reassigned by telcos and universities.