Regular expressions and search terms for phone examiners

This page is intended to act as a resource for phone forensic examiners who are working with physical extractions (“hex dumps”) of mobile phone handsets. It aims to provide advice as well as usable search terms and regular expressions that phone examiners can use to find key data within a mass of hex data.

The page is split into two sections to reflect that certain types of data are commonly stored as files (e.g. video, MMS etc.) whereas other data types such as previous SIM details may be stored as individual “records” rather than in files. Do also refer to Gary Kessler’s excellent file signature page – we can’t live without it!

Non File Based

Data Type Example (decoded) Info on format Suggested search / regexp
IMSI 234105563888549 http://en.wikipedia.org/wiki/IMSI Country/region specific – See below
IMSI prefixes are country specific so a single global IMSI search probably doesn’t make sense (and would return too many ‘false positive’ hits). Encoding of the IMSI within the handset memory will vary, but some handsets will follow the SIM card standards and store the IMSI in hex in a reverse nibbled decimal format.For example, the IMSI 234105563888549 would appear as the hex 29 43 01 55 36 88 58 94. Some Samsung and Nokia handsets certainly use this approach.IMSIs are normally 15 digits long, the leading “9” in the above example creates 8 pairs of hex digits. IMSIs start with the Mobile Country Code (for the UK this is “234”) which is followed by the Mobile Network Code. In the UK, the MNCs “10”, “15”, “20”, “30” and “33” are used by the five UK networks O2, Vodafone, 3, T-Mobile and Orange respectively.So the following regular expression:
\x29\x43[\x01|\x02|\x03|\x33|\x51]

could be used to find all occurrences of the five valid UK IMSI prefixes.

The regular expression could be extended to specify that the remaining 9 characters must be digits, but this makes the regular expression much longer and somewhat confusing (due to the fact that we are writing a regular expression to match hex rather than one to match text).

Some Samsung handsets (e.g. E1080i, E1120) appear to always store

\x11\xF2\xFF

directly after the IMSI. Hence the regular expression above can be extended to include the last 5 bytes of the IMSI and this (Samsung-specific) trailer:

\x29\x43[\x01|\x02|\x03|\x33|\x51].{5}\x11\xF2\xFF
Data Type Example (decoded) Info on format Suggested search / regexp
ICCID 8944110064815443720 http://en.wikipedia.org/wiki/ICCID#ICCID Country/region specific – See below
ICCID prefixes are also country specific and country-specific searching is a good place to start. ICCIDs are also often stored (as per the SIM standards) in hex in reverse nibbled decimal format like IMSIs See the notes above on IMSI encoding.ICCIDs are normally 19 or 20 digits long, so expect 10 pairs of hex digits, possibly using an “F” as padding  if the ICCID was only 19 digits long (that’s how it’s done on the SIM and hence some manufacturers will follow this). All ICCIDs start “89″ followed by the dialling code for that country and then a two digit identifier for the issuing service provider (known as the Issuer Identifier Number). For example, all UK SIM cards all start “8944” and the Issuer Identifier Numbers  “10”, “11”, “20”, “30” and “12” are used by the five networks Vodafone, O2, 3, T-Mobile and Orange respectively.So the following regular expression:

\x98\x44[\x01|\x02|\x03|\x11|\x21]

could be used to find all occurrences of the five valid UK ICCID prefixes.

Data Type Example (decoded) Info on format Suggested search / regexp
SMS N/A GSM 03.38 (GSM 7 Bit Alphabet)GSM03.40http://www.dreamfabric.com/sms/ Country/region specific – See below
Format for transmission of SMS across the network (and storage on the SIM) is standardised, but copies of SMS in handset memory (or on removable media) are often manufacturer-specific. It makes sense to start by looking for SMS using the GSM standard format and then adjusting the search based on the results.The start of a GSM standard SMS header includes the phone number of the SMSC responsible for handling the message as well as the length of that phone number. The length and format of phone numbers varies from country to country, so it is most practical to start by looking for SMS which have SMSC numbers relating to a specific country. The very first byte of a standards-based SMS is the status of the message which could be “00” (deleted) , “01”  (read),  “03” (unread),  “05” (sent) or “07” (unsent). The second byte stores the length of the SMSC number counted in bytes (including the Type of Number field which itself is 1 byte long) which should be consistent within a particular country. The Type of Number (TON) field is usually set to “91” for the SMSC number. This indicates that the number which follows is in international format (e.g. “447958879884”) – again the first few digits of the SMSC number will be the country code which can be included in the search expression. Taking the UK as an example, a typical SMS header for a read message might look like:

01 07 91 44

We can choose to ignore the status byte in our search (as manufacturers may have chosen to not include this) and just search for the characteristic “07 91 44” string.  So, to search for SMS relating relating to UK subscribers, we could use the regular expression:

\x07\x91\x44

This search term may identify SMS present within a memory dump. Further work will be required to determine the actual start and end of the SMS message. The message content itself is quite possibly encoded using the GSM 7-bit alphabet (i.e. not ASCII) so don’t expect the words of the text message to jump out at you!

Some Samsung handsets (e.g. D500, D600 etc.) use the keyword “DEADBEEF” as a delimiter between text messages in the handset memory. Running a ASCII text search for “DEADBEEF” on such a handset should yield results – and if you are searching the OneNAND raw memory then you’ll be looking at live and deleted messages (only live messages can be recovered from the FAT file system layer).

File Based

Data Type Info on format Suggested search / regexp
MMS MMS Encapsulation Protocol (Candidate Version 1.3, 28-Jan-2008), Open Mobile AllianceMMS Conformance Document (Candidate Version 1.3, 28-Jan-2008), Open Mobile Alliance
\x8C[\x80\x84]\x98
OR
<smil
Format for transmission is standardised but copies of MMS in handset memory are often in manufacturer-specific format. If the manufacturer has stored the message in a proprietary format, it may still be able to locate them using an ASCII text search for “<smil” which should appear near the start of a message. MMS messages start with a “tag” (0x8C) which indicates that the first header field will be the message type (which itself is one byte long). The message types of most interest are 0×80 (sent messages) and 0×84 (received messages). The third byte in the message should be the tag 0×98 which indicates that the second header field that follows is the Transaction-ID.As such, we can search for all sent and received messages with the regular expression:

\x8C[\x80\x84]\x98
Data Type Info on format Suggested search / regexp
3GP / MP4 MP4 file format defined in MPEG Part 14 (ISO/IEC 14496-14:2003)
\x00\x00\x00.\x66\x74\x79\x70
Commonly encountered “4th byte” values are 0×14, 0×18, 0x1C, 0×20. The regular expression above will match on 3GP/MP4 headers with any value present in the 4th byte.