netzob.Inference.Vocabulary.FormatOperations package

Submodules

netzob.Inference.Vocabulary.FormatOperations.ClusterByAlignment module

class ClusterByAlignment(minEquivalence=50, internalSlick=True, recomputeMatrixThreshold=None)[source]

Bases: object

This clustering process regroups messages in groups that maximes their alignement. It provides the required methods to compute clustering between multiple symbols/messages using UPGMA algorithms (see U{http://en.wikipedia.org/wiki/UPGMA}). When processing, the matrix of scores is computed by the C extensions (L{_libScoreComputation} and used to regroup messages and symbols into equivalent cluster.

>>> from netzob.all import *
>>> pseudos = ["zoby", "ditrich", "toto", "carlito"]
>>> cities = ["Paris", "Munich", "Barcelone", "Vienne"]
>>> ips = ["192.168.0.10", "10.120.121.212", "78.167.23.10"]
>>> # Creation of the different types of message
>>> msgsType1 = [ RawMessage("hello {0}, what's up in {1} ?".format(pseudo, city)) for pseudo in pseudos for city in cities]
>>> msgsType2 = [ RawMessage("My ip address is {0}".format(TypeConverter.convert(ip, IPv4, Raw))) for ip in ips]
>>> msgsType3 = [ RawMessage("Your IP is {0}, name = {1} and city = {2}".format(TypeConverter.convert(ip, IPv4, Raw), pseudo, city)) for ip in ips for pseudo in pseudos for city in cities]
>>> messages = msgsType1+msgsType2+msgsType3
>>> clustering = ClusterByAlignment()
>>> symbols = clustering.cluster(messages)
>>> len(symbols)
3
>>> symbols[0].addEncodingFunction(TypeEncodingFunction(HexaString))
>>> print symbols[0]
Field                                | Field    | Field | Field   
------------------------------------ | -------- | ----- | --------
'4d79206970206164647265737320697320' | '4ea717' | '0a'  | ''      
'4d79206970206164647265737320697320' | 'c0a800' | '0a'  | ''      
'4d79206970206164647265737320697320' | ''       | '0a'  | '7879d4'
------------------------------------ | -------- | ----- | --------
>>> print symbols[2]
Field    | Field     | Field             | Field       | Field
-------- | --------- | ----------------- | ----------- | -----
'hello ' | 'carlito' | ", what's up in " | 'Munich'    | ' ?' 
'hello ' | 'carlito' | ", what's up in " | 'Paris'     | ' ?' 
'hello ' | 'ditrich' | ", what's up in " | 'Munich'    | ' ?' 
'hello ' | 'ditrich' | ", what's up in " | 'Paris'     | ' ?' 
'hello ' | 'carlito' | ", what's up in " | 'Vienne'    | ' ?' 
'hello ' | 'ditrich' | ", what's up in " | 'Vienne'    | ' ?' 
'hello ' | 'toto'    | ", what's up in " | 'Paris'     | ' ?' 
'hello ' | 'zoby'    | ", what's up in " | 'Paris'     | ' ?' 
'hello ' | 'toto'    | ", what's up in " | 'Munich'    | ' ?' 
'hello ' | 'zoby'    | ", what's up in " | 'Munich'    | ' ?' 
'hello ' | 'toto'    | ", what's up in " | 'Vienne'    | ' ?' 
'hello ' | 'zoby'    | ", what's up in " | 'Vienne'    | ' ?' 
'hello ' | 'carlito' | ", what's up in " | 'Barcelone' | ' ?' 
'hello ' | 'ditrich' | ", what's up in " | 'Barcelone' | ' ?' 
'hello ' | 'toto'    | ", what's up in " | 'Barcelone' | ' ?' 
'hello ' | 'zoby'    | ", what's up in " | 'Barcelone' | ' ?' 
-------- | --------- | ----------------- | ----------- | -----
cluster(*args, **kwargs)[source]
internalSlick

If active, the alignment is slicked during the merging process while if not active, only final alignment is slicked.

minEquivalence

Minimum equivalence score under which two messages are note considered as similar.

Return type:float
recomputeMatrixThreshold

netzob.Inference.Vocabulary.FormatOperations.ClusterByApplicativeData module

class ClusterByApplicativeData[source]

Bases: object

This operations cluster messages in symbols following their embedded applicative data.

The clustering by applicative data use the netzob.Inference.Search.SearchEngine.SearchEngine to search applicative data in messages and cluster together message with the same applicative data. In the example below, we generate two types of messages. The first, contains the pseudo and the city of the user: two applicative datas. While the second type of message includes the IP address of the user another applicative data.

>>> from netzob.all import *
>>> pseudos = ["zoby", "ditrich", "toto", "carlito"]
>>> cities = ["Paris", "Munich", "Barcelone", "Vienne"]
>>> ips = ["192.168.0.10", "10.120.121.212", "78.167.23.10"]
>>> # Build applicative data
>>> appPseudos = [ApplicativeData("Pseudo", ASCII(pseudo)) for pseudo in pseudos]
>>> appCities = [ApplicativeData("City", ASCII(city)) for city in cities]
>>> appIps = [ApplicativeData("IPs", IPv4(ip)) for ip in ips]
>>> appDatas = appPseudos + appCities + appIps
>>> # Creating messages using application data
>>> msgsType1 = [ RawMessage("hello {0}, what's up in {1} ?".format(pseudo, city)) for pseudo in pseudos for city in cities]
>>> msgsType2 = [ RawMessage("My ip address is {0}".format(TypeConverter.convert(ip, IPv4, Raw))) for ip in ips]
>>> messages = msgsType1+msgsType2
>>> appCluster = ClusterByApplicativeData()
>>> symbols = appCluster.cluster(messages, appDatas)
>>> len(symbols)
2
>>> len(symbols[0].messages) == 3 or len(symbols[0].messages) == 16
True
>>> len(symbols[1].messages) == 3 or len(symbols[1].messages) == 16
True
cluster(*args, **kwargs)[source]

netzob.Inference.Vocabulary.FormatOperations.ClusterByKeyField module

class ClusterByKeyField[source]

Bases: object

This operation clusters the messages belonging to the specified field following their value in the specified key field.

cluster(*args, **kwargs)[source]

Create and return new symbols according to a specific key field.

>>> import binascii
>>> from netzob.all import *
>>> samples = ["00ff2f000000",  "000020000000", "00ff2f000000"]
>>> messages = [RawMessage(data=binascii.unhexlify(sample)) for sample in samples]
>>> f1 = Field(Raw(nbBytes=1))
>>> f2 = Field(Raw(nbBytes=2))
>>> f3 = Field(Raw(nbBytes=3))
>>> symbol = Symbol([f1, f2, f3], messages=messages)
>>> symbol.addEncodingFunction(TypeEncodingFunction(HexaString))
>>> newSymbols = Format.clusterByKeyField(symbol, f2)
>>> for sym in newSymbols.values():
...     sym.addEncodingFunction(TypeEncodingFunction(HexaString))
...     print sym.name + ":"
...     print sym
Symbol_ff2f:
Field | Field  | Field   
----- | ------ | --------
'00'  | 'ff2f' | '000000'
'00'  | 'ff2f' | '000000'
----- | ------ | --------
Symbol_0020:
Field | Field  | Field   
----- | ------ | --------
'00'  | '0020' | '000000'
----- | ------ | --------
Parameters:

:raise Exception if something bad happens

netzob.Inference.Vocabulary.FormatOperations.ClusterBySize module

class ClusterBySize[source]

Bases: object

This clustering process regroups messages that have equivalent size.

cluster(*args, **kwargs)[source]

Create and return new symbols according to the messages size.

>>> from netzob.all import *
>>> import binascii
>>> samples = ["00ffff1100abcd", "00aaaa1100abcd", "00bbbb1100abcd", "001100abcd", "001100ffff", "00ffffffff1100abcd"]
>>> messages = [RawMessage(data=binascii.unhexlify(sample)) for sample in samples]
>>> clusterer = ClusterBySize()
>>> newSymbols = clusterer.cluster(messages)
>>> for sym in newSymbols:
...     print "[" + sym.name + "]"
...     sym.addEncodingFunction(TypeEncodingFunction(HexaString))
...     print sym
[symbol_9]
Field               
--------------------
'00ffffffff1100abcd'
--------------------
[symbol_5]
Field       
------------
'001100abcd'
'001100ffff'
------------
[symbol_7]
Field           
----------------
'00ffff1100abcd'
'00aaaa1100abcd'
'00bbbb1100abcd'
----------------
Parameters:messages (a list of netzob.Common.Models.Vocabulary.Messages.AbstractMessage.AbstractMessage) – the messages to cluster.

:raise Exception if something bad happens

netzob.Inference.Vocabulary.FormatOperations.FieldOperations module

class FieldOperations[source]

Bases: object

This class offers various operations to support manual merge and split of fields.

mergeFields(*args, **kwargs)[source]

Merge specified fields.

>>> import binascii
>>> from netzob.all import *
>>> samples = ["00ff2f000000", "000010000000",  "00fe1f000000"]
>>> messages = [RawMessage(data=binascii.unhexlify(sample)) for sample in samples]
>>> f1 = Field(Raw(nbBytes=1), name="f1")
>>> f2 = Field(Raw(nbBytes=2), name="f2")
>>> f3 = Field(Raw(nbBytes=2), name="f3")
>>> f4 = Field(Raw(nbBytes=1), name="f4")
>>> symbol = Symbol([f1, f2, f3, f4], messages=messages)
>>> symbol.addEncodingFunction(TypeEncodingFunction(HexaString))
>>> print symbol
f1   | f2     | f3     | f4  
---- | ------ | ------ | ----
'00' | 'ff2f' | '0000' | '00'
'00' | '0010' | '0000' | '00'
'00' | 'fe1f' | '0000' | '00'
---- | ------ | ------ | ----
>>> fo = FieldOperations()
>>> fo.mergeFields(f2, f3)
>>> print symbol
f1   | Merge      | f4  
---- | ---------- | ----
'00' | 'ff2f0000' | '00'
'00' | '00100000' | '00'
'00' | 'fe1f0000' | '00'
---- | ---------- | ----
>>> fo.mergeFields(symbol.fields[0], symbol.fields[1])
>>> print symbol
Merge        | f4  
------------ | ----
'00ff2f0000' | '00'
'0000100000' | '00'
'00fe1f0000' | '00'
------------ | ----
>>> fo.mergeFields(symbol.fields[0], symbol.fields[1])
>>> print symbol
Merge         
--------------
'00ff2f000000'
'000010000000'
'00fe1f000000'
--------------
Parameters:

:raise Exception if something bad happens

netzob.Inference.Vocabulary.FormatOperations.FieldReseter module

class FieldReseter[source]

Bases: object

This class defines the required operation to reset the definition of a field. It reinitializes the definition domain as a raw field and delete its children.

>>> import binascii
>>> from netzob.all import *
>>> samples = ["00ff2f000000",      "000010000000", "00fe1f000000"]
>>> messages = [RawMessage(data=binascii.unhexlify(sample)) for sample in samples]
>>> f1 = Field(Raw(nbBytes=1), name="f1")
>>> f21 = Field(Raw(nbBytes=1), name="f21")
>>> f22 = Field(Raw(nbBytes=1), name="f22")
>>> f2 = Field(name="f2")
>>> f2.fields = [f21, f22]
>>> f3 = Field(Raw(), name="f3")
>>> symbol = Symbol([f1, f2, f3], messages=messages)
>>> symbol.addEncodingFunction(TypeEncodingFunction(HexaString))
>>> print symbol
f1   | f21  | f22  | f3      
---- | ---- | ---- | --------
'00' | 'ff' | '2f' | '000000'
'00' | '00' | '10' | '000000'
'00' | 'fe' | '1f' | '000000'
---- | ---- | ---- | --------
>>> reseter = FieldReseter()
>>> reseter.reset(symbol)
>>> symbol.addEncodingFunction(TypeEncodingFunction(HexaString))
>>> print symbol
Field         
--------------
'00ff2f000000'
'000010000000'
'00fe1f000000'
--------------
reset(*args, **kwargs)[source]

Resets the format (field hierarchy and definition domain) of the specified field.

Parameters:field (netzob.Common.Models.Vocabulary.AbstractField.AbstractField) – the field we want to reset

:raise Exception if something bad happens

netzob.Inference.Vocabulary.FormatOperations.FieldSplitDelimiter module

class FieldSplitDelimiter[source]

Bases: object

static split(*args, **kwargs)[source]

Split a field (or symbol) with a specific delimiter. The delimiter can be passed either as an ASCII, a Raw, an HexaString, or any objects that inherit from AbstractType.

>>> from netzob.all import *
>>> samples = ["aaaaff000000ff10",      "bbff110010ff00000011", "ccccccccfffe1f000000ff12"]
>>> messages = [RawMessage(data=sample) for sample in samples]
>>> symbol = Symbol(messages=messages[:3])
>>> Format.splitDelimiter(symbol, ASCII("ff"))
>>> print symbol
Field-0    | Field-sep-6666 | Field-2      | Field-sep-6666 | Field-4   
---------- | -------------- | ------------ | -------------- | ----------
'aaaa'     | 'ff'           | '000000'     | 'ff'           | '10'      
'bb'       | 'ff'           | '110010'     | 'ff'           | '00000011'
'cccccccc' | 'ff'           | 'fe1f000000' | 'ff'           | '12'      
---------- | -------------- | ------------ | -------------- | ----------
>>> samples = ["434d446964656e74696679230400000066726564", "5245536964656e74696679230000000000000000", "434d44696e666f2300000000", "524553696e666f230000000004000000696e666f","434d4473746174732300000000","52455373746174732300000000050000007374617473","434d4461757468656e7469667923090000006d7950617373776421","52455361757468656e74696679230000000000000000","434d44656e6372797074230a00000031323334353674657374","524553656e637279707423000000000a00000073707176777436273136","434d4464656372797074230a00000073707176777436273136","5245536465637279707423000000000a00000031323334353674657374","434d446279652300000000","524553627965230000000000000000","434d446964656e746966792307000000526f626572746f","5245536964656e74696679230000000000000000","434d44696e666f2300000000","524553696e666f230000000004000000696e666f","434d4473746174732300000000","52455373746174732300000000050000007374617473","434d4461757468656e74696679230a000000615374726f6e67507764","52455361757468656e74696679230000000000000000","434d44656e63727970742306000000616263646566","524553656e6372797074230000000006000000232021262724","434d44646563727970742306000000232021262724","52455364656372797074230000000006000000616263646566","434d446279652300000000","524553627965230000000000000000"]
>>> messages = [RawMessage(data=TypeConverter.convert(sample, HexaString, Raw)) for sample in samples]
>>> symbol = Symbol(messages=messages)
>>> symbol.encodingFunctions.add(TypeEncodingFunction(ASCII))  # Change visualization to hexastring
>>> Format.splitDelimiter(symbol, ASCII("#"))
>>> print symbol
Field-0         | Field-sep-23 | Field-2              | Field-sep-23 | Field-4
--------------- | ------------ | -------------------- | ------------ | -------
'CMDidentify'   | '#'          | '....fred'           | ''           | ''     
'RESidentify'   | '#'          | '........'           | ''           | ''     
'CMDinfo'       | '#'          | '....'               | ''           | ''     
'RESinfo'       | '#'          | '........info'       | ''           | ''     
'CMDstats'      | '#'          | '....'               | ''           | ''     
'RESstats'      | '#'          | '........stats'      | ''           | ''     
'CMDauthentify' | '#'          | '....myPasswd!'      | ''           | ''     
'RESauthentify' | '#'          | '........'           | ''           | ''     
'CMDencrypt'    | '#'          | '....123456test'     | ''           | ''     
'RESencrypt'    | '#'          | "........spqvwt6'16" | ''           | ''     
'CMDdecrypt'    | '#'          | "....spqvwt6'16"     | ''           | ''     
'RESdecrypt'    | '#'          | '........123456test' | ''           | ''     
'CMDbye'        | '#'          | '....'               | ''           | ''     
'RESbye'        | '#'          | '........'           | ''           | ''     
'CMDidentify'   | '#'          | '....Roberto'        | ''           | ''     
'RESidentify'   | '#'          | '........'           | ''           | ''     
'CMDinfo'       | '#'          | '....'               | ''           | ''     
'RESinfo'       | '#'          | '........info'       | ''           | ''     
'CMDstats'      | '#'          | '....'               | ''           | ''     
'RESstats'      | '#'          | '........stats'      | ''           | ''     
'CMDauthentify' | '#'          | '....aStrongPwd'     | ''           | ''     
'RESauthentify' | '#'          | '........'           | ''           | ''     
'CMDencrypt'    | '#'          | '....abcdef'         | ''           | ''     
'RESencrypt'    | '#'          | '........'           | '#'          | " !&'$"
'CMDdecrypt'    | '#'          | '....'               | '#'          | " !&'$"
'RESdecrypt'    | '#'          | '........abcdef'     | ''           | ''     
'CMDbye'        | '#'          | '....'               | ''           | ''     
'RESbye'        | '#'          | '........'           | ''           | ''     
--------------- | ------------ | -------------------- | ------------ | -------
>>> print symbol.fields[0]._str_debug()
Field-0
|--   Alt
      |--   Data (Raw='CMDidentify' ((0, 88)))
      |--   Data (Raw='RESidentify' ((0, 88)))
      |--   Data (Raw='CMDinfo' ((0, 56)))
      |--   Data (Raw='RESinfo' ((0, 56)))
      |--   Data (Raw='CMDstats' ((0, 64)))
      |--   Data (Raw='RESstats' ((0, 64)))
      |--   Data (Raw='CMDauthentify' ((0, 104)))
      |--   Data (Raw='RESauthentify' ((0, 104)))
      |--   Data (Raw='CMDencrypt' ((0, 80)))
      |--   Data (Raw='RESencrypt' ((0, 80)))
      |--   Data (Raw='CMDdecrypt' ((0, 80)))
      |--   Data (Raw='RESdecrypt' ((0, 80)))
      |--   Data (Raw='CMDbye' ((0, 48)))
      |--   Data (Raw='RESbye' ((0, 48)))

:param field : the field to consider when spliting :type: netzob.Common.Models.Vocabulary.AbstractField.AbstractField :param delimiter : the delimiter used to split messages of the field :type: netzob.Common.Models.Types.AbstractType.AbstractType

netzob.Inference.Vocabulary.FormatOperations.FindKeyFields module

class FindKeyFields[source]

Bases: object

This class provides methods to identify potential key fields in symbols/fields.

execute(*args, **kwargs)[source]

Try to identify potential key fields in a symbol/field.

>>> import binascii
>>> from netzob.all import *
>>> samples = ["00ff2f000011",  "000010000000", "00fe1f000000", "000020000000", "00ff1f000000", "00ff1f000000", "00ff2f000000", "00fe1f000000"]
>>> messages = [RawMessage(data=binascii.unhexlify(sample)) for sample in samples]
>>> symbol = Symbol(messages=messages)
>>> Format.splitStatic(symbol)
>>> symbol.addEncodingFunction(TypeEncodingFunction(HexaString))
>>> print symbol
Field-0 | Field-1 | Field-2 | Field-3
------- | ------- | ------- | -------
'00'    | 'ff2f'  | '0000'  | '11'   
'00'    | '0010'  | '0000'  | '00'   
'00'    | 'fe1f'  | '0000'  | '00'   
'00'    | '0020'  | '0000'  | '00'   
'00'    | 'ff1f'  | '0000'  | '00'   
'00'    | 'ff1f'  | '0000'  | '00'   
'00'    | 'ff2f'  | '0000'  | '00'   
'00'    | 'fe1f'  | '0000'  | '00'   
------- | ------- | ------- | -------
>>> finder = FindKeyFields()
>>> results = finder.execute(symbol)
>>> for result in results:
...     print "Field name: " + result["keyField"].name + ", number of clusters: " + str(result["nbClusters"]) + ", distribution: " + str(result["distribution"])
Field name: Field-1, number of clusters: 5, distribution: [2, 1, 2, 2, 1]
Field name: Field-3, number of clusters: 2, distribution: [1, 7]
Parameters:field (netzob.Common.Models.Vocabulary.AbstractField.AbstractField) – the field in which we want to identify key fields.

:raise Exception if something bad happens

Module contents