Skip to main content

Consuming Socratex information.

In the previous blog items about Socratex we talked about how to get an XML representation of the Abstract syntax trees into the BaseX XML database, and we have seen that we can use the XQuery language to express queries over those ASTs. In the previous posts we showed how the Socratex Explorer can be used to author and run such queries, and how to consume the results – The source code is shown in the Socratex explorer (SE) if the result elements have a few special attributes. This is all good, and it will satisfy a lot of requirements, but I do want to show some more advanced uses that you may want to use for special purposes. Some of these purposes will become clearer as we continue our Socratex learning path.

Getting data from the database

In this section we will talk about how you can extract data from the database. There are several ways, each one applicable for different tasks.

Using the BaseX client application

The BaseX client is a simple command line tool that connects to the BaseX server using the credentials provided, sets the context to the database you provide with the -i flag, and runs the query provided in a file that you provide as an argument to the tool. Here is a sample invocation:

$ basexclient.bat -Uadmin -Padmin -iApplicationFoundation Desktop\cnt.xq

You will need to provide the proper installation path for the tool if you do not have it in your path. If the tooling is installed using the default options it will live in the C:\Program Files (x86)\BaseX\bin  directory. 

Using the Bulk Extractor

The basex client tool mentioned above works well for simple, one-off situations. However, if you need to execute many queries it becomes tedious to log in, set the context (opening the database) and log out for every query. If you need to do multiple extractions, there is better option in the Socratex github repo, namely the BulkQuery tool. This tool is given a pattern that matches the files that you want to use to query. There is only one login, and one opening of the database. The tool is multithreaded, and can evaluate many queries at the same time, so it is far more effective than running a series of basex client invocations. We will see why this is important in subsequent blogs. Here is the information you will get if you use the -? switch:

Usage
    BulkQuery [options] [<args>...]

Arguments:
    <args> The files containing the queries to execute. A list of query files can be provided, including wildcards.

Options:
    --server <server> The name of the BaseX server.
    --password <password> The password used to log into the BaseX server.
    --database  <database> The name of the database to be queries on the BaseX server.
    --username <username> The user name. The default is "admin".
   --output-directory <output-directory> The target directory.
   --extension <extension> The file extension of the extracted files.
   --port <port> The port number where the BaseX server listens for connections.
   --verbose Print extra information to the console.
   --version Show version information
    -?, -h, --help Show help and usage information

In this example:

$ BulkQuery --server localhost --password secret --database mycodebase *.xq

the tool will pick up app the .xq files in the current directory and execute those, leaving the results in the current directory (with the same names, with the original extensions replaced by .xml). You can provide a directory where the results are stored with the –output-directory flag.

The Socratex Explorer

This is the tool that we have been advocating for usage in the previous blogs. It is convenient because it allows you to immediately see the code that is matched if you provide certain attributes (namely the Artifact, StartLine, EndLine, StartCol and EndCol attributes) on your results. We will not say more about this here.

Executing queries from code.

This is useful mainly if you want to integrate results into your own custom applications. The BaseX repo contains drivers for most programming languages. I will present a simple case using Python below:

from BaseXClient import BaseXClient
import matplotlib.pyplot as plt
import xml.etree.ElementTree as ET

# create session
session = BaseXClient.Session('localhost', 1984, 'admin', 'admin')

try:
    # create query instance
    input = '''let $results :=
<Results>
{
    for $c in /Class | /Table

    let $allMethods := count($c/Method)
    let $privateMethods := count($c/Method[@IsPrivate = 'true'])
    let $protectedMethods := count($c/Method[@IsProtected = 'true']) (: Explicitly marked with protected keyword :)
    let $publicMethods := count($c/Method[@IsPublic = 'true'])
    let $internalMethods := count($c[@IsInternal = 'true'])

    (: Compensate for methods that lack a visibility keyword: They are protected. :)
    let $protectedMethods := $protectedMethods + ($allMethods - $privateMethods - $protectedMethods - $publicMethods - $internalMethods)

    return <Result Class='{$c/@Name}' PrivateMethodCount='{$privateMethods}'ProtectedMethodCount='{$protectedMethods}'
                                     PublicMethodCount='{$publicMethods}' InternalMethodCount='{$internalMethods}'/>

}
</Results>
return <Totals
    PrivateMethodCount='{sum($results/Result/@PrivateMethodCount)}'
    ProtectedMethodCount='{sum($results/Result/@ProtectedMethodCount)}'
    PublicMethodCount='{sum($results/Result/@PublicMethodCount)}'
    InternalMethodCount='{sum($results/Result/@InternalMethodCount)}' />'''

    model = "ApplicationPlatform"
    session.execute("open " + model)
    query = session.query(input)

    # There is only one result, an XML tag called Totals with the
    # attributes provided.
    res = query.execute()

    # Do something with the result.

    tree = ET.fromstring(res)
    privateCount = tree.get("PrivateMethodCount")
    protectedCount = tree.get("ProtectedMethodCount")
    publicCount = tree.get("PublicMethodCount")
    internalCount = tree.get("InternalMethodCount")

    fig = plt.figure(figsize=(5,5)) # In inches(!)
    piechart = fig.add_subplot(111)

    piechart.pie([privateCount, protectedCount, internalCount, publicCount],
                 labels=("Private", "Protected", "Internal", "Public"),
                 shadow=True, startangle=90, explode=(0.1, 0,0,0), autopct='%1.1f%%')
    piechart.set_title ("Distribution of method visiblity\nModel: " + model, fontsize=20)
    piechart.axis('equal') # Equal aspect ratio ensures that pie is drawn as a circle.


    plt.tight_layout()
    plt.show()

    # close query object
    query.close()

finally:
    # close session
    if session:
        session.close()

When run (make sure you install the BaseXClient and matplotlib libraries using the pip tool) you will get this bitmap:

Bitmap generated with Python

Using REST queries

There is one more way to get data from the XML database: Using REST queries. To use these you need to start the BaseX HTTP server. If you are using large databases, you may need to use the -Xmx java switch to allow the server to use more memory. You can find the details about what is possible on this page: REST - BaseX Documentation. Let us start by using the Advanced Rest Client app (the open source Postman eqiuvalent) to experiment with this feature. Set up a POST query on the endpoint for the BaseX rest server, appended with the name of the database we want to use (ApplicationFoundation in this case). Set up the Authorization to be Basic (i.e. name and password) and provide that information. In the body of the request, up in the query that you want to execute, as shown in the example below.

Advanced REST Client doing POST against BaseX

You can easily write the code to send a REST request in any language, on any platform. I am not going to show that here, since it is common knowledge.

Data Visualization

In the python example above, I went to extreme measures (at least for me) to get a bitmap representation of my results. It is safe to argue that you should not have to write all the plumbing code for the graphics yourself, of course. Instead include the information in Excel or Power BI and have them create the artwork for you.

Excel

Sometimes you may need to deliver results in other formats than XML. In such cases you will be happy to know that this is easy to provide results in other formats than XML in the BaseX system. Let us say that you wanted to get the code metrics in a comma separated file instead of the normal XML format (and, as we will see later on our journey there are good reasons why you would need this): All you need to do is to use the csv:serialize(doc, options) from your query. You can even save the information in JSON if you need to, by using the json serializer. This is useful mostly when using the REST protocol, since many services work well with JSON.

Let’s run through a little example where we run some analysis of the classes. We want to show the complexity and other interesting metrics of each of the classes:

(: Provide interesting metrics for classes. The metrics provided follow the naming conventions used in the literature:
NOAM: The number of abstract methods.
LOC: Lines of Code.
NOS: Number of statements.
NOM: Number of methods.
NOPM: The number of private methods.
NOA (Number of Attributes): The number of fields.
NOPA: Number of public fields.
WMC: Weighted Method Complexity, the sum of the method complexities across all methods. :)

declare function local:MethodComplexity($m as element(Method)) as xs:integer
{
  1 + count($m//IfStatement) + count($m//IfThenElseStatement)
    + count($m//WhileStatement) + count($m//DoWhileStatement)
    + count($m//ForStatement) + count($m/SearchStatement)
    + count($m//CaseValues/*) + count($m//ConditionalExpression)
};

let $options := map { 'lax': false(), 'header': true(), 'format': 'attributes' }


let $r := <ClassMetrics>
{
    for $a in / (Class | Table)
    let $weightedMethodComplexity := sum(for $m in $a/Method return local:MethodComplexity($m))
    order by $weightedMethodComplexity descending
    return <Record>
        <Artifact name="Artifact">{ lower-case("/" || $a/@Package || "/classes/" || $a/@Name)}</Artifact>
        <Name name="Name">{lower-case($a/@Name)}</Name>
        <NOAM name='NOAM'>{count($a/Method[@IsAbstract="true"])}</NOAM>
        <LOC name='LOC'>{$a/@EndLine - $a/@StartLine + 1}</LOC>
        <NOM name='NOM'>{count($a/Method)}</NOM>
        <NOA name='NOA'>{count($a/FieldDeclaration)}</NOA>
        <WMC name='WMC'>{$weightedMethodComplexity}</WMC>
        <NOPM name='NOPM'>{count($a/Method[@IsPublic="true"])}</NOPM>
        <NOPA name='NOPA'>{count($a/FieldDeclaration[@IsPublic="true"])}</NOPA>
        <NOS name='NOS'>{count(for $stmt in $a/Method//* where ends-with(name($stmt), "Statement") return $stmt)}</NOS>
     </Record>
}
</ClassMetrics>

return csv:serialize($r, $options)

The query may seem a little complex because of the logic to calculate the complexity metrics. Ignore that for now, that is not the point here. Run the query generating a csv file, say classes.csv.

$ basexclient.bat -Uadmin -Padmin -iApplicationFoundation classMetrics.xq >classMetrics.csv

Once you have the csv file, open it in excel you can slice and dice the data and present it with some interesting graphics. There is one thing you need to be aware of: When Excel opens the csv file it does not try to guess what the type of the column is. This means that the integer fields are not marked as such, which causes trouble when trying to build graphs etc. Always change the column types to the appropriate type. Once this is done it is easy to use Excel to sort the data and create any graphic that  you want.

Power BI

What we have mentioned above is useful for one-off analysis, where a file is created and analyzed by some external tool (either Excel or custom written tools). However, there is an interesting scenario where you want to build dashboards from the information in the database; one interesting case is where the important code quality metrics are shown on a web page. The tool commonly used to build such dashboards is Power BI.
Power BI is great at getting data from many different places, transforming it and rendering it in interesting ways. By now it should be no surprise that we will be getting the data from a REST POST call to the server. Unfortunately, this is not as easy as I had expected – I had expected to be able to connect to a web API with a great user experience, in pretty much the same way as Postman does it, but instead you have to do it in code. Below you will find the power query in M that I ended up with. Note that you must include the authorization in the header in the Web.Contents options; do this by using the user name and password separated by colon, and base64 encode it. You must select Anonymous as the authentication on the connection, otherwise you will get an error at runtime (i.e. when the data is fetched). The power query below shows the fetching of the query, and the selection of the data we need, as well as  typing the relevant columns as integers, not strings. It looks a little ominous, but almost all of the code was generated by Power BI as the result of manually selecting and transforming the data. The interesting parts are in bold below: The POST query is performed with the Web.Content call. Everything else is just a repetition of the query we have already seen, and the code that Power BI inserted to extract the information from the XML document and making it the correct type.

let
  url = "http://localhost:8984/rest/ApplicationFoundation",
  query = "
<query>
  <text><![CDATA[declare function local:MethodComplexity($m as element(Method)) as xs:integer
{
  1 + count($m//IfStatement) + count($m//IfThenElseStatement)
  + count($m//WhileStatement) + count($m//DoWhileStatement)
  + count($m//ForStatement) + count($m/SearchStatement)
  + count($m//CaseValues/*) + count($m//ConditionalExpression)
};

let $options := map { 'lax': false(), 'header': true(), 'format': 'attributes' }

let $r := <ClassMetrics>
{
  for $a in /(Class | Table)
  let $weightedMethodComplexity := sum(for $m in $a/Method return local:MethodComplexity($m))
  order by $weightedMethodComplexity descending

  return <Record>
    <Artifact name='Artifact'>{ lower-case('/' || $a/@Package || '/classes/' || $a/@Name)}</Artifact>
    <Name name='Name'>{lower-case($a/@Name)}</Name>
    <NOAM name='NOAM'>{count($a/Method[@IsAbstract='true'])}</NOAM>
    <LOC name='LOC'>{$a/@EndLine - $a/@StartLine + 1}</LOC>
    <NOM name='NOM'>{count($a/Method)}</NOM>
    <NOA name='NOA'>{count($a/FieldDeclaration)}</NOA>
    <WMC name='WMC'>{$weightedMethodComplexity}</WMC>
    <NOPM name='NOPM'>{count($a/Method[@IsPublic='true'])}</NOPM>
    <NOPA name='NOPA'>{count($a/FieldDeclaration[@IsPublic='true'])}</NOPA>
    <NOS name='NOS'>{count(for $stmt in $a/Method//* where ends-with(name($stmt), 'Statement') return $stmt)}</NOS>
  </Record>
}
</ClassMetrics>

return $r]]>
  </text>
</query>",
  options = [Content=Text.ToBinary(query), Headers =[#"Content-type"="application/xml",#"authorization"="Basic YWRtaW46YWRtaW4="]],
  result = Web.Contents(url, options),
  #"Imported XML" = Xml.Tables(result,null,65001),
  #"Expanded Table" = Table.ExpandTableColumn(#"Imported XML", "Table", {"Artifact", "Name", "NOAM", "LOC", "NOM", "NOA", "WMC", "NOPM", "NOPA", "NOS"}, {"Table.Artifact", "Table.Name", "Table.NOAM", "Table.LOC", "Table.NOM", "Table.NOA", "Table.WMC", "Table.NOPM", "Table.NOPA", "Table.NOS"}),
  #"Expanded Table.Artifact" = Table.ExpandTableColumn(#"Expanded Table", "Table.Artifact", {"Element:Text"}, {"Table.Artifact.Element:Text"}),
  #"Expanded Table.Name" = Table.ExpandTableColumn(#"Expanded Table.Artifact", "Table.Name", {"Element:Text"}, {"Table.Name.Element:Text"}),
  #"Expanded Table.NOAM" = Table.ExpandTableColumn(#"Expanded Table.Name", "Table.NOAM", {"Element:Text"}, {"Table.NOAM.Element:Text"}),
  #"Changed Type" = Table.TransformColumnTypes(#"Expanded Table.NOAM",{{"Table.NOAM.Element:Text", Int64.Type}}),
  #"Expanded Table.LOC" = Table.ExpandTableColumn(#"Changed Type", "Table.LOC", {"Element:Text"}, {"Table.LOC.Element:Text"}),
  #"Changed Type1" = Table.TransformColumnTypes(#"Expanded Table.LOC",{{"Table.LOC.Element:Text", Int64.Type}}),
  #"Expanded Table.NOM" = Table.ExpandTableColumn(#"Changed Type1", "Table.NOM", {"Element:Text"}, {"Table.NOM.Element:Text"}),
  #"Changed Type2" = Table.TransformColumnTypes(#"Expanded Table.NOM",{{"Table.NOM.Element:Text", Int64.Type}}),
  #"Expanded Table.NOA" = Table.ExpandTableColumn(#"Changed Type2", "Table.NOA", {"Element:Text"}, {"Table.NOA.Element:Text"}),
  #"Changed Type3" = Table.TransformColumnTypes(#"Expanded Table.NOA",{{"Table.NOA.Element:Text", Int64.Type}}),
  #"Expanded Table.WMC" = Table.ExpandTableColumn(#"Changed Type3", "Table.WMC", {"Element:Text"}, {"Table.WMC.Element:Text"}),
  #"Changed Type4" = Table.TransformColumnTypes(#"Expanded Table.WMC",{{"Table.WMC.Element:Text", Int64.Type}}),
  #"Expanded Table.NOPM" = Table.ExpandTableColumn(#"Changed Type4", "Table.NOPM", {"Element:Text"}, {"Table.NOPM.Element:Text"}),
  #"Changed Type5" = Table.TransformColumnTypes(#"Expanded Table.NOPM",{{"Table.NOPM.Element:Text", Int64.Type}}),
  #"Expanded Table.NOPA" = Table.ExpandTableColumn(#"Changed Type5", "Table.NOPA", {"Element:Text"}, {"Table.NOPA.Element:Text"}),
  #"Changed Type6" = Table.TransformColumnTypes(#"Expanded Table.NOPA",{{"Table.NOPA.Element:Text", Int64.Type}}),
  #"Expanded Table.NOS" = Table.ExpandTableColumn(#"Changed Type6", "Table.NOS", {"Element:Text"}, {"Table.NOS.Element:Text"}),
  #"Changed Type7" = Table.TransformColumnTypes(#"Expanded Table.NOS",{{"Table.NOS.Element:Text", Int64.Type}}),
  #"Removed Columns" = Table.RemoveColumns(#"Changed Type7",{"Name"})
in
  #"Removed Columns"

Now you can do something with the data. Let's create a bar graph that shows the artifacts that have a WMC over 250. I also added a word cloud for good measure:

Power BI showing top WMC

Now you can publish the information so it can be consumed by your audience and start enforcing processes that will bring better quality to your code.

We will talk about code metrics and how they impact your code in another blog item. I hope you have enjoyed this one. Thanks for now.

Comments

*This post is locked for comments

  • _Frank Profile Picture _Frank 15
    Posted at
    @Denis: ˋWMC: Weighted Method Complexity, the sum of the method complexities across all methodsˋ. Regarding the calculation see code above [emoticon:4191f5ee34e248a29fa0dbe8d975f74a] And I guess SysQueryForm is the most „complex“ class in the analyzed model ˋApplicationFoundationˋ
  • Community Member Profile Picture Community Member Microsoft Employee
    Posted at
    What is "WMC over 250"? And why SysQueryForm is the biggest one?