Portfolio, Parsing Logs To Find Troublesome Files

Standard

Portfolio Admin

As mentioned before, we use Portfolio to archive old projects.

However since moving to Portfolio v1.x (from v11), we’ve had numerous issues. The main one has been getting the 40TB of data we host re-cataloged into Portfolio.

I’ve recently deep dived into Portfolio, & found some files that cause issues & these can be found via the logs. Parsing them allows me to “fix” the troublesome files or delete any corrupt ones.

Below is the how I’m parsing the logs.

Log Files

On OS X , Portfolio stores it’s logs in /Applications/Extensis/Portfolio Server/logs/, however this location can be set within Portfolio Administration under “Global Settings” > “Logging Configuration”.

The full list of logs can be found below:

admin_server_access.log
admin_server.log
boot.log
elasticsearch_index_indexing_slowlog.log
elasticsearch_index_indexing_slowlog.log
elasticsearch.log
extensis.admin.log
jboss.out
media-delegate_access.log
media-delegate.log
native-agent.log
native-processor.log
server.log
usage.log

The two main logs files that hold the information  we need are:

media-delegate.log
server.log

media-delegate.log

Within media-delegate.log you’ll see entries like the below for any files that Portfolio has had an issue processing:

2016-01-11 04:43:28,896 [] ERROR [extensis.portfolio.mediaprocessor.BaseClassServlet] invalid json
 org.json.JSONException: A JSONArray text must start with '[' at character 1 of Error: File format error - /Volumes/Archive2/autumn research/new research/b110.gif
 [{
 "SourceFile": "/Volumes/Archive2/autumn research/new research/b110.gif"
 }]
at org.json.JSONTokener.syntaxError(JSONTokener.java:448)
 at org.json.JSONArray.<init>(JSONArray.java:107)
 at org.json.JSONArray.<init>(JSONArray.java:146)
 at extensis.portfolio.mediaprocessor.ExifToolFacility.process(ExifToolFacility.java:175)
 at extensis.portfolio.mediaprocessor.DefaultMediaHandler.handleMedia(MediaHandler.java:248)
 at extensis.portfolio.mediaprocessor.MediaProcessorServlet.handleProcessFile(MediaProcessorServlet.java:543)
 at extensis.portfolio.mediaprocessor.MediaProcessorServlet.doPost(MediaProcessorServlet.java:451)
 at javax.servlet.http.HttpServlet.service(HttpServlet.java:644)
 at javax.servlet.http.HttpServlet.service(HttpServlet.java:725)
 at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:291)
 at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
 at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52)
 at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:239)
 at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
 at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:219)
 at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:106)
 at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:136)
 at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:79)
 at org.apache.catalina.valves.AbstractAccessLogValve.invoke(AbstractAccessLogValve.java:610)
 at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:88)
 at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:526)
 at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1078)
 at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:655)
 at org.apache.coyote.http11.Http11NioProtocol$Http11ConnectionHandler.process(Http11NioProtocol.java:222)
 at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1566)
 at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:1523)
 at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
 at java.lang.Thread.run(Thread.java:745)

You’ll see multiple entries with similar for the same file whilst Portfolio tries & tries again to generate a preview for the file.

The bit we need is:

"SourceFile": "/Volumes/Archive2/autumn research/new research/b110.gif"

And from that, just the path to the file itself.

Some other entries might look like:

2016-01-11 07:31:37,617 [] ERROR [extensis.portfolio.mediaprocessor.BaseClassServlet] No workfile produced from /Volumes/Archive2/A8 Q3 CL 000 HOVER.eps. Ghostscript output: Error: /configurationerror in --setpagedevice--
Additional information: [/PageSize [0 0]]
Operand stack:
 --nostringval-- (%%BoundingBox: 0 0 0 0) --dict:1/2(ro)(G)-- --nostringval-- --nostringval-- (%%BoundingBox: 0 0 0 0) 1 0 0 --dict:1/1(L)--
Execution stack:
 %interp_exit .runexec2 --nostringval-- --nostringval-- --nostringval-- 2 %stopped_push --nostringval-- --nostringval-- --nostringval-- false 1 %stopped_push 1914 1 3 %oparray_pop 1914 1 3 %oparray_pop 1913 1 3 %oparray_pop --nostringval-- 1897 1 3 %oparray_pop 1787 1 3 %oparray_pop --nostringval-- %errorexec_pop .runexec2 --nostringval-- --nostringval-- --nostringval-- 2 %stopped_push --nostringval-- %ztokenexec_continue (scanner state) --nostringval-- --nostringval-- --nostringval-- --nostringval-- --nostringval-- --nostringval-- 1882 10 3 %oparray_pop --nostringval-- --nostringval-- --dict:1/10(L)-- --nostringval-- 9 %dict_continue
Dictionary stack:
 --dict:1174/1684(ro)(G)-- --dict:0/20(G)-- --dict:81/200(L)--
Current allocation mode is local
GPL Ghostscript 9.06: Unrecoverable error, exit code 1

Again from that all we want is the path from the below:

2016-01-11 07:31:37,617 [] ERROR [extensis.portfolio.mediaprocessor.BaseClassServlet] No workfile produced from /Volumes/Archive2/A8 Q3 CL 000 HOVER.eps. Ghostscript output: Error: /configurationerror in --setpagedevice--

This log seems to pick up files that might be corrupted or damaged as well as some that are in a format that Portfolio cannot generate a preview from.

Some example errors received from opening some files found via the above are shown below:
Screen Shot 2016-01-11 at 3.09.36 PM Screen Shot 2016-01-11 at 3.10.57 PM
Screen Shot 2016-01-12 at 8.47.43 AM Screen Shot 2016-01-10 at 11.36.03 AM

server.log

Within server.log, we find entries for troublesome files like the below. The files seem to be in a “bad format” that causes Portfolio to struggle generating previews.

2016-01-11 23:59:53,126 ERROR [extensis.portfolio] (Asset Processor 7) error processing file ::SERVER:Archive2:clan crest.eps: java.io.IOException: Server returned HTTP response code: 500 for URL: http://localhost:40372/portfolio-media-processor/MediaProcessor/
 at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1626) [rt.jar:1.7.0_71]
 at extensis.portfolio.server.mediaprocessing.MediaProcessor$FileProcessor.handleInput(MediaProcessor.java:603) [portfolio-classes.jar:]
 at extensis.portfolio.util.HttpURLConnectionWrapper.sendHttpRequest(HttpURLConnectionWrapper.java:24) [portfolio-classes.jar:]
 at extensis.portfolio.server.mediaprocessing.MediaProcessor.processFile(MediaProcessor.java:813) [portfolio-classes.jar:]
 at extensis.portfolio.server.mediaprocessing.MediaProcessor.sendRequestInner(MediaProcessor.java:54) [portfolio-classes.jar:]
 at extensis.portfolio.server.mediaprocessing.MediaProcessor.sendRequest(MediaProcessor.java:43) [portfolio-classes.jar:]
 at extensis.portfolio.server.service.Cataloger.processFile(Cataloger.java:990) [portfolio-classes.jar:]
 at extensis.portfolio.server.service.Cataloger.updateAsset(Cataloger.java:843) [portfolio-classes.jar:]
 at extensis.portfolio.server.service.Cataloger.updateAssetById(Cataloger.java:444) [portfolio-classes.jar:]
 at extensis.portfolio.server.async.QueueItems$UpdateItem.executeInternal(QueueItems.java:165) [portfolio-classes.jar:]
 at extensis.portfolio.server.async.QueueItems$RetryItem.executeInternal(QueueItems.java:247) [portfolio-classes.jar:]
 at extensis.portfolio.server.async.QueueItems$MediaEngineItem.executeWithMediaEngine(QueueItems.java:107) [portfolio-classes.jar:]
 at extensis.portfolio.server.async.QueueItems$UpdateItem.execute(QueueItems.java:156) [portfolio-classes.jar:]
 at extensis.portfolio.server.async.Runner.run(AssetProcessingEngine.java:502) [portfolio-classes.jar:]
 at java.lang.Thread.run(Thread.java:745) [rt.jar:1.7.0_71]

Where the bit we’re interesting in is:

error processing file ::SERVER:Archive2:clan crest.eps:

I’ve seen this will files in legacy formats (such as .art files) as well as EPS files that are not PDF compatible (that option is shown below).

Screen Shot 2016-01-12 at 11.30.50 AM

As mentioned, EPS files seem to be a big issue. On opening some of these i’ve had prompts like the below. Now note, this is 2016 & i’m opening files in CS6 that are considered “legacy”. So some of these files & formats go way back:

Screen Shot 2016-01-10 at 10.22.00 AM Screen Shot 2016-01-10 at 11.41.07 AM Screen Shot 2016-01-11 at 2.00.55 PM

More parsing

More parsing can be done as Portfolio leverages several open source technologies under such as:

Amongst others.

Each outputs a differing error to the logs, & so you might want to change what is parsed depending on your needs.

The Script(s)

I have 2 versions of the Python scripts I’m using to parse the logs.

One script is for running frequently to check todays logs, the other is to run to check yesterdays.

The idea is to run the today script whilst cataloging a lot of items, & then only run the yesterday script going forward.

However, Portfolio does not always rotate it’s logs. So the yesterday script will try & find the log files that are appended with todays date. If that fails it will use the current logs.

If neither script can find todays logs, it will error & send an email.

If any entries based on the above criteria are found in either script, they are then put into a list & the duplicates are removed, entries ordered & then the script writes the files to the scripts logs file which it overwrites with every run & will email you then contents of if any files are found.

You’ll need to set the values of the variables found in the “# Variables to set manually” section yourself.

Output

The scripts will write to their log file with every run, with each run overwriting it’s contents.

If any files found or the log files are missing, they will then email. The email will have a subject like the 1st line & content like the following.

Today Bad Files.py: 6 Bad files & 5 Corrupt files found
 
----- 6 Bad Format Files ------
SERVER:Archive2:artwork:chav bodies.art:
SERVER:Archive2:artwork:lines.art:
SERVER:Archive2:artwork:moon surface.art:
SERVER:Archive2:artwork:mother trucker.art:
SERVER:Archive2:artwork:pink drawing.art:
SERVER:Archive2:scan in girls:illustration1.art:
 
----- 5 Corrupt Files -----
"/Volumes/Archive1/Portfolio/image ref/3752674533_3063245233_b-1.jpg"
"/Volumes/Archive1/Portfolio/image ref/3752674533_3063245233_b.jpg"
"/Volumes/Archive1/Portfolio/image ref/3752676055_6491eed6c9_b.jpg"
"/Volumes/Archive1/Portfolio/image ref/3752677315_64bdf53fd7_b.jpg"
"/Volumes/Archive1/Portfolio/image ref/3768850119_33bfb67d2b_b.jpg"

Today

This script can be found here or below:

Yesterday

This script can be found here or below:

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.